Martini 3 Coarse‐Grained Force Field: Small Molecules

The recent re‐parametrization of the Martini coarse‐grained force field, Martini 3, improved the accuracy of the model in predicting molecular packing and interactions in molecular dynamics simulations. Here, we describe how small molecules can be accurately parametrized within the Martini 3 framework and present a database of validated small molecule models. We pay particular attention to the description of aliphatic and aromatic ring‐like structures, which are ubiquitous in small molecules such as solvents and drugs or in building blocks constituting macromolecules such as proteins and synthetic polymers. In Martini 3, ring‐like structures are described by models that use higher resolution coarse‐grained particles (small and tiny particles). As such, the present database constitutes one of the cornerstones of the calibration of the new Martini 3 small and tiny particle sizes. The models show excellent partitioning behavior and solvent properties. Miscibility trends between different bulk phases are also captured, completing the set of thermodynamic properties considered during the parametrization. We also show how the new bead sizes allow for a good representation of molecular volume, which translates into better structural properties such as stacking distances. We further present design strategies to build Martini 3 models for small molecules of increased complexity.


Introduction
The use of coarse-grained (CG) models in molecular dynamics (MD) simulations is of paramount importance to explore length and time scales beyond what is feasible with atomistic, or all-atom (AA), models. [1][2][3][4][5][6] In CG models, atoms are bundled together into supra-atomic particles, or beads. Besides widening the accessible spatiotemporal scales, such models can provide a more insightful picture with respect to atomistic ones, as the reduction in the number of degrees of freedom translates into a simpler representation of the system which highlights its key features. Thus, CG simulations are an indispensable tool to complement experimental methods to study processes such as spontaneous self-assembly, transformation between phases, domain formation, and host-guest binding. [1][2][3][4][5][6] CG models can be categorized as mainly following a systematic (also known as hierarchical) or a building block approach. [4,7] Models developed on the basis of the former principle reproduce the underlying atomistic structural data accurately, but require reparametrization whenever any condition changes, which makes the parametrization procedure more time-consuming. Also, potential forms required are often complex, which can result in lower performance and therefore less sampling. On the other side, models developed following a building block approach are often cheaper due to the use of simpler potential forms and the only partial parametrization required. Moreover, the parametrization of the building blocks enables their use as part of similar moieties in different molecules, providing a second advantage of the building block approach: the transferability of the models developed within such a framework. However, these advantages come at the cost of a more limited structural accuracy, given the necessarily suboptimal representation of the underlying atomistic detail inherent to building block-based CG models. Most of the CG models, however, do not fit purely in either of the categories but often include features of both CG modeling approaches. [4] Among the building block approaches to coarse-graining, notably the Martini CG force field [8] has seen many applications due to successes achieved in the description of a wide range of biomolecular systems [9] We note that the force field, originally intended for biomolecular simulations, has been successfully applied to describe systems relevant in soft materials science, such as (block co)polymers, self-assembled supramolecular materials, and organic semiconductors. [10] The wide use of the Martini model, however, highlighted a number of shortcomings, in particular in the description of structures which are modeled with a finer resolution than Martini's standard resolution of 4 non-hydrogen atoms per bead (4-to-1). [11][12][13][14][15] Finer 3-to-1, or 2-to-1 mappings are required when modeling ring-like structures, [8] or to adhere to the symmetry of the repeat unit of a polymer. [16] For such finer mappings, "small" [8] and subsequently even smaller, "tiny" beads [17] were introduced. However, the parametrization of the cross-interactions between small and tiny beads and regular beads was performed on an ad hoc basis. As we have shown in previous work, [11] this ad hoc parametrization in concert with deviations from the standard bonded parameters used in the original Martini parametrization [8] results in small but systematic deviations in packing and intermolecular interactions. [11] Such deviations are one of the main reasons for the observed overestimated interactions between Martini models of proteins and sugars, [14,18] and the necessity for the development of custom beads for polymers and other molecules which use 3-to-1 or 2-to-1 mappings. [19,20] Given the ubiquitous role of ring-like structures across the chemical space, addressing such shortcomings was important for applications ranging from protein-ligand binding [21] to imidazolium-based ionic liquids [22] and conjugated molecular materials. [23] To address these shortcomings, a new version of the Martini model has been recently developed, Martini 3: with a re-balanced set of nonbonded interactions and expanded variety of CG particle types, the new version of the force field shows improved accuracy overall in predicting molecular packing and interactions. [24] In light of the recent development of Martini 3, [24] we here describe how small molecules can be accurately parameterized within the Martini 3 framework and present a database of validated small molecule models. We pay particular attention to the description of aliphatic and aromatic ring-like structures, given their ubiquity in small molecules such as solvents and drugs or in building blocks constituting macromolecules such as proteins and synthetic polymers. Such ring-like structures are described by models that use small and tiny CG particle types. As such, the present database of small molecules constitutes one of the cornerstones of the calibration of the new Martini 3 small and tiny particle sizes. We comprehensively describe how to parameterize small molecules, describe which properties have been considered during the parametrization, present design strategies to build complex small molecules, show what can be achieved with the new models, and discuss expected strengths and potential weaknesses of the new models. With this work, we initiate a database of Martini 3 models for small molecules (currently 90 entries) that includes Martini 3 models, atomistic reference structures, atomistic-to-CG (and vice versa) mapping files, and reference and computed properties (available at https: //github.com/ricalessandri/Martini3-small-molecules and http: //cgmartini.nl).
This paper is structured as follows. In Section 2, we start by summarizing the main features of Martini 3 and we subsequently describe in detail how to model small molecules' aromatic and aliphatic core elements (Section 2.1.1), substituents and functional groups (Section 2.1.2), and molecules of increased complexity (Section 2.1.3); we moreover describe the database (Section 2.2) and methods used (Section 2.3). In Section 3, we de-scribe how the models perform in terms of partitioning behavior between water and apolar phases (Section 3.1), solvent properties such as mass density and heat of vaporization (Section 3.2), miscibility (Section 3.3), and stacking interactions (Section 3.4). Last, in Section 3.5, we concisely summarize the key points, limitations, and envisioned applications of the models.

Basics
Nonbonded Interactions: For a thorough description of the Martini 3 force field, we refer to the main publication. [24] Briefly, as in the previous version of the force field, the main parametrization target remains the free energy of transfer between several pairs of solvents, such as hexadecane/water, octanol/water, and chloroform/water. Comparison to infinite-dilution properties such as transfer free energies allows to probe mainly solvent-solute crossinteraction, albeit solvent-solvent self-interaction is also probed for the considered solvents. Moreover, in Martini 3, miscibility data constitute a second core parametrization target: depending on the available reference data, miscibility can be checked by either qualitatively looking at the miscibility of two phases or by computing excess free energies of mixing, as it will also be shown in this contribution. Comparison to miscibility data allows to probe at the same time not only the cross-interaction between the two molecular species but also the relative self-interaction of the two species.
Molecular fragments are used as building blocks (beads), whose nonbonded interactions are described by modified Lennard-Jones (LJ) interactions. To describe the chemical nature of the underlying atomistic structure, CG beads with more or less polar character exists. As in Martini 2, there are four main types of beads: polar (P), intermediately polar (N), apolar (C), and charged (Q). Moreover, a separate water (W) bead, a bead for divalent ions (D), and a set of beads (X) dedicated to groups containing halogen atoms are now included. The P-, N-, C-, Q-, and X-bead types are in turn divided in subtypes based on their hydrogen-bonding capabilities, degree of polarity, or charge softness. [24] To keep the model simple, the levels of interactions (LJ ) are discretized. In Martini 3, the number of possible interaction levels was extended from 10 to 22. Similarly to the LJ , also the sizes of the beads (LJ ) are discretized. There are three bead sizes-regular (R), small (S), and tiny (T), with of 0.47, 0.41, and 0.34 nm, respectivelyand cross-interactions across sizes (R-S, R-T, S-T) are specifically parameterized. In particular, the three available bead sizes are now fully balanced, with regular, small, and tiny beads intended to be used for 4-to-1, 3-to-1, and 2-to-1 atoms-to-bead mappings, respectively. As such, small and tiny beads are not only appropriate to represent ring-like fragments, but also suited for cases involving 3-to-1 and 2-to-1 mapping of linear and branched chemical groups. Moreover, when fully branched fragments (e.g., quaternary carbon atoms or tertiary amine groups) need to be described, one should usually use beads of a smaller size, such as an S-bead for a fully branched fragment containing four nonhydrogen atoms. The rationale for this is that the central atom Figure 1. Examples of mappings of small molecules of increasing complexity. 2D chemical structures are shown along with renderings of their corresponding Martini 3 CG models. From left to right: (first row) cyclohexane, benzene, piperidine, p-cresol, naphthalene, and caffeine; (second row) the molecular dopant N-DMBI, the drug dasatinib. CG beads are color-coded as follows: apolar aliphatic (C1-C3 type) beads are in cyan, apolar aromatic (C4-C6) beads are in silver, intermediately polar (N1-N6) beads in blue, polar (P1-P6) beads in red, halogen-containing (X1-X4) beads in green, and dummy beads in white.
of a fully branched fragment is not exposed to the environment, and this reduces its influence on the environment. A thorough description of Martini 3 is presented in Supporting Information of ref. [24].
Bonded Interactions: In contrast to Martini 2, which relied on using the center of mass (COM) of the mapped chemical groups to define the CG bonded parameters of a molecule, Martini 3 follows a "size-shape concept" aimed at preserving the volume and shape of molecules with respect to the underlying atomistic reference structures. [24] In general, bond lengths obtained from COM mapping proved to be unsatisfactory, usually leading to too high packing densities. This is exemplified for benzene in Figure S1, Supporting Information and it is not unexpected: the original Martini 2.0 benzene model already uses bond lengths of 0.27 nm, ≈25% longer than bond lengths extracted following COM mapping. [8] The inadequacy of COM-mapped bond lengths is due to the geometrical mismatch enforced by the lower resolution of the CG representation (e.g., triangular versus hexagonal shape of benzene) and by not taking into account the volume of hydrogen atoms. Instead, center-of-geometry (COG)-based mapping performed taking into account the hydrogen atoms leads to better molecular (e.g., solvent accessible surface area) and bulk (e.g., mass densities) properties (Section 3.2). COG-mapping is especially important the more the mapping is close to atomistic resolution, that is, when using T-beads. COM and COG are indeed almost equivalent in the case of a 4-to-1 mapped alkane chain (see Figure S2, Supporting Information). In contrast, they provide considerably different bond lengths in the case of 2-to-1 mapped rings, as just discussed for benzene. COG-based (like COM-based) bond lengths can be directly extracted from AA models: such a direct link to the underlying atomistic structure is important, as it, for example, makes them suitable for automated topology building. [12,[25][26][27][28] Hence, using COG-based mapping of atomistic structures constitute the standard procedure for obtaining bonded parameters in Martini 3; such parameters can then, if necessary, be refined, as discussed in more detail in Section 3.2.
Aromatic Rings. Aromatic (i.e., planar, atom-thick) structures are best described by tiny beads. The prototypical aromatic compound, benzene, is described by a three-bead model reminiscent of its Martini 2 model [8] (see also Figure 1): each of the three beads represents two consecutive carbon atoms and their corresponding hydrogen atoms. However, there are a few important differences. First, the model now uses T-beads rather than S-beads. In this way, the six non-hydrogen atoms which constitute benzene are described by three beads which have been parameterized specifically for 2-to-1 mappings. Moreover, the radius of Tparticles is such that stacking distances between Martini models of aromatic structures agree with atomistic and/or experimental stacking distances (see also Section 3.4). For benzene, or more generally nonsubstituted −C=C− groups part of aromatic compounds, the Martini bead type of choice is TC5, that leads to free energies of transfer in excellent agreement with experimental data in the literature (see Section 3.1); therefore benzene is represented by a TC5-TC5-TC5 model. A COG-based bond length of 0.29 nm is used (see also Figure S1, Supporting Information), which leads to a mass density for liquid benzene of 0.890 g cm −3 , www.advancedsciencenews.com www.advtheorysimul.com in excellent agreement (≈2% too high) with the experimental value of 0.877 g cm −3 .
Aromatic ring models are held together [8] by constraints. [29] Distributions of bond lengths obtained from mapped AA simulations between beads which are part of aromatic structures are indeed very narrow (Figure S1c, Supporting Information). As a consequence, they would require very stiff potentials, and such potentials would in turn demand very short time steps. However, at the Martini scale the fast stretching of these bonds becomes unimportant, and such stiff bonds can be replaced by constraints, that is, those bond lengths become constants. More extended stiff molecular structures can be conveniently described by using virtual sites, as discussed in detail in Section 2.1.3.
The parameters of benzene form the basis for all the aromatic ring-like structures, some of which are shown in Figure 1. Together with the other topologies presented in this work, this set should guide the development of new Martini 3 models for aromatic molecules and molecular fragments.
Aliphatic Rings. Aliphatic ring-like structures are described by small beads. The prototypical aliphatic cyclic compound, cyclohexane, is described by a two-S-bead model, differentiating it from the Martini 2 model. [8] The model has been devised to capture the larger bulkiness of cyclohexane, as compared to its aromatic counterpart, that is, benzene. The beads used to describe cyclohexane are of type SC3 based on partitioning data (see Section 3.1). Bonds keep together aliphatic models, as bond distributions are broader in the case of aliphatic rings (compare Figure S1c and Figure S1f, Supporting Information). A bond length of 0.378 nm is used, leading to a mass density for liquid cyclohexane of 0.793 g cm −3 , in excellent agreement (≈2% too high) with the experimental value of 0.774 g cm −3 .
Analogously to benzene, cyclohexane constitutes the basis for the aliphatic ring-like structures. Accordingly, the present pool of aliphatic compounds should serve as reference for the development of new Martini 3 models of aliphatic molecules and molecular fragments. Concluding this section we also note that, although the CG models do not preserve the symmetries of both the prototypical aromatic (benzene) and aliphatic (cyclohexane) compounds, they do increasingly capture the size, shape, and symmetries of increasingly larger molecules.

Mapping Small Molecules
Bead Type Choice. While benzene and cyclohexane form the basis of the, respectively, aromatic and aliphatic ring-like structures which appear in small molecules, most of the interesting compounds also contain heteroatoms either within the ring structures themselves or as substituents. Table 1 lists a subset of the small and tiny Martini particle types and the corresponding chemical building block which they describe in Martini 3 models, along with examples of small molecules which employ that particle type. Note that in some cases a (group of) atom(s) is shared between two neighboring beads: this is indicated with the superscript a) in Table 1 and throughout this manuscript. Atom-sharing between beads is sometimes necessary in order to keep the symmetry of the underlying atomistic structure-see, for example, phenol, tetrahydrofuran, or toluene in Table 1. Note that atomsharing is important also because it needs to be taken into ac-count when extracting bonded interactions for the CG models from COG-mapped atomistic simulations. In particular, when an atom is shared between two beads, the atom's contribution to the position of either of the beads should be halved. For example, in toluene, the position of the SC4 bead is defined by the group −CH a) =C(−CH 3 )−CH a) = (see Table 1). This means that the position of this bead, necessary, for example, to extract bond lengths, will be determined by five atoms-the C atom and the three H atoms of the methyl group and the C atom connected to the methyl group-with a weight, w, of 1 each, while the two C atoms in the "ortho" position, the shared atoms, will contribute to the position of the bead each with w = 1 2 . Note that we found that not including the hydrogens of shared atoms when extracting COG-based bonded parameters leads to properties (see Results Section) in slightly better agreement with reference data.
Beads are chosen based on partitioning data, the Martini hallmark. [9] More bead types are now available within Martini 3 with respect to Martini 2, thus, for example, the three groups a) − (found in 1-methylimidazole, furan, and tetrahydrofuran, respectivelysee also Table 1) described in Martini 2 by a SN0 bead are now described by a TN1, TN2a, and TN4a bead, respectively. Table 1, listing mapping of CG particle types to chemical building blocks, serves as an additional guide towards the assignment of CG particle types. More bead assignments are exemplified in Tables S24 and S25, Supporting Information of ref. [24].
Besides expanding the applicability of the hydrogen bonding "labels", or interaction modifiers, already available in Martini 2, Martini 3 introduced also electron polarizability labels. [24] These labels mimic changes to the interactions of a molecular fragment due to inductive/conjugate effects caused by neighboring chemical groups. Accordingly, C-and X-beads can acquire an electrondonor (or "enriched," label "e") or electron-acceptor ("vacancy", label "v") character. Such labels were shown to nicely capture preferential orientations in aedamers. [24] Moreover, such labels could be used to capture halogen-bonding capabilities of X-beads, as such interactions have been recently shown to be important in, for example, membrane-small molecule interactions. [30] The "e" label is also applied to, for example, the central bead describing naphthalene-which is hence a TC5e bead, see Table 1-being that an electron-rich region of the molecule.
Mapping Substituted Ring-like Fragments. Figure 2 shows how to map substituted ring-like fragments as a function of substituent number and size. Such mappings are based on the following two principles: 1) map all the non-hydrogen atoms with the minimum possible number of beads and 2) preserve the symmetry, volume, and shape of the molecule as much as possible, with aromatic rings being best described by T-beads and aliphatic rings by S-beads. Principle (1) alone would, for example, demand a model for benzene made by either two S-beads or one R-bead and one T-bead, as such models would use the minimum possible number of beads (given the six non-hydrogen atoms of benzene that need to be described). However, principle 2) tells us to disregard both of these models, since the molecular volume and shape of benzene are best described by T-beads. Hence, a three-T-bead model is the best choice for benzene.
The same principles can guide the mapping of substituted benzene shown in Figure 2. For example, when adding a methyl group to benzene to obtain toluene (top row of Figure 2), one Table 1. Small molecule building blocks. CG particle type, corresponding chemical building block, and examples of molecules in which such a block appears. The atoms the CG block is taken to represent are also shown in the 2D chemical structures, with T-beads, S-beads, and R-beads depicted in blue, green, and red, respectively. examples type chemical building block 2D name (mapping) Adv. Theory Simul. 2022, 5, 2100391 Adv. Theory Simul. 2022, 5, 2100391 Indicates that the (group of) atom(s) is shared with neighboring beads; b) indicates molecules considered in this work; others are from ref. [24]. In the case of di-substituted rings, we distinguish between para-and meta-or ortho-substitutions, the latter two to be treated identically in terms of mapping. Small differences in bonded parameters may arise when extracting them from COG-mapped atomistic simulations. Regular, small, and tiny beads are indicated in red, green, and blue, respectively, but not drawn to scale.
of the three beads used to describe the benzene plane become a larger S-bead to account for the extra carbon atom. When instead adding an ethyl group as substituent on benzene-to obtain ethyl-benzene-an extra T-bead should be used, as shown in Figure 2 (top row). Such a choice allows to keep a three-T-bead model for the benzene moiety, which is the most accurate model for benzene in Martini 3. At the same time, the two extra carbon atoms of the ethyl group can be modelled by a T-bead, which we recall are parametrized specifically for 2-to-1 mappings.

Advanced Model Design Strategies
Building More Complex Models. We now describe a number of design strategies useful to build more complex small molecule models, that is, models comprising more than one aromatic/aliphatic ring-like structure. Necessarily, the strategies described do not represent a comprehensive list, nor are they the only way to build a model for a certain molecular construction within the Martini framework. However, we found them to be very helpful when building Martini models for more complex small molecules, and we thus describe them here with the aim of enabling a more facile construction of Martini models by the Martini user community. The strategies can be used in combination and can be adapted to different chemistries than the ones described in the examples. Furthermore, they form an initial set of "rules" which may be implemented in the next generation of programs aimed at automated topology building of Martini models. [12,26,28] Many of the model design strategies outlined below leverage virtual (interaction) sites. [31] In general, the use of virtual sites improves the numerical stability of Martini models of small molecules which are comprised of multiple, or extended, aro-matic structures. As mentioned earlier, Martini aromatic ring structures are held together by constraints. As more extended stiff molecular structures need to be described (e.g., in polycyclic aromatic compounds), the number of constraints that need to be used in their Martini representations grows. However, a highly extended network of constraints leads to numerical instabilities, as it is increasingly complicated to satisfy a growing number of connected constraints (we note that we use GROMACS [32] and the LINCS [29] constraint solver). Saving on the number of constraints by using virtual sites not only increases the numerical stability of the simulations but also improves their performance. Construction 1: "Hinge" Model. The first design strategy is useful to build models for rigid, fused polycyclic compounds and can be illustrated by the proposed model for the molecule of naphthalene (NAPH)-- Figure 3a. The molecule consists of ten aromatic carbon atoms, which translates into a 5-TC5-bead Martini model. Simply interconnecting the five beads with constraints leads to a network of eight constraints (see Figure S3a, Supporting Information). Such a network of constraints already starts to be hard to satisfy: whereas one molecule in vacuum is still numerically stable with a time step of 20 fs, a condensed phase of such a model (432 molecules in a periodic box) readily leads to numerical instabilities with time steps as low as 10 fs, thus requiring even smaller time steps. This makes largescale simulations impracticable. The idea to overcome this limitation is inspired by the "hinge" construction used in the latest cholesterol model proposed by Melo and coworkers. [13] Indeed, another model which could be devised for naphtalene exploits the "hinge" construction for the four external beads (Figure 3a and Figure S3b, Supporting Information), while the central bead is described as a virtual interaction site, that is, a particle whose position is completely defined by its constructing particles (in this case, the other four beads). The displacement of the virtual site . Advanced design strategies to build Martini models for small molecules of increasing complexity. a) Two fused rings can be modeled with a "hinge" construction, as exemplified for naphthalene. b) Two (or more) rigid planar fragments, whose relative dihedral angle can be controlled, can be modeled via a "divide and conquer" strategy; this is exemplified for 2,2 ′ -bithiophene. c) Ring systems linked via a sp 2 hybridized carbon, such as in 2-(phenylmethylidene)cyclopent-4-ene-1,3-dione, represent a "molecular turn" that can be modeled with a specific construction. Schematic representations of the CG models are drawn on top of the chemical structures: a gray circle with solid red contour indicates a CG particle, a gray circle with a dashed cyan contour represents a virtual interaction site, and a smaller gray circle with a dashed gray contour represents a virtual dummy site. Solid red lines indicate constraints, while double blue lines indicate harmonic bonds.
itself is not calculated by the integrator algorithm, but its position is recalculated from the new positions of the constructing particles after each integration step. [32] Of course, forces on the virtual interaction site are accounted for-these are distributed to the constructing atoms in each integration step. Virtual sites are also mass-less, thus the mass of the bead they represent must be accounted for; in the NAPH case, it is evenly added to the masses of the constructing particles. Due to the reduction on the number of constraints used (from 8 to 5), the use of the hinge model also accelerates the calculation: in the case of the same NAPH condensed phase, the simulation runs ≈1.5 times faster. Note also that an improper dihedral is applied in order to keep the hinge model flat (see the comparison of the improper dihedral distribution to AA reference data in Figure S3c, Supporting Information). Finally, exclusions are applied between the two beads of the hinge that are not connected by any bond. It is worth nothing that Martini does not have a defined exclusion rule in its parametrization protocol. In particular, although by default in Martini nonbonded exclusions are defined only between directly connected beads (i.e., only 1-2 interactions are excluded), for rigid fragments or molecules nonbonded exclusions are commonly applied between all the beads of the rigid fragment/molecule (see, e.g., the case of naphthalene just discussed). However, in general, nonbonded interactions are important for more flexible fragments or molecules, where keeping nonbonded interactions between beads separated by 2 (i.e., 1-3 interactions) or 3 (i.e., 1-4 interactions) bonds may be beneficial to capture nontrivial features of the interactions.
To get even more speed-up and get rid of the improper dihedral potential, one could think of reducing further the number of particles to three-the minimal amount of particles needed to define the NAPH plane-and then construct the remaining two as virtual interaction sites. However, this would make NAPH an absolutely rigid body. When torque is generated at one end of the model, the consequent rigid-body rotation can cause a very large displacement on the opposite end, possibly resulting in unphysical overlap with other system components. [13] The hinge model, on the other hand, works as a shock absorber, [13] preventing such an absolute rigid body behavior. It is thus recommended to opt for such a nonrigid body model. This strategy was used to develop models for other extended ring structures, such as caffeine and polyacenes (see Figure S3d-f, Supporting Information). Note that in these other cases, more than one virtual site per hinge was defined.
Construction 2: "Divide and Conquer". A second design strategy is useful to build arbitrarily long chains of rigid planar fragments whose relative dihedral angle can be controlled; it is illustrated by the proposed model for 2,2 ′ -bithiophene (2T)- Figure 3b. This construction is relevant not only for small molecules but also for conjugated polymers, [33] Indeed, this second strategy, dubbed "divide and conquer," is inspired by the polymer backbone construction used in Martini polythiophene models. [33,34] The two thiophene rings are described by three Tbeads each. To connect them in 2,2 ′ -bithiophene while allowing for control of the dihedral angle between the two thiophene planes, we use two virtual dummy sites at the center of geometry of the thiophenes. Note that by dummy we mean a site that does not interact via nonbonded interactions; this type of virtual site is to be contrasted with the virtual interaction site of the previous construction which instead does interact as a standard CG particle. The two dummy sites are then connected by an harmonic bond. In this way, we can apply a dihedral potential i-D i -D j -j, where D i and D j are the dummy sites on the two thiophenes and i and j are two particles on either of the thiophenes; for i and j, it is convenient to take the beads describing the sulfur atoms. Such dihedral represents the torsion between the thiophene rings, and a dihedral potential obtained from quantum mechanical calculation (V dih, QM ) can be directly implemented in such a construction if nonbonded interactions are excluded between the two thiophenes. If nonbonded interactions are not excluded, one should instead introduce in the CG model a dihedral potential which is the reference V dih, QM minus the energy profile around that dihedral that one obtains at the CG level solely due to the nonbonded interactions between adjacent thiophenes (the baseline CG energy profile around that dihedral, V dih, CG ). Although the latter option is of course legitimate and routinely done in QMbased refinement of AA force fields, its added complexity can be conveniently sidestepped at the CG level by using exclusions between the two adjacent rings (which leads to a "flat" V dih, CG ). Let us stress again that this is a very specific use of exclusions, and does not apply in general to Martini models (see discussion of the use of exclusions in Martini in the previous subsection). Finally, the definition of this dihedral is also reminescent of the dummyassisted dihedral described by Bulacu et al. in the context of increasing the numerical stability of CG models. [35] Indeed, by construction this dihedral prevents bead collinearities which lead to numerical instabilities, [35] hence providing a numerically robust way of connecting rigid planar fragments whose relative dihedral angle can be controlled.
Construction 3: A "Molecular Turn." Ring-systems linked through an sp 2 hybridized carbon represent one of the more complicated groups of small molecules to coarsegrain within the Martini framework. As exemplified by the 2-(phenylmethylidene)cyclopent-4-ene-1,3-dione molecule (PMCD, Figure 3c), this type of coupling induces a "molecular turn" in the link between the two ring-systems, which calls for special attention to preserve the rotation axis for the torsional motion of the two rings relative to each other. Also here, virtual dummy sites come in handy. By defining such a virtual dummy site in the COG of each of the ring-systems and a third virtual dummy site directly on top of the linking sp 2 hybridized carbon, it is possible to allow the correct torsion around the single bond despite not having an actual CG bead at the rotation axis. The phenyl ring and the dione-substituted cyclopentene are, respectively, mapped using the standard benzene construction and a hinge-type construction similar to the one described above (cf. Figure 3a); the virtual dummy site on the linking sp 2 hybridized carbon is constructed from the two oxo group beads and the methylidene bead (using [ virtualsites3 ] in GROMACS). The connection to the phenyl group is taken care of by using a harmonic bond between the COG virtual dummy site of the phenyl group and the virtual dummy site of the linker, and the angle between the two ring-systems is conserved by a harmonic angle potential between the three virtual sites. This means that the torsion of the phenyl group relative to the rest of the molecule can be described by a proper dihedral potential centered on the COG virtual dummy site of the phenyl group and the virtual dummy site of the linker. Note that it is also necessary to add an improper dihedral potential to keep the phenyl CG beads in plane with the virtual dummy site of the linker as well as one keeping the COG virtual dummy site of the phenyl group in plane with the CG beads of the dione-substituted cyclopentene. With this construction, it is thus possible to maintain the correct rotation axes between ring-systems linked through an sp 2 hybridized carbon, which is relevant, for example, for several small molecules used in organic electronics. [36]

Database Description
A database containing 90 small molecules (see Table S2, Supporting Information for the full list of molecules), is available at https://github.com/ricalessandri/Martini3-small-molecules and on the Martini portal http://cgmartini.nl. The database comprises small molecules containing a range of mono-and polycyclic aromatic and aliphatic structures, and covering most of the chemical functional groups, thus including a wide range of (bio)molecular building blocks. However, it is necessarily far from being complete. The models included serve as a guide to build new models, or as building blocks to be used in macromolecules such as proteins or synthetic polymers. For example, the side chains of the four aromatic amino acids-4methylimidazole (histidine), p-cresol (tyrosine), toluene (phenylalanine), and 3-methyl-1H-indole (tryptophan)-have been developed as part of this database and are used in the Martini 3 proteins. [24] Similarly, the database contains models for monomers of synthetic polymers, such as 3-hexylthiophene, which when polymerized-which can be done in an automated fashion by the Polyply [37] program-constitutes the conjugated polymer poly(3-hexylthiophene).
For every molecule, a two to five letter long string is used as unique identifier (uID) and eight files are included, namely: 1. uID.smiles: the SMILES string representing the molecule; 2. uID.itp: the Martini 3 model topology, in GROMACS format; www.advancedsciencenews.com

www.advtheorysimul.com
Mapping and index files were generated using CGBuilder (https: //jbarnoud.github.io/cgbuilder/), and then modified manually if necessary, for example, to introduce atom-sharing which is not handled by CGBuilder. Table S2, Supporting Information lists the uID, common name, and SMILES string for all the molecules in the database. Reference and computed properties, needed to reproduce the plots shown in this paper, are available at https://github.com/ ricalessandri/Martini3-small-molecules. Moreover, a tutorial on how to parameterize a small molecule with Martini 3 is also available on the same GitHub repository and on the Martini portal (http://cgmartini.nl).

Simulation Settings and Procedure Details
General Simulation Settings. Settings for the CG simulations adhere to the "new" set of Martini run parameters, [41] using a time step of 20 fs. Specifically, the Verlet neighbor search algorithm was used to update the neighbor list, with a straight cutoff of 1.1 nm. The Parrinello-Rahman barostat [42] (coupling parameter of 12.0 ps) and the velocity-rescaling thermostat [43] (coupling parameter of 1.0 ps) were used to maintain pressure and temperature, respectively. CG simulation setting files are available on the Martini portal http://cgmartini.nl and at https://github.com/ ricalessandri/Martini3-small-molecules. A unique set of atomistic run parameters was used for the AA simulations, using a time step of 2 fs. The Verlet neighbor search algorithm was employed to update the neighbor list; a 1.4 nm cutoff for LJ and for Coulomb (reaction-field) interactions was employed. The Parrinello-Rahman barostat [42] (coupling parameter of 5.0 ps) and the Nosé-Hoover thermostat [44,45] (coupling parameter of 1.0 ps) were used to maintain pressure and temperature, respectively. All simulations were run with GROMACS [32] 2016.x or 2019.x.
All-Atom Models: For every molecule in the database, a OPLS-AA/1.14 * CM1A-LBCC model was obtained from the LigParGen server [38][39][40] by entering the SMILES string of the small molecule; for each molecule, a coordinate file (pdb format) and a topology file in GROMACS format (itp) were obtained. Additional all-atom models for a selected number of small molecules (for example the ones for which free energy surfaces of dimerization were computed, see Section 3.4) were obtained with either the GROMOS 54A7 [46] or the CHARMM General FF (CGenFF) [47,48] force fields. GROMOS 54A7 [46] all-atom models were obtained, in GROMACS format, from the ATB server. [49,50] CHARMM General FF (CGenFF) [47,48] models were obtained by uploading a molecule (in .mol2 format) to https://cgenff.umaryland.edu/ to obtain a .str file; subsequently, to obtain a topology in GRO-MACS format, both files were passed to cgenff_charmm2gmx.py (using the July2017 version, charmm36-jul2017.ff.tgz, both downloaded from http://mackerell.umaryland.edu/ charmm_ff.shtml. In the remainder of the article, we will refer to the AA force fields simply as OPLS, GROMOS, and CHARMM. Liquid and Gas Phase Simulations. A liquid phase was approximated as an equilibrated box of dimensions of about 5 × 5 × 5 nm 3 ; a gas phase as a single molecule occupying a large (7 × 7 × 7 nm 3 ) simulation box. Liquid phase simulations were performed in the NPT ensemble at 298 K and 1 bar, while gas phase ones in the NVT ensemble at 298 K. The enthalpy of vaporization (ΔH vap ) was computed as where U gas and U liq are the total energies (per mole) of the gas and liquid phase, respectively (both extracted from NVT simulations). Densities were extracted with the GROMACS tool gmx density. For this analysis, we used, for each bead, the mass of the constituting atoms rather than the typical bead masses that were used for the simulation. By default Martini 3, like Martini 2, uses standard masses for regular, small, and tiny beads of 72, 54, and 36 Da, respectively. Free Energies of Transfer Calculations. We used alchemical transformations to compute free energies of solvation ΔG S→Ø in a solvent S. ΔGs were computed between water (W) and a number of organic phases, namely hexadecane (HD), (hydrated) octanol (OCO), and chloroform (CLF). In Martini 3, these are described by a W, C1-C1-C1-C1, SC2-SC2-SP1, and X2 model, [24] respectively. Note that we simulate hydrated octanol, that is we add a ≈0.3 mole fraction of water according to experimental conditions. [51] A series of 21 simulations with equally spaced points going from 0 to 1 were performed using a stochastic integrator. Simulations were equilibrated for 2 ns and each point was run for 10 ns. A soft-core potential (with of 0.5 and power set to 1) was used to avoid the singularity in the potential when interactions were switched off. The free energies and corresponding errors were finally computed using the multistate Bennett acceptance ratio. [52] The free energy associated with transferring a solute from a solvent S 1 to a solvent S 2 (ΔG S 1 →S 2 ) was computed as the difference ΔG S 1 →Ø − ΔG S 2 →Ø . Dimerization Free Energy Surface Calculations. Free energy surface (FES) profiles for the dimerization of each molecule were computed by either umbrella sampling (US) [53] or (welltempered [54] ) metadynamics (MTD) [55] simulations. In US simulations, the two solute molecules were placed in a box and solvated. Umbrella windows were spaced 0.1 nm apart along the collective variable (CV), this being the distance between the COGs of the two solute molecules. For each window, this distance was kept fixed by an umbrella potential with a force constant of 1500 kJ mol −1 nm −2 . Each window was simulated for at least 150 and 500 ns in the case of AA and CG systems, respectively. The FES profiles were calculated using the weighted histogram analysis method (WHAM) [56] as implemented in the GROMACS tool gmx wham. In MTD simulations, the bias was added on both the distance between the COGs of the two small molecules and the torsional angle formed by atoms of the small molecules and their COGs in the case of AA simulations. In the CG simulations, the efficiency of Martini allowed us to use only the distance between the COG of the molecules as a CV and obtain converged profiles. Simulations were 100-150 ns or 1 s long in the AA and CG case, respectively. cos (where is the angle between the two vectors perpendicular to the planes of the two molecules), which describes the relative orientation of the ring planes, was used as a third CV to project the FES by using a reweighting algorithm. [57] The height of the deposited Gaussians was set to 1.0 kJ mol −1 and the width to 0.05 nm and 0.2 rad for the distance and torsion, respectively. Gaussians were deposited every 500 steps and the bias factor was set to 5. A wall, in the form of a harmonic potential with a force constant of 200 kJ mol −1 , was added at a distance beyond 2 nm to prevent the molecules from exploring conformations at distances irrelevant to binding. Block analysis was used to estimate the error from the free energy calculations (https://plumed.github.io/doc-v2.4/user-doc/html/trieste-4.html). Both US and MTD FES need to be entropy corrected and shifted so that the free energy at large distances (1.7-2.0 nm) is very close to zero. A stochastic integrator was employed in both US and MTD simulations. All simulations were performed with GROMACS 2016.x, [32] patched with PLUMED 2.4 [58] in the case of the MTD simulations.
Binary Mixture Simulations. An equal number, namely 500, of molecules of species A and B was placed randomly in a simulation box of dimensions 9 × 9 × 9 nm 3 . The box was then energy minimized, simulated at a higher pressure (100 bar) for 1 ns and then subsequently relaxed for 1 ns in the NPT ensemble at 1 bar and 300 K. Simulations were then run in the same ensemble for at least 400 ns. Miscibility was monitored following the number of A-B contacts with the GROMACS tool gmx mindist, with a cutoff distance of 0.6 nm, which comprises the nearest neighbor CG sites around a CG particle. Typical evolutions of the number of A-B contacts as a function of simulation time for mixtures with miscible and immiscible components are shown in Figure S6a, Supporting Information.
Vapor-Liquid Equilibrium Simulations. Vapor-liquid equilibrium simulations were performed by setting up systems of dimensions 7 × 7 × Z nm 3 where Z ranges from ≈12 to ≈16 nm depending on the mixture (benzene-cyclohexane, or benzenechloroform; note that in the case of cyclohexane we used a model with a bond length of 0.385 nm); at least 7 nm of the Z dimension is empty-constituting the starting configuration for the vapor phase. The remaining of the box contains a 50:50 mixture of two components-the starting configuration for the liquid phase. A rendering of this setup is shown in Figure 6c. Such starting simulation boxes were set up for all the compositions possible from x A = 0 to x A = 1 in steps of 0.05 where x A is the molar fraction of component A in the mixture. The vapor-liquid binary systems were then simulated for at least 2 s at each x A in the NVT ensemble at 300 K. This leads to statistically reliable vapor and liquid densities which can be extracted with the GROMACS tool gmx density, as described more in detail in Supporting Information. The equilibrium densities thus obtained were then used to compute the ΔG solv,i of the ith component in the mixture according to Equation (3) as described in Section 3.3.

Partitioning Behavior: Free Energies of Transfer
We first examine the performance of the new models in reproducing experimental free energies of transfer, as partitioning between different solvents constitutes the main target of the Martini force field parametrization. Transfer free energies of all the molecules in the database have been thus computed and compared to available experimental data [59][60][61][62][63][64] in order to settle on consistent mappings for all the molecules in the database. Experimental data are mainly available for hexadecane→water (HD→W), (hydrated) octanol→water (OCO→W) and chloroform→water (CLF→W) partitioning [59][60][61][62][63][64] (see Section 2.3 for the solvent models). Figure 4 displays the performance of the final models with respect to partitioning by showing computed versus experimental free energies in these three cases. The agreement is excellent in all three cases, with a mean absolute error of 1.8, 1.8, and 2.2 kJ mol −1 , for HD→W, OCO→W, and CLF→W, respectively.

Solvent Properties: Enthalpies of Vaporization and Densities
We now examine the performance of the small molecule models in reproducing mass densities and enthalpies of vaporization (ΔH vap ), a secondary parametrization target of the Martini 3 force field. We computed mass density and ΔH vap for the CG models presented here, and compared them to experimental values, [65] where available; the results are shown in Figure 5, where the CG values are computed as described in Section 2.3. In the case of the enthalpies of vaporization (Figure 5a,b), we also report the values for simpler, linear Martini 3 molecules, [24] in order to show the overall trend of ΔH vap of the Martini 3 force field. Indeed, we remark that Martini enthalpies of vaporization are notoriously systematically underestimated due to the limited fluid range of the employed 12-6 Lennard-Jones potential form. [8] Normalizing the data by dividing by the ΔH vap of water leads to the correlation plot presented in Figure 5b. Moreover, we recall [11] that Martini 2 ring-like models deviate from the overall force field trend and lead to systematically higher heat of vaporizations. This deviation was related to the "overmapping" of such molecules and the imperfect S-bead calibration. [11] This is now overcome in Martini 3, where all the small-molecules containing ring-like structures lie in the general trend of the Martini force field.
Computed mass densities for the solvents are overall in good agreement with experimental densities: the mean absolute percentage error is 5.8%. This is shown in Figure 5c, where CG versus experimental densities are plotted. Unsurprisingly, not only the LJ parameters but also the bond lengths used in the models have a large impact on the final densities of the simulated solvents. As anticipated earlier, bond lengths obtained from COGmapped structures in general lead to good molecular and bulk properties. For example, such bond lengths allow for a good reproduction of the solvent accessible surface area (SASA) of the AA model, as shown in Figure S5b, Supporting Information. This also translates into good agreement in case of mass densities (see Figure S5a, Supporting Information, mean absolute percentage error of 7.2 %). However, given the lower resolution of the model, perfectly matching densities cannot be obtained  Data points corresponding to aromatic and aliphatic compounds are indicated by green triangles and blue squares, respectively. In (a-c), also linear molecules from ref. [24] are reported to show the general trend of the Martini 3 force field. In (c) and (d), the dashed and the dotted lines indicated threshold error lines of ±5% and 10%, respectively. Experimental mass densities and enthalpies of vaporization are from ref. [65]. SASA reference values have been obtained with the OPLS all-atom force field. For details on the SASA calculations, see Supporting Information. , THF (tetrahydrofuran), TOLU (toluene), and W (water). b) Comparison between Martini and experimental free energy of mixing (ΔG mix ) and excess free energy of mixing (ΔG ex ) for the benzene-cyclohexane mixture as a function of the mixture composition. x 1 is the molar fraction of benzene. Experimental data are obtained from vapor-liquid equilibrium data. [76] Simulation data are obtained from vapor-liquid equilibrium simulations (see Experimental Section). Both experimental and CG data points are fitted to quadratic equations. A rendering of the setup in shown in (c); small and tiny bead radii are rendered in scale. for all cases simply by following a COG-based mapping scheme. Thus, bond lengths can be refined further if higher accuracy is required. This fine-tuning was done for the models where the mismatch between the experimental and simulated densities was the highest. Thus, some of the topologies contained in the database contain "optimized" bond lengths, which lead to correlation plot for the mass densities reported in Figure 5c-mean absolute percentage error of 5.8%-and for the SASA values reported in Figure 5d.

Solvent Mixtures: Martini Miscibility Table
We tested the capabilities of the models further by examining its performance in reproducing mixing behaviors of binary mixtures. Some force fields have been parameterized based on experimental data on binary mixtures, such as vapor-liquid equilibria (e.g., TraPPE), [66] while established force fields have been tested for their performance in reproducing such equilibria [67] or other related thermodynamic properties of mixtures (e.g., the free energy of mixing). [68] We performed mixing assays to estimate whether two Martini solvent models mixed or not, and compared the observed behavior to reference experimental data. Data regarding the most common solvents are commonly available as solvent miscibility tables. We selected the main aromatic or aliphatic ring-like small molecules which usually appear in such tables. The extracted experimental miscibility table is shown in Figure 6a; results for Martini 3 are in agreement with experimental data, so Figure 6a also represent the Martini miscibility table for ring-like small molecules (and water). In the simulations, the miscibility was monitored following the number of contacts between molecules A and molecules B of a equimolar A:B mixture. Starting from a random mixture of the two, where each species makes a certain number of contacts with the other, monitoring the number of contacts allows for detection of phase separation (see also Experimental Section). Immiscible phases readily demix (sharp decrease in the number of A-B contacts), while for miscible phases the average number of contacts remains constant (see Figure S6a, Supporting Information for typical evolution of number of A-B contacts for mixing and demixing binary mixtures).
A more quantitative estimation of how the new Martini model performs with respect to solvent mixtures has then been carried out for the benzene-cyclohexane mixture. We computed the free energy of mixing (ΔG mix ) as a function of the composition of the mixture. Considering a mixture of A and B, let x A (x B ) be the mole fraction of A (B) in the system, the (molar) ΔG mix is given by [69] where A ( B ) is the chemical potential of component A (B) in the mixture, and * A ( * B ) the chemical potential of component A (B) in its pure state. In this expression, the chemical potentials represent the reversible work to solvate the two components (A or B) in the mixture ( A or B ) or in their own liquid state ( * A or * B ). As such, they correspond to the solvation free energies of the two components in the mixture and in their own liquid state. The first term in Equation (2) can be referred to as the excess free energy of mixing (ΔG ex ), and has to do with the change in interactions experienced by A and B in the mixture with respect to their pure states. In the case of an ideal mixture, where the interactions between A (B) molecules are as strong as the A-B interactions, this term vanishes (and the experimental manifestation of it is Raoult's law). The second term in Equation (2) is an entropic contribution (S id ) due to the increased number of states available upon mixing (sometimes referred to as the ideal mixing free energy given the fact that it is the only term which remains in the case of ideal mixtures).
The chemical potentials which appear in Equation (2), that is, the solvation free energies of the two components in the mixture and in their own liquid state, can be computed directly from the equilibrium vapor and liquid densities of a binary liquid-gas system. Following the scheme of Ben-Naim, [69] the free energy of solvation in the mixture ΔG solv of the ith-component is given by: where vap,i and liq,i are the vapor and liquid densities, respectively, of the ith component in the mixture at the given composition. We obtained the equilibrium densities via "direct" simulations, that is, by simulating equilibrated binary liquid-vapor systems (see Figure 6c and Experimental Section). Such simulations can be very easily extended in the microsecond range with the Martini model, enough to obtain statistically reliable results on the equilibrium vapor and liquid densities. We did this for the benzene-cyclohexane mixture along the whole range of compositions-from pure cyclohexane (x 1 = 0) to pure benzene (x 1 = 1) in steps of 0.05 in x 1 . Figure 6b shows the ΔG mix and ΔG ex computed according to Equation (2). The agreement between simulated and experimental data is good. Experimentally, benzene-cyclohexane deviates positively from Raoult's law, showing less than ideal mixing. This feature is captured at the Martini level. The simulated ΔG ex is somewhat more positive than the experimental one over the whole composition range. It measures 0.74 kJ mol −1 for an equimolar mixture (x A = 0.5), to be compared to the experimental 0.37 kJ mol −1 . The benzenechloroform mixture has also been investigated with the same approach, and results are shown in Figure S6c, Supporting Information. Again, the agreement between experiments and CG is good, when taking into account that the difference between the ΔG ex for the equimolar mixture is around 0.3 kJ mol −1 . We propose that vapor-liquid equilibrium data can be used to further validate the miscibility of phases relevant to a certain application of the Martini force field. Given that evidence of constraints leading to temperature gradient and miscibility artifacts in ternary lipid mixtures has been recently presented, [70,71] and that many of the (aromatic) small molecules rely heavily on constraints, we tested the impact of the LINCS [29] constraint settings on mixtures. Namely, we investigated the impact on mixtures of small molecules by comparing the default LINCS settings (lincs-order = 4; lincs-iter = 1; time step of 20 fs) versus more stringent LINCS settings (lincs-order = 8; lincs-iter = 2) or a smaller time step (15 fs). As shown in Figure S4, Supporting Information, we found no significant impact on three representative mixtures-benzenecyclohexane, benzene-ethanol, and benzene-water-neither on the mixing (as quantified by the number of contacts) nor on eventual differences in temperatures between the molecules involved. However, for larger molecules with multiple constraints (such as cholesterol), we recommend the user to test these settings.

Stacking Interactions: Dimerization Free Energy Landscapes
We now turn to describe the performance of the model by investigating local structural properties, that is, properties which have to do with the "local" structure, such as molecular stacking or packing. In particular, we analyzed stacking behaviors for several aromatic rings, given the ubiquitous role played by aromatic structures in the self-assembly of soft matter. Figure 7 shows potentials of mean force (PMFs) of dimerization in water for several representative aromatic compounds-at the Martini 3, Martini 2, and atomistic levels-along the distance between the COGs of the two molecules. PMFs have been obtained either via Umbrella Sampling, or Metadynamics, as described in detail in Experimental Section. In particular, we analyze the following aromatic compounds: a-c) (poly)cyclic aromatic hydrocarbons, d) a heterocyclic five-membered ring, and e) mono-, and f) di-substituted aromatic compounds.
Overall, the depth of the minima in Martini 3 is in good agreement with the atomistic reference data, showing improvements with respect to Martini 2-in particular for the more hydrophobic compounds, such as benzene and thiophene. Moreover, desolvation barriers-the free energy barriers located between the first and the second minimum of each PMF-are reduced in Martini 3 as compared to Martini 2. This reduction is more in line with the atomistic force fields. The improvement is due to the introduction of LJ cross-interaction sizes, as opposed to Martini 2 were these were absent. [8,11] Note that, in most of the cases, we compare to reference AA PMFs obtained with more than one atomistic force field, since different force fields may provide different reference PMFs, in particular when the size of the molecule increases. [72] A second look at the data shows that, going from Figure 7a,c, that is, upon increasing the size of the aromatic system, the discrepancy between the overall shape of the AA and Martini 3 PMFs decreases. This can be explained by comparing the 2D free energy profiles of Figure 7g,h (AA) and Figure 7j,k (Martini 3). The free energy is now plotted on the 2D coordinate space formed by the distance between the COGs of the two molecules and the cosine of the angle ( ) between the two vectors perpendicular to the planes of the two aromatic molecules. A value of cos close to 1 ( = 0 • ) or −1 ( = 180 • ) indicates that the two aromatic molecules are perfectly stacked, that is, they are in a sandwich conformation. In contrast, cos values close to 0 ( = 90 • ) denote a T-shaped conformation. 1D and 2D free energy surfaces of dimerization for several aromatic small molecules in water. a-f) The free energy is plotted along the distance between the COGs of the two molecules. The profiles obtained with Martini 3 (green solid lines) are shown along with the Martini 2 (red dash-dotted lines) and the atomistic (GROMOS [46] , OPLS [38] , or CHARMM [47,48] , dashed lines) ones; (a-c) show how the dimerization profiles change upon increasing the molecular size of (poly)cyclic aromatic hydrocarbons (benzene (a), naphthalene (b), and pyrene (c)); (d) shows the profile for the five-membered heterocyclic aromatic compound thiophene; (e-f) show profiles for substituted aromatic compounds which possess a permanent dipole moment (chlorobenzene (e), and p-cresol (f)). g-l) The free energy surface is plotted on the 2D coordinate space formed by the distance used in (a-f) and cos , where is the angle between the two vectors perpendicular to the planes of the two molecules; (g-i) are all-atom, and (j-l) are Martini 3 surfaces. Molecular structures and mappings are shown as figure insets. www.advancedsciencenews.com

www.advtheorysimul.com
For representative sandwich and T-shaped conformations, see Figure S7e, Supporting Information. Figure 7g,h show such 2D surfaces for benzene and pyrene, respectively. It can be seen that in the case of the smaller benzene, a T-shaped dimer conformation is preferred at the AA level over the sandwich conformation. As the size of the molecule increases, there is a shift from the T-shaped minimum of benzene (and naphthalene, see Figure S7, Supporting Information) to the sandwich minimum of pyrene. This behavior is not captured by the Martini models; instead, Martini predicts the two conformations to be practically equivalent in terms of free energy for benzene, while it predicts decidedly the sandwich conformation to be the global minimum for the larger naphthalene and pyrene molecules. The T-shaped minimum of benzene is known to be a consequence of the quadrupole moment driving the interaction between the two molecules (given the null dipole moment). The quadrupole interaction appears to be the dominant driving force also in the case of naphthalene, as the sandwich conformation does not appear to be a minimum of the free energy landscape ( Figure S7b, Supporting Information). However, as the size of the molecule increases, the "cavity cost" prevails over the quadrupolar interaction, leading to a sandwich interaction which maximizes the water-water contacts. Standard Martini 3 models for benzene and naphthalene will not be able to capture this effect, which is driven by the specific distribution of charges which is absent in the models, and only accounted for effectively within the LJ parameters. In contrast, the "hydrophobic effect" is intrinsically captured by Martini. However, when the dominant interaction is not a quadrupolar one, Martini 3 models are now expected to reproduce experimental stacking distances-as can be seen in Figure 7c for pyrene-such as the one of conjugated polymers which were previously systematically off due to the larger size of S-beads. [33] The fact that Martini favors sandwich conformations with respect to T-shaped ones also explains the better agreement of the free energy profiles of chlorobenzene (CLBZ) and p-cresol (PCRE) of Figure 7e,f. Indeed, the 2D free energy landscape of Figure 7i (CLBZ) show the dominance of the sandwich conformation in molecules with a permanent dipole moment such as CLBZ (and PCRE, Figure S7a, Supporting Information). This is again captured by Martini 3 (Figure 7k and Figure S7c, Supporting Information).

Possible Applications and Limitations
The present database, along with the outlined model design strategies and the modularity of the Martini force field, open up possibilities to more easily model complex (bio)molecular systems. On the biomolecular side, a natural application of the small molecules presented here combined with the Martini protein models [24] is to investigate protein-ligand interactions in unbiased or biased MD simulations. [21,73] The parametrization guidelines and strategies are expected to facilitate the building of databases with accurate Martini 3 models of small molecules; this possibly will require the help of the next-generation of programs aimed at automated topology building of Martini models, [12,26,28] and automated parametrization of bonded parameters. [25,27] On the materials science side, Martini 3 and the presented parametrization guidelines are expected to open up the possibility of more easily model organic functional materials in general. For example, modeling of organic semiconductors-the key components of organic solar cells, organic transistors, and organic bioelectronic devices to name but a few-and their morphology is an area of particular richness and in need of computational insights at the length scales which can be reached with Martini. [10,74] Organic semiconductors are rich in complex conjugated structures which can now be accurately modeled with Martini 3 and the advanced model design strategies described here. [36] Despite the numerous improvements and even wider applications opened up by Martini 3, limitations of this coarsegraining approach must be kept in mind. Certain limitations apply generally to the Martini force field, [8,24,75] and include a nonquantitative agreement with experimental free energies of solvation and narrower fluid ranges due to the use of the 12-6 Lennard-Jones potential, and an entropy-enthalpy imbalancedue to enthalpy compensating for the loss of degrees of freedom upon coarse-graining-which is known to affect the temperature dependence of several properties, [10,24] Other limitations pertain more specifically to the systems subject of the present study, namely small-molecules containing ring-like structures described by finer mappings. As shown in Section 3.4, conformations driven by quadrupolar interactions, such as T-shaped stacking, are not captured by standard Martini models due to the lack of specific electrostatic interactions. Care must be taken thus if such quadrupolar interactions are expected to be driving a particular self-assembly process. The inclusion of partial charges may remedy to this, and this is now possible in Martini 3 thanks to the "q" labels. [24] Exploring such an avenue is a natural possibility now opened up by the increased flexibility of Martini 3. Additional model refinement is also made possible in Martini 3 by the introduction of the so-called selfinteraction labels. These more generic labels decrease or increase the self-interaction of a certain bead type without changing its free energy of transfer. Accordingly, they allow for quick model refinement and, together with the wider bead type selection and the other labels, are expected to greatly reduce the need for the development of ad hoc beads. Finally, although some model design strategies have been formulated in the present work, building models for extended planar (polymeric) systems can be challenging and time-consuming, as the extended networks of constraints often required by such structures easily lead to numerical instabilities. To fully harness the capabilities of Martini 3 CG modeling of (small) molecules, such strategies need to be implemented in the next-generation of programs aimed at automated topology building of Martini models. [12,26,28] In conclusion, we presented how small molecules can be accurately parameterized within the Martini 3 framework, and initiated a database of validated small molecule models. The present database, along with the outlined model design strategies and the modularity of the Martini force field, constitute a resource of models that 1) can be used "as is" in (bio)molecular simulations; 2) gives reference points for the construction of other small molecule models; 3) provides guidelines and reference data for automated topology builders; and 4) can be used as building blocks for the construction of more complex (macro)molecules, www.advancedsciencenews.com www.advtheorysimul.com hence enabling investigations of complex biomolecular [21,73] and soft material systems. [10,74]

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.