GA crystal structure prediction¶
Here we will use the GA to predict a crystal structure of a given chemical composition.
The implementation is based on:
M. Van den Bossche, H. Grönbeck, and B. HammerJ. Chem. Theory Comput. 2018, 14, 2797−2807
and has much the same functionality as e.g. the USPEX program from the Oganov group and the XtalOpt code by the Zurek group.
What sets crystal structure searches apart from other global optimization problems, is that typically the cell vectors need to be treated as additional degrees of freedom (in addition to the atomic coordinates). This results in the following ‘bulk GA’-specific functionality:
Generating random structures now also involves randomized unit cell vectors.
Cut-and-splice pairing needs to be done using scaled coordinates and includes a random combination of the two parent unit cells.
A ‘strain’ mutation applies a random deformation of the parent unit cell, similar to how the ‘rattle’ mutation acts on the atomic coordinates.
In a ‘permustrain’ mutation, atoms from different elements are swapped in addition to the ‘strain’ mutation.
A ‘soft’ mutation displaces the atoms along a low-frequency vibrational mode obtained via e.g. a simple harmonic bond model.
As an example, we will search for the most energetically stable Ag24 polymorphs using an EMT potential. Note that, in general, the number of atoms per unit cell should be chosen carefully, as one will only find crystal structures where the stoichiometry of the primitive cell is a divisor of the chosen stoichiometry. As the stress tensor is currently not implemented in the EMT calculator in ASE, the ASAP code will be used instead.
Initial population¶
The following script (ga_bulk_start.py
) creates a gadb.db
database containing 20 randomly generated initial structures.
from ase.data import atomic_numbers
from ase.ga.utilities import closest_distances_generator
from ase.ga.bulk_utilities import CellBounds
from ase.ga.bulk_startgenerator import StartGenerator
from ase.ga.data import PrepareDB
# Number of random initial structures to generate
N = 20
# Target cell volume for the initial structures, in angstrom^3
volume = 240.
# Specify the 'building blocks' from which the initial structures
# will be constructed. Here we take single Ag atoms as building
# blocks, 24 in total.
blocks = [('Ag', 24)]
# From these blocks we can determine the stoichiometry
# (as a list of atomic numbers)
stoichiometry = [atomic_numbers[atom] for atom, count in blocks
for _ in range(count)]
# Generate a dictionary with the closest allowed interatomic distances
atom_numbers = list(set(stoichiometry))
blmin = closest_distances_generator(atom_numbers=atom_numbers,
ratio_of_covalent_radii=0.5)
# Specify reasonable bounds on the minimal and maximal
# cell vector lengths (in angstrom) and angles (in degrees)
cellbounds = CellBounds(bounds={'phi': [35, 145],
'chi': [35, 145],
'psi': [35, 145],
'a': [3, 50], 'b': [3, 50], 'c': [3, 50]})
# Choose an (optional) 'cell splitting' scheme which basically
# controls the level of translational symmetry (within the unit
# cell) of the randomly generated structures. Here a 1:1 ratio
# of splitting factors 2 and 1 is used:
splits = {(2,): 1, (1,): 1}
# There will hence be a 50% probability that a candidate
# is constructed by repeating an randomly generated Ag12
# structure along a randomly chosen axis. In the other 50%
# of cases, no cell cell splitting will be applied.
# Initialize the random structure generator
sg = StartGenerator(blocks, blmin, volume, cellbounds=cellbounds,
splits=splits)
# Create the database
da = PrepareDB(db_file_name='gadb.db',
stoichiometry=stoichiometry)
# Generate N random structures
# and add them to the database
for i in range(N):
a = sg.get_new_candidate()
da.add_unrelaxed_candidate(a)
Run the GA search¶
Now we can start the actual search (ga_bulk_run.py
),
which should only take a few minutes to complete.
The relaxation function, which performs the variable-cell
local optimization, is imported from ga_bulk_relax.py
.
from ase.io import write
from ase.ga import get_raw_score
from ase.ga.data import DataConnection
from ase.ga.population import Population
from ase.ga.utilities import closest_distances_generator
from ase.ga.bulk_utilities import CellBounds
from ase.ga.ofp_comparator import OFPComparator
from ase.ga.offspring_creator import OperationSelector
from ase.ga.bulk_mutations import StrainMutation, SoftMutation
from ase.ga.bulk_crossovers import CutAndSplicePairing
from ga_bulk_relax import relax
# Connect to the database and retrieve some information
da = DataConnection('gadb.db')
atom_numbers_to_optimize = da.get_atom_numbers_to_optimize()
n_to_optimize = len(atom_numbers_to_optimize)
# Use Oganov's fingerprint functions to decide whether
# two structures are identical or not
comp = OFPComparator(n_top=n_to_optimize, dE=1.0,
cos_dist_max=1e-3, rcut=10., binwidth=0.05,
pbc=[True, True, True], sigma=0.05, nsigma=4,
recalculate=False)
# Define the cell and interatomic distance bounds
# that the candidates must obey
blmin = closest_distances_generator(atom_numbers_to_optimize, 0.5)
cellbounds = CellBounds(bounds={'phi': [20, 160],
'chi': [20, 160],
'psi': [20, 160],
'a': [2, 60], 'b': [2, 60], 'c': [2, 60]})
# Define a pairing operator with 100% (0%) chance that the first
# (second) parent will be randomly translated, and with each parent
# contributing to at least 15% of the child's scaled coordinates
pairing = CutAndSplicePairing(blmin, p1=1., p2=0., minfrac=0.15,
cellbounds=cellbounds, use_tags=False)
# Define a strain mutation with a typical standard deviation of 0.7
# for the strain matrix elements (drawn from a normal distribution)
strainmut = StrainMutation(blmin, stddev=0.7, cellbounds=cellbounds,
use_tags=False)
# Define a soft mutation; we need to provide a dictionary with
# (typically rather short) minimal interatomic distances which
# is used to determine when to stop displacing the atoms along
# the chosen mode. The minimal and maximal single-atom displacement
# distances (in Angstrom) for a valid mutation are provided via
# the 'bounds' keyword argument.
blmin_soft = closest_distances_generator(atom_numbers_to_optimize, 0.1)
softmut = SoftMutation(blmin_soft, bounds=[2., 5.], use_tags=False)
# By default, the operator will update a "used_modes.json" file
# after every mutation, listing which modes have been used so far
# for each structure in the database. The mode indices start at 3
# as the three lowest frequency modes are translational modes.
# Set up the relative probabilities for the different operators
operators = OperationSelector([4., 3., 3.],
[pairing, softmut, strainmut])
# Relax the initial candidates
while da.get_number_of_unrelaxed_candidates() > 0:
a = da.get_an_unrelaxed_candidate()
relax(a, cellbounds=cellbounds)
da.add_relaxed_step(a)
cell = a.get_cell()
if not cellbounds.is_within_bounds(cell):
da.kill_candidate(a.info['confid'])
# Initialize the population
population_size = 20
population = Population(data_connection=da,
population_size=population_size,
comparator=comp,
logfile='log.txt',
use_extinct=True)
# Update the scaling volume used in some operators
# based on a number of the best candidates
current_pop = population.get_current_population()
strainmut.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)
pairing.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)
# Test n_to_test new candidates; in this example we need
# only few GA iterations as the global minimum (FCC Ag)
# is very easily found (typically already after relaxation
# of the initial random structures).
n_to_test = 50
for step in range(n_to_test):
print('Now starting configuration number {0}'.format(step))
# Create a new candidate
a3 = None
while a3 is None:
a1, a2 = population.get_two_candidates()
a3, desc = operators.get_new_individual([a1, a2])
# Save the unrelaxed candidate
da.add_unrelaxed_candidate(a3, description=desc)
# Relax the new candidate and save it
relax(a3, cellbounds=cellbounds)
da.add_relaxed_step(a3)
# If the relaxation has changed the cell parameters
# beyond the bounds we disregard it in the population
cell = a3.get_cell()
if not cellbounds.is_within_bounds(cell):
da.kill_candidate(a3.info['confid'])
# Update the population
population.update()
if step % 10 == 0:
# Update the scaling volumes of the strain mutation
# and the pairing operator based on the current
# best structures contained in the population
current_pop = population.get_current_population()
strainmut.update_scaling_volume(current_pop, w_adapt=0.5,
n_adapt=4)
pairing.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)
write('current_population.traj', current_pop)
print('GA finished after step %d' % step)
hiscore = get_raw_score(current_pop[0])
print('Highest raw score = %8.4f eV' % hiscore)
all_candidates = da.get_all_relaxed_candidates()
write('all_candidates.traj', all_candidates)
current_pop = population.get_current_population()
write('current_population.traj', current_pop)
All the bulk operators¶
-
class
ase.ga.bulk_crossovers.
CutAndSplicePairing
(blmin, n_top=None, p1=1.0, p2=0.05, minfrac=None, cellbounds=None, use_tags=False, test_dist_to_slab=True, verbose=False)[source]¶ A cut and splice operator for bulk structures.
For more information, see also:
Parameters:
- blmin: dict
The closest allowed interatomic distances on the form: {(Z, Z*): dist, …}, where Z and Z* are atomic numbers.
- n_top: int or None
The number of atoms to optimize (None = include all).
- p1: float or int between 0 and 1
Probability that a parent is shifted over a random distance along the normal of the cutting plane.
- p2: float or int between 0 and 1
Same as p1, but for shifting along the two directions in the cutting plane.
- minfrac: float or int between 0 and 1
Minimal fraction of atoms a parent must contribute to the child.
- cellbounds: ase.ga.bulk_utilities.CellBounds instance
Describing limits on the cell shape, see
CellBounds
.- use_tags: boolean
Whether to use the atomic tags to preserve molecular identity. Note: same-tag atoms are supposed to be grouped together.
- test_dist_to_slab: boolean
Whether also the distances to the slab should be checked to satisfy the blmin.
-
class
ase.ga.bulk_mutations.
StrainMutation
(blmin, cellbounds=None, stddev=0.7, use_tags=False, verbose=False)[source]¶ Mutates a candidate by applying a randomly generated strain.
For more information, see also:
After initialization of the mutation, a scaling volume (to which each mutated structure is scaled before checking the constraints) is typically generated from the population, which is then also occasionally updated in the course of the GA run.
Parameters:
- blmin: dict
The closest allowed interatomic distances on the form: {(Z, Z*): dist, …}, where Z and Z* are atomic numbers.
- cellbounds: ase.ga.bulk_utilities.CellBounds instance
Describing limits on the cell shape, see
CellBounds
.- stddev: float
Standard deviation used in the generation of the strain matrix elements.
- use_tags: boolean
Whether to use the atomic tags to preserve molecular identity.
-
class
ase.ga.bulk_mutations.
PermuStrainMutation
(permutationmutation, strainmutation, verbose=False)[source]¶ Combination of PermutationMutation and StrainMutation.
For more information, see also:
Parameters:
- permutationmutation: OffspringCreator instance
A mutation that permutes atom types.
- strainmutation: OffspringCreator instance
A mutation that mutates by straining.
-
class
ase.ga.bulk_mutations.
SoftMutation
(blmin, bounds=[0.5, 2.0], calculator=<class 'ase.ga.bulk_mutations.BondElectroNegativityModel'>, rcut=10.0, used_modes_file='used_modes.json', use_tags=False, verbose=False)[source]¶ Mutates the structure by displacing it along the lowest (nonzero) frequency modes found by vibrational analysis, as in:
As in the reference above, the next-lowest mode is used if the structure has already been softmutated along the current-lowest mode.
Parameters:
- blmin: dict
The closest allowed interatomic distances on the form: {(Z, Z*): dist, …}, where Z and Z* are atomic numbers.
- bounds: list
Lower and upper limits (in Angstrom) for the largest atomic displacement in the structure. For a given mode, the algorithm starts at zero amplitude and increases it until either blmin is violated or the largest displacement exceeds the provided upper bound). If the largest displacement in the resulting structure is lower than the provided lower bound, the mutant is considered too similar to the parent and None is returned.
- calculator: ASE calculator object
The calculator to be used in the vibrational analysis. The default is to use a calculator based on pairwise harmonic potentials with force constants from the “bond electronegativity” model described in the reference above. Any calculator with a working
get_forces()
method will work.- rcut: float
Cutoff radius in Angstrom for the pairwise harmonic potentials.
- used_modes_file: str or None
Name of json dump file where previously used modes will be stored (and read). If None, no such file will be used. Default is to use the filename ‘used_modes.json’.
- use_tags: boolean
Whether to use the atomic tags to preserve molecular identity.
-
class
ase.ga.bulk_mutations.
RotationalMutation
(blmin, n_top=None, fraction=0.33, tags=None, min_angle=1.57, test_dist_to_slab=True, verbose=False)[source]¶ Mutates a candidate by applying random rotations to multi-atom moieties in the structure (atoms with the same tag are considered part of one such moiety). Only performs whole-molecule rotations, no internal rotations.
For more information, see also:
Parameters:
- blmin: dict
The closest allowed interatomic distances on the form: {(Z, Z*): dist, …}, where Z and Z* are atomic numbers.
- n_top: int or None
The number of atoms to optimize (None = include all).
- fraction: float
Fraction of the moieties to be rotated.
- tags: None or list of integers
Specifies, respectively, whether all moieties or only those with matching tags are eligible for rotation.
- min_angle: float
Minimal angle (in radians) for each rotation; should lie in the interval [0, pi].
- test_dist_to_slab: boolean
Whether also the distances to the slab should be checked to satisfy the blmin.
-
class
ase.ga.bulk_mutations.
RattleRotationalMutation
(rattlemutation, rotationalmutation, verbose=False)[source]¶ Combination of RattleMutation and RotationalMutation.
Parameters:
- rattlemutation: OffspringCreator instance
A mutation that rattles atoms.
- rotationalmutation: OffspringCreator instance
A mutation that rotates moieties.
All the helper functions¶
-
class
ase.ga.bulk_utilities.
CellBounds
(bounds={})[source]¶ Class for defining as well as checking limits on cell vector lengths and angles
Parameters:
- bounds: dict
Any of the following keywords can be used, in conjunction with a [low, high] list determining the lower and upper bounds:
- a, b, c:
Minimal and maximal lengths (in Angstrom) for the 1st, 2nd and 3rd lattice vectors.
- alpha, beta, gamma:
Minimal and maximal values (in degrees) for the angles between the lattice vectors.
- phi, chi, psi:
Minimal and maximal values (in degrees) for the angles between each lattice vector and the plane defined by the other two vectors.
Example:
>>> from ase.ga.bulk_utilities import CellBounds >>> CellBounds(bounds={'phi': [20, 160], ... 'chi': [60, 120], ... 'psi': [20, 160], ... 'a': [2, 20], 'b': [2, 20], 'c': [2, 20]})
-
class
ase.ga.bulk_startgenerator.
StartGenerator
(blocks, blmin, volume, cellbounds=None, cell=None, splits={(1, ): 1})[source]¶ Class for generating random starting candidates of bulk structures. The candidates are generated by:
Randomly creating a cell that satisfies the specified volume.
Adding one building block at a time, respecting the minimal distances. Building blocks can be either single atoms or groups of atoms.
Parameters:
- blocks: list
List of building units for the structure, each item as a (A, B) tuple, with A either an Atoms object or string (element symbol or molecule name) and B the number of these repeating units. A few examples:
>>> sg = StartGenerator([('Ti', 4), ('O', 8)], ...) >>> sg = StartGenerator([('CO2', 10)], ...) >>> h2 = Atoms('H2', positions=[[0, 0, 0], [0.7, 0, 0]]) >>> sg = StartGenerator([(h2, 12)], ...)
- blmin: dict or int
Dictionary with minimal interatomic distances. If an integer is provided instead, the dictionary will be generated with this fraction of covalent bond lengths.
- volume: int or float
Initial guess for the unit cell volume.
- cellbounds: ase.ga.bulk_utilities.CellBounds instance
Describing limits on the cell shape, see
CellBounds
.- cell: 3x3 matrix or length 3 or 6 vector
If you want to keep the cell vectors fixed, the desired cell is to be specified here.
- splits: dict
Splitting scheme to be used, based on:
This should be a dict specifying the relative probabilities for each split, written as tuples. For example,
>>> splits = {(2,): 3, (1,): 1}
This means that, for each structure, either a splitting factor of 2 is applied to one randomly chosen axis, or a splitting factor of 1 is applied (i.e., no splitting). The probability ratio of the two scenararios will be 3:1, i.e. 75% chance for the former and 25% chance for the latter splitting scheme.
To e.g. always apply splitting factors of 2 and 3 along two randomly chosen axes:
>>> splits={(2, 3): 1}
-
class
ase.ga.ofp_comparator.
OFPComparator
(n_top=None, dE=1.0, cos_dist_max=0.005, rcut=20.0, binwidth=0.05, sigma=0.02, nsigma=4, pbc=True, maxdims=None, recalculate=False)[source]¶ Implementation of comparison using Oganov’s fingerprint (OFP) functions, based on:
Parameters:
- n_top: int or None
The number of atoms to optimize (None = include all).
- dE: float
Energy difference above which two structures are automatically considered to be different. (Default 1 eV)
- cos_dist_max: float
Maximal cosine distance between two structures in order to be still considered the same structure. Default 5e-3
- rcut: float
Cutoff radius in Angstrom for the fingerprints. (Default 20 Angstrom)
- binwidth: float
Width in Angstrom of the bins over which the fingerprints are discretized. (Default 0.05 Angstrom)
- pbc: list of three booleans or None
Specifies whether to apply periodic boundary conditions along each of the three unit cell vectors when calculating the fingerprint. The default (None) is to apply PBCs in all 3 directions.
Note: for isolated systems (pbc = [False, False, False]), the pair correlation function itself is always short-ranged (decays to zero beyond a certain radius), so unity is not substracted for calculating the fingerprint. Also the volume normalization disappears.
- maxdims: list of three floats or None
If PBCs in only 1 or 2 dimensions are specified, the maximal thicknesses along the non-periodic directions can be specified here (the values given for the periodic directions will not be used). If set to None (the default), the length of the cell vector along the non-periodic direction is used.
Note: in this implementation, the cell vectors are assumed to be orthogonal.
- sigma: float
Standard deviation of the gaussian smearing to be applied in the calculation of the fingerprints (in Angstrom). Default 0.02 Angstrom.
- nsigma: int
Distance (as the number of standard deviations sigma) at which the gaussian smearing is cut off (i.e. no smearing beyond that distance). (Default 4)
- recalculate: boolean
If True, ignores the fingerprints stored in atoms.info and recalculates them. (Default False)