Note
Go to the end to download the full example code.
First descriptor computation¶
This is an introduction to the featomic interface using a molecular crystals dataset using the Python interface. If you are interested in another programming language we recommend you first follow this tutorial and afterward take a look at the how-to guide on Computing SOAP features.
The dataset¶
The atomic configurations used in our documentation are a small subset of the ShiftML2
dataset containing molecular
crystals. There are four crystals - one with each of the elements [H, C]
, [H, C,
N, O]
, [H, C, N]
, and [H, C, O]
. For each crystal, we have 10 structures: the
first one is geometry-optimized. The following 9 contain atoms that are slightly
displaced from the geometry-optimized one. You can obtain the dataset from our
website
.
We will start by importing all the required packages: the classic numpy; chemfiles to load data, and featomic to compute representations. Afterward we will load the dataset using chemfiles.
import chemfiles
import numpy as np
from featomic import SphericalExpansion
with chemfiles.Trajectory("dataset.xyz") as trajectory:
structures = [s for s in trajectory]
print(f"The dataset contains {len(structures)} structures.")
The dataset contains 40 structures.
We will not explain here how to use chemfiles in detail, as we only use a few
functions. Briefly, chemfiles.Trajectory
loads structure data in a format
featomic can use. If you want to learn more about the possibilities take a look at the
chemfiles documentation.
Let us now take a look at the first structure of the dataset.
structure_0 = structures[0]
print(structure_0)
Frame with 20 atoms
With structure_0.atoms
we get a list of the atoms that make up the first
structure. The name
attribute gives us the name of the specified atom.
The first structure contains 4 C-atoms, 8 H-atoms, 4 N-atoms and 4 O-atoms.
Calculate a descriptor¶
We will now calculate an atomic descriptor for this structure using the SOAP spherical expansion as introduced by Bartók, Kondor, and Csányi.
To do so we define below a set of parameters telling featomic how the spherical expansion should be calculated. These parameters are also called hyper parameters since they are parameters of the representation, in opposition to parameters of machine learning models. Hyper parameters are a crucial part of calculating descriptors. Poorly selected hyper parameters will lead to a poor description of your dataset as discussed in the literature.
cutoff = {
"radius": 4.5,
"smoothing": {"type": "ShiftedCosine", "width": 0.5},
}
density = {
"type": "Gaussian",
"width": 0.3,
"radial_scaling": {
"type": "Willatt2018",
"scale": 2.0,
"rate": 1.0,
"exponent": 4,
},
}
basis = {
"type": "TensorProduct",
"max_angular": 5,
"radial": {"type": "Gto", "max_radial": 8},
}
After we set the hyper parameters we initialize a
featomic.calculators.SphericalExpansion
object with hyper parameters defined
above and run the featomic.calculators.CalculatorBase.compute()
method.
calculator = SphericalExpansion(cutoff=cutoff, density=density, basis=basis)
descriptor_0 = calculator.compute(structure_0)
print(type(descriptor_0))
<class 'metatensor.tensor.TensorMap'>
The descriptor format is a metatensor.TensorMap
object. Metatensor is like
numpy for storing representations of atomistic ML data. Extensive details on the
metatensor are covered in the corresponding documentation.
We will now have a look at how the data is stored inside metatensor.TensorMap
objects.
print(descriptor_0)
TensorMap with 96 blocks
keys: o3_lambda o3_sigma center_type neighbor_type
0 1 1 1
1 1 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1 1
0 1 1 6
1 1 1 6
2 1 1 6
3 1 1 6
4 1 1 6
5 1 1 6
0 1 1 7
1 1 1 7
2 1 1 7
3 1 1 7
4 1 1 7
5 1 1 7
0 1 1 8
1 1 1 8
2 1 1 8
3 1 1 8
4 1 1 8
5 1 1 8
0 1 6 1
1 1 6 1
2 1 6 1
3 1 6 1
4 1 6 1
5 1 6 1
0 1 6 6
1 1 6 6
2 1 6 6
3 1 6 6
4 1 6 6
5 1 6 6
0 1 6 7
1 1 6 7
2 1 6 7
3 1 6 7
4 1 6 7
5 1 6 7
0 1 6 8
1 1 6 8
2 1 6 8
3 1 6 8
4 1 6 8
5 1 6 8
0 1 7 1
1 1 7 1
2 1 7 1
3 1 7 1
4 1 7 1
5 1 7 1
0 1 7 6
1 1 7 6
2 1 7 6
3 1 7 6
4 1 7 6
5 1 7 6
0 1 7 7
1 1 7 7
2 1 7 7
3 1 7 7
4 1 7 7
5 1 7 7
0 1 7 8
1 1 7 8
2 1 7 8
3 1 7 8
4 1 7 8
5 1 7 8
0 1 8 1
1 1 8 1
2 1 8 1
3 1 8 1
4 1 8 1
5 1 8 1
0 1 8 6
1 1 8 6
2 1 8 6
3 1 8 6
4 1 8 6
5 1 8 6
0 1 8 7
1 1 8 7
2 1 8 7
3 1 8 7
4 1 8 7
5 1 8 7
0 1 8 8
1 1 8 8
2 1 8 8
3 1 8 8
4 1 8 8
5 1 8 8
The metatensor.TensorMap
contains multiple metatensor.TensorBlock
.
To distinguish them each block is associated with a unique key. For this example, we
have one block for each angular channel labeled by o3_lambda
(o3_sigma
is
always equal to 1 since we only consider proper tensors, not pseudo-tensors), the
central atom type center_type
and neighbor atom type labeled by neighbor_type
.
Different atomic types are represented using their atomic number, e.g. 1 for hydrogen,
6 for carbon, etc. To summarize, this descriptor contains 96 blocks covering all
combinations of the angular channels, central atom type, and neighbor atom types in
our dataset.
Let us take a look at the second block (at index 1) in detail. This block contains the descriptor for the \(\lambda = 1\) angular channel for hydrogen-hydrogen pairs.
block = descriptor_0.block(1)
print(descriptor_0.keys[1])
LabelsEntry(o3_lambda=1, o3_sigma=1, center_type=1, neighbor_type=1)
Metadata about the descriptor¶
The values of the representation are inside the blocks, in the block.values
array.
Each entry in this array is described by metadata carried by the block. For the
spherical expansion calculator used in this tutorial the values have three dimensions
which we can verify from the .shape
attribute.
print(block.values.shape)
(8, 3, 9)
The first dimension is described by the block.samples
, the intermediate dimension
by block.components
, and the last dimension by block.properties
. The sample
dimension has a length of eight because we have eight hydrogen atoms in the first
structure. We can reveal more detailed metadata information about the sample dimension
printing of the metatensor.TensorBlock.samples
attribute of the block
print(block.samples)
Labels(
system atom
0 12
0 13
...
0 18
0 19
)
In these labels, the first column indicates the structure, which is 0 for all because we only computed the representation of a single structure. The second entry of each tuple refers to the index of the central atom.
We can do a similar investigation for the second dimension of the value array,
descrbied by metatensor.TensorBlock.components
.
print(block.components)
[Labels(
o3_mu
-1
0
1
)]
Here, the components are associated with the angular channels of the representation.
The size of o3_mu
is \(2l + 1\), where \(l\) is the current o3_lambda
of the block. Here, its dimension is three because we are looking at the
o3_lambda=1
block. You may have noticed that the return value of the last call is
a list
of metatensor.Labels
and not a single Labels
instance.
The reason is that a block can have several component dimensions as we will see below
for the gradients.
The last dimension represents the radial channels of spherical expansion. These are
described in the metatensor.TensorBlock.properties
:
print(block.properties)
# There is only one column here, named **n**, and indicating which of the radial channel
# we are looking at, from 0 to ``max_radial``.
Labels(
n
0
1
...
7
8
)
The descriptor values¶
After looking at the metadata we can investigate the actual data of the representation in more details
print(block.values[0, 0, :])
[-0.00237693 0.01533393 -0.03954346 -0.20278318 -0.05190685 -0.14738365
-0.17762571 -0.05567855 -0.01701108]
By using [0, 0, :]
we selected the first hydrogen and the first m
channel. As
you the output shows the values are floating point numbers, representing the
coefficients of the spherical expansion.
Featomic is also able to process more than one structure within one function call. You can process a whole dataset with
descriptor_full = calculator.compute(structures)
block_full = descriptor_full.block(0)
print(block_full.values.shape)
(420, 1, 9)
Now, the 0th block of the metatensor.TensorMap
contains not eight but
420 entries in the first dimensions. This reflects the fact that in total we
have 420 hydrogen atoms in the whole dataset.
If you want to use another calculator instead of
featomic.calculators.SphericalExpansion
shown here check out the
Reference guides section.
Computing gradients¶
Additionally, featomic is also able to calculate gradients on top of the
values. Gradients are useful for constructing an ML potential and running
simulations. For example gradients
of the representation with respect to
atomic positions can be calculated by setting the gradients
parameter of
the featomic.calculators.CalculatorBase.compute()
method to
["positions"]
.
descriptor = calculator.compute(structure_0, gradients=["positions"])
block = descriptor.block(o3_lambda=1, center_type=1, neighbor_type=1)
gradient = block.gradient("positions")
print(gradient.values.shape)
(52, 3, 3, 9)
The calculated descriptor contains the values and in each block the associated
position gradients as another metatensor.TensorBlock
, containing both
gradient data and associated metadata.
Compared to the features where we found three dimensions, gradients have four. Again
the first is called samples
and the properties
. The dimensions between the
sample and property dimensions are denoted by components
.
Looking at the shape in more detail we find that we have 52 samples, which is much
more compared to features where we only have eight samples. This arises from the fact
that we compute positions gradient for each samples in the features with respect to
the position of other atoms. Since we are looking at the block with neighbor_type
= 1
, only hydrogen neighbors will contribute to these gradients.
The samples shows this in detail:
print(gradient.samples.print(max_entries=10))
sample system atom
0 0 12
0 0 13
0 0 14
0 0 15
0 0 16
...
7 0 15
7 0 16
7 0 17
7 0 18
7 0 19
Here, the sample column indicate which of the sample of the features we are taking
the gradients of; and (system, atom) indicate which atom’s positions we are
taking the gradient with respect to. For example, looking at the gradient sample at
index 3
:
print("taking the gradient of", block.samples[gradient.samples[3]["sample"]])
print(
"with respect to the position of system =",
gradient.samples[3]["system"],
"and atom =",
gradient.samples[3]["atom"],
)
taking the gradient of LabelsEntry(system=0, atom=12)
with respect to the position of system = 0 and atom = 15
Now looking at the components:
print(gradient.components)
[Labels(
xyz
0
1
2
), Labels(
o3_mu
-1
0
1
)]
we find two of them. Besides the o3_mu
component that is also present in the
features position gradients also have a component indicating the direction of the
gradient vector.
Finally, the properties are the same as the features
print(gradient.properties)
Labels(
n
0
1
...
7
8
)
Featomic can also calculate gradients with respect to the strain (i.e. the virial).
For this, you have to add "strain"
to the list parsed to the gradients
parameter of the featomic.calculators.CalculatorBase.compute()
method.
Strain gradients/virial are useful when computing the stress and the pressure.
If you want to know about the effect of changing hypers take a look at the next tutorial. If you want to solve an explicit problem our How-to guides might help you.