First descriptor computation

This is an introduction to the featomic interface using a molecular crystals dataset using the Python interface. If you are interested in another programming language we recommend you first follow this tutorial and afterward take a look at the how-to guide on Computing SOAP features.

The dataset

The atomic configurations used in our documentation are a small subset of the ShiftML2 dataset containing molecular crystals. There are four crystals - one with each of the elements [H, C], [H, C, N, O], [H, C, N], and [H, C, O]. For each crystal, we have 10 structures: the first one is geometry-optimized. The following 9 contain atoms that are slightly displaced from the geometry-optimized one. You can obtain the dataset from our website.

We will start by importing all the required packages: the classic numpy; chemfiles to load data, and featomic to compute representations. Afterward we will load the dataset using chemfiles.

import chemfiles
import numpy as np

from featomic import SphericalExpansion


with chemfiles.Trajectory("dataset.xyz") as trajectory:
    structures = [s for s in trajectory]

print(f"The dataset contains {len(structures)} structures.")
The dataset contains 40 structures.

We will not explain here how to use chemfiles in detail, as we only use a few functions. Briefly, chemfiles.Trajectory loads structure data in a format featomic can use. If you want to learn more about the possibilities take a look at the chemfiles documentation.

Let us now take a look at the first structure of the dataset.

Frame with 20 atoms

With structure_0.atoms we get a list of the atoms that make up the first structure. The name attribute gives us the name of the specified atom.

elements, counts = np.unique(
    [atom.name for atom in structure_0.atoms], return_counts=True
)

print(
    f"The first structure contains "
    f"{counts[0]} {elements[0]}-atoms, "
    f"{counts[1]} {elements[1]}-atoms, "
    f"{counts[2]} {elements[2]}-atoms and "
    f"{counts[3]} {elements[3]}-atoms."
)
The first structure contains 4 C-atoms, 8 H-atoms, 4 N-atoms and 4 O-atoms.

Calculate a descriptor

We will now calculate an atomic descriptor for this structure using the SOAP spherical expansion as introduced by Bartók, Kondor, and Csányi.

To do so we define below a set of parameters telling featomic how the spherical expansion should be calculated. These parameters are also called hyper parameters since they are parameters of the representation, in opposition to parameters of machine learning models. Hyper parameters are a crucial part of calculating descriptors. Poorly selected hyper parameters will lead to a poor description of your dataset as discussed in the literature.

cutoff = {
    "radius": 4.5,
    "smoothing": {"type": "ShiftedCosine", "width": 0.5},
}

density = {
    "type": "Gaussian",
    "width": 0.3,
    "radial_scaling": {
        "type": "Willatt2018",
        "scale": 2.0,
        "rate": 1.0,
        "exponent": 4,
    },
}

basis = {
    "type": "TensorProduct",
    "max_angular": 5,
    "radial": {"type": "Gto", "max_radial": 8},
}

After we set the hyper parameters we initialize a featomic.calculators.SphericalExpansion object with hyper parameters defined above and run the featomic.calculators.CalculatorBase.compute() method.

calculator = SphericalExpansion(cutoff=cutoff, density=density, basis=basis)
descriptor_0 = calculator.compute(structure_0)
print(type(descriptor_0))
<class 'metatensor.tensor.TensorMap'>

The descriptor format is a metatensor.TensorMap object. Metatensor is like numpy for storing representations of atomistic ML data. Extensive details on the metatensor are covered in the corresponding documentation.

We will now have a look at how the data is stored inside metatensor.TensorMap objects.

print(descriptor_0)
TensorMap with 96 blocks
keys: o3_lambda  o3_sigma  center_type  neighbor_type
          0         1           1             1
          1         1           1             1
          2         1           1             1
          3         1           1             1
          4         1           1             1
          5         1           1             1
          0         1           1             6
          1         1           1             6
          2         1           1             6
          3         1           1             6
          4         1           1             6
          5         1           1             6
          0         1           1             7
          1         1           1             7
          2         1           1             7
          3         1           1             7
          4         1           1             7
          5         1           1             7
          0         1           1             8
          1         1           1             8
          2         1           1             8
          3         1           1             8
          4         1           1             8
          5         1           1             8
          0         1           6             1
          1         1           6             1
          2         1           6             1
          3         1           6             1
          4         1           6             1
          5         1           6             1
          0         1           6             6
          1         1           6             6
          2         1           6             6
          3         1           6             6
          4         1           6             6
          5         1           6             6
          0         1           6             7
          1         1           6             7
          2         1           6             7
          3         1           6             7
          4         1           6             7
          5         1           6             7
          0         1           6             8
          1         1           6             8
          2         1           6             8
          3         1           6             8
          4         1           6             8
          5         1           6             8
          0         1           7             1
          1         1           7             1
          2         1           7             1
          3         1           7             1
          4         1           7             1
          5         1           7             1
          0         1           7             6
          1         1           7             6
          2         1           7             6
          3         1           7             6
          4         1           7             6
          5         1           7             6
          0         1           7             7
          1         1           7             7
          2         1           7             7
          3         1           7             7
          4         1           7             7
          5         1           7             7
          0         1           7             8
          1         1           7             8
          2         1           7             8
          3         1           7             8
          4         1           7             8
          5         1           7             8
          0         1           8             1
          1         1           8             1
          2         1           8             1
          3         1           8             1
          4         1           8             1
          5         1           8             1
          0         1           8             6
          1         1           8             6
          2         1           8             6
          3         1           8             6
          4         1           8             6
          5         1           8             6
          0         1           8             7
          1         1           8             7
          2         1           8             7
          3         1           8             7
          4         1           8             7
          5         1           8             7
          0         1           8             8
          1         1           8             8
          2         1           8             8
          3         1           8             8
          4         1           8             8
          5         1           8             8

The metatensor.TensorMap contains multiple metatensor.TensorBlock. To distinguish them each block is associated with a unique key. For this example, we have one block for each angular channel labeled by o3_lambda (o3_sigma is always equal to 1 since we only consider proper tensors, not pseudo-tensors), the central atom type center_type and neighbor atom type labeled by neighbor_type. Different atomic types are represented using their atomic number, e.g. 1 for hydrogen, 6 for carbon, etc. To summarize, this descriptor contains 96 blocks covering all combinations of the angular channels, central atom type, and neighbor atom types in our dataset.

Let us take a look at the second block (at index 1) in detail. This block contains the descriptor for the \(\lambda = 1\) angular channel for hydrogen-hydrogen pairs.

LabelsEntry(o3_lambda=1, o3_sigma=1, center_type=1, neighbor_type=1)

Metadata about the descriptor

The values of the representation are inside the blocks, in the block.values array. Each entry in this array is described by metadata carried by the block. For the spherical expansion calculator used in this tutorial the values have three dimensions which we can verify from the .shape attribute.

(8, 3, 9)

The first dimension is described by the block.samples, the intermediate dimension by block.components, and the last dimension by block.properties. The sample dimension has a length of eight because we have eight hydrogen atoms in the first structure. We can reveal more detailed metadata information about the sample dimension printing of the metatensor.TensorBlock.samples attribute of the block

Labels(
    system  atom
      0      12
      0      13
        ...
      0      18
      0      19
)

In these labels, the first column indicates the structure, which is 0 for all because we only computed the representation of a single structure. The second entry of each tuple refers to the index of the central atom.

We can do a similar investigation for the second dimension of the value array, descrbied by metatensor.TensorBlock.components.

[Labels(
    o3_mu
     -1
      0
      1
)]

Here, the components are associated with the angular channels of the representation. The size of o3_mu is \(2l + 1\), where \(l\) is the current o3_lambda of the block. Here, its dimension is three because we are looking at the o3_lambda=1 block. You may have noticed that the return value of the last call is a list of metatensor.Labels and not a single Labels instance. The reason is that a block can have several component dimensions as we will see below for the gradients.

The last dimension represents the radial channels of spherical expansion. These are described in the metatensor.TensorBlock.properties:

print(block.properties)

# There is only one column here, named **n**, and indicating which of the radial channel
# we are looking at, from 0 to ``max_radial``.
Labels(
    n
    0
    1
     ...
    7
    8
)

The descriptor values

After looking at the metadata we can investigate the actual data of the representation in more details

print(block.values[0, 0, :])
[-0.00237693  0.01533393 -0.03954346 -0.20278318 -0.05190685 -0.14738365
 -0.17762571 -0.05567855 -0.01701108]

By using [0, 0, :] we selected the first hydrogen and the first m channel. As you the output shows the values are floating point numbers, representing the coefficients of the spherical expansion.

Featomic is also able to process more than one structure within one function call. You can process a whole dataset with

(420, 1, 9)

Now, the 0th block of the metatensor.TensorMap contains not eight but 420 entries in the first dimensions. This reflects the fact that in total we have 420 hydrogen atoms in the whole dataset.

If you want to use another calculator instead of featomic.calculators.SphericalExpansion shown here check out the Reference guides section.

Computing gradients

Additionally, featomic is also able to calculate gradients on top of the values. Gradients are useful for constructing an ML potential and running simulations. For example gradients of the representation with respect to atomic positions can be calculated by setting the gradients parameter of the featomic.calculators.CalculatorBase.compute() method to ["positions"].

descriptor = calculator.compute(structure_0, gradients=["positions"])

block = descriptor.block(o3_lambda=1, center_type=1, neighbor_type=1)
gradient = block.gradient("positions")

print(gradient.values.shape)
(52, 3, 3, 9)

The calculated descriptor contains the values and in each block the associated position gradients as another metatensor.TensorBlock, containing both gradient data and associated metadata.

Compared to the features where we found three dimensions, gradients have four. Again the first is called samples and the properties. The dimensions between the sample and property dimensions are denoted by components.

Looking at the shape in more detail we find that we have 52 samples, which is much more compared to features where we only have eight samples. This arises from the fact that we compute positions gradient for each samples in the features with respect to the position of other atoms. Since we are looking at the block with neighbor_type = 1, only hydrogen neighbors will contribute to these gradients.

The samples shows this in detail:

print(gradient.samples.print(max_entries=10))
sample  system  atom
  0       0      12
  0       0      13
  0       0      14
  0       0      15
  0       0      16
        ...
  7       0      15
  7       0      16
  7       0      17
  7       0      18
  7       0      19

Here, the sample column indicate which of the sample of the features we are taking the gradients of; and (system, atom) indicate which atom’s positions we are taking the gradient with respect to. For example, looking at the gradient sample at index 3:

print("taking the gradient of", block.samples[gradient.samples[3]["sample"]])
print(
    "with respect to the position of system =",
    gradient.samples[3]["system"],
    "and atom =",
    gradient.samples[3]["atom"],
)
taking the gradient of LabelsEntry(system=0, atom=12)
with respect to the position of system = 0 and atom = 15

Now looking at the components:

[Labels(
    xyz
     0
     1
     2
), Labels(
    o3_mu
     -1
      0
      1
)]

we find two of them. Besides the o3_mu component that is also present in the features position gradients also have a component indicating the direction of the gradient vector.

Finally, the properties are the same as the features

Labels(
    n
    0
    1
     ...
    7
    8
)

Featomic can also calculate gradients with respect to the strain (i.e. the virial). For this, you have to add "strain" to the list parsed to the gradients parameter of the featomic.calculators.CalculatorBase.compute() method. Strain gradients/virial are useful when computing the stress and the pressure.

If you want to know about the effect of changing hypers take a look at the next tutorial. If you want to solve an explicit problem our How-to guides might help you.

Gallery generated by Sphinx-Gallery