.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/first-calculation.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_first-calculation.py: .. _userdoc-tutorials-get-started: First descriptor computation ============================ This is an introduction to the featomic interface using a molecular crystals dataset using the Python interface. If you are interested in another programming language we recommend you first follow this tutorial and afterward take a look at the how-to guide on :ref:`userdoc-how-to-computing-soap`. The dataset ----------- The atomic configurations used in our documentation are a small subset of the `ShiftML2 dataset `_ containing molecular crystals. There are four crystals - one with each of the elements [hydrogen, carbon], [hydrogen, carbon, nitrogen, oxygen], [hydrogen, carbon, nitrogen], or [hydrogen, carbon, oxygen]. Each crystal has 10 structures, also denoted by frames, attributed to it. The first frame of each crystal structure is the geometry-optimized frame. The following 9 frames contain atoms that are slightly displaced from the geometry-optimized frame. You can obtain the dataset from our :download:`website <../../static/dataset.xyz>`. .. GENERATED FROM PYTHON SOURCE LINES 27-30 We will start by importing all the required packages: the classic numpy; chemfiles to load data, and featomic to compute representations. Afterward we will load the dataset using chemfiles. .. GENERATED FROM PYTHON SOURCE LINES 31-43 .. code-block:: Python import chemfiles import numpy as np from featomic import SphericalExpansion with chemfiles.Trajectory("dataset.xyz") as trajectory: frames = [f for f in trajectory] print(f"The dataset contains {len(frames)} frames.") .. rst-class:: sphx-glr-script-out .. code-block:: none The dataset contains 40 frames. .. GENERATED FROM PYTHON SOURCE LINES 44-50 We will not explain here how to use chemfiles in detail, as we only use a few functions. Briefly, :class:`chemfiles.Trajectory` loads structure data in a format featomic can use. If you want to learn more about the possibilities take a look at the `chemfiles documentation `_. Let us now take a look at the first frame of the dataset. .. GENERATED FROM PYTHON SOURCE LINES 51-56 .. code-block:: Python frame0 = frames[0] print(frame0) .. rst-class:: sphx-glr-script-out .. code-block:: none Frame with 20 atoms .. GENERATED FROM PYTHON SOURCE LINES 57-59 With ``frame0.atoms`` we get a list of the atoms that make up frame zero. The ``name`` attribute gives us the name of the specified atom. .. GENERATED FROM PYTHON SOURCE LINES 60-71 .. code-block:: Python elements, counts = np.unique([atom.name for atom in frame0.atoms], return_counts=True) print( f"The first frame contains " f"{counts[0]} {elements[0]}-atoms, " f"{counts[1]} {elements[1]}-atoms, " f"{counts[2]} {elements[2]}-atoms and " f"{counts[3]} {elements[3]}-atoms." ) .. rst-class:: sphx-glr-script-out .. code-block:: none The first frame contains 4 C-atoms, 8 H-atoms, 4 N-atoms and 4 O-atoms. .. GENERATED FROM PYTHON SOURCE LINES 72-86 Calculate a descriptor ---------------------- We will now calculate an atomic descriptor for this structure using the SOAP spherical expansion as introduced by `Bartók, Kondor, and Csányi `_. To do so we define below a set of parameters telling featomic how the spherical expansion should be calculated. These parameters are also called hyper parameters since they are parameters of the representation, in opposition to parameters of machine learning models. Hyper parameters are a crucial part of calculating descriptors. Poorly selected hyper parameters will lead to a poor description of your dataset as discussed in the `literature `_. .. GENERATED FROM PYTHON SOURCE LINES 87-110 .. code-block:: Python cutoff = { "radius": 4.5, "smoothing": {"type": "ShiftedCosine", "width": 0.5}, } density = { "type": "Gaussian", "width": 0.3, "radial_scaling": { "type": "Willatt2018", "scale": 2.0, "rate": 1.0, "exponent": 4, }, } basis = { "type": "TensorProduct", "max_angular": 5, "radial": {"type": "Gto", "max_radial": 8}, } .. GENERATED FROM PYTHON SOURCE LINES 111-115 After we set the hyper parameters we initialize a :class:`featomic.calculators.SphericalExpansion` object with hyper parameters defined above and run the :py:func:`featomic.calculators.CalculatorBase.compute()` method. .. GENERATED FROM PYTHON SOURCE LINES 116-121 .. code-block:: Python calculator = SphericalExpansion(cutoff=cutoff, density=density, basis=basis) descriptor0 = calculator.compute(frame0) print(type(descriptor0)) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 122-129 The descriptor format is a :class:`metatensor.TensorMap` object. Metatensor is like numpy for storing representations of atomistic ML data. Extensive details on the metatensor are covered in the `corresponding documentation `_. We will now have a look at how the data is stored inside :class:`metatensor.TensorMap` objects. .. GENERATED FROM PYTHON SOURCE LINES 130-134 .. code-block:: Python print(descriptor0) .. rst-class:: sphx-glr-script-out .. code-block:: none TensorMap with 96 blocks keys: o3_lambda o3_sigma center_type neighbor_type 0 1 1 1 1 1 1 1 2 1 1 1 3 1 1 1 4 1 1 1 5 1 1 1 0 1 1 6 1 1 1 6 2 1 1 6 3 1 1 6 4 1 1 6 5 1 1 6 0 1 1 7 1 1 1 7 2 1 1 7 3 1 1 7 4 1 1 7 5 1 1 7 0 1 1 8 1 1 1 8 2 1 1 8 3 1 1 8 4 1 1 8 5 1 1 8 0 1 6 1 1 1 6 1 2 1 6 1 3 1 6 1 4 1 6 1 5 1 6 1 0 1 6 6 1 1 6 6 2 1 6 6 3 1 6 6 4 1 6 6 5 1 6 6 0 1 6 7 1 1 6 7 2 1 6 7 3 1 6 7 4 1 6 7 5 1 6 7 0 1 6 8 1 1 6 8 2 1 6 8 3 1 6 8 4 1 6 8 5 1 6 8 0 1 7 1 1 1 7 1 2 1 7 1 3 1 7 1 4 1 7 1 5 1 7 1 0 1 7 6 1 1 7 6 2 1 7 6 3 1 7 6 4 1 7 6 5 1 7 6 0 1 7 7 1 1 7 7 2 1 7 7 3 1 7 7 4 1 7 7 5 1 7 7 0 1 7 8 1 1 7 8 2 1 7 8 3 1 7 8 4 1 7 8 5 1 7 8 0 1 8 1 1 1 8 1 2 1 8 1 3 1 8 1 4 1 8 1 5 1 8 1 0 1 8 6 1 1 8 6 2 1 8 6 3 1 8 6 4 1 8 6 5 1 8 6 0 1 8 7 1 1 8 7 2 1 8 7 3 1 8 7 4 1 8 7 5 1 8 7 0 1 8 8 1 1 8 8 2 1 8 8 3 1 8 8 4 1 8 8 5 1 8 8 .. GENERATED FROM PYTHON SOURCE LINES 135-146 The :class:`metatensor.TensorMap` is structured in several instances of an :class:`metatensor.TensorBlock`. To distinguish the block each block is associated with a unique key. For the current example, we have one block for each angular channel labeled by ``o3_lambda``, the central atom type ``center_type`` and neighbor atom type labeled by ``neighbor_type``. Different atomic types are represented using their atomic number, e.g. 1 for hydrogen, 6 for carbon, etc. To summarize, this descriptor contains 112 blocks covering all combinations of the angular channels of the central and neighbor atom types in our dataset. Let us take a look at the second block (at index 1) in detail. This block contains the descriptor for the :math:`l=1` angular channel for hydrogen-hydrogen pairs. .. GENERATED FROM PYTHON SOURCE LINES 147-152 .. code-block:: Python block = descriptor0.block(1) print(descriptor0.keys[1]) .. rst-class:: sphx-glr-script-out .. code-block:: none LabelsEntry(o3_lambda=1, o3_sigma=1, center_type=1, neighbor_type=1) .. GENERATED FROM PYTHON SOURCE LINES 153-160 The descriptor values --------------------- The values of the representation are stored as an array. Each entry in this array also has associated unique metadata as each block. For the spherical expansion calculator used in this tutorial the values have three dimensions which we can verify from the ``.shape`` attribute. .. GENERATED FROM PYTHON SOURCE LINES 161-165 .. code-block:: Python print(block.values.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (8, 3, 9) .. GENERATED FROM PYTHON SOURCE LINES 166-175 The descriptor values --------------------- The first dimension is denoted by the `samples`, the intermediate dimension by `components`, and the last dimension by `properties`. The "sample dimension" has a length of eight because we have eight hydrogen atoms in the first frame. We can reveal more detailed metadata information about the sample-dimension printing of the :py:attr:`metatensor.TensorBlock.samples` attribute of the block .. GENERATED FROM PYTHON SOURCE LINES 176-179 .. code-block:: Python print(block.samples) .. rst-class:: sphx-glr-script-out .. code-block:: none Labels( system atom 0 12 0 13 ... 0 18 0 19 ) .. GENERATED FROM PYTHON SOURCE LINES 180-182 The result is an :class:`metatensor.TensorMap` instance. It contains in total eight tuples each with two values. The tuple values are named as follows .. GENERATED FROM PYTHON SOURCE LINES 183-186 .. code-block:: Python print(block.samples.names) .. rst-class:: sphx-glr-script-out .. code-block:: none ['system', 'atom'] .. GENERATED FROM PYTHON SOURCE LINES 187-193 Meaning that the first entry of each tuple indicates the _structure_, which is 0 for all because we only computed the representation of a single frame. The second entry of each tuple refers to the index of the _center_ atom. We can do a similar investigation for the second dimension: the :py:attr:`metatensor.TensorBlock.components`. .. GENERATED FROM PYTHON SOURCE LINES 194-197 .. code-block:: Python print(block.components) .. rst-class:: sphx-glr-script-out .. code-block:: none [Labels( o3_mu -1 0 1 )] .. GENERATED FROM PYTHON SOURCE LINES 198-209 Here, the components are associated with the angular channels of the representation. The size of ``o3_mu`` is :math:`2l + 1`, where :math:`l` is the current ``o3_lambda`` of the block. Here, its dimension is three because we are looking at the ``o3_lambda=1`` block. You may have noticed that the return value of the last call is a :class:`list` of :class:`metatensor.Labels` and not a single ``Labels`` instance. The reason is that a block can have several component dimensions as we will see below for the gradients. The last value represents the number of radial channels. For the :py:attr:`metatensor.TensorBlock.properties` dimension we find an object .. GENERATED FROM PYTHON SOURCE LINES 210-213 .. code-block:: Python print(block.properties) .. rst-class:: sphx-glr-script-out .. code-block:: none Labels( n 0 1 ... 7 8 ) .. GENERATED FROM PYTHON SOURCE LINES 214-215 containing a tuple of only one value ranging from 0 to 8. The name of this entry is .. GENERATED FROM PYTHON SOURCE LINES 216-219 .. code-block:: Python print(block.properties.names) .. rst-class:: sphx-glr-script-out .. code-block:: none ['n'] .. GENERATED FROM PYTHON SOURCE LINES 220-225 and denoting the radial channels. The range results from our choice of ``max_radial = 9`` in the hyper parameters above. After looking at the metadata we can investigate the actual data of the representation in more details .. GENERATED FROM PYTHON SOURCE LINES 226-229 .. code-block:: Python print(block.values[0, 0, :]) .. rst-class:: sphx-glr-script-out .. code-block:: none [-0.00237693 0.01533393 -0.03954346 -0.20278318 -0.05190685 -0.14738365 -0.17762571 -0.05567855 -0.01701108] .. GENERATED FROM PYTHON SOURCE LINES 230-237 By using ``[0, 0, :]`` we selected the first hydrogen and the first ``m`` channel. As you the output shows the values are floating point numbers between ``-1.0`` and ``1.0``. Values in this range are reasonable and can be directly used as input for a machine learning algorithm. Featomic is also able to process more than one structure within one function call. You can process a whole dataset with .. GENERATED FROM PYTHON SOURCE LINES 238-244 .. code-block:: Python descriptor_full = calculator.compute(frames) block_full = descriptor_full.block(0) print(block_full.values.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (420, 1, 9) .. GENERATED FROM PYTHON SOURCE LINES 245-262 Now, the 0th block of the :class:`metatensor.TensorMap` contains not eight but 420 entries in the first dimensions. This reflects the fact that in total we have 420 hydrogen atoms in the whole dataset. If you want to use another calculator instead of :class:`featomic.calculators.SphericalExpansion` shown here check out the :ref:`userdoc-references` section. Computing gradients ------------------- Additionally, featomic is also able to calculate gradients on top of the values. Gradients are useful for constructing an ML potential and running simulations. For example ``gradients`` of the representation with respect to atomic positions can be calculated by setting the ``gradients`` parameter of the :py:func:`featomic.calculators.CalculatorBase.compute()` method to ``["positions"]``. .. GENERATED FROM PYTHON SOURCE LINES 263-271 .. code-block:: Python descriptor_gradients = calculator.compute(frame0, gradients=["positions"]) block_gradients = descriptor_gradients.block(0) gradient_position = block_gradients.gradient("positions") print(gradient_position.values.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (52, 3, 1, 9) .. GENERATED FROM PYTHON SOURCE LINES 272-288 The calculated descriptor contains the values and in each block the associated position gradients as an :class:`metatensor.block.Gradient` instance. The actual values are stored in the ``data`` attribute. Similar to the features the gradient data also has associated metadata. But, compared to the values were we found three dimensions, and gradients have four. Again the first is called `samples` and the `properties`. The dimensions between the sample and property dimensions are denoted by `components`. Looking at the shape in more detail we find that we have 52 samples, which is much more compared to features where we only have eight samples. This arises from the fact that we calculate the position gradient for each pair in the structure. For our selected block these are all hydrogen-hydrogen pairs. Naively one would come up with ``8 * 8 = 64`` samples, but featomic already ignores pairs that are outside of the cutoff radius. Their position gradient is always zero. The :attr:`metatensor.block.Gradient.samples` attribute shows this in detail. .. GENERATED FROM PYTHON SOURCE LINES 289-292 .. code-block:: Python print(gradient_position.samples) .. rst-class:: sphx-glr-script-out .. code-block:: none Labels( sample system atom 0 0 12 0 0 13 ... 7 0 18 7 0 19 ) .. GENERATED FROM PYTHON SOURCE LINES 293-294 Note that we have a tuple of three with the names .. GENERATED FROM PYTHON SOURCE LINES 295-298 .. code-block:: Python print(gradient_position.samples.names) .. rst-class:: sphx-glr-script-out .. code-block:: none ['sample', 'system', 'atom'] .. GENERATED FROM PYTHON SOURCE LINES 299-303 In the above output of the Labels instance for example the `(2, 0, 17)` entry is missing indicating that this pair is outside of the cutoff. Now looking at the :attr:`metatensor.block.Gradient.components` .. GENERATED FROM PYTHON SOURCE LINES 304-307 .. code-block:: Python print(gradient_position.components) .. rst-class:: sphx-glr-script-out .. code-block:: none [Labels( xyz 0 1 2 ), Labels( o3_mu 0 )] .. GENERATED FROM PYTHON SOURCE LINES 308-314 we find two of them. Besides the `o3_mu` component that is also present in the features position gradients also have a component indicating the direction of the gradient vector. Finally, the :attr:`metatensor.block.Gradient.properties` dimension is the same as for the values .. GENERATED FROM PYTHON SOURCE LINES 315-318 .. code-block:: Python print(gradient_position.properties) .. rst-class:: sphx-glr-script-out .. code-block:: none Labels( n 0 1 ... 7 8 ) .. GENERATED FROM PYTHON SOURCE LINES 319-327 Featomic can also calculate gradients with respect to the strain (i.e. the virial). For this, you have to add ``"strain"`` to the list parsed to the ``gradients`` parameter of the :py:func:`featomic.calculators.CalculatorBase.compute()` method. Strain gradients/virial are useful when computing the stress and the pressure. If you want to know about the effect of changing hypers take a look at the next tutorial. If you want to solve an explicit problem our :ref:`userdoc-how-to` might help you. .. _sphx_glr_download_examples_first-calculation.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: first-calculation.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: first-calculation.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: first-calculation.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_