.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/first-calculation.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_first-calculation.py: .. _userdoc-tutorials-get-started: First descriptor computation ============================ This is an introduction to the featomic interface using a molecular crystals dataset using the Python interface. If you are interested in another programming language we recommend you first follow this tutorial and afterward take a look at the how-to guide on :ref:`userdoc-how-to-computing-soap`. The dataset ----------- The atomic configurations used in our documentation are a small subset of the `ShiftML2 dataset `_ containing molecular crystals. There are four crystals - one with each of the elements ``[H, C]``, ``[H, C, N, O]``, ``[H, C, N]``, and ``[H, C, O]``. For each crystal, we have 10 structures: the first one is geometry-optimized. The following 9 contain atoms that are slightly displaced from the geometry-optimized one. You can obtain the dataset from our :download:`website <../../static/dataset.xyz>`. .. GENERATED FROM PYTHON SOURCE LINES 25-28 We will start by importing all the required packages: the classic numpy; chemfiles to load data, and featomic to compute representations. Afterward we will load the dataset using chemfiles. .. GENERATED FROM PYTHON SOURCE LINES 29-41 .. code-block:: Python import chemfiles import numpy as np from featomic import SphericalExpansion with chemfiles.Trajectory("dataset.xyz") as trajectory: structures = [s for s in trajectory] print(f"The dataset contains {len(structures)} structures.") .. rst-class:: sphx-glr-script-out .. code-block:: none The dataset contains 40 structures. .. GENERATED FROM PYTHON SOURCE LINES 42-48 We will not explain here how to use chemfiles in detail, as we only use a few functions. Briefly, :class:`chemfiles.Trajectory` loads structure data in a format featomic can use. If you want to learn more about the possibilities take a look at the `chemfiles documentation `_. Let us now take a look at the first structure of the dataset. .. GENERATED FROM PYTHON SOURCE LINES 49-54 .. code-block:: Python structure_0 = structures[0] print(structure_0) .. rst-class:: sphx-glr-script-out .. code-block:: none Frame with 20 atoms .. GENERATED FROM PYTHON SOURCE LINES 55-57 With ``structure_0.atoms`` we get a list of the atoms that make up the first structure. The ``name`` attribute gives us the name of the specified atom. .. GENERATED FROM PYTHON SOURCE LINES 58-71 .. code-block:: Python elements, counts = np.unique( [atom.name for atom in structure_0.atoms], return_counts=True ) print( f"The first structure contains " f"{counts[0]} {elements[0]}-atoms, " f"{counts[1]} {elements[1]}-atoms, " f"{counts[2]} {elements[2]}-atoms and " f"{counts[3]} {elements[3]}-atoms." ) .. rst-class:: sphx-glr-script-out .. code-block:: none The first structure contains 4 C-atoms, 8 H-atoms, 4 N-atoms and 4 O-atoms. .. GENERATED FROM PYTHON SOURCE LINES 72-85 Calculate a descriptor ---------------------- We will now calculate an atomic descriptor for this structure using the SOAP spherical expansion as introduced by `Bartók, Kondor, and Csányi `_. To do so we define below a set of parameters telling featomic how the spherical expansion should be calculated. These parameters are also called hyper parameters since they are parameters of the representation, in opposition to parameters of machine learning models. Hyper parameters are a crucial part of calculating descriptors. Poorly selected hyper parameters will lead to a poor description of your dataset as discussed in the `literature `_. .. GENERATED FROM PYTHON SOURCE LINES 86-109 .. code-block:: Python cutoff = { "radius": 4.5, "smoothing": {"type": "ShiftedCosine", "width": 0.5}, } density = { "type": "Gaussian", "width": 0.3, "radial_scaling": { "type": "Willatt2018", "scale": 2.0, "rate": 1.0, "exponent": 4, }, } basis = { "type": "TensorProduct", "max_angular": 5, "radial": {"type": "Gto", "max_radial": 8}, } .. GENERATED FROM PYTHON SOURCE LINES 110-113 After we set the hyper parameters we initialize a :class:`featomic.calculators.SphericalExpansion` object with hyper parameters defined above and run the :py:func:`featomic.calculators.CalculatorBase.compute()` method. .. GENERATED FROM PYTHON SOURCE LINES 114-119 .. code-block:: Python calculator = SphericalExpansion(cutoff=cutoff, density=density, basis=basis) descriptor_0 = calculator.compute(structure_0) print(type(descriptor_0)) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 120-127 The descriptor format is a :class:`metatensor.TensorMap` object. Metatensor is like numpy for storing representations of atomistic ML data. Extensive details on the metatensor are covered in the `corresponding documentation `_. We will now have a look at how the data is stored inside :class:`metatensor.TensorMap` objects. .. GENERATED FROM PYTHON SOURCE LINES 128-132 .. code-block:: Python print(descriptor_0) .. rst-class:: sphx-glr-script-out .. code-block:: none TensorMap with 96 blocks keys: o3_lambda o3_sigma center_type neighbor_type 0 1 1 1 1 1 1 1 2 1 1 1 3 1 1 1 4 1 1 1 5 1 1 1 0 1 1 6 1 1 1 6 2 1 1 6 3 1 1 6 4 1 1 6 5 1 1 6 0 1 1 7 1 1 1 7 2 1 1 7 3 1 1 7 4 1 1 7 5 1 1 7 0 1 1 8 1 1 1 8 2 1 1 8 3 1 1 8 4 1 1 8 5 1 1 8 0 1 6 1 1 1 6 1 2 1 6 1 3 1 6 1 4 1 6 1 5 1 6 1 0 1 6 6 1 1 6 6 2 1 6 6 3 1 6 6 4 1 6 6 5 1 6 6 0 1 6 7 1 1 6 7 2 1 6 7 3 1 6 7 4 1 6 7 5 1 6 7 0 1 6 8 1 1 6 8 2 1 6 8 3 1 6 8 4 1 6 8 5 1 6 8 0 1 7 1 1 1 7 1 2 1 7 1 3 1 7 1 4 1 7 1 5 1 7 1 0 1 7 6 1 1 7 6 2 1 7 6 3 1 7 6 4 1 7 6 5 1 7 6 0 1 7 7 1 1 7 7 2 1 7 7 3 1 7 7 4 1 7 7 5 1 7 7 0 1 7 8 1 1 7 8 2 1 7 8 3 1 7 8 4 1 7 8 5 1 7 8 0 1 8 1 1 1 8 1 2 1 8 1 3 1 8 1 4 1 8 1 5 1 8 1 0 1 8 6 1 1 8 6 2 1 8 6 3 1 8 6 4 1 8 6 5 1 8 6 0 1 8 7 1 1 8 7 2 1 8 7 3 1 8 7 4 1 8 7 5 1 8 7 0 1 8 8 1 1 8 8 2 1 8 8 3 1 8 8 4 1 8 8 5 1 8 8 .. GENERATED FROM PYTHON SOURCE LINES 133-145 The :class:`metatensor.TensorMap` contains multiple :class:`metatensor.TensorBlock`. To distinguish them each block is associated with a unique key. For this example, we have one block for each angular channel labeled by ``o3_lambda`` (``o3_sigma`` is always equal to 1 since we only consider proper tensors, not pseudo-tensors), the central atom type ``center_type`` and neighbor atom type labeled by ``neighbor_type``. Different atomic types are represented using their atomic number, e.g. 1 for hydrogen, 6 for carbon, etc. To summarize, this descriptor contains 96 blocks covering all combinations of the angular channels, central atom type, and neighbor atom types in our dataset. Let us take a look at the second block (at index 1) in detail. This block contains the descriptor for the :math:`\lambda = 1` angular channel for hydrogen-hydrogen pairs. .. GENERATED FROM PYTHON SOURCE LINES 146-151 .. code-block:: Python block = descriptor_0.block(1) print(descriptor_0.keys[1]) .. rst-class:: sphx-glr-script-out .. code-block:: none LabelsEntry(o3_lambda=1, o3_sigma=1, center_type=1, neighbor_type=1) .. GENERATED FROM PYTHON SOURCE LINES 152-159 Metadata about the descriptor ----------------------------- The values of the representation are inside the blocks, in the ``block.values`` array. Each entry in this array is described by metadata carried by the block. For the spherical expansion calculator used in this tutorial the values have three dimensions which we can verify from the ``.shape`` attribute. .. GENERATED FROM PYTHON SOURCE LINES 160-164 .. code-block:: Python print(block.values.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (8, 3, 9) .. GENERATED FROM PYTHON SOURCE LINES 165-170 The first dimension is described by the ``block.samples``, the intermediate dimension by ``block.components``, and the last dimension by ``block.properties``. The sample dimension has a length of eight because we have eight hydrogen atoms in the first structure. We can reveal more detailed metadata information about the sample dimension printing of the :py:attr:`metatensor.TensorBlock.samples` attribute of the block .. GENERATED FROM PYTHON SOURCE LINES 171-174 .. code-block:: Python print(block.samples) .. rst-class:: sphx-glr-script-out .. code-block:: none Labels( system atom 0 12 0 13 ... 0 18 0 19 ) .. GENERATED FROM PYTHON SOURCE LINES 175-181 In these labels, the first column indicates the **structure**, which is 0 for all because we only computed the representation of a single structure. The second entry of each tuple refers to the index of the central **atom**. We can do a similar investigation for the second dimension of the value array, descrbied by :py:attr:`metatensor.TensorBlock.components`. .. GENERATED FROM PYTHON SOURCE LINES 182-185 .. code-block:: Python print(block.components) .. rst-class:: sphx-glr-script-out .. code-block:: none [Labels( o3_mu -1 0 1 )] .. GENERATED FROM PYTHON SOURCE LINES 186-196 Here, the components are associated with the angular channels of the representation. The size of ``o3_mu`` is :math:`2l + 1`, where :math:`l` is the current ``o3_lambda`` of the block. Here, its dimension is three because we are looking at the ``o3_lambda=1`` block. You may have noticed that the return value of the last call is a :class:`list` of :class:`metatensor.Labels` and not a single ``Labels`` instance. The reason is that a block can have several component dimensions as we will see below for the gradients. The last dimension represents the radial channels of spherical expansion. These are described in the :py:attr:`metatensor.TensorBlock.properties`: .. GENERATED FROM PYTHON SOURCE LINES 197-203 .. code-block:: Python print(block.properties) # There is only one column here, named **n**, and indicating which of the radial channel # we are looking at, from 0 to ``max_radial``. .. rst-class:: sphx-glr-script-out .. code-block:: none Labels( n 0 1 ... 7 8 ) .. GENERATED FROM PYTHON SOURCE LINES 204-209 The descriptor values --------------------- After looking at the metadata we can investigate the actual data of the representation in more details .. GENERATED FROM PYTHON SOURCE LINES 210-213 .. code-block:: Python print(block.values[0, 0, :]) .. rst-class:: sphx-glr-script-out .. code-block:: none [-0.00237693 0.01533393 -0.03954346 -0.20278318 -0.05190685 -0.14738365 -0.17762571 -0.05567855 -0.01701108] .. GENERATED FROM PYTHON SOURCE LINES 214-220 By using ``[0, 0, :]`` we selected the first hydrogen and the first ``m`` channel. As you the output shows the values are floating point numbers, representing the coefficients of the spherical expansion. Featomic is also able to process more than one structure within one function call. You can process a whole dataset with .. GENERATED FROM PYTHON SOURCE LINES 221-227 .. code-block:: Python descriptor_full = calculator.compute(structures) block_full = descriptor_full.block(0) print(block_full.values.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (420, 1, 9) .. GENERATED FROM PYTHON SOURCE LINES 228-245 Now, the 0th block of the :class:`metatensor.TensorMap` contains not eight but 420 entries in the first dimensions. This reflects the fact that in total we have 420 hydrogen atoms in the whole dataset. If you want to use another calculator instead of :class:`featomic.calculators.SphericalExpansion` shown here check out the :ref:`userdoc-references` section. Computing gradients ------------------- Additionally, featomic is also able to calculate gradients on top of the values. Gradients are useful for constructing an ML potential and running simulations. For example ``gradients`` of the representation with respect to atomic positions can be calculated by setting the ``gradients`` parameter of the :py:func:`featomic.calculators.CalculatorBase.compute()` method to ``["positions"]``. .. GENERATED FROM PYTHON SOURCE LINES 246-254 .. code-block:: Python descriptor = calculator.compute(structure_0, gradients=["positions"]) block = descriptor.block(o3_lambda=1, center_type=1, neighbor_type=1) gradient = block.gradient("positions") print(gradient.values.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (52, 3, 3, 9) .. GENERATED FROM PYTHON SOURCE LINES 255-270 The calculated descriptor contains the values and in each block the associated position gradients as another :class:`metatensor.TensorBlock`, containing both gradient data and associated metadata. Compared to the features where we found three dimensions, gradients have four. Again the first is called ``samples`` and the ``properties``. The dimensions between the sample and property dimensions are denoted by ``components``. Looking at the shape in more detail we find that we have 52 samples, which is much more compared to features where we only have eight samples. This arises from the fact that we compute positions gradient for each samples in the features **with respect to the position of other atoms**. Since we are looking at the block with ``neighbor_type = 1``, only hydrogen neighbors will contribute to these gradients. The samples shows this in detail: .. GENERATED FROM PYTHON SOURCE LINES 271-274 .. code-block:: Python print(gradient.samples.print(max_entries=10)) .. rst-class:: sphx-glr-script-out .. code-block:: none sample system atom 0 0 12 0 0 13 0 0 14 0 0 15 0 0 16 ... 7 0 15 7 0 16 7 0 17 7 0 18 7 0 19 .. GENERATED FROM PYTHON SOURCE LINES 275-279 Here, the **sample** column indicate which of the sample of the features we are taking the gradients of; and (**system**, **atom**) indicate which atom's positions we are taking the gradient with respect to. For example, looking at the gradient sample at index ``3``: .. GENERATED FROM PYTHON SOURCE LINES 280-289 .. code-block:: Python print("taking the gradient of", block.samples[gradient.samples[3]["sample"]]) print( "with respect to the position of system =", gradient.samples[3]["system"], "and atom =", gradient.samples[3]["atom"], ) .. rst-class:: sphx-glr-script-out .. code-block:: none taking the gradient of LabelsEntry(system=0, atom=12) with respect to the position of system = 0 and atom = 15 .. GENERATED FROM PYTHON SOURCE LINES 290-291 Now looking at the components: .. GENERATED FROM PYTHON SOURCE LINES 292-295 .. code-block:: Python print(gradient.components) .. rst-class:: sphx-glr-script-out .. code-block:: none [Labels( xyz 0 1 2 ), Labels( o3_mu -1 0 1 )] .. GENERATED FROM PYTHON SOURCE LINES 296-301 we find two of them. Besides the ``o3_mu`` component that is also present in the features position gradients also have a component indicating the direction of the gradient vector. Finally, the properties are the same as the features .. GENERATED FROM PYTHON SOURCE LINES 302-305 .. code-block:: Python print(gradient.properties) .. rst-class:: sphx-glr-script-out .. code-block:: none Labels( n 0 1 ... 7 8 ) .. GENERATED FROM PYTHON SOURCE LINES 306-314 Featomic can also calculate gradients with respect to the strain (i.e. the virial). For this, you have to add ``"strain"`` to the list parsed to the ``gradients`` parameter of the :py:func:`featomic.calculators.CalculatorBase.compute()` method. Strain gradients/virial are useful when computing the stress and the pressure. If you want to know about the effect of changing hypers take a look at the next tutorial. If you want to solve an explicit problem our :ref:`userdoc-how-to` might help you. .. _sphx_glr_download_examples_first-calculation.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: first-calculation.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: first-calculation.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: first-calculation.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_