.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/first-calculation.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_first-calculation.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_first-calculation.py:


.. _userdoc-tutorials-get-started:

First descriptor computation
============================

This is an introduction to the featomic interface using a molecular crystals dataset
using the Python interface. If you are interested in another programming language we
recommend you first follow this tutorial and afterward take a look at the how-to guide
on :ref:`userdoc-how-to-computing-soap`.

The dataset
-----------

The atomic configurations used in our documentation are a small subset of the `ShiftML2
dataset <https://pubs.acs.org/doi/pdf/10.1021/acs.jpcc.2c03854>`_ containing molecular
crystals. There are four crystals - one with each of the elements ``[H, C]``, ``[H, C,
N, O]``, ``[H, C, N]``, and ``[H, C, O]``. For each crystal, we have 10 structures: the
first one is geometry-optimized. The following 9 contain atoms that are slightly
displaced from the geometry-optimized one. You can obtain the dataset from our
:download:`website <../../static/dataset.xyz>`.

.. GENERATED FROM PYTHON SOURCE LINES 25-28

We will start by importing all the required packages: the classic numpy;
chemfiles to load data, and featomic to compute representations. Afterward
we will load the dataset using chemfiles.

.. GENERATED FROM PYTHON SOURCE LINES 29-41

.. code-block:: Python


    import chemfiles
    import numpy as np

    from featomic import SphericalExpansion


    with chemfiles.Trajectory("dataset.xyz") as trajectory:
        structures = [s for s in trajectory]

    print(f"The dataset contains {len(structures)} structures.")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The dataset contains 40 structures.


.. GENERATED FROM PYTHON SOURCE LINES 42-48

We will not explain here how to use chemfiles in detail, as we only use a few
functions. Briefly, :class:`chemfiles.Trajectory` loads structure data in a format
featomic can use. If you want to learn more about the possibilities take a look at the
`chemfiles documentation <https://chemfiles.org>`_.

Let us now take a look at the first structure of the dataset.

.. GENERATED FROM PYTHON SOURCE LINES 49-54

.. code-block:: Python


    structure_0 = structures[0]

    print(structure_0)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Frame with 20 atoms


.. GENERATED FROM PYTHON SOURCE LINES 55-57

With ``structure_0.atoms`` we get a list of the atoms that make up the first
structure. The ``name`` attribute gives us the name of the specified atom.

.. GENERATED FROM PYTHON SOURCE LINES 58-71

.. code-block:: Python


    elements, counts = np.unique(
        [atom.name for atom in structure_0.atoms], return_counts=True
    )

    print(
        f"The first structure contains "
        f"{counts[0]} {elements[0]}-atoms, "
        f"{counts[1]} {elements[1]}-atoms, "
        f"{counts[2]} {elements[2]}-atoms and "
        f"{counts[3]} {elements[3]}-atoms."
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The first structure contains 4 C-atoms, 8 H-atoms, 4 N-atoms and 4 O-atoms.


.. GENERATED FROM PYTHON SOURCE LINES 72-85

Calculate a descriptor
----------------------

We will now calculate an atomic descriptor for this structure using the SOAP spherical
expansion as introduced by `Bartók, Kondor, and Csányi
<http://dx.doi.org/10.1103/PhysRevB.87.184115>`_.

To do so we define below a set of parameters telling featomic how the spherical
expansion should be calculated. These parameters are also called hyper parameters
since they are parameters of the representation, in opposition to parameters of
machine learning models. Hyper parameters are a crucial part of calculating
descriptors. Poorly selected hyper parameters will lead to a poor description of your
dataset as discussed in the `literature <https://arxiv.org/abs/1502.02127>`_.

.. GENERATED FROM PYTHON SOURCE LINES 86-109

.. code-block:: Python


    cutoff = {
        "radius": 4.5,
        "smoothing": {"type": "ShiftedCosine", "width": 0.5},
    }

    density = {
        "type": "Gaussian",
        "width": 0.3,
        "radial_scaling": {
            "type": "Willatt2018",
            "scale": 2.0,
            "rate": 1.0,
            "exponent": 4,
        },
    }

    basis = {
        "type": "TensorProduct",
        "max_angular": 5,
        "radial": {"type": "Gto", "max_radial": 8},
    }


.. GENERATED FROM PYTHON SOURCE LINES 110-113

After we set the hyper parameters we initialize a
:class:`featomic.calculators.SphericalExpansion` object with hyper parameters defined
above and run the :py:func:`featomic.calculators.CalculatorBase.compute()` method.

.. GENERATED FROM PYTHON SOURCE LINES 114-119

.. code-block:: Python


    calculator = SphericalExpansion(cutoff=cutoff, density=density, basis=basis)
    descriptor_0 = calculator.compute(structure_0)
    print(type(descriptor_0))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <class 'metatensor.tensor.TensorMap'>


.. GENERATED FROM PYTHON SOURCE LINES 120-127

The descriptor format is a :class:`metatensor.TensorMap` object. Metatensor is like
numpy for storing representations of atomistic ML data. Extensive details on the
metatensor are covered in the `corresponding documentation
<https://docs.metatensor.org>`_.

We will now have a look at how the data is stored inside :class:`metatensor.TensorMap`
objects.

.. GENERATED FROM PYTHON SOURCE LINES 128-132

.. code-block:: Python


    print(descriptor_0)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorMap with 96 blocks
    keys: o3_lambda  o3_sigma  center_type  neighbor_type
              0         1           1             1
              1         1           1             1
              2         1           1             1
              3         1           1             1
              4         1           1             1
              5         1           1             1
              0         1           1             6
              1         1           1             6
              2         1           1             6
              3         1           1             6
              4         1           1             6
              5         1           1             6
              0         1           1             7
              1         1           1             7
              2         1           1             7
              3         1           1             7
              4         1           1             7
              5         1           1             7
              0         1           1             8
              1         1           1             8
              2         1           1             8
              3         1           1             8
              4         1           1             8
              5         1           1             8
              0         1           6             1
              1         1           6             1
              2         1           6             1
              3         1           6             1
              4         1           6             1
              5         1           6             1
              0         1           6             6
              1         1           6             6
              2         1           6             6
              3         1           6             6
              4         1           6             6
              5         1           6             6
              0         1           6             7
              1         1           6             7
              2         1           6             7
              3         1           6             7
              4         1           6             7
              5         1           6             7
              0         1           6             8
              1         1           6             8
              2         1           6             8
              3         1           6             8
              4         1           6             8
              5         1           6             8
              0         1           7             1
              1         1           7             1
              2         1           7             1
              3         1           7             1
              4         1           7             1
              5         1           7             1
              0         1           7             6
              1         1           7             6
              2         1           7             6
              3         1           7             6
              4         1           7             6
              5         1           7             6
              0         1           7             7
              1         1           7             7
              2         1           7             7
              3         1           7             7
              4         1           7             7
              5         1           7             7
              0         1           7             8
              1         1           7             8
              2         1           7             8
              3         1           7             8
              4         1           7             8
              5         1           7             8
              0         1           8             1
              1         1           8             1
              2         1           8             1
              3         1           8             1
              4         1           8             1
              5         1           8             1
              0         1           8             6
              1         1           8             6
              2         1           8             6
              3         1           8             6
              4         1           8             6
              5         1           8             6
              0         1           8             7
              1         1           8             7
              2         1           8             7
              3         1           8             7
              4         1           8             7
              5         1           8             7
              0         1           8             8
              1         1           8             8
              2         1           8             8
              3         1           8             8
              4         1           8             8
              5         1           8             8


.. GENERATED FROM PYTHON SOURCE LINES 133-145

The :class:`metatensor.TensorMap` contains multiple :class:`metatensor.TensorBlock`.
To distinguish them each block is associated with a unique key. For this example, we
have one block for each angular channel labeled by ``o3_lambda`` (``o3_sigma`` is
always equal to 1 since we only consider proper tensors, not pseudo-tensors), the
central atom type ``center_type`` and neighbor atom type labeled by ``neighbor_type``.
Different atomic types are represented using their atomic number, e.g. 1 for hydrogen,
6 for carbon, etc. To summarize, this descriptor contains 96 blocks covering all
combinations of the angular channels, central atom type, and neighbor atom types in
our dataset.

Let us take a look at the second block (at index 1) in detail. This block contains the
descriptor for the :math:`\lambda = 1` angular channel for hydrogen-hydrogen pairs.

.. GENERATED FROM PYTHON SOURCE LINES 146-151

.. code-block:: Python


    block = descriptor_0.block(1)
    print(descriptor_0.keys[1])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    LabelsEntry(o3_lambda=1, o3_sigma=1, center_type=1, neighbor_type=1)


.. GENERATED FROM PYTHON SOURCE LINES 152-159

Metadata about the descriptor
-----------------------------

The values of the representation are inside the blocks, in the ``block.values`` array.
Each entry in this array is described by metadata carried by the block. For the
spherical expansion calculator used in this tutorial the values have three dimensions
which we can verify from the ``.shape`` attribute.

.. GENERATED FROM PYTHON SOURCE LINES 160-164

.. code-block:: Python


    print(block.values.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (8, 3, 9)


.. GENERATED FROM PYTHON SOURCE LINES 165-170

The first dimension is described by the ``block.samples``, the intermediate dimension
by ``block.components``, and the last dimension by ``block.properties``. The sample
dimension has a length of eight because we have eight hydrogen atoms in the first
structure. We can reveal more detailed metadata information about the sample dimension
printing of the :py:attr:`metatensor.TensorBlock.samples` attribute of the block

.. GENERATED FROM PYTHON SOURCE LINES 171-174

.. code-block:: Python


    print(block.samples)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Labels(
        system  atom
          0      12
          0      13
            ...
          0      18
          0      19
    )


.. GENERATED FROM PYTHON SOURCE LINES 175-181

In these labels, the first column indicates the **structure**, which is 0 for all
because we only computed the representation of a single structure. The second entry of
each tuple refers to the index of the central **atom**.

We can do a similar investigation for the second dimension of the value array,
descrbied by :py:attr:`metatensor.TensorBlock.components`.

.. GENERATED FROM PYTHON SOURCE LINES 182-185

.. code-block:: Python


    print(block.components)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [Labels(
        o3_mu
         -1
          0
          1
    )]


.. GENERATED FROM PYTHON SOURCE LINES 186-196

Here, the components are associated with the angular channels of the representation.
The size of ``o3_mu`` is :math:`2l + 1`, where :math:`l` is the current ``o3_lambda``
of the block. Here, its dimension is three because we are looking at the
``o3_lambda=1`` block. You may have noticed that the return value of the last call is
a :class:`list` of :class:`metatensor.Labels` and not a single ``Labels`` instance.
The reason is that a block can have several component dimensions as we will see below
for the gradients.

The last dimension represents the radial channels of spherical expansion. These are
described in the :py:attr:`metatensor.TensorBlock.properties`:

.. GENERATED FROM PYTHON SOURCE LINES 197-203

.. code-block:: Python


    print(block.properties)

    # There is only one column here, named **n**, and indicating which of the radial channel
    # we are looking at, from 0 to ``max_radial``.


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Labels(
        n
        0
        1
         ...
        7
        8
    )


.. GENERATED FROM PYTHON SOURCE LINES 204-209

The descriptor values
---------------------

After looking at the metadata we can investigate the actual data of the
representation in more details

.. GENERATED FROM PYTHON SOURCE LINES 210-213

.. code-block:: Python


    print(block.values[0, 0, :])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [-0.00237693  0.01533393 -0.03954346 -0.20278318 -0.05190685 -0.14738365
     -0.17762571 -0.05567855 -0.01701108]


.. GENERATED FROM PYTHON SOURCE LINES 214-220

By using ``[0, 0, :]`` we selected the first hydrogen and the first ``m`` channel. As
you the output shows the values are floating point numbers, representing the
coefficients of the spherical expansion.

Featomic is also able to process more than one structure within one function call. You
can process a whole dataset with

.. GENERATED FROM PYTHON SOURCE LINES 221-227

.. code-block:: Python


    descriptor_full = calculator.compute(structures)

    block_full = descriptor_full.block(0)
    print(block_full.values.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (420, 1, 9)


.. GENERATED FROM PYTHON SOURCE LINES 228-245

Now, the 0th block of the :class:`metatensor.TensorMap` contains not eight but
420 entries in the first dimensions. This reflects the fact that in total we
have 420 hydrogen atoms in the whole dataset.

If you want to use another calculator instead of
:class:`featomic.calculators.SphericalExpansion` shown here check out the
:ref:`userdoc-references` section.

Computing gradients
-------------------

Additionally, featomic is also able to calculate gradients on top of the
values. Gradients are useful for constructing an ML potential and running
simulations. For example ``gradients`` of the representation with respect to
atomic positions can be calculated by setting the ``gradients`` parameter of
the :py:func:`featomic.calculators.CalculatorBase.compute()` method to
``["positions"]``.

.. GENERATED FROM PYTHON SOURCE LINES 246-254

.. code-block:: Python


    descriptor = calculator.compute(structure_0, gradients=["positions"])

    block = descriptor.block(o3_lambda=1, center_type=1, neighbor_type=1)
    gradient = block.gradient("positions")

    print(gradient.values.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (52, 3, 3, 9)


.. GENERATED FROM PYTHON SOURCE LINES 255-270

The calculated descriptor contains the values and in each block the associated
position gradients as another :class:`metatensor.TensorBlock`, containing both
gradient data and associated metadata.

Compared to the features where we found three dimensions, gradients have four. Again
the first is called ``samples`` and the ``properties``. The dimensions between the
sample and property dimensions are denoted by ``components``.

Looking at the shape in more detail we find that we have 52 samples, which is much
more compared to features where we only have eight samples. This arises from the fact
that we compute positions gradient for each samples in the features **with respect to
the position of other atoms**. Since we are looking at the block with ``neighbor_type
= 1``, only hydrogen neighbors will contribute to these gradients.

The samples shows this in detail:

.. GENERATED FROM PYTHON SOURCE LINES 271-274

.. code-block:: Python


    print(gradient.samples.print(max_entries=10))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

     sample  system  atom
       0       0      12
       0       0      13
       0       0      14
       0       0      15
       0       0      16
             ...
       7       0      15
       7       0      16
       7       0      17
       7       0      18
       7       0      19


.. GENERATED FROM PYTHON SOURCE LINES 275-279

Here, the **sample** column indicate which of the sample of the features we are taking
the gradients of; and (**system**,  **atom**) indicate which atom's positions we are
taking the gradient with respect to. For example, looking at the gradient sample at
index ``3``:

.. GENERATED FROM PYTHON SOURCE LINES 280-289

.. code-block:: Python


    print("taking the gradient of", block.samples[gradient.samples[3]["sample"]])
    print(
        "with respect to the position of system =",
        gradient.samples[3]["system"],
        "and atom =",
        gradient.samples[3]["atom"],
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    taking the gradient of LabelsEntry(system=0, atom=12)
    with respect to the position of system = 0 and atom = 15


.. GENERATED FROM PYTHON SOURCE LINES 290-291

Now looking at the components:

.. GENERATED FROM PYTHON SOURCE LINES 292-295

.. code-block:: Python


    print(gradient.components)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [Labels(
        xyz
         0
         1
         2
    ), Labels(
        o3_mu
         -1
          0
          1
    )]


.. GENERATED FROM PYTHON SOURCE LINES 296-301

we find two of them. Besides the ``o3_mu`` component that is also present in the
features position gradients also have a component indicating the direction of the
gradient vector.

Finally, the properties are the same as the features

.. GENERATED FROM PYTHON SOURCE LINES 302-305

.. code-block:: Python


    print(gradient.properties)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Labels(
        n
        0
        1
         ...
        7
        8
    )


.. GENERATED FROM PYTHON SOURCE LINES 306-314

Featomic can also calculate gradients with respect to the strain (i.e. the virial).
For this, you have to add ``"strain"`` to the list parsed to the ``gradients``
parameter of the :py:func:`featomic.calculators.CalculatorBase.compute()` method.
Strain gradients/virial are useful when computing the stress and the pressure.

If you want to know about the effect of changing hypers take a look at the next
tutorial. If you want to solve an explicit problem our :ref:`userdoc-how-to` might
help you.


.. _sphx_glr_download_examples_first-calculation.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: first-calculation.ipynb <first-calculation.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: first-calculation.py <first-calculation.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: first-calculation.zip <first-calculation.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_