Computing density correlations

In a previous example, we computed the \(\lambda\)-SOAP equivariant descriptor. This tutorial focuses on the computation of arbitrary body-order descriptors by means of an auto-correlations of spherical expansion (or ‘density’).

We will explore both \(\lambda\)-SOAP — a version of auto-correlations with body-order 3 (taking into account one central atom and two neighbors) — an the bispectrum which is a representation with body-order 4.

import ase.io
import metatensor as mts
import numpy as np

from featomic import SphericalExpansion
from featomic.clebsch_gordan import DensityCorrelations, EquivariantPowerSpectrum

Computing the spherical expansion

We can define the spherical expansion hyper parameters and compute the density. This will be used throughout the remainder of the tutorial.

HYPER_PARAMETERS = {
    "cutoff": {
        "radius": 5.0,
        "smoothing": {"type": "ShiftedCosine", "width": 0.5},
    },
    "density": {
        "type": "Gaussian",
        "width": 0.3,
    },
    "basis": {
        "type": "TensorProduct",
        "max_angular": 3,
        "radial": {"type": "Gto", "max_radial": 4},
    },
}

systems = ase.io.read("dataset.xyz", ":")

# Initialize the SphericalExpansion calculator and compute the density
spherical_expansion = SphericalExpansion(**HYPER_PARAMETERS)
density = spherical_expansion.compute(systems)

Move to “neighbor_type” to properties, i.e. remove sparsity in this dimension.

Note: this method should be called with care when computing on a subset of systems from a larger dataset. If the systems being computed contain a subset of the global atom types, an inconsistent feature dimension will be created. In this case, the argument to keys_to_properties should be specified as a Labels object with all global atom types.

density = density.keys_to_properties("neighbor_type")

# average number of features per block
print("total number of features:", np.sum([len(block.properties) for block in density]))
total number of features: 320

Density correlations to build a \(\lambda\)-SOAP

We can now use the DensityCorrelations calculator and specify that we want to take a single Clebsch-Gordan (CG) tensor product, i.e. n_correlations=1.

During initialization, the calculator computes and stores the CG coefficients. As the density expansion is up to o3_lambda=3 and we are doing a single contraction, we need CG coefficients computed up to o3_lambda=6 in order to do a full contraction. Hence, we set max_angular=6.

density_correlations = DensityCorrelations(
    n_correlations=1,
    max_angular=6,
)

This outputs an equivariant power spectrum descriptor of body-order 3, i.e. \(\lambda\)-SOAP features.

lambda_soap = density_correlations.compute(density)

This is not quite equivalent to the result seen in the previous tutorial on computing λ-SOAP. The keys contain dimensions "l_1" and "l_2" which for a given block track the angular order of the blocks from the input combined to create the block in the output TensorMap.

Keeping these dimensions in the keys is useful to allow for further CG products to be taken, building more complex descriptors. For now, we can move these key dimensions to properties. Inspect the metadata before and after moving these dimensions.

print("λ-SOAP before keys_to_properties:", lambda_soap.keys)
print("first block:", lambda_soap[0])
λ-SOAP before keys_to_properties: Labels(
    o3_lambda  o3_sigma  center_type  l_1  l_2
        0         1           1        0    0
        1         1           1        0    1
                       ...
        5         -1          8        3    3
        6         1           8        3    3
)
first block: TensorBlock
    samples (420): ['system', 'atom']
    components (1): ['o3_mu']
    properties (400): ['neighbor_1_type', 'n_1', 'neighbor_2_type', 'n_2']
    gradients: None
lambda_soap = lambda_soap.keys_to_properties(["l_1", "l_2"])

print("λ-SOAP after keys_to_properties:", lambda_soap.keys)
print("first block:", lambda_soap[0])
λ-SOAP after keys_to_properties: Labels(
    o3_lambda  o3_sigma  center_type
        0         1           1
        1         1           1
                  ...
        5         -1          8
        6         1           8
)
first block: TensorBlock
    samples (420): ['system', 'atom']
    components (1): ['o3_mu']
    properties (1600): ['l_1', 'l_2', 'neighbor_1_type', 'n_1', 'neighbor_2_type', 'n_2']
    gradients: None

This obtains a result that is equivalent to the \(\lambda\)-SOAP seen in the previous tutorial. We can confirm this by computing an EquivariantPowerSpectrum and checking for consistency.

equivariant_ps_calculator = EquivariantPowerSpectrum(spherical_expansion)
equivariant_ps = equivariant_ps_calculator.compute(
    systems, neighbors_to_properties=True
)

assert mts.equal(lambda_soap, equivariant_ps)

Computing the bispectrum

Higher body order descriptors can be computed by increasing the n_correlations parameter. The max_angular should also be increased to account for the increased combinations in angular momenta.

With more iterations, the cost of the computation scales unfavourably. Let’s use a density with small hyper parameters to demonstrate calculation of the bispectrum, a body-order 4 equivariant descriptor.

HYPER_PARAMETERS_SMALL = {
    "cutoff": {
        "radius": 5.0,
        "smoothing": {"type": "ShiftedCosine", "width": 0.5},
    },
    "density": {
        "type": "Gaussian",
        "width": 0.3,
    },
    "basis": {
        "type": "TensorProduct",
        "max_angular": 2,
        "radial": {"type": "Gto", "max_radial": 2},
    },
}

Taking two CG combinations of a density expanded to o3_lambda=2 requires CG coefficients computed up to max_angular=6. This is given by (n_iterations + 1) * max_angular_density.

# Initialize the SphericalExpansion calculator and compute the density
spherical_expansion = SphericalExpansion(**HYPER_PARAMETERS_SMALL)
density = spherical_expansion.compute(systems)
density = density.keys_to_properties("neighbor_type")

# Initialize DensityCorrelations calculator
density_correlations = DensityCorrelations(
    n_correlations=2,
    max_angular=6,
)

# Compute the bispectrum
bispectrum = density_correlations.compute(density)

There are now "neighbor_x_type" and "n_x" (which track the radial channel indices) dimensions created by the product of feature spaces of the 3 density blocks combined to make each bispectrum block.

For each block, its key contains dimensions tracking the angular order of blocks combined to create it, namely ["l_1", "l_2", "k_2", "l_3"]. The "l_" dimensions track the angular order of the blocks from the original density, while "k_" dimensions track the angular order of intermediate blocks.

print("bispectrum first block:", bispectrum[0])
bispectrum first block: TensorBlock
    samples (420): ['system', 'atom']
    components (1): ['o3_mu']
    properties (1728): ['neighbor_1_type', 'n_1', 'neighbor_2_type', 'n_2', 'neighbor_3_type', 'n_3']
    gradients: None

Let’s look at an example. Take the block at index 156:

print(bispectrum.keys[156])
LabelsEntry(o3_lambda=3, o3_sigma=1, center_type=7, l_1=1, l_2=2, k_2=1, l_3=2)

This was created in the following way.

First, a block from density of angular order l_1=1 was combined with a block of order l_2=2. Angular momenta coupling rules state that non-zero combinations can only be created for output blocks with order | l_1 - l_2 | <= k_2 <= | l_1 + l_2 |, corresponding to [1, 2, 3]. In this case, a block of order k_2=1 was created.

Next, this intermediate block of order k_2=1 was then combined with a block from the original density of order l_3=2. This can again create combinations [1, 2, 3], and in this case has been combined to create the output angular order o3_lambda=3.

As before, we can move these symmetry keys to properties and inspect the metadata and the total size of the features.

bispectrum = bispectrum.keys_to_properties(["l_1", "l_2", "k_2", "l_3"])

print("first block:", bispectrum[0])
first block: TensorBlock
    samples (420): ['system', 'atom']
    components (1): ['o3_mu']
    properties (8640): ['l_1', 'l_2', 'k_2', 'l_3', 'neighbor_1_type', 'n_1', 'neighbor_2_type', 'n_2', 'neighbor_3_type', 'n_3']
    gradients: None
print(
    "total number of features:",
    np.sum([len(block.properties) for block in bispectrum]),
)
total number of features: 428544

Instead of computing the full CG product, a threshold can be defined to limit the maximum angular order of blocks computed at each step in the iterative CG coupling steps.

This is controlled by the angular_cutoff parameter, and allows us to initialize the calculator with a lower max_angular.

Note that any truncation of the angular channels away from the maximal allowed by angular momenta coupling rules results in some loss of information.

Let’s truncate to an angular cutoff of 3:

angular_cutoff = 3
density_correlations = DensityCorrelations(
    n_correlations=2,
    max_angular=angular_cutoff,
)
bispectrum_truncated = density_correlations.compute(
    density, angular_cutoff=angular_cutoff
)

# Move the "l_" and "k_" keys to properties
bispectrum_truncated = bispectrum_truncated.keys_to_properties(
    ["l_1", "l_2", "k_2", "l_3"]
)

print("truncated bispectrum:", bispectrum_truncated.keys)
truncated bispectrum: Labels(
    o3_lambda  o3_sigma  center_type
        0         1           1
        1         1           1
                  ...
        3         -1          8
        0         -1          8
)
print("first block:", bispectrum_truncated[0])
first block: TensorBlock
    samples (420): ['system', 'atom']
    components (1): ['o3_mu']
    properties (8640): ['l_1', 'l_2', 'k_2', 'l_3', 'neighbor_1_type', 'n_1', 'neighbor_2_type', 'n_2', 'neighbor_3_type', 'n_3']
    gradients: None
print(
    "total number of features:",
    np.sum([len(block.properties) for block in bispectrum_truncated]),
)
total number of features: 338688

To compute a descriptor that matches the symmetry of a given target property, the selected_keys argument can be passed to the compute method. This was also seen in the previous tutorial on computing lambda-SOAP.

Following this example, to compute the truncated bispectrum for a polarizability tensor:

bispectrum_truncated_key_select = density_correlations.compute(
    density,
    angular_cutoff=angular_cutoff,
    selected_keys=mts.Labels(
        ["o3_lambda", "o3_sigma"],
        np.array([[0, 1], [2, 1]]),
    ),
)

# Move the "l_" and "k_" keys to properties
bispectrum_truncated_key_select = bispectrum_truncated_key_select.keys_to_properties(
    ["l_1", "l_2", "k_2", "l_3"]
)

print("truncated bispectrum with selected keys:", bispectrum_truncated_key_select.keys)
truncated bispectrum with selected keys: Labels(
    o3_lambda  o3_sigma  center_type
        0         1           1
        2         1           1
                  ...
        0         1           8
        2         1           8
)
print(
    "total number of features:",
    np.sum([len(block.properties) for block in bispectrum_truncated_key_select]),
)
total number of features: 103680

Conclusion

DensityCorrelations can be used to build equivariants of arbitrary body order from a spherical expansion of decorated atomic point cloud data.

A key limitation of this approach is an exploding feature size. To reduce the number of output blocks, the angular_cutoff parameter can be used.