Dataset Information¶
When working with metatrain
, you will most likely need to interact with some core
classes which are responsible for storing some information about datasets. All these
classes belong to the metatrain.utils.data
module which can be found in the
Data section of the developer documentation.
These classes are:
metatrain.utils.data.DatasetInfo
: This class is responsible for storing information about a dataset. It contains the length unit used in the dataset, the atomic types present, as well as information about the dataset’s targets as aDict[str, TargetInfo]
object. The keys of this dictionary are the names of the targets in the datasets (e.g.,energy
,mtt::dipole
, etc.).metatrain.utils.data.TargetInfo
: This class is responsible for storinginformation about a target in a dataset. It contains the target’s physical quantity, the unit in which the target is expressed, and the
layout
of the target. Thelayout
isTensorMap
object with zero samples which is used to exemplify the metadata of each target.
At the moment, only three types of layouts are supported:
- scalar: This type of layout is used when the target is a scalar quantity. The
layout
TensorMap
object corresponding to a scalar must have oneTensorBlock
and nocomponents
.
- Cartesian tensor: This type of layout is used when the target is a Cartesian tensor.
The
layout
TensorMap
object corresponding to a Cartesian tensor must have oneTensorBlock
and as manycomponents
as the tensor’s rank. These components are namedxyz
for a tensor of rank 1 andxyz_1
,xyz_2
, and so on for higher ranks.
- Spherical tensor: This type of layout is used when the target is a spherical tensor.
The
layout
TensorMap
object corresponding to a spherical tensor can have multiple blocks corresponding to different irreps (irreducible representations) of the target. Thekeys
of theTensorMap
object must have theo3_lambda
ando3_sigma
names, and eachTensorBlock
must have a single component namedo3_mu
.