Fine-tuning¶
Warning
This section of the documentation is only relevant for PET model so far.
This section describes the process of fine-tuning a pre-trained model to adapt it to new tasks or datasets. Fine-tuning is a common technique used in transfer learning, where a model is trained on a large dataset and then fine-tuned on a smaller dataset to improve its performance on specific tasks. So far the fine-tuning capabilities are only available for PET model.
Fine-Tuning PET Model with LoRA¶
Fine-tuning a PET model using LoRA (Low-Rank Adaptation) can significantly
enhance the model’s performance on specific tasks while reducing the
computational cost. Below are the steps to fine-tune a PET model from
metatrain.pet
using LoRA.
What is LoRA?¶
LoRA (Low-Rank Adaptation) stands for a Parameter-Efficient Fine-Tuning (PEFT) technique used to adapt pre-trained models to new tasks by introducing low-rank matrices into the model’s architecture. This approach reduces the number of trainable parameters, making the fine-tuning process more efficient and less resource-intensive. LoRA is particularly useful in scenarios where computational resources are limited or when quick adaptation to new tasks is required.
Given a pre-trained model with the weights matrix \(W_0\), LoRA introduces low-rank matrices \(A\) and \(B\) of a rank \(r\) such that the new weights matrix \(W\) is computed as:
where \(\alpha\) is a regularization factor that controls the influence of the low-rank matrices on the model’s weights. By adjusting the rank \(r\) and the regularization factor \(\alpha\), you can fine-tune the model to achieve better performance on specific tasks.
Prerequisites¶
1. Train the Base Model. You can train the base model using the command:
mtt train options.yaml
. Alternatively, you can use a pre-trained
foundational model, if you have access to its checkpoint. After this training,
you will find the checkpoint file called best_model.ckpt
in the training
directory.
1. Set the LoRA parameters in the architecture.training
section of the options.yaml
:
architecture:
training:
LORA_RANK: <desired_rank>
LORA_ALPHA: <desired_alpha>
USE_LORA_PEFT: True
These parameters control whether to use LoRA for pre-trained model fine-tuning
(USE_LORA_PEFT
), the rank of the low-rank matrices introduced by LoRA
(LORA_RANK
), and the regularization factor for the low-rank matrices
(LORA_ALPHA
).
4. Run mtt train options.yaml -c best_model.ckpt
to fine-tune the model.
The -c
flag specifies the path to the pre-trained model checkpoint.
Fine-Tuning Options¶
When USE_LORA_PEFT
is set to True
, the original model’s weights will be
frozen, and only the adapter layers introduced by LoRA will be trained. This
allows for efficient fine-tuning with fewer parameters. If USE_LORA_PEFT
is
set to False
, all the weights of the model will be trained. This can lead to
better performance on the specific task but may require more computational
resources, and the model may be prone to overfitting (i.e. loosing accuracy on
the original training set).