Getting Started with Llamafactory: Installation and Configuration Guide

Author's image | DALE-3

It has always been tedious to train large language models. Even with extensive support from public platforms like HuggingFace, the process includes setting up different scripts for each stage of the process. From setting up data for pre-training, fine-tuning or RLHF to setting up the model from quantization and LORA, training an LLM requires laborious manual efforts and tuning.

The recent launch of Flame-Factory in 2024 it aims to solve the exact problem. The GitHub repository makes setting up model training for all stages of an LLM lifecycle extremely convenient. From pre-training to SFT and even RLHF, the repository provides built-in support for setting up and training the latest available LLMs.

Supported models and data formats

The repository supports all recent models including LLama, LLava, Mixtral Mixture-of-Experts, Qwen, Phi, and Gemma, among others. The full list can be found here. Supports pre-training, SFT and major RL techniques including DPO, PPO and ORPO, enabling all the latest methodologies from full tuning to frozen tuning, LORA, QLoras and Agent Tuning.

Furthermore, they also provide sample data sets for each training step. Sample data sets generally follow the alpaca template even though he shared format is also supported. We highlight the Alpaca data format below to better understand how to set up your property data.

Please note that when using your data, you must edit and add information about your data file in the dataset_info.json file in the Llama-Factory/data folder.

Pre-workout data

The provided data is saved in a JSON file and only the text column is used to train the LLM. The data must be in the format below to set up pre-training.

(
  {"text": "document"},
  {"text": "document"}
)

Monitored fine tuning data

In SFT data, there are three required parameters; instruction, entry and exit. However, system and history can be optionally passed and will be used to train the model accordingly if provided in the data set.

The general alpaca format for SFT data is as follows:

(
  {
	"instruction": "human instruction (required)",
	"input": "human input (optional)",
	"output": "model response (required)",
	"system": "system prompt (optional)",
	"history": (
  	("human instruction in the first round (optional)", "model response in the first round (optional)"),
  	("human instruction in the second round (optional)", "model response in the second round (optional)")
	)
  }
)

Reward modeling data

Llama-Factory provides support for training an LLM for preference alignment using RLHF. The data format should provide two different responses for the same statement, which should highlight the alignment preference.

The best aligned response is passed to the chosen key and the worst response is passed to the rejected parameter. The data format is as follows:

(
  {
	"instruction": "human instruction (required)",
	"input": "human input (optional)",
	"chosen": "chosen answer (required)",
	"rejected": "rejected answer (required)"
  }
)

Configuration and installation

He GitHub repository provides support for easy installation using a setup.py and requirements file. However, it is recommended to use a clean Python environment when setting up the repository to avoid dependencies and package conflicts.

Although Python 3.8 is a minimum requirement, it is recommended to install Python 3.11 or higher. Clone the GitHub repository using the following command:

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory

Now we can create a new Python environment using the following commands:

python3.11 -m venv venv
source venv/bin/activate

Now we need to install the required packages and dependencies using the setup.py file. We can install them using the following command:

pip install -e ".(torch,metrics)"

This will install all the necessary dependencies, including torch, trl, accelerate and other packages. To ensure a successful installation, we should now be able to use the Llama-Factory command line interface. Running the following command should generate the usage help information in the terminal as shown in the image.

This should print to the command line if the installation was successful.

LLM adjustment

We can now start training an LLM! This is as easy as writing a configuration file and invoking a bash command.

Please note that a GPU is a must to train an LLM using Llama-factory.

We chose a smaller model to save on GPU memory and training resources. In this example, we will perform LORA based SFT for Phi3-mini-Instruct. We chose to create a yaml configuration file but you can also use a JSON file.

Create a new config.yaml file as follows. This configuration file is for SFT training and you can find more examples of various methods in the examples directory.

### model
model_name_or_path: microsoft/Phi-3.5-mini-instruct

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all

### dataset
dataset: alpaca_en_demo
template: llama3
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/phi-3/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

Although self-explanatory, we need to focus on two important parts of the configuration file.

Setting up the data set for training

The first name of the data set is a key parameter. More details about the dataset need to be added to the dataset_info.json file in the data directory before training. This information includes the necessary information about the actual path of the data file, the data format followed and the columns to be used from the data.

For this tutorial, we use the alpaca_demo dataset which contains questions and answers related to English. You can see the full data set. here.

Data will then be automatically loaded based on the information provided. Additionally, the data set key accepts a comma-separated list of values. Given a list, all data sets will be loaded and used to train the LLM.

Set up model training

Changing the type of training in Llama-Factory is as easy as changing a configuration parameter. As shown below, we only need the following parameters to configure LORA-based SFT for the LLM.

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all

We can replace SFT with pre-training and reward modeling with exact configuration files available in the examples directory. You can easily change the SFT to reward modeling by changing the given parameters.

Start training an LLM

Now we have everything configured. All that's left is to invoke a bash command passing the configuration file as command line input.

Invoke the following command:

llamafactory-cli train config.yaml

The program will automatically configure all the data sets, models and pipelines needed for training. It took me 10 minutes to train an epoch on a TESLA T4 GPU. The output model is saved in the output directory provided in config.yaml.

Inference

Inference is even easier than training a model. We need a configuration file similar to the training one that provides the base model and the path to the trained LORA adapter.

Create a new infer_config.yaml file and provide values for the provided keys:

model_name_or_path: microsoft/Phi-3.5-mini-instruct
adapter_name_or_path: saves/phi3-8b/lora/sft/  # Path to trained model
template: llama3
finetuning_type: lora

We can chat with the trained model directly on the command line with this command:

llamafactory-cli chat infer_config.yaml

This will load the model with the trained adapter and you can easily chat using the command line, similar to other packages like Ollama.

The following image shows an example of a response in the terminal:

Inference result

web user interface

If that wasn't simple enough, Llama-factory provides a code-free training and inference option with LlamaBoard.

You can start the GUI using the bash command:

This starts a web-based GUI on localhost as shown in the image. We can choose the model and training parameters, load and preview the dataset, set hyperparameters, train and infer all in the GUI.

Screenshot of the LlamaBoard web interface

Conclusion

Llama-factory is quickly becoming popular with over 30k stars on GitHub. It makes it considerably easier to set up and train an LLM from scratch, eliminating the need to manually configure the training process for multiple methods.

It supports all the latest methods and models and still claims to be 3.7 times faster than ChatGLM's P-Tuning and uses less GPU memory. This makes it easy for regular and enthusiast users to train their LLMs using minimal code.

Kanwal Mehreen Kanwal is a machine learning engineer and technical writer with a deep passion for data science and the intersection of ai with medicine. She is the co-author of the eBook “Maximize Productivity with ChatGPT.” As a Google Generation Scholar 2022 for APAC, he champions diversity and academic excellence. She is also recognized as a Teradata Diversity in tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is a passionate advocate for change and founded FEMCodes to empower women in STEM fields.