In this tutorial, we demonstrate how efficiently adjust the CAT-called-2B chat model for the generation of Python code using advanced techniques such as Qlora, gradient control point and fine adjustment supervised with the SFTTrainer. Taking advantage of the Alpaca-14K data set, we pass through the environment configuration, configuring LORA parameters and applying memory optimization strategies to train a model that stands out in generating high quality Python code. This step -by -step guide is designed for professionals who seek to take advantage of the power of the LLMs with a minimal computing overload.
!pip install -q accelerate
!pip install -q peft
!pip install -q transformers
!pip install -q trl
First, install the libraries required for our project. They include Accelerate, Peft, Transformers and TRL of the Python package index. The indicator -q (quiet mode) maintains the minimum output.
import os
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
HfArgumentParser,
TrainingArguments,
pipeline,
logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
Amount the essential modules for our training configuration. They include utilities for the loading of data sets, model/tokenizer, training arguments, registration, configuration of Lora and Sfttrainer.
# The model to train from the Hugging Face hub
model_name = "NousResearch/llama-2-7b-chat-hf"
# The instruction dataset to use
dataset_name = "user/minipython-Alpaca-14k"
# Fine-tuned model name
new_model = "/kaggle/working/llama-2-7b-codeAlpaca"
We specify the base model from the facial center of hugs, the set of instruction data and the name of the new model.
# QLoRA parameters
# LoRA attention dimension
lora_r = 64
# Alpha parameter for LoRA scaling
lora_alpha = 16
# Dropout probability for LoRA layers
lora_dropout = 0.1
Define the LORA parameters for our model. `lora_r` establishes the lora,` lora_alpha` dimension scale the updates of Lora, and 'lora_dropout` controls the probability of dropout.
# TrainingArguments parameters
# Output directory where the model predictions and checkpoints will be stored
output_dir = "/kaggle/working/llama-2-7b-codeAlpaca"
# Number of training epochs
num_train_epochs = 1
# Enable fp16 training (set to True for mixed precision training)
fp16 = True
# Batch size per GPU for training
per_device_train_batch_size = 8
# Batch size per GPU for evaluation
per_device_eval_batch_size = 8
# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 2
# Enable gradient checkpointing
gradient_checkpointing = True
# Maximum gradient norm (gradient clipping)
max_grad_norm = 0.3
# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4
# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001
# Optimizer to use
optim = "adamw_torch"
# Learning rate schedule
lr_scheduler_type = "constant"
# Group sequences into batches with the same length
# Saves memory and speeds up training considerably
group_by_length = True
# Ratio of steps for a linear warmup
warmup_ratio = 0.03
# Save checkpoint every x updates steps
save_steps = 100
# Log every x updates steps
logging_steps = 10
These parameters configure the training process. They include output routes, number of times, precision (FP16), lot sizes, gradient accumulation and control point. Additional configuration such as learning rate, optimizer and programming help adjust training behavior. Heating and registration configuration Control how the model begins to train and how we monitor progress.
import torch
print("PyTorch Version:", torch.__version__)
print("CUDA Version:", torch.version.cuda)
Pytorch amount and prints both the Pytorch version installed and the corresponding CUDA version.
This command shows the GPU information, including the controller version, the CUDA version and the current use of the GPU.
# SFT parameters
# Maximum sequence length to use
max_seq_length = None
# Pack multiple short examples in the same input sequence to increase efficiency
packing = False
# Load the entire model on the GPU 0
device_map = {"": 0}
Define the SFT parameters, such as the maximum length of the sequence, either to pack multiple examples and map the entire GPU 0 model.
# SFT parameters
# Maximum sequence length to use
max_seq_length = None
# Pack multiple short examples in the same input sequence to increase efficiency
packing = False
# Load dataset
dataset = load_dataset(dataset_name, split="train")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
# Load base model with 8-bit quantization
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
)
# Prepare model for training
model.gradient_checkpointing_enable()
model.enable_input_require_grads()
Establish additional SFT parameters and load our data set and tokenizer. We configure padded tokens for the token and load the base model with 8 -bit quantification. Finally, we enable the gradient control point and make sure the model requires input gradients for training.
from peft import get_peft_model
Amount the `Get_PEFT_MODEL`, which applies the finest adjustment of the parameters (PEFT) to our base model.
# Load LoRA configuration
peft_config = LoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_r,
bias="none",
task_type="CAUSAL_LM",
)
# Apply LoRA to the model
model = get_peft_model(model, peft_config)
# Set training parameters
training_arguments = TrainingArguments(
output_dir=output_dir,
num_train_epochs=num_train_epochs,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=save_steps,
logging_steps=logging_steps,
learning_rate=learning_rate,
weight_decay=weight_decay,
fp16=fp16,
max_grad_norm=max_grad_norm,
warmup_ratio=warmup_ratio,
group_by_length=True,
lr_scheduler_type=lr_scheduler_type,
)
# Set supervised fine-tuning parameters
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=training_arguments,
packing=packing,
)
Configure and apply to our model using `LORACONFIG` Y` GET_PEFT_MODEL`. Then we create `training training 'for models training, specifying the counts of times, lot sizes and optimization configuration. Finally, we configure the `sfttrainer`, passing it by the model, the data set, the token and the training arguments.
# Train model
trainer.train()
# Save trained model
trainer.model.save_pretrained(new_model)
Start the supervised process of fine adjustment (`trainer.train ()`) and then save the LORA trained LORA model to the specified directory.
# Run text generation pipeline with the fine-tuned model
prompt = "How can I write a Python program that calculates the mean, standard deviation, and coefficient of variation of a dataset from a CSV file?"
pipe = pipeline(task="text-generation", model=trainer.model, tokenizer=tokenizer, max_length=400)
result = pipe(f"(INST) {prompt} (/INST)")
print(result(0)('generated_text'))
Create a text generation pipe using our tight model and tokenizer. Then, we provide a message, generate text using the pipe and print the output.
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("HF_TOKEN")
Access Kaggy's secrets to recover a stored facial token hugging (`hf_token`). This token is used for authentication with the clamp center.
# Empty VRAM
# del model
# del pipe
# del trainer
# del dataset
del tokenizer
import gc
gc.collect()
gc.collect()
torch.cuda.empty_cache()
The previous fragment shows how to release GPU memory by eliminating the references and edge of the caches. We eliminate the tokenizer, execute the garbage collection and empty the CUDA cache to reduce the use of VRM.
import torch
# Check the number of GPUs available
num_gpus = torch.cuda.device_count()
print(f"Number of GPUs available: {num_gpus}")
# Check if CUDA device 1 is available
if num_gpus > 1:
print("cuda:1 is available.")
else:
print("cuda:1 is not available.")
We import Pytorch and verify the number of GPU detected. Then, we print the count and conditionally report if the GPU with ID 1 is available.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Specify the device ID for your desired GPU (e.g., 0 for the first GPU, 1 for the second GPU)
device_id = 1 # Change this based on your available GPUs
device = f"cuda:{device_id}"
# Load the base model on the specified GPU
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map="auto", # Use auto to load on the available device
)
# Load the LoRA weights
lora_model = PeftModel.from_pretrained(base_model, new_model)
# Move LoRA model to the specified GPU
lora_model.to(device)
# Merge the LoRA weights with the base model weights
model = lora_model.merge_and_unload()
# Ensure the merged model is on the correct device
model.to(device)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
Select a GPU device (_id 1) and load the base optimizations and memory optimizations specified. Then, load and merge Lora's weights in the base model, ensuring that the merged model moves to the designated GPU. Finally, load the tokenizer and confuse it with the appropriate filling configuration.
In conclusion, after this tutorial, the Cat call-2b model has successfully adjusted to specialize in the generation of python code. The integration of Qlora, the gradient control post and Sfttrainer demonstrates a practical approach to manage the limitations of resources while achieving high performance.
Download the Colab notebook here. All credit for this investigation goes to the researchers of this project. Besides, don't forget to follow us <a target="_blank" href="https://x.com/intent/follow?screen_name=marktechpost” target=”_blank” rel=”noreferrer noopener”>twitter and join our Telegram channel and LINKEDIN GRsplash. Do not forget to join our 75K+ ml of submen.
Marktechpost is inviting companies/companies/artificial intelligence groups to associate for their next ai magazines in 'Open Source ai in Production' and 'ai de Agent'.
Asif Razzaq is the CEO of Marktechpost Media Inc .. as a visionary entrepreneur and engineer, Asif undertakes to take advantage of the potential of artificial intelligence for the social good. Its most recent effort is the launch of an artificial intelligence media platform, Marktechpost, which stands out for its deep coverage of automatic learning and deep learning news that is technically solid and easily understandable by a broad audience. The platform has more than 2 million monthly views, illustrating its popularity among the public.