- Update config.py with Pinecone settings and model configurations - Implement VectorStore class with Pinecone backend - Add comprehensive vector operations (add, search, delete) - Set up proper error handling and metadata management - Add .gitignore for Python project
4.3 KiB
LLM Finetuning with Hugging Face
This repository contains scripts for finetuning language models using Hugging Face's transformers library.
Setup
- Install the required dependencies:
pip install -r requirements.txt
- Make sure you have enough GPU memory for finetuning. For smaller models like OPT-350M, 8GB should be sufficient. For larger models, you may need more.
Finetuning a Model
The finetune_model.py script allows you to finetune a language model using a JSON dataset containing prompts and completions.
Basic Usage
python finetune_model.py
This will use the default settings:
- Dataset:
datasets/adriana_finetune_dataset.json - Model:
facebook/opt-350m(a more capable model than GPT-2) - Output directory:
finetuned_model - Training epochs: 3
- Batch size: 4
- Learning rate: 5e-5
Model Options
The script is configured to use facebook/opt-350m by default, which is a more capable model than GPT-2. Here are some other good options you can use by modifying the model_name in the Args class:
-
Smaller models (faster training, less memory):
facebook/opt-125m(125M parameters)EleutherAI/pythia-70m(70M parameters)facebook/opt-350m(350M parameters)
-
Medium models (better quality, more memory):
facebook/opt-1.3b(1.3B parameters)EleutherAI/pythia-1.4b(1.4B parameters)facebook/opt-2.7b(2.7B parameters)
-
Large models (best quality, requires significant memory):
facebook/opt-6.7b(6.7B parameters)EleutherAI/pythia-6.9b(6.9B parameters)
For very large models, it's recommended to use LoRA by setting use_lora = True in the Args class.
Advanced Usage
You can customize the finetuning process by modifying the Args class in the finetune_model.py file:
class Args:
def __init__(self):
self.dataset_path = "datasets/adriana_finetune_dataset.json"
self.model_name = "facebook/opt-1.3b" # Change to a different model
self.output_dir = "finetuned_model"
self.num_train_epochs = 5 # Increase epochs for better results
self.per_device_train_batch_size = 2 # Adjust based on your GPU memory
self.learning_rate = 3e-5 # Adjust learning rate
self.use_lora = True # Enable LoRA for efficient finetuning
Using LoRA for Efficient Finetuning
LoRA (Low-Rank Adaptation) is a technique that allows for efficient finetuning of large language models by only training a small number of parameters. This is especially useful when you have limited computational resources.
To use LoRA, simply set self.use_lora = True in the Args class:
class Args:
def __init__(self):
# ... other settings ...
self.use_lora = True # Enable LoRA
Dataset Format
The script expects a JSON file with the following format:
[
{
"prompt": "Your prompt here",
"completion": "Your completion here"
},
...
]
Using the Finetuned Model
After finetuning, you can use the model with the Hugging Face transformers library:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the finetuned model and tokenizer
model_path = "finetuned_model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
# Generate text
prompt = "Create a welcome message for new clients"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100, num_return_sequences=1)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Testing the Model
You can test the finetuned model using the test_model.py script:
python test_model.py
This will load the finetuned model from the finetuned_model directory and generate text for the default prompt. You can modify the prompt in the test_model.py file to test different inputs.
Troubleshooting
- If you encounter CUDA out of memory errors, try:
- Using a smaller model
- Reducing the batch size
- Using LoRA by setting
self.use_lora = True
- For very large models, consider using 8-bit quantization with
load_in_8bit=Truewhen loading the model - If you're finetuning on a CPU, the process will be much slower. Consider using a smaller model or fewer epochs