Files
ds_task_marketing_assistant…/FINETUNE_README.md
T
boladeE 859c17aad8 feat: Implement Pinecone vector store integration
- Update config.py with Pinecone settings and model configurations
- Implement VectorStore class with Pinecone backend
- Add comprehensive vector operations (add, search, delete)
- Set up proper error handling and metadata management
- Add .gitignore for Python project
2025-04-16 23:09:52 +01:00

4.3 KiB

LLM Finetuning with Hugging Face

This repository contains scripts for finetuning language models using Hugging Face's transformers library.

Setup

  1. Install the required dependencies:
pip install -r requirements.txt
  1. Make sure you have enough GPU memory for finetuning. For smaller models like OPT-350M, 8GB should be sufficient. For larger models, you may need more.

Finetuning a Model

The finetune_model.py script allows you to finetune a language model using a JSON dataset containing prompts and completions.

Basic Usage

python finetune_model.py

This will use the default settings:

  • Dataset: datasets/adriana_finetune_dataset.json
  • Model: facebook/opt-350m (a more capable model than GPT-2)
  • Output directory: finetuned_model
  • Training epochs: 3
  • Batch size: 4
  • Learning rate: 5e-5

Model Options

The script is configured to use facebook/opt-350m by default, which is a more capable model than GPT-2. Here are some other good options you can use by modifying the model_name in the Args class:

  • Smaller models (faster training, less memory):

    • facebook/opt-125m (125M parameters)
    • EleutherAI/pythia-70m (70M parameters)
    • facebook/opt-350m (350M parameters)
  • Medium models (better quality, more memory):

    • facebook/opt-1.3b (1.3B parameters)
    • EleutherAI/pythia-1.4b (1.4B parameters)
    • facebook/opt-2.7b (2.7B parameters)
  • Large models (best quality, requires significant memory):

    • facebook/opt-6.7b (6.7B parameters)
    • EleutherAI/pythia-6.9b (6.9B parameters)

For very large models, it's recommended to use LoRA by setting use_lora = True in the Args class.

Advanced Usage

You can customize the finetuning process by modifying the Args class in the finetune_model.py file:

class Args:
    def __init__(self):
        self.dataset_path = "datasets/adriana_finetune_dataset.json"
        self.model_name = "facebook/opt-1.3b"  # Change to a different model
        self.output_dir = "finetuned_model"
        self.num_train_epochs = 5  # Increase epochs for better results
        self.per_device_train_batch_size = 2  # Adjust based on your GPU memory
        self.learning_rate = 3e-5  # Adjust learning rate
        self.use_lora = True  # Enable LoRA for efficient finetuning

Using LoRA for Efficient Finetuning

LoRA (Low-Rank Adaptation) is a technique that allows for efficient finetuning of large language models by only training a small number of parameters. This is especially useful when you have limited computational resources.

To use LoRA, simply set self.use_lora = True in the Args class:

class Args:
    def __init__(self):
        # ... other settings ...
        self.use_lora = True  # Enable LoRA

Dataset Format

The script expects a JSON file with the following format:

[
  {
    "prompt": "Your prompt here",
    "completion": "Your completion here"
  },
  ...
]

Using the Finetuned Model

After finetuning, you can use the model with the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the finetuned model and tokenizer
model_path = "finetuned_model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

# Generate text
prompt = "Create a welcome message for new clients"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100, num_return_sequences=1)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Testing the Model

You can test the finetuned model using the test_model.py script:

python test_model.py

This will load the finetuned model from the finetuned_model directory and generate text for the default prompt. You can modify the prompt in the test_model.py file to test different inputs.

Troubleshooting

  • If you encounter CUDA out of memory errors, try:
    • Using a smaller model
    • Reducing the batch size
    • Using LoRA by setting self.use_lora = True
  • For very large models, consider using 8-bit quantization with load_in_8bit=True when loading the model
  • If you're finetuning on a CPU, the process will be much slower. Consider using a smaller model or fewer epochs