# LLM Finetuning with Hugging Face This repository contains scripts for finetuning language models using Hugging Face's transformers library. ## Setup 1. Install the required dependencies: ```bash pip install -r requirements.txt ``` 2. Make sure you have enough GPU memory for finetuning. For smaller models like OPT-350M, 8GB should be sufficient. For larger models, you may need more. ## Finetuning a Model The `finetune_model.py` script allows you to finetune a language model using a JSON dataset containing prompts and completions. ### Basic Usage ```bash python finetune_model.py ``` This will use the default settings: - Dataset: `datasets/adriana_finetune_dataset.json` - Model: `facebook/opt-350m` (a more capable model than GPT-2) - Output directory: `finetuned_model` - Training epochs: 3 - Batch size: 4 - Learning rate: 5e-5 ### Model Options The script is configured to use `facebook/opt-350m` by default, which is a more capable model than GPT-2. Here are some other good options you can use by modifying the `model_name` in the `Args` class: - **Smaller models** (faster training, less memory): - `facebook/opt-125m` (125M parameters) - `EleutherAI/pythia-70m` (70M parameters) - `facebook/opt-350m` (350M parameters) - **Medium models** (better quality, more memory): - `facebook/opt-1.3b` (1.3B parameters) - `EleutherAI/pythia-1.4b` (1.4B parameters) - `facebook/opt-2.7b` (2.7B parameters) - **Large models** (best quality, requires significant memory): - `facebook/opt-6.7b` (6.7B parameters) - `EleutherAI/pythia-6.9b` (6.9B parameters) For very large models, it's recommended to use LoRA by setting `use_lora = True` in the `Args` class. ### Advanced Usage You can customize the finetuning process by modifying the `Args` class in the `finetune_model.py` file: ```python class Args: def __init__(self): self.dataset_path = "datasets/adriana_finetune_dataset.json" self.model_name = "facebook/opt-1.3b" # Change to a different model self.output_dir = "finetuned_model" self.num_train_epochs = 5 # Increase epochs for better results self.per_device_train_batch_size = 2 # Adjust based on your GPU memory self.learning_rate = 3e-5 # Adjust learning rate self.use_lora = True # Enable LoRA for efficient finetuning ``` ## Using LoRA for Efficient Finetuning LoRA (Low-Rank Adaptation) is a technique that allows for efficient finetuning of large language models by only training a small number of parameters. This is especially useful when you have limited computational resources. To use LoRA, simply set `self.use_lora = True` in the `Args` class: ```python class Args: def __init__(self): # ... other settings ... self.use_lora = True # Enable LoRA ``` ## Dataset Format The script expects a JSON file with the following format: ```json [ { "prompt": "Your prompt here", "completion": "Your completion here" }, ... ] ``` ## Using the Finetuned Model After finetuning, you can use the model with the Hugging Face transformers library: ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load the finetuned model and tokenizer model_path = "finetuned_model" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path) # Generate text prompt = "Create a welcome message for new clients" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=100, num_return_sequences=1) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) ``` ## Testing the Model You can test the finetuned model using the `test_model.py` script: ```bash python test_model.py ``` This will load the finetuned model from the `finetuned_model` directory and generate text for the default prompt. You can modify the prompt in the `test_model.py` file to test different inputs. ## Troubleshooting - If you encounter CUDA out of memory errors, try: - Using a smaller model - Reducing the batch size - Using LoRA by setting `self.use_lora = True` - For very large models, consider using 8-bit quantization with `load_in_8bit=True` when loading the model - If you're finetuning on a CPU, the process will be much slower. Consider using a smaller model or fewer epochs