133 lines
4.3 KiB
Markdown
133 lines
4.3 KiB
Markdown
|
|
# LLM Finetuning with Hugging Face
|
||
|
|
|
||
|
|
This repository contains scripts for finetuning language models using Hugging Face's transformers library.
|
||
|
|
|
||
|
|
## Setup
|
||
|
|
|
||
|
|
1. Install the required dependencies:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
pip install -r requirements.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
2. Make sure you have enough GPU memory for finetuning. For smaller models like OPT-350M, 8GB should be sufficient. For larger models, you may need more.
|
||
|
|
|
||
|
|
## Finetuning a Model
|
||
|
|
|
||
|
|
The `finetune_model.py` script allows you to finetune a language model using a JSON dataset containing prompts and completions.
|
||
|
|
|
||
|
|
### Basic Usage
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python finetune_model.py
|
||
|
|
```
|
||
|
|
|
||
|
|
This will use the default settings:
|
||
|
|
- Dataset: `datasets/adriana_finetune_dataset.json`
|
||
|
|
- Model: `facebook/opt-350m` (a more capable model than GPT-2)
|
||
|
|
- Output directory: `finetuned_model`
|
||
|
|
- Training epochs: 3
|
||
|
|
- Batch size: 4
|
||
|
|
- Learning rate: 5e-5
|
||
|
|
|
||
|
|
### Model Options
|
||
|
|
|
||
|
|
The script is configured to use `facebook/opt-350m` by default, which is a more capable model than GPT-2. Here are some other good options you can use by modifying the `model_name` in the `Args` class:
|
||
|
|
|
||
|
|
- **Smaller models** (faster training, less memory):
|
||
|
|
- `facebook/opt-125m` (125M parameters)
|
||
|
|
- `EleutherAI/pythia-70m` (70M parameters)
|
||
|
|
- `facebook/opt-350m` (350M parameters)
|
||
|
|
|
||
|
|
- **Medium models** (better quality, more memory):
|
||
|
|
- `facebook/opt-1.3b` (1.3B parameters)
|
||
|
|
- `EleutherAI/pythia-1.4b` (1.4B parameters)
|
||
|
|
- `facebook/opt-2.7b` (2.7B parameters)
|
||
|
|
|
||
|
|
- **Large models** (best quality, requires significant memory):
|
||
|
|
- `facebook/opt-6.7b` (6.7B parameters)
|
||
|
|
- `EleutherAI/pythia-6.9b` (6.9B parameters)
|
||
|
|
|
||
|
|
For very large models, it's recommended to use LoRA by setting `use_lora = True` in the `Args` class.
|
||
|
|
|
||
|
|
### Advanced Usage
|
||
|
|
|
||
|
|
You can customize the finetuning process by modifying the `Args` class in the `finetune_model.py` file:
|
||
|
|
|
||
|
|
```python
|
||
|
|
class Args:
|
||
|
|
def __init__(self):
|
||
|
|
self.dataset_path = "datasets/adriana_finetune_dataset.json"
|
||
|
|
self.model_name = "facebook/opt-1.3b" # Change to a different model
|
||
|
|
self.output_dir = "finetuned_model"
|
||
|
|
self.num_train_epochs = 5 # Increase epochs for better results
|
||
|
|
self.per_device_train_batch_size = 2 # Adjust based on your GPU memory
|
||
|
|
self.learning_rate = 3e-5 # Adjust learning rate
|
||
|
|
self.use_lora = True # Enable LoRA for efficient finetuning
|
||
|
|
```
|
||
|
|
|
||
|
|
## Using LoRA for Efficient Finetuning
|
||
|
|
|
||
|
|
LoRA (Low-Rank Adaptation) is a technique that allows for efficient finetuning of large language models by only training a small number of parameters. This is especially useful when you have limited computational resources.
|
||
|
|
|
||
|
|
To use LoRA, simply set `self.use_lora = True` in the `Args` class:
|
||
|
|
|
||
|
|
```python
|
||
|
|
class Args:
|
||
|
|
def __init__(self):
|
||
|
|
# ... other settings ...
|
||
|
|
self.use_lora = True # Enable LoRA
|
||
|
|
```
|
||
|
|
|
||
|
|
## Dataset Format
|
||
|
|
|
||
|
|
The script expects a JSON file with the following format:
|
||
|
|
|
||
|
|
```json
|
||
|
|
[
|
||
|
|
{
|
||
|
|
"prompt": "Your prompt here",
|
||
|
|
"completion": "Your completion here"
|
||
|
|
},
|
||
|
|
...
|
||
|
|
]
|
||
|
|
```
|
||
|
|
|
||
|
|
## Using the Finetuned Model
|
||
|
|
|
||
|
|
After finetuning, you can use the model with the Hugging Face transformers library:
|
||
|
|
|
||
|
|
```python
|
||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||
|
|
|
||
|
|
# Load the finetuned model and tokenizer
|
||
|
|
model_path = "finetuned_model"
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
||
|
|
model = AutoModelForCausalLM.from_pretrained(model_path)
|
||
|
|
|
||
|
|
# Generate text
|
||
|
|
prompt = "Create a welcome message for new clients"
|
||
|
|
inputs = tokenizer(prompt, return_tensors="pt")
|
||
|
|
outputs = model.generate(**inputs, max_length=100, num_return_sequences=1)
|
||
|
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||
|
|
print(generated_text)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Testing the Model
|
||
|
|
|
||
|
|
You can test the finetuned model using the `test_model.py` script:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python test_model.py
|
||
|
|
```
|
||
|
|
|
||
|
|
This will load the finetuned model from the `finetuned_model` directory and generate text for the default prompt. You can modify the prompt in the `test_model.py` file to test different inputs.
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
- If you encounter CUDA out of memory errors, try:
|
||
|
|
- Using a smaller model
|
||
|
|
- Reducing the batch size
|
||
|
|
- Using LoRA by setting `self.use_lora = True`
|
||
|
|
- For very large models, consider using 8-bit quantization with `load_in_8bit=True` when loading the model
|
||
|
|
- If you're finetuning on a CPU, the process will be much slower. Consider using a smaller model or fewer epochs
|