FINETUNE_README.md

# LLM Finetuning with Hugging Face

This repository contains scripts for finetuning language models using Hugging Face's transformers library.

## Setup

1. Install the required dependencies:

```bash
pip install -r requirements.txt
```

2. Make sure you have enough GPU memory for finetuning. For smaller models like OPT-350M, 8GB should be sufficient. For larger models, you may need more.

## Finetuning a Model

The `finetune_model.py` script allows you to finetune a language model using a JSON dataset containing prompts and completions.

### Basic Usage

```bash
python finetune_model.py
```

This will use the default settings:
- Dataset: `datasets/adriana_finetune_dataset.json`
- Model: `facebook/opt-350m` (a more capable model than GPT-2)
- Output directory: `finetuned_model`
- Training epochs: 3
- Batch size: 4
- Learning rate: 5e-5

### Model Options

The script is configured to use `facebook/opt-350m` by default, which is a more capable model than GPT-2. Here are some other good options you can use by modifying the `model_name` in the `Args` class:

- **Smaller models** (faster training, less memory):
  - `facebook/opt-125m` (125M parameters)
  - `EleutherAI/pythia-70m` (70M parameters)
  - `facebook/opt-350m` (350M parameters)

- **Medium models** (better quality, more memory):
  - `facebook/opt-1.3b` (1.3B parameters)
  - `EleutherAI/pythia-1.4b` (1.4B parameters)
  - `facebook/opt-2.7b` (2.7B parameters)

- **Large models** (best quality, requires significant memory):
  - `facebook/opt-6.7b` (6.7B parameters)
  - `EleutherAI/pythia-6.9b` (6.9B parameters)

For very large models, it's recommended to use LoRA by setting `use_lora = True` in the `Args` class.

### Advanced Usage

You can customize the finetuning process by modifying the `Args` class in the `finetune_model.py` file:

```python
class Args:
    def __init__(self):
        self.dataset_path = "datasets/adriana_finetune_dataset.json"
        self.model_name = "facebook/opt-1.3b"  # Change to a different model
        self.output_dir = "finetuned_model"
        self.num_train_epochs = 5  # Increase epochs for better results
        self.per_device_train_batch_size = 2  # Adjust based on your GPU memory
        self.learning_rate = 3e-5  # Adjust learning rate
        self.use_lora = True  # Enable LoRA for efficient finetuning
```

## Using LoRA for Efficient Finetuning

LoRA (Low-Rank Adaptation) is a technique that allows for efficient finetuning of large language models by only training a small number of parameters. This is especially useful when you have limited computational resources.

To use LoRA, simply set `self.use_lora = True` in the `Args` class:

```python
class Args:
    def __init__(self):
        # ... other settings ...
        self.use_lora = True  # Enable LoRA
```

## Dataset Format

The script expects a JSON file with the following format:

```json
[
  {
    "prompt": "Your prompt here",
    "completion": "Your completion here"
  },
  ...
]
```

## Using the Finetuned Model

After finetuning, you can use the model with the Hugging Face transformers library:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the finetuned model and tokenizer
model_path = "finetuned_model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

# Generate text
prompt = "Create a welcome message for new clients"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100, num_return_sequences=1)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```

## Testing the Model

You can test the finetuned model using the `test_model.py` script:

```bash
python test_model.py
```

This will load the finetuned model from the `finetuned_model` directory and generate text for the default prompt. You can modify the prompt in the `test_model.py` file to test different inputs.

## Troubleshooting

- If you encounter CUDA out of memory errors, try:
  - Using a smaller model
  - Reducing the batch size
  - Using LoRA by setting `self.use_lora = True`
- For very large models, consider using 8-bit quantization with `load_in_8bit=True` when loading the model
- If you're finetuning on a CPU, the process will be much slower. Consider using a smaller model or fewer epochs
feat: Implement Pinecone vector store integration 2025-04-16 23:09:52 +01:00			`# LLM Finetuning with Hugging Face`

			`This repository contains scripts for finetuning language models using Hugging Face's transformers library.`

			`## Setup`

			`1. Install the required dependencies:`

			```bash
			`pip install -r requirements.txt`
			```

			`2. Make sure you have enough GPU memory for finetuning. For smaller models like OPT-350M, 8GB should be sufficient. For larger models, you may need more.`

			`## Finetuning a Model`

			The `finetune_model.py` script allows you to finetune a language model using a JSON dataset containing prompts and completions.

			`### Basic Usage`

			```bash
			`python finetune_model.py`
			```

			`This will use the default settings:`
			- Dataset: `datasets/adriana_finetune_dataset.json`
			- Model: `facebook/opt-350m` (a more capable model than GPT-2)
			- Output directory: `finetuned_model`
			`- Training epochs: 3`
			`- Batch size: 4`
			`- Learning rate: 5e-5`

			`### Model Options`

			The script is configured to use `facebook/opt-350m` by default, which is a more capable model than GPT-2. Here are some other good options you can use by modifying the `model_name` in the `Args` class:

			`- Smaller models (faster training, less memory):`
			- `facebook/opt-125m` (125M parameters)
			- `EleutherAI/pythia-70m` (70M parameters)
			- `facebook/opt-350m` (350M parameters)

			`- Medium models (better quality, more memory):`
			- `facebook/opt-1.3b` (1.3B parameters)
			- `EleutherAI/pythia-1.4b` (1.4B parameters)
			- `facebook/opt-2.7b` (2.7B parameters)

			`- Large models (best quality, requires significant memory):`
			- `facebook/opt-6.7b` (6.7B parameters)
			- `EleutherAI/pythia-6.9b` (6.9B parameters)

			For very large models, it's recommended to use LoRA by setting `use_lora = True` in the `Args` class.

			`### Advanced Usage`

			You can customize the finetuning process by modifying the `Args` class in the `finetune_model.py` file:

			```python
			`class Args:`
			`def __init__(self):`
			`self.dataset_path = "datasets/adriana_finetune_dataset.json"`
			`self.model_name = "facebook/opt-1.3b" # Change to a different model`
			`self.output_dir = "finetuned_model"`
			`self.num_train_epochs = 5 # Increase epochs for better results`
			`self.per_device_train_batch_size = 2 # Adjust based on your GPU memory`
			`self.learning_rate = 3e-5 # Adjust learning rate`
			`self.use_lora = True # Enable LoRA for efficient finetuning`
			```

			`## Using LoRA for Efficient Finetuning`

			`LoRA (Low-Rank Adaptation) is a technique that allows for efficient finetuning of large language models by only training a small number of parameters. This is especially useful when you have limited computational resources.`

			To use LoRA, simply set `self.use_lora = True` in the `Args` class:

			```python
			`class Args:`
			`def __init__(self):`
			`# ... other settings ...`
			`self.use_lora = True # Enable LoRA`
			```

			`## Dataset Format`

			`The script expects a JSON file with the following format:`

			```json
			`[`
			`{`
			`"prompt": "Your prompt here",`
			`"completion": "Your completion here"`
			`},`
			`...`
			`]`
			```

			`## Using the Finetuned Model`

			`After finetuning, you can use the model with the Hugging Face transformers library:`

			```python
			`from transformers import AutoModelForCausalLM, AutoTokenizer`

			`# Load the finetuned model and tokenizer`
			`model_path = "finetuned_model"`
			`tokenizer = AutoTokenizer.from_pretrained(model_path)`
			`model = AutoModelForCausalLM.from_pretrained(model_path)`

			`# Generate text`
			`prompt = "Create a welcome message for new clients"`
			`inputs = tokenizer(prompt, return_tensors="pt")`
			`outputs = model.generate(**inputs, max_length=100, num_return_sequences=1)`
			`generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)`
			`print(generated_text)`
			```

			`## Testing the Model`

			You can test the finetuned model using the `test_model.py` script:

			```bash
			`python test_model.py`
			```

			This will load the finetuned model from the `finetuned_model` directory and generate text for the default prompt. You can modify the prompt in the `test_model.py` file to test different inputs.

			`## Troubleshooting`

			`- If you encounter CUDA out of memory errors, try:`
			`- Using a smaller model`
			`- Reducing the batch size`
			- Using LoRA by setting `self.use_lora = True`
			- For very large models, consider using 8-bit quantization with `load_in_8bit=True` when loading the model
			`- If you're finetuning on a CPU, the process will be much slower. Consider using a smaller model or fewer epochs`