feat: Implement Pinecone vector store integration

- Update config.py with Pinecone settings and model configurations - Implement VectorStore class with Pinecone backend - Add comprehensive vector operations (add, search, delete) - Set up proper error handling and metadata management - Add .gitignore for Python project
2025-04-16 23:09:52 +01:00
commit 859c17aad8
27 changed files with 2820 additions and 0 deletions
@@ -0,0 +1,133 @@
+# LLM Finetuning with Hugging Face
+
+This repository contains scripts for finetuning language models using Hugging Face's transformers library.
+
+## Setup
+
+1. Install the required dependencies:
+
+```bash
+pip install -r requirements.txt
+```
+
+2. Make sure you have enough GPU memory for finetuning. For smaller models like OPT-350M, 8GB should be sufficient. For larger models, you may need more.
+
+## Finetuning a Model
+
+The `finetune_model.py` script allows you to finetune a language model using a JSON dataset containing prompts and completions.
+
+### Basic Usage
+
+```bash
+python finetune_model.py
+```
+
+This will use the default settings:
+- Dataset: `datasets/adriana_finetune_dataset.json`
+- Model: `facebook/opt-350m` (a more capable model than GPT-2)
+- Output directory: `finetuned_model`
+- Training epochs: 3
+- Batch size: 4
+- Learning rate: 5e-5
+
+### Model Options
+
+The script is configured to use `facebook/opt-350m` by default, which is a more capable model than GPT-2. Here are some other good options you can use by modifying the `model_name` in the `Args` class:
+
+- **Smaller models** (faster training, less memory):
+  - `facebook/opt-125m` (125M parameters)
+  - `EleutherAI/pythia-70m` (70M parameters)
+  - `facebook/opt-350m` (350M parameters)
+
+- **Medium models** (better quality, more memory):
+  - `facebook/opt-1.3b` (1.3B parameters)
+  - `EleutherAI/pythia-1.4b` (1.4B parameters)
+  - `facebook/opt-2.7b` (2.7B parameters)
+
+- **Large models** (best quality, requires significant memory):
+  - `facebook/opt-6.7b` (6.7B parameters)
+  - `EleutherAI/pythia-6.9b` (6.9B parameters)
+
+For very large models, it's recommended to use LoRA by setting `use_lora = True` in the `Args` class.
+
+### Advanced Usage
+
+You can customize the finetuning process by modifying the `Args` class in the `finetune_model.py` file:
+
+```python
+class Args:
+    def __init__(self):
+        self.dataset_path = "datasets/adriana_finetune_dataset.json"
+        self.model_name = "facebook/opt-1.3b"  # Change to a different model
+        self.output_dir = "finetuned_model"
+        self.num_train_epochs = 5  # Increase epochs for better results
+        self.per_device_train_batch_size = 2  # Adjust based on your GPU memory
+        self.learning_rate = 3e-5  # Adjust learning rate
+        self.use_lora = True  # Enable LoRA for efficient finetuning
+```
+
+## Using LoRA for Efficient Finetuning
+
+LoRA (Low-Rank Adaptation) is a technique that allows for efficient finetuning of large language models by only training a small number of parameters. This is especially useful when you have limited computational resources.
+
+To use LoRA, simply set `self.use_lora = True` in the `Args` class:
+
+```python
+class Args:
+    def __init__(self):
+        # ... other settings ...
+        self.use_lora = True  # Enable LoRA
+```
+
+## Dataset Format
+
+The script expects a JSON file with the following format:
+
+```json
+[
+  {
+    "prompt": "Your prompt here",
+    "completion": "Your completion here"
+  },
+  ...
+]
+```
+
+## Using the Finetuned Model
+
+After finetuning, you can use the model with the Hugging Face transformers library:
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+# Load the finetuned model and tokenizer
+model_path = "finetuned_model"
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+model = AutoModelForCausalLM.from_pretrained(model_path)
+
+# Generate text
+prompt = "Create a welcome message for new clients"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=100, num_return_sequences=1)
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_text)
+```
+
+## Testing the Model
+
+You can test the finetuned model using the `test_model.py` script:
+
+```bash
+python test_model.py
+```
+
+This will load the finetuned model from the `finetuned_model` directory and generate text for the default prompt. You can modify the prompt in the `test_model.py` file to test different inputs.
+
+## Troubleshooting
+
+- If you encounter CUDA out of memory errors, try:
+  - Using a smaller model
+  - Reducing the batch size
+  - Using LoRA by setting `self.use_lora = True`
+- For very large models, consider using 8-bit quantization with `load_in_8bit=True` when loading the model
+- If you're finetuning on a CPU, the process will be much slower. Consider using a smaller model or fewer epochs