From d7441f40891e55a1f5a7cd7e1b7e550a399122b5 Mon Sep 17 00:00:00 2001
From: Your Name <you@example.com>
Date: Wed, 13 Aug 2025 23:59:28 +0000
Subject: [PATCH] upated readme

---
 .ipynb_checkpoints/README-checkpoint.md    | 1305 ++++++++++----------
 .ipynb_checkpoints/untitled-checkpoint.txt |    0
 README.md                                  | 1305 ++++++++++----------
 untitled.txt                               |    0
 4 files changed, 1268 insertions(+), 1342 deletions(-)
 create mode 100644 .ipynb_checkpoints/untitled-checkpoint.txt
 create mode 100644 untitled.txt

diff --git a/.ipynb_checkpoints/README-checkpoint.md b/.ipynb_checkpoints/README-checkpoint.md
index f1ba946..517945e 100644
--- a/.ipynb_checkpoints/README-checkpoint.md
+++ b/.ipynb_checkpoints/README-checkpoint.md
@@ -1,763 +1,726 @@
-# Fine-Tune Task: NLP Pipeline Framework
+# Fine-Tuning Task Framework
 
-A comprehensive framework for fine-tuning NLP models with organized YAML configurations, supporting multiple tasks (classification, completion, styling, matching).
+A comprehensive framework for fine-tuning Large Language Models (LLMs) across multiple task types including classification, completion, styling, and matching.
 
-## Supported Tasks
+## Table of Contents
 
-This framework supports multiple NLP tasks with organized configurations:
+- [Overview](#overview)
+- [Architecture](#architecture)
+- [Task Types](#task-types)
+- [Quick Start](#quick-start)
+- [Configuration Guide](#configuration-guide)
+- [Scripts & Commands](#scripts--commands)
+- [Complete Workflows](#complete-workflows)
+- [API Reference](#api-reference)
+- [Troubleshooting](#troubleshooting)
+- [Contributing](#contributing)
 
-- **Classification**: Text classification, sentiment analysis, topic classification
-- **Completion**: Text generation, code completion, story generation
-- **Styling**: Style transfer, tone classification, writing style adaptation
-- **Matching**: Semantic matching, entity matching, similarity scoring
+## Overview
 
-### Current Implementation Status
+This framework provides a unified approach to fine-tuning LLMs for various NLP tasks. It's designed to be:
 
-- **Classification**: ✅ Fully implemented with emotion classification example
-- **Styling**: ✅ Fully implemented with style transfer and LoRA fine-tuning
-- **Completion**: Planned for future updates
-- **Matching**: Planned for future updates
+- **Task-Agnostic**: Same pipeline structure for different task types
+- **Configuration-Driven**: YAML-based configuration for all parameters
+- **Developer-Friendly**: Clear scripts and comprehensive logging
+- **Production-Ready**: Built-in validation, error handling, and optimization
 
-**Note**: Classification and styling tasks are fully supported. Other tasks (completion, matching) are planned for future updates.
+## Architecture
 
-## Project Structure
+The framework follows a **modular pipeline architecture**:
 
 ```
-fine-tune-task/
-├── configs/                    # YAML configuration files
-│   ├── classification/         # ✅ Implemented
-│   │   ├── emotion.yaml       # Emotion classification
-│   │   └── custom.yaml        # Custom dataset
-│   ├── styling/               # ✅ Implemented
-│   │   └── formal.yaml        # Formal style transfer
-│   ├── completion/             # Planned for future updates
-│   └── matching/              # Planned for future updates
-├── data/                       # Data directories
-│   ├── raw/                    # Raw input data
-│   │   ├── classification/     # ✅ Implemented
-│   │   ├── styling/           # ✅ Implemented
-│   │   ├── completion/         # Planned for future updates
-│   │   └── matching/          # Planned for future updates
-│   └── processed/              # Processed data
-│       ├── classification/     # ✅ Implemented
-│       ├── styling/           # ✅ Implemented
-│       ├── completion/         # Planned for future updates
-│       └── matching/          # Planned for future updates
-├── pipelines/                  # Core pipeline scripts
-│   ├── classification/         # ✅ Implemented
-│   │   ├── data_processor.py  # Data processing
-│   │   ├── train.py          # Training
-│   │   └── inference.py      # Inference
-│   ├── styling/               # ✅ Implemented
-│   │   ├── data_processor.py  # Style data processing
-│   │   ├── train.py          # LoRA fine-tuning
-│   │   └── inference.py      # Style transfer inference
-│   ├── completion/            # Planned for future updates
-│   └── matching/             # Planned for future updates
-├── scripts/                    # User-friendly scripts
-│   ├── classification/         # ✅ Implemented
-│   │   ├── data_processor.py  # Data processing script
-│   │   ├── trainer.py        # Training script
-│   │   └── inference.py      # Inference script
-│   ├── styling/               # ✅ Implemented
-│   │   ├── data_processor.py  # Style data processing script
-│   │   ├── train.py          # Training script
-│   │   └── inference.py      # Inference script
-│   ├── completion/            # Planned for future updates
-│   └── matching/             # Planned for future updates
-├── results/                    # Model outputs
-│   ├── classification/         # ✅ Implemented
-│   ├── styling/              # ✅ Implemented
-│   ├── completion/            # Planned for future updates
-│   └── matching/             # Planned for future updates
-└── utils/                      # Shared utility modules
+Raw Data → Data Processing → Model Training → Inference/Evaluation
+    ↓              ↓              ↓              ↓
+  JSONL/CSV    HuggingFace    Trained      Ready for
+  Files        Datasets       Models       Production
 ```
 
-## Quick Start (Classification Task)
+### Core Components
 
-### 1. Setup Environment
+1. **Data Processors**: Convert raw data to training-ready formats
+2. **Training Pipelines**: Task-specific training with optimization
+3. **Inference Engines**: Production-ready text generation/classification
+4. **Configuration Management**: YAML-based parameter control
+5. **Utility Scripts**: Command-line interfaces for all operations
+
+## Task Types
+
+### 1. Classification Task
+
+**Purpose**: Text classification, sentiment analysis, topic categorization
+
+**Data Format**: 
+```jsonl
+{"text": "I love this product!", "label": "positive"}
+{"text": "This is terrible", "label": "negative"}
+```
+
+**Output**: Classification probabilities and predicted labels
+
+**Use Cases**: Sentiment analysis, spam detection, content moderation
+
+### 2. Completion Task
+
+**Purpose**: Text generation, story completion, code generation
+
+**Data Format**:
+```jsonl
+{"prompt": "Once upon a time", "completion": "there was a brave knight..."}
+{"prompt": "def calculate_sum", "completion": "(numbers): return sum(numbers)"}
+```
+
+**Output**: Generated text continuations
+
+**Use Cases**: Creative writing, code completion, content generation
+
+### 3. Styling Task
+
+**Purpose**: Style transfer, tone modification, writing style adaptation
+
+**Data Format**:
+```jsonl
+{"text": "Hey there!", "styled_text": "Hello, how are you?"}
+{"text": "I'm gonna go", "styled_text": "I will be going"}
+```
+
+**Output**: Text rewritten in target style
+
+**Use Cases**: Formalization, casualization, domain adaptation
+
+### 4. Matching Task
+
+**Purpose**: Semantic similarity, question-answer matching, paraphrase detection
+
+**Data Format**:
+```jsonl
+{"text1": "What is AI?", "text2": "Artificial Intelligence", "label": "similar"}
+{"text1": "Weather today", "text2": "Cooking recipes", "label": "different"}
+```
+
+**Output**: Similarity scores or binary classifications
+
+**Use Cases**: Search relevance, duplicate detection, semantic matching
+
+## Quick Start
+
+### Prerequisites
 
 ```bash
 # Install dependencies
 pip install -r requirements.txt
 
-# Set Python path
-export PYTHONPATH=.
+# Verify installation
+python -c "import torch, transformers, datasets; print('✅ All packages installed')"
 ```
 
-### 2. Data Processing
+### Basic Workflow
 
 ```bash
-# Process emotion dataset
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml
+# 1. Process data
+python scripts/[task_type]/data_processor.py --config configs/[task_type]/[config].yaml
 
-# Process with custom parameters
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml --max-samples 1000
+# 2. Train model
+python scripts/[task_type]/train.py train --config configs/[task_type]/[config].yaml
 
-# Check output location
-ls -la ./data/processed/classification/emotion/classification/
+# 3. Run inference
+python scripts/[task_type]/inference.py infer --config configs/[task_type]/[config].yaml
 ```
 
-**Expected Output:**
-```
-Data processing completed successfully!
-  Data source: huggingface
-  Dataset: dair-ai/emotion
-  Total samples: 2999
-  Unique labels: 6
-  Split sizes: {'train': 1000, 'validation': 999, 'test': 1000}
-  Output directory: ./data/processed/classification/emotion
-```
+## Configuration Guide
 
-### 3. Model Training
+### YAML Structure
 
-```bash
-# Train using processed data
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml
-
-# Train with custom parameters
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml --num-epochs 5 --batch-size 32
-
-# Check model output
-ls -la ./results/classification/emotion_model/
-```
-
-**Expected Output:**
-```
-Training completed successfully!
-  Model: bert-base-uncased
-  Data directory: ./data/processed/classification/emotion
-  Training for 3 epochs with batch size 16
-  Model saved to: ./results/classification/emotion_model
-```
-
-### 4. Model Inference
-
-```bash
-# Run inference
-python scripts/classification/inference.py --config configs/classification/emotion.yaml --input-text "I love this product!"
-
-# File-based inference
-python scripts/classification/inference.py --config configs/classification/emotion.yaml --input-file input.txt --output-file predictions.jsonl
-```
-
-**Expected Output:**
-```
-Inference completed successfully!
-  Loading model from: ./results/classification/emotion_model
-  Predicted label: joy
-  Confidence: 0.8542
-  Top 3 predictions:
-    - joy: 0.8542
-    - love: 0.1234
-    - surprise: 0.0224
-```
-
-## Quick Start (Styling Task)
-
-### 1. Setup Environment
-
-```bash
-# Install dependencies (including unsloth for styling)
-pip install -r requirements.txt
-
-# Set Python path
-export PYTHONPATH=.
-```
-
-### 2. Data Processing
-
-```bash
-# Process style transfer dataset
-python scripts/styling/data_processor.py --config configs/styling/formal.yaml
-
-# Create HuggingFace dataset
-python scripts/styling/data_processor.py --config configs/styling/formal.yaml --create-hf-dataset
-
-# Check output location
-ls -la ./data/processed/styling/formal/
-```
-
-**Expected Output:**
-```
-Styling data processing completed successfully!
-  Data source: custom
-  Data file: ./data/raw/styling/sample_formal.jsonl
-  Total samples: 5
-  Split sizes: {'train': 3, 'validation': 1, 'test': 1}
-  Output directory: ./data/processed/styling/formal
-  Style instruction: Rewrite the following text in a formal style
-```
-
-### 3. Model Training
-
-```bash
-# Train using processed data (automatically loads from YAML output_dir)
-python scripts/styling/train.py example
-
-# Custom training
-python scripts/styling/train.py train --config configs/styling/formal.yaml --epochs 3 --batch-size 4
-
-# Check model output
-ls -la ./models/styling/
-```
-
-**Expected Output:**
-```
-Training completed successfully!
-  Model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
-  Dataset: Loaded from ./data/processed/styling/formal
-  Training for 3 epochs with batch size 4
-  Model saved to: ./models/styling
-```
-
-### 4. Model Inference
-
-```bash
-# Single text style transfer
-python scripts/styling/inference.py infer --config configs/styling/formal.yaml --text "Hey, what's up?"
-
-# Batch processing
-python scripts/styling/inference.py batch
-
-# Interactive mode
-python scripts/styling/inference.py infer --config configs/styling/formal.yaml
-```
-
-**Expected Output:**
-```
-Inference completed successfully!
-  Input: Hey, what's up?
-  Output: Hello, how are you doing?
-  Style: Formal
-```
-
-## Adding New Tasks
-
-To add a new task (e.g., completion, styling, matching), follow these steps:
-
-### Example: Styling Task (Already Implemented)
-
-The styling task demonstrates a complete implementation:
-
-1. **Task Directory Structure** ✅
-```bash
-configs/styling/           # YAML configurations
-data/raw/styling/         # Raw style transfer data
-data/processed/styling/   # Processed data
-pipelines/styling/        # Core pipeline scripts
-scripts/styling/          # User-friendly scripts
-models/styling/           # Trained models
-```
-
-2. **Pipeline Components** ✅
-- **Data Processor**: Handles style transfer datasets with instruction/input/output format
-- **Trainer**: LoRA fine-tuning using Unsloth for efficiency
-- **Inference**: Style transfer with streaming and batch processing
-
-3. **Key Features** ✅
-- Automatic EOS token handling: `text + tokenizer.eos_token`
-- Dataset mapping: `dataset.map(formatting_prompts_func, batched=True)`
-- YAML integration: Uses `data.output_dir` for automatic dataset loading
-- HuggingFace dataset export and loading
-
-### For Other Tasks (completion, matching)
-
-1. **Create Task Directory Structure**
-```bash
-# Create task directories
-mkdir -p configs/completion
-mkdir -p data/raw/completion data/processed/completion
-mkdir -p pipelines/completion
-mkdir -p scripts/completion
-mkdir -p results/completion
-mkdir -p tasks/completion
-mkdir -p models/completion
-```
-
-2. **Create Task Configuration**
-
-```bash
-# Create YAML configuration for new task
-cat > configs/completion/text_generation.yaml << 'EOF'
-# Text Generation Task Configuration
-task:
-  name: "completion"
-  type: "text_generation"
-
-# Data Processing Configuration
-data:
-  source: "huggingface"
-  dataset_name: "your-dataset-name"
-  output_dir: "./data/processed/completion/text_generation"
-  max_samples: 1000
-  # ... other data parameters
-
-# Model Configuration
-model:
-  name: "gpt2"  # Different model for completion
-  max_length: 1024
-  # ... model parameters
-
-# Training Configuration
-training:
-  num_epochs: 3
-  batch_size: 8  # Smaller batch for generation
-  learning_rate: 5e-5
-  data_dir: "./data/processed/completion/text_generation"
-  output_dir: "./results/completion/text_generation_model"
-
-# Inference Configuration
-inference:
-  model_path: "./results/completion/text_generation_model"
-  device: "auto"
-  batch_size: 1  # Generation is typically one at a time
-  max_length: 100
-  temperature: 0.7
-EOF
-```
-
-3. **Create Pipeline Scripts**
-
-Copy and modify the classification pipeline scripts:
-
-```bash
-# Copy classification scripts as templates
-cp pipelines/classification/data_processor.py pipelines/completion/
-cp pipelines/classification/train.py pipelines/completion/
-cp pipelines/classification/inference.py pipelines/completion/
-
-# Copy task scripts
-cp scripts/classification/data_processor.py scripts/completion/
-cp scripts/classification/trainer.py scripts/completion/
-cp scripts/classification/inference.py scripts/completion/
-```
-
-4. **Modify Pipeline Code**
-
-Update the pipeline scripts for your specific task:
-
-1. **Data Processor** (`pipelines/completion/data_processor.py`):
-   - Update data loading logic for completion datasets
-   - Modify preprocessing for text generation
-   - Adjust output format for completion tasks
-
-2. **Trainer** (`pipelines/completion/train.py`):
-   - Change model type to generation models (GPT, T5, etc.)
-   - Update training loop for text generation
-   - Modify evaluation metrics
-
-3. **Inference** (`pipelines/completion/inference.py`):
-   - Update inference for text generation
-   - Add generation parameters (temperature, top-k, etc.)
-   - Modify output format
-
-5. **Update Task Scripts**
-
-Modify the task scripts to use your new pipeline:
-
-```python
-# scripts/completion/data_processor.py
-def run_with_yaml_config(config_path: str, **cli_overrides):
-    cmd = [
-        "python", "pipelines/completion/data_processor.py",  # Updated path
-        "--config", config_path
-    ]
-    # ... rest of the function
-```
-
-6. **Create Task-Specific Models**
-
-```bash
-# Create model directory
-mkdir -p models/completion
-
-# Add task-specific model classes
-cat > models/completion/text_generator.py << 'EOF'
-from transformers import AutoModelForCausalLM, AutoTokenizer
-
-class TextGenerator:
-    def __init__(self, model_name):
-        self.model = AutoModelForCausalLM.from_pretrained(model_name)
-        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
-    
-    def generate(self, prompt, max_length=100, temperature=0.7):
-        # Implementation for text generation
-        pass
-EOF
-```
-
-7. **Test Your New Task**
-
-```bash
-# Test data processing
-python scripts/completion/data_processor.py --config configs/completion/text_generation.yaml
-
-# Test training
-python scripts/completion/trainer.py --config configs/completion/text_generation.yaml
-
-# Test inference
-python scripts/completion/inference.py --config configs/completion/text_generation.yaml --input-text "Once upon a time"
-```
-
-## YAML Configuration Guide
-
-### Configuration Structure
-
-Each YAML file is organized into clear sections:
+All configurations follow this hierarchical structure:
 
 ```yaml
 # Task Configuration
 task:
-  name: "classification"  # or "completion", "styling", "matching"
-  type: "sequence_classification"  # or "text_generation", "style_transfer", "semantic_matching"
+  name: "task_type"                    # classification, completion, styling, matching
+  type: "specific_type"                # e.g., "sentiment_analysis", "style_transfer"
 
-# Data Processing Configuration
+# Data Configuration
 data:
-  source: "huggingface"                    # "huggingface" or "custom"
-  dataset_name: "dair-ai/emotion"         # HuggingFace dataset name
-  output_dir: "./data/processed/classification/emotion"
-  max_samples: 1000                        # Limit dataset size
-  # ... other data parameters
+  source: "custom"                     # "custom" or "huggingface"
+  data_path: "./data/raw/..."          # Path to raw data
+  input_field: "text"                  # Field name for input
+  output_field: "label"                # Field name for output
+  instruction: "Task instruction"      # For instruction-following tasks
 
 # Model Configuration
 model:
-  name: "bert-base-uncased"                # Model from HuggingFace Hub
-  max_length: 512                          # Sequence length
-  num_labels: 6                            # Number of classes
+  name: "model_name"                   # HuggingFace model identifier
+  max_seq_length: 2048                 # Maximum sequence length
+  dtype: null                          # Data type (auto-detected)
+  load_in_4bit: true                   # 4-bit quantization
 
 # Training Configuration
 training:
-  num_epochs: 3                            # Training epochs
-  batch_size: 16                           # Batch size
-  learning_rate: 2e-5                      # Learning rate
-  data_dir: "./data/processed/classification/emotion"
-  output_dir: "./results/classification/emotion_model"
+  num_epochs: 3                        # Training epochs
+  batch_size: 4                        # Batch size
+  learning_rate: 2e-4                  # Learning rate
+  warmup_steps: 5                      # Warmup steps
+  max_steps: 60                        # Maximum training steps
 
 # Inference Configuration
 inference:
-  model_path: "./results/classification/emotion_model"
-  device: "auto"                           # "auto", "cuda", "cpu"
-  batch_size: 32                           # Inference batch size
-  return_top_k: 3                          # Top K predictions
+  batch_size: 32                       # Inference batch size
+  max_new_tokens: 128                  # Max tokens to generate
+  temperature: 0.8                     # Sampling temperature
 ```
 
-### Styling Configuration Example
+### Configuration Parameters
+
+#### Data Processing Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `source` | string | "custom" | Data source type |
+| `data_path` | string | required | Path to raw data file |
+| `input_field` | string | "text" | Input field name |
+| `output_field` | string | "label" | Output field name |
+| `instruction` | string | task-specific | Task instruction |
+| `data_format` | string | "jsonl" | Data file format |
+| `max_length` | int | 256 | Maximum text length |
+| `min_length` | int | 10 | Minimum text length |
+| `clean_text` | boolean | true | Enable text cleaning |
+| `lowercase` | boolean | false | Convert to lowercase |
+
+#### Model Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `name` | string | required | HuggingFace model name |
+| `max_seq_length` | int | 2048 | Maximum sequence length |
+| `dtype` | string | null | Data type (auto-detected) |
+| `load_in_4bit` | boolean | true | Enable 4-bit quantization |
+| `token` | string | null | HuggingFace access token |
+
+#### Training Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `num_epochs` | int | 1 | Number of training epochs |
+| `batch_size` | int | 2 | Training batch size |
+| `learning_rate` | float | 2e-4 | Learning rate |
+| `weight_decay` | float | 0.01 | Weight decay |
+| `warmup_steps` | int | 5 | Warmup steps |
+| `max_steps` | int | 60 | Maximum training steps |
+| `gradient_accumulation_steps` | int | 4 | Gradient accumulation |
+
+#### LoRA Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `lora_r` | int | 16 | LoRA rank |
+| `lora_alpha` | int | 16 | LoRA alpha |
+| `lora_dropout` | float | 0 | LoRA dropout |
+| `target_modules` | list | ["q_proj", "k_proj", "v_proj", "o_proj"] | Target modules for LoRA |
+
+### Environment Variables
+
+```bash
+# HuggingFace token for gated models
+export HF_TOKEN="hf_..."
+
+# CUDA device selection
+export CUDA_VISIBLE_DEVICES="0"
+
+# Logging level
+export LOG_LEVEL="INFO"
+```
+
+## Scripts & Commands
+
+### Data Processing Scripts
+
+#### Basic Usage
+
+```bash
+python scripts/[task_type]/data_processor.py --config configs/[task_type]/[config].yaml
+```
+
+#### Advanced Options
+
+```bash
+python scripts/[task_type]/data_processor.py \
+  --config configs/[task_type]/[config].yaml \
+  --max-samples 1000 \
+  --log-level DEBUG \
+  --create-hf-dataset \
+  --hf-dataset-path ./datasets/[task_name]
+```
+
+#### Command Line Arguments
+
+| Argument | Type | Default | Description |
+|----------|------|---------|-------------|
+| `--config` | string | required | YAML configuration file |
+| `--max-samples` | int | all | Maximum samples to process |
+| `--log-level` | string | "INFO" | Logging level |
+| `--create-hf-dataset` | flag | false | Create HuggingFace dataset |
+| `--hf-dataset-path` | string | auto | HuggingFace dataset path |
+
+### Training Scripts
+
+#### Basic Usage
+
+```bash
+python scripts/[task_type]/train.py train --config configs/[task_type]/[config].yaml
+```
+
+#### Advanced Options
+
+```bash
+python scripts/[task_type]/train.py train \
+  --config configs/[task_type]/[config].yaml \
+  --epochs 5 \
+  --batch-size 8 \
+  --learning-rate 1e-4 \
+  --max-steps 100
+```
+
+#### Command Line Arguments
+
+| Argument | Type | Default | Description |
+|----------|------|---------|-------------|
+| `--config` | string | required | YAML configuration file |
+| `--epochs` | int | YAML value | Override training epochs |
+| `--batch-size` | int | YAML value | Override batch size |
+| `--learning-rate` | float | YAML value | Override learning rate |
+| `--max-steps` | int | YAML value | Override max steps |
+| `--output-dir` | string | YAML value | Override output directory |
+
+### Inference Scripts
+
+#### Basic Usage
+
+```bash
+python scripts/[task_type]/inference.py infer \
+  --config configs/[task_type]/[config].yaml \
+  --input-text "Your input text here"
+```
+
+#### Advanced Options
+
+```bash
+python scripts/[task_type]/inference.py infer \
+  --config configs/[task_type]/[config].yaml \
+  --input-text "Your input text here" \
+  --max-tokens 256 \
+  --temperature 0.7 \
+  --stream
+```
+
+#### Command Line Arguments
+
+| Argument | Type | Default | Description |
+|----------|------|---------|-------------|
+| `--config` | string | required | YAML configuration file |
+| `--input-text` | string | required | Text to process |
+| `--max-tokens` | int | 128 | Maximum tokens to generate |
+| `--temperature` | float | 0.8 | Sampling temperature |
+| `--stream` | flag | false | Enable streaming generation |
+
+### Batch Processing
+
+```bash
+# Process multiple inputs from file
+python scripts/[task_type]/inference.py batch \
+  --config configs/[task_type]/[config].yaml \
+  --input-file input.txt \
+  --output-file output.txt
+```
+
+### Interactive Mode
+
+```bash
+# Enter interactive mode for testing
+python scripts/[task_type]/inference.py interactive \
+  --config configs/[task_type]/[config].yaml
+```
+
+## Complete Workflows
+
+### Classification Task Workflow
+
+#### 1. Data Preparation
+
+```jsonl
+# data/raw/classification/sentiment.jsonl
+{"text": "I love this movie!", "label": "positive"}
+{"text": "This is terrible", "label": "negative"}
+{"text": "It's okay", "label": "neutral"}
+```
+
+#### 2. Configuration
 
 ```yaml
-# Styling Task Configuration
+# configs/classification/sentiment.yaml
+task:
+  name: "classification"
+  type: "sentiment_analysis"
+
+data:
+  source: "custom"
+  data_path: "./data/raw/classification/sentiment.jsonl"
+  input_field: "text"
+  output_field: "label"
+  instruction: "Classify the sentiment of the following text"
+
+model:
+  name: "microsoft/DialoGPT-medium"
+  max_seq_length: 512
+
+training:
+  num_epochs: 3
+  batch_size: 8
+  learning_rate: 3e-5
+```
+
+#### 3. Execute Pipeline
+
+```bash
+# Process data
+python scripts/classification/data_processor.py --config configs/classification/sentiment.yaml
+
+# Train model
+python scripts/classification/train.py train --config configs/classification/sentiment.yaml
+
+# Run inference
+python scripts/classification/inference.py infer \
+  --config configs/classification/sentiment.yaml \
+  --input-text "This product exceeded my expectations!"
+```
+
+### Styling Task Workflow
+
+#### 1. Data Preparation
+
+```jsonl
+# data/raw/styling/formal.jsonl
+{"text": "Hey there!", "styled_text": "Hello, how are you?"}
+{"text": "I'm gonna go", "styled_text": "I will be going"}
+{"text": "This is cool", "styled_text": "This is quite impressive"}
+```
+
+#### 2. Configuration
+
+```yaml
+# configs/styling/formal.yaml
 task:
   name: "styling"
   type: "style_transfer"
 
-# Data Processing Configuration
 data:
   source: "custom"
-  data_path: "./data/raw/styling/sample_formal.jsonl"
+  data_path: "./data/raw/styling/formal.jsonl"
   input_field: "text"
   output_field: "styled_text"
   instruction: "Rewrite the following text in a formal style"
-  output_dir: "./data/processed/styling/formal"
-  output_format: "alpaca"
 
-# Model Configuration
 model:
-  training_model: "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit"
-  training_max_seq_length: 2048
-  training_load_in_4bit: true
+  name: "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit"
+  max_seq_length: 2048
 
-# Training Configuration
 training:
   num_epochs: 3
-  batch_size: 2
+  batch_size: 4
   learning_rate: 2e-4
-  weight_decay: 0.01
-
-# Inference Configuration
-inference:
-  batch_size: 1
-  max_new_tokens: 128
-  temperature: 0.8
+  model_output_dir: "./models/styling"
 ```
 
-### Available Configuration Files
-
-- `configs/classification/emotion.yaml` - Emotion classification with HuggingFace dataset
-- `configs/classification/custom.yaml` - Custom dataset processing
-- `configs/styling/formal.yaml` - Formal style transfer with LoRA fine-tuning
-
-## Usage Examples
-
-### Data Processing Examples
+#### 3. Execute Pipeline
 
 ```bash
-# 1. Use YAML config only
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml
-
-# 2. Override YAML values
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml --max-samples 500
-
-# 3. Use CLI only (backward compatibility)
-python scripts/classification/data_processor.py --data-source huggingface --dataset-name dair-ai/emotion
-
-# 4. Run examples
-python scripts/classification/data_processor.py examples
-```
-
-### Training Examples
-
-```bash
-# 1. Use YAML config only
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml
-
-# 2. Override YAML values
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml --num-epochs 5
-
-# 3. Use CLI only
-python scripts/classification/trainer.py --model-name bert-base-uncased --num-epochs 3
-
-# 4. Run examples
-python scripts/classification/trainer.py examples
-```
-
-### Inference Examples
-
-```bash
-# 1. Single text prediction
-python scripts/classification/inference.py --config configs/classification/emotion.yaml --input-text "I love this product!"
-
-# 2. File-based prediction
-python scripts/classification/inference.py --config configs/classification/emotion.yaml --input-file input.txt --output-file predictions.jsonl
-
-# 3. Interactive mode
-python scripts/classification/inference.py --config configs/classification/emotion.yaml
-
-# 4. Run examples
-python scripts/classification/inference.py examples
-```
-
-### Styling Examples
-
-```bash
-# 1. Data Processing
+# Process data
 python scripts/styling/data_processor.py --config configs/styling/formal.yaml
-python scripts/styling/data_processor.py --config configs/styling/formal.yaml --create-hf-dataset
 
-# 2. Training
-python scripts/styling/train.py example
-python scripts/styling/train.py train --config configs/styling/formal.yaml --epochs 2
+# Train model
+python scripts/styling/train.py train --config configs/styling/formal.yaml
 
-# 3. Inference
-python scripts/styling/inference.py infer --config configs/styling/formal.yaml --text "Hey, what's up?"
-python scripts/styling/inference.py batch
-python scripts/styling/inference.py infer --config configs/styling/formal.yaml
-
-# 4. Run examples
-python scripts/styling/data_processor.py examples
-python scripts/styling/train.py features
-python scripts/styling/inference.py features
+# Run inference
+python scripts/styling/inference.py infer \
+  --config configs/styling/formal.yaml \
+  --instruction "Rewrite in formal style" \
+  --input-text "Hey there! What's up?"
 ```
 
-## Troubleshooting Common Errors
+### Completion Task Workflow
 
-### 1. ModuleNotFoundError: No module named 'utils'
+#### 1. Data Preparation
 
-**Error:**
-```
-ModuleNotFoundError: No module named 'utils'
+```jsonl
+# data/raw/completion/story.jsonl
+{"prompt": "Once upon a time", "completion": "there was a brave knight who lived in a castle..."}
+{"prompt": "The dragon roared", "completion": "and the ground shook beneath its massive feet..."}
 ```
 
-**Solution:**
-```bash
-# Set Python path before running scripts
-export PYTHONPATH=.
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml
-```
+#### 2. Configuration
 
-### 2. Model Path Not Found
-
-**Error:**
-```
-Model path not found: ./results/classification/emotion_model
-```
-
-**Solution:**
-```bash
-# Train the model first
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml
-
-# Then run inference
-python scripts/classification/inference.py --config configs/classification/emotion.yaml
-```
-
-### 3. Data Directory Not Found
-
-**Error:**
-```
-Data directory not found: ./data/processed/classification/emotion
-```
-
-**Solution:**
-```bash
-# Process data first
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml
-
-# Then train
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml
-```
-
-### 4. YAML Configuration Errors
-
-**Error:**
-```
-data_processor.py: error: --data-source is required (either in YAML config or CLI)
-```
-
-**Solution:**
-Check your YAML file structure. It should have:
 ```yaml
-data:
-  source: "huggingface"  # Not data_source
-  dataset_name: "dair-ai/emotion"
-```
+# configs/completion/story.yaml
+task:
+  name: "completion"
+  type: "story_generation"
 
-### 5. HuggingFace Download Issues
-
-**Error:**
-```
-KeyboardInterrupt during model download
-```
-
-**Solution:**
-```bash
-# Use smaller dataset for testing
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml --max-samples 100
-
-# Or use cached models
-export HF_HOME=./cache
-```
-
-### 6. CUDA/GPU Issues
-
-**Error:**
-```
-RuntimeError: CUDA out of memory
-```
-
-**Solution:**
-```bash
-# Reduce batch size
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml --batch-size 8
-
-# Or use CPU
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml --device cpu
-```
-
-## Monitoring and Logs
-
-### Check Processing Status
-
-```bash
-# Check data processing output
-ls -la ./data/processed/classification/emotion/classification/
-
-# Check training output
-ls -la ./results/classification/emotion_model/
-
-# Check logs
-tail -f logs/training.log
-```
-
-### Expected File Structure After Processing
-
-```
-./data/processed/classification/emotion/classification/
-├── train.jsonl       # Training data
-├── validation.jsonl   # Validation data
-└── test.jsonl        # Test data
-
-./results/classification/emotion_model/
-├── config.json       # Model configuration
-├── pytorch_model.bin # Model weights
-├── tokenizer.json    # Tokenizer
-└── label_info.json   # Label mappings
-```
-
-## Workflow Summary
-
-### Classification Task
-1. **Setup**: Install dependencies and set PYTHONPATH
-2. **Data Processing**: Process raw data into organized splits
-3. **Training**: Train model using processed data
-4. **Inference**: Use trained model for predictions
-5. **Monitoring**: Check logs and outputs for errors
-
-### Styling Task
-1. **Setup**: Install dependencies (including unsloth) and set PYTHONPATH
-2. **Data Processing**: Process style transfer data with instruction/input/output format
-3. **Training**: LoRA fine-tuning using Unsloth for efficient style transfer
-4. **Inference**: Style transfer with streaming and batch processing
-5. **Monitoring**: Check training logs and model outputs
-
-## Creating Custom Configurations
-
-### For New Datasets
-
-1. Copy existing config:
-```bash
-cp configs/classification/emotion.yaml configs/classification/my_dataset.yaml
-```
-
-2. Modify parameters:
-```yaml
-data:
-  source: "huggingface"
-  dataset_name: "your-dataset-name"
-  output_dir: "./data/processed/classification/my_dataset"
-  # ... other parameters
-
-training:
-  data_dir: "./data/processed/classification/my_dataset"
-  output_dir: "./results/classification/my_dataset_model"
-```
-
-3. Run pipeline:
-```bash
-python scripts/classification/data_processor.py --config configs/classification/my_dataset.yaml
-```
-
-### For Custom Data
-
-1. Use custom config:
-```yaml
 data:
   source: "custom"
-  data_path: "./data/raw/my_data.jsonl"
-  output_dir: "./data/processed/classification/my_custom_dataset"
+  data_path: "./data/raw/completion/story.jsonl"
+  input_field: "prompt"
+  output_field: "completion"
+
+model:
+  name: "gpt2-medium"
+  max_seq_length: 1024
+
+training:
+  num_epochs: 2
+  batch_size: 16
+  learning_rate: 5e-5
 ```
 
-2. Run processing:
+#### 3. Execute Pipeline
+
 ```bash
-python scripts/classification/data_processor.py --config configs/classification/custom.yaml
+# Process data
+python scripts/completion/data_processor.py --config configs/completion/story.yaml
+
+# Train model
+python scripts/completion/train.py train --config configs/completion/story.yaml
+
+# Run inference
+python scripts/completion/inference.py infer \
+  --config configs/completion/story.yaml \
+  --input-text "The wizard cast a spell"
 ```
 
-## Best Practices
+## API Reference
 
-1. **Always check output directories** before running next step
-2. **Use small datasets for testing** before full runs
-3. **Monitor logs** for errors and warnings
-4. **Backup configurations** before major changes
-5. **Use version control** for YAML files
-6. **Test with CLI overrides** for quick experiments
+### Data Processing Classes
+
+#### BaseDataProcessor
+
+```python
+class BaseDataProcessor:
+    def __init__(self, config: Dict[str, Any])
+    def load_and_preprocess(self) -> Tuple[Dict, Dict]
+    def validate_data(self, data: Dict) -> Tuple[bool, List[str]]
+    def save_data(self, data: Dict, output_path: str)
+```
+
+#### ClassificationDataProcessor
+
+```python
+class ClassificationDataProcessor(BaseDataProcessor):
+    def convert_to_classification_format(self, data: Dict) -> Dict
+    def create_label_mapping(self, labels: List[str]) -> Dict[str, int]
+```
+
+#### StylingDataProcessor
+
+```python
+class StylingDataProcessor(BaseDataProcessor):
+    def convert_to_alpaca_format(self, data: Dict) -> Dict
+    def format_for_training(self, data: Dict) -> Dict
+```
+
+### Training Classes
+
+#### BaseTrainer
+
+```python
+class BaseTrainer:
+    def __init__(self, config: Dict[str, Any])
+    def load_model_and_tokenizer(self)
+    def setup_training(self, dataset: Dataset)
+    def train(self, dataset_path: str) -> Dict
+    def save_model(self)
+```
+
+#### ClassificationTrainer
+
+```python
+class ClassificationTrainer(BaseTrainer):
+    def setup_classification_head(self)
+    def compute_metrics(self, eval_pred) -> Dict
+```
+
+#### StylingTrainer
+
+```python
+class StylingTrainer(BaseTrainer):
+    def setup_lora(self)
+    def format_dataset(self, dataset: Dataset) -> Dataset
+```
+
+### Inference Classes
+
+#### BaseInference
+
+```python
+class BaseInference:
+    def __init__(self, config: Dict[str, Any])
+    def load_model_and_tokenizer(self)
+    def preprocess_input(self, input_text: str) -> torch.Tensor
+    def postprocess_output(self, output: torch.Tensor) -> str
+```
+
+#### ClassificationInference
+
+```python
+class ClassificationInference(BaseInference):
+    def classify(self, text: str) -> Dict[str, float]
+    def batch_classify(self, texts: List[str]) -> List[Dict]
+```
+
+#### StylingInference
+
+```python
+class StylingInference(BaseInference):
+    def style_transfer(self, text: str, instruction: str) -> str
+    def generate_text(self, instruction: str, input_text: str) -> str
+```
+
+## Troubleshooting
+
+### Common Issues
+
+#### 1. Model Loading Errors
+
+**Error**: `FileNotFoundError: ./models/[task_name]/*.json`
+
+**Solution**: 
+- Verify model was trained successfully
+- Check `model_output_dir` in YAML config
+- Ensure model files exist in specified directory
+
+#### 2. Memory Issues
+
+**Error**: `CUDA out of memory`
+
+**Solution**:
+- Reduce `batch_size` in YAML config
+- Enable `load_in_4bit: true`
+- Use gradient accumulation
+- Reduce `max_seq_length`
+
+#### 3. Data Format Errors
+
+**Error**: `KeyError: 'input_field'`
+
+**Solution**:
+- Verify field names in JSONL/CSV files
+- Check `input_field` and `output_field` in YAML
+- Ensure data format matches expected structure
+
+#### 4. Training Convergence Issues
+
+**Symptoms**: Loss not decreasing, poor model performance
+
+**Solution**:
+- Adjust learning rate (try 1e-5 to 5e-4)
+- Increase training epochs
+- Check data quality and quantity
+- Verify label distribution (for classification)
+
+### Debug Mode
+
+Enable detailed logging:
+
+```bash
+export LOG_LEVEL="DEBUG"
+python scripts/[task_type]/[script].py --log-level DEBUG
+```
+
+### Performance Optimization
+
+#### Memory Optimization
+
+```yaml
+model:
+  load_in_4bit: true          # 4-bit quantization
+  dtype: "float16"            # Use float16 if supported
+
+training:
+  gradient_accumulation_steps: 4  # Effective batch size = batch_size * steps
+  max_grad_norm: 1.0         # Gradient clipping
+```
+
+#### Speed Optimization
+
+```yaml
+training:
+  dataloader_num_workers: 4   # Parallel data loading
+  fp16: true                  # Mixed precision training
+  bf16: false                 # Disable bfloat16 if not supported
+```
+
+## Contributing
+
+### Adding New Task Types
+
+1. **Create task directory structure**:
+```
+pipelines/[new_task]/
+├── __init__.py
+├── data_processor.py
+├── train.py
+└── inference.py
+
+scripts/[new_task]/
+├── __init__.py
+├── data_processor.py
+├── train.py
+└── inference.py
+
+configs/[new_task]/
+└── example.yaml
+```
+
+2. **Implement base classes**:
+- Extend `BaseDataProcessor`
+- Extend `BaseTrainer` 
+- Extend `BaseInference`
+
+3. **Add configuration templates**:
+- Define task-specific parameters
+- Document all configuration options
+
+4. **Update documentation**:
+- Add task description to README
+- Include usage examples
+- Document configuration parameters
+
+### Code Style
+
+- Follow PEP 8 guidelines
+- Use type hints for all functions
+- Include comprehensive docstrings
+- Add unit tests for new functionality
+
+### Testing
+
+```bash
+# Run all tests
+python -m pytest tests/
+
+# Run specific task tests
+python -m pytest tests/[task_type]/
+
+# Run with coverage
+python -m pytest --cov=pipelines tests/
+```
+
+## License
+
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
 
 ## Support
 
-For issues and questions:
-1. Check the troubleshooting section above
-2. Review logs in the output directories
-3. Verify YAML configuration structure
-4. Test with smaller datasets first
+- **Issues**: [GitHub Issues](https://github.com/your-repo/issues)
+- **Discussions**: [GitHub Discussions](https://github.com/your-repo/discussions)
+- **Documentation**: [Wiki](https://github.com/your-repo/wiki)
 
 ---
 
-**Happy fine-tuning!**
+**Happy fine-tuning! 🚀**
diff --git a/.ipynb_checkpoints/untitled-checkpoint.txt b/.ipynb_checkpoints/untitled-checkpoint.txt
new file mode 100644
index 0000000..e69de29
diff --git a/README.md b/README.md
index f1ba946..517945e 100644
--- a/README.md
+++ b/README.md
@@ -1,763 +1,726 @@
-# Fine-Tune Task: NLP Pipeline Framework
+# Fine-Tuning Task Framework
 
-A comprehensive framework for fine-tuning NLP models with organized YAML configurations, supporting multiple tasks (classification, completion, styling, matching).
+A comprehensive framework for fine-tuning Large Language Models (LLMs) across multiple task types including classification, completion, styling, and matching.
 
-## Supported Tasks
+## Table of Contents
 
-This framework supports multiple NLP tasks with organized configurations:
+- [Overview](#overview)
+- [Architecture](#architecture)
+- [Task Types](#task-types)
+- [Quick Start](#quick-start)
+- [Configuration Guide](#configuration-guide)
+- [Scripts & Commands](#scripts--commands)
+- [Complete Workflows](#complete-workflows)
+- [API Reference](#api-reference)
+- [Troubleshooting](#troubleshooting)
+- [Contributing](#contributing)
 
-- **Classification**: Text classification, sentiment analysis, topic classification
-- **Completion**: Text generation, code completion, story generation
-- **Styling**: Style transfer, tone classification, writing style adaptation
-- **Matching**: Semantic matching, entity matching, similarity scoring
+## Overview
 
-### Current Implementation Status
+This framework provides a unified approach to fine-tuning LLMs for various NLP tasks. It's designed to be:
 
-- **Classification**: ✅ Fully implemented with emotion classification example
-- **Styling**: ✅ Fully implemented with style transfer and LoRA fine-tuning
-- **Completion**: Planned for future updates
-- **Matching**: Planned for future updates
+- **Task-Agnostic**: Same pipeline structure for different task types
+- **Configuration-Driven**: YAML-based configuration for all parameters
+- **Developer-Friendly**: Clear scripts and comprehensive logging
+- **Production-Ready**: Built-in validation, error handling, and optimization
 
-**Note**: Classification and styling tasks are fully supported. Other tasks (completion, matching) are planned for future updates.
+## Architecture
 
-## Project Structure
+The framework follows a **modular pipeline architecture**:
 
 ```
-fine-tune-task/
-├── configs/                    # YAML configuration files
-│   ├── classification/         # ✅ Implemented
-│   │   ├── emotion.yaml       # Emotion classification
-│   │   └── custom.yaml        # Custom dataset
-│   ├── styling/               # ✅ Implemented
-│   │   └── formal.yaml        # Formal style transfer
-│   ├── completion/             # Planned for future updates
-│   └── matching/              # Planned for future updates
-├── data/                       # Data directories
-│   ├── raw/                    # Raw input data
-│   │   ├── classification/     # ✅ Implemented
-│   │   ├── styling/           # ✅ Implemented
-│   │   ├── completion/         # Planned for future updates
-│   │   └── matching/          # Planned for future updates
-│   └── processed/              # Processed data
-│       ├── classification/     # ✅ Implemented
-│       ├── styling/           # ✅ Implemented
-│       ├── completion/         # Planned for future updates
-│       └── matching/          # Planned for future updates
-├── pipelines/                  # Core pipeline scripts
-│   ├── classification/         # ✅ Implemented
-│   │   ├── data_processor.py  # Data processing
-│   │   ├── train.py          # Training
-│   │   └── inference.py      # Inference
-│   ├── styling/               # ✅ Implemented
-│   │   ├── data_processor.py  # Style data processing
-│   │   ├── train.py          # LoRA fine-tuning
-│   │   └── inference.py      # Style transfer inference
-│   ├── completion/            # Planned for future updates
-│   └── matching/             # Planned for future updates
-├── scripts/                    # User-friendly scripts
-│   ├── classification/         # ✅ Implemented
-│   │   ├── data_processor.py  # Data processing script
-│   │   ├── trainer.py        # Training script
-│   │   └── inference.py      # Inference script
-│   ├── styling/               # ✅ Implemented
-│   │   ├── data_processor.py  # Style data processing script
-│   │   ├── train.py          # Training script
-│   │   └── inference.py      # Inference script
-│   ├── completion/            # Planned for future updates
-│   └── matching/             # Planned for future updates
-├── results/                    # Model outputs
-│   ├── classification/         # ✅ Implemented
-│   ├── styling/              # ✅ Implemented
-│   ├── completion/            # Planned for future updates
-│   └── matching/             # Planned for future updates
-└── utils/                      # Shared utility modules
+Raw Data → Data Processing → Model Training → Inference/Evaluation
+    ↓              ↓              ↓              ↓
+  JSONL/CSV    HuggingFace    Trained      Ready for
+  Files        Datasets       Models       Production
 ```
 
-## Quick Start (Classification Task)
+### Core Components
 
-### 1. Setup Environment
+1. **Data Processors**: Convert raw data to training-ready formats
+2. **Training Pipelines**: Task-specific training with optimization
+3. **Inference Engines**: Production-ready text generation/classification
+4. **Configuration Management**: YAML-based parameter control
+5. **Utility Scripts**: Command-line interfaces for all operations
+
+## Task Types
+
+### 1. Classification Task
+
+**Purpose**: Text classification, sentiment analysis, topic categorization
+
+**Data Format**: 
+```jsonl
+{"text": "I love this product!", "label": "positive"}
+{"text": "This is terrible", "label": "negative"}
+```
+
+**Output**: Classification probabilities and predicted labels
+
+**Use Cases**: Sentiment analysis, spam detection, content moderation
+
+### 2. Completion Task
+
+**Purpose**: Text generation, story completion, code generation
+
+**Data Format**:
+```jsonl
+{"prompt": "Once upon a time", "completion": "there was a brave knight..."}
+{"prompt": "def calculate_sum", "completion": "(numbers): return sum(numbers)"}
+```
+
+**Output**: Generated text continuations
+
+**Use Cases**: Creative writing, code completion, content generation
+
+### 3. Styling Task
+
+**Purpose**: Style transfer, tone modification, writing style adaptation
+
+**Data Format**:
+```jsonl
+{"text": "Hey there!", "styled_text": "Hello, how are you?"}
+{"text": "I'm gonna go", "styled_text": "I will be going"}
+```
+
+**Output**: Text rewritten in target style
+
+**Use Cases**: Formalization, casualization, domain adaptation
+
+### 4. Matching Task
+
+**Purpose**: Semantic similarity, question-answer matching, paraphrase detection
+
+**Data Format**:
+```jsonl
+{"text1": "What is AI?", "text2": "Artificial Intelligence", "label": "similar"}
+{"text1": "Weather today", "text2": "Cooking recipes", "label": "different"}
+```
+
+**Output**: Similarity scores or binary classifications
+
+**Use Cases**: Search relevance, duplicate detection, semantic matching
+
+## Quick Start
+
+### Prerequisites
 
 ```bash
 # Install dependencies
 pip install -r requirements.txt
 
-# Set Python path
-export PYTHONPATH=.
+# Verify installation
+python -c "import torch, transformers, datasets; print('✅ All packages installed')"
 ```
 
-### 2. Data Processing
+### Basic Workflow
 
 ```bash
-# Process emotion dataset
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml
+# 1. Process data
+python scripts/[task_type]/data_processor.py --config configs/[task_type]/[config].yaml
 
-# Process with custom parameters
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml --max-samples 1000
+# 2. Train model
+python scripts/[task_type]/train.py train --config configs/[task_type]/[config].yaml
 
-# Check output location
-ls -la ./data/processed/classification/emotion/classification/
+# 3. Run inference
+python scripts/[task_type]/inference.py infer --config configs/[task_type]/[config].yaml
 ```
 
-**Expected Output:**
-```
-Data processing completed successfully!
-  Data source: huggingface
-  Dataset: dair-ai/emotion
-  Total samples: 2999
-  Unique labels: 6
-  Split sizes: {'train': 1000, 'validation': 999, 'test': 1000}
-  Output directory: ./data/processed/classification/emotion
-```
+## Configuration Guide
 
-### 3. Model Training
+### YAML Structure
 
-```bash
-# Train using processed data
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml
-
-# Train with custom parameters
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml --num-epochs 5 --batch-size 32
-
-# Check model output
-ls -la ./results/classification/emotion_model/
-```
-
-**Expected Output:**
-```
-Training completed successfully!
-  Model: bert-base-uncased
-  Data directory: ./data/processed/classification/emotion
-  Training for 3 epochs with batch size 16
-  Model saved to: ./results/classification/emotion_model
-```
-
-### 4. Model Inference
-
-```bash
-# Run inference
-python scripts/classification/inference.py --config configs/classification/emotion.yaml --input-text "I love this product!"
-
-# File-based inference
-python scripts/classification/inference.py --config configs/classification/emotion.yaml --input-file input.txt --output-file predictions.jsonl
-```
-
-**Expected Output:**
-```
-Inference completed successfully!
-  Loading model from: ./results/classification/emotion_model
-  Predicted label: joy
-  Confidence: 0.8542
-  Top 3 predictions:
-    - joy: 0.8542
-    - love: 0.1234
-    - surprise: 0.0224
-```
-
-## Quick Start (Styling Task)
-
-### 1. Setup Environment
-
-```bash
-# Install dependencies (including unsloth for styling)
-pip install -r requirements.txt
-
-# Set Python path
-export PYTHONPATH=.
-```
-
-### 2. Data Processing
-
-```bash
-# Process style transfer dataset
-python scripts/styling/data_processor.py --config configs/styling/formal.yaml
-
-# Create HuggingFace dataset
-python scripts/styling/data_processor.py --config configs/styling/formal.yaml --create-hf-dataset
-
-# Check output location
-ls -la ./data/processed/styling/formal/
-```
-
-**Expected Output:**
-```
-Styling data processing completed successfully!
-  Data source: custom
-  Data file: ./data/raw/styling/sample_formal.jsonl
-  Total samples: 5
-  Split sizes: {'train': 3, 'validation': 1, 'test': 1}
-  Output directory: ./data/processed/styling/formal
-  Style instruction: Rewrite the following text in a formal style
-```
-
-### 3. Model Training
-
-```bash
-# Train using processed data (automatically loads from YAML output_dir)
-python scripts/styling/train.py example
-
-# Custom training
-python scripts/styling/train.py train --config configs/styling/formal.yaml --epochs 3 --batch-size 4
-
-# Check model output
-ls -la ./models/styling/
-```
-
-**Expected Output:**
-```
-Training completed successfully!
-  Model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
-  Dataset: Loaded from ./data/processed/styling/formal
-  Training for 3 epochs with batch size 4
-  Model saved to: ./models/styling
-```
-
-### 4. Model Inference
-
-```bash
-# Single text style transfer
-python scripts/styling/inference.py infer --config configs/styling/formal.yaml --text "Hey, what's up?"
-
-# Batch processing
-python scripts/styling/inference.py batch
-
-# Interactive mode
-python scripts/styling/inference.py infer --config configs/styling/formal.yaml
-```
-
-**Expected Output:**
-```
-Inference completed successfully!
-  Input: Hey, what's up?
-  Output: Hello, how are you doing?
-  Style: Formal
-```
-
-## Adding New Tasks
-
-To add a new task (e.g., completion, styling, matching), follow these steps:
-
-### Example: Styling Task (Already Implemented)
-
-The styling task demonstrates a complete implementation:
-
-1. **Task Directory Structure** ✅
-```bash
-configs/styling/           # YAML configurations
-data/raw/styling/         # Raw style transfer data
-data/processed/styling/   # Processed data
-pipelines/styling/        # Core pipeline scripts
-scripts/styling/          # User-friendly scripts
-models/styling/           # Trained models
-```
-
-2. **Pipeline Components** ✅
-- **Data Processor**: Handles style transfer datasets with instruction/input/output format
-- **Trainer**: LoRA fine-tuning using Unsloth for efficiency
-- **Inference**: Style transfer with streaming and batch processing
-
-3. **Key Features** ✅
-- Automatic EOS token handling: `text + tokenizer.eos_token`
-- Dataset mapping: `dataset.map(formatting_prompts_func, batched=True)`
-- YAML integration: Uses `data.output_dir` for automatic dataset loading
-- HuggingFace dataset export and loading
-
-### For Other Tasks (completion, matching)
-
-1. **Create Task Directory Structure**
-```bash
-# Create task directories
-mkdir -p configs/completion
-mkdir -p data/raw/completion data/processed/completion
-mkdir -p pipelines/completion
-mkdir -p scripts/completion
-mkdir -p results/completion
-mkdir -p tasks/completion
-mkdir -p models/completion
-```
-
-2. **Create Task Configuration**
-
-```bash
-# Create YAML configuration for new task
-cat > configs/completion/text_generation.yaml << 'EOF'
-# Text Generation Task Configuration
-task:
-  name: "completion"
-  type: "text_generation"
-
-# Data Processing Configuration
-data:
-  source: "huggingface"
-  dataset_name: "your-dataset-name"
-  output_dir: "./data/processed/completion/text_generation"
-  max_samples: 1000
-  # ... other data parameters
-
-# Model Configuration
-model:
-  name: "gpt2"  # Different model for completion
-  max_length: 1024
-  # ... model parameters
-
-# Training Configuration
-training:
-  num_epochs: 3
-  batch_size: 8  # Smaller batch for generation
-  learning_rate: 5e-5
-  data_dir: "./data/processed/completion/text_generation"
-  output_dir: "./results/completion/text_generation_model"
-
-# Inference Configuration
-inference:
-  model_path: "./results/completion/text_generation_model"
-  device: "auto"
-  batch_size: 1  # Generation is typically one at a time
-  max_length: 100
-  temperature: 0.7
-EOF
-```
-
-3. **Create Pipeline Scripts**
-
-Copy and modify the classification pipeline scripts:
-
-```bash
-# Copy classification scripts as templates
-cp pipelines/classification/data_processor.py pipelines/completion/
-cp pipelines/classification/train.py pipelines/completion/
-cp pipelines/classification/inference.py pipelines/completion/
-
-# Copy task scripts
-cp scripts/classification/data_processor.py scripts/completion/
-cp scripts/classification/trainer.py scripts/completion/
-cp scripts/classification/inference.py scripts/completion/
-```
-
-4. **Modify Pipeline Code**
-
-Update the pipeline scripts for your specific task:
-
-1. **Data Processor** (`pipelines/completion/data_processor.py`):
-   - Update data loading logic for completion datasets
-   - Modify preprocessing for text generation
-   - Adjust output format for completion tasks
-
-2. **Trainer** (`pipelines/completion/train.py`):
-   - Change model type to generation models (GPT, T5, etc.)
-   - Update training loop for text generation
-   - Modify evaluation metrics
-
-3. **Inference** (`pipelines/completion/inference.py`):
-   - Update inference for text generation
-   - Add generation parameters (temperature, top-k, etc.)
-   - Modify output format
-
-5. **Update Task Scripts**
-
-Modify the task scripts to use your new pipeline:
-
-```python
-# scripts/completion/data_processor.py
-def run_with_yaml_config(config_path: str, **cli_overrides):
-    cmd = [
-        "python", "pipelines/completion/data_processor.py",  # Updated path
-        "--config", config_path
-    ]
-    # ... rest of the function
-```
-
-6. **Create Task-Specific Models**
-
-```bash
-# Create model directory
-mkdir -p models/completion
-
-# Add task-specific model classes
-cat > models/completion/text_generator.py << 'EOF'
-from transformers import AutoModelForCausalLM, AutoTokenizer
-
-class TextGenerator:
-    def __init__(self, model_name):
-        self.model = AutoModelForCausalLM.from_pretrained(model_name)
-        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
-    
-    def generate(self, prompt, max_length=100, temperature=0.7):
-        # Implementation for text generation
-        pass
-EOF
-```
-
-7. **Test Your New Task**
-
-```bash
-# Test data processing
-python scripts/completion/data_processor.py --config configs/completion/text_generation.yaml
-
-# Test training
-python scripts/completion/trainer.py --config configs/completion/text_generation.yaml
-
-# Test inference
-python scripts/completion/inference.py --config configs/completion/text_generation.yaml --input-text "Once upon a time"
-```
-
-## YAML Configuration Guide
-
-### Configuration Structure
-
-Each YAML file is organized into clear sections:
+All configurations follow this hierarchical structure:
 
 ```yaml
 # Task Configuration
 task:
-  name: "classification"  # or "completion", "styling", "matching"
-  type: "sequence_classification"  # or "text_generation", "style_transfer", "semantic_matching"
+  name: "task_type"                    # classification, completion, styling, matching
+  type: "specific_type"                # e.g., "sentiment_analysis", "style_transfer"
 
-# Data Processing Configuration
+# Data Configuration
 data:
-  source: "huggingface"                    # "huggingface" or "custom"
-  dataset_name: "dair-ai/emotion"         # HuggingFace dataset name
-  output_dir: "./data/processed/classification/emotion"
-  max_samples: 1000                        # Limit dataset size
-  # ... other data parameters
+  source: "custom"                     # "custom" or "huggingface"
+  data_path: "./data/raw/..."          # Path to raw data
+  input_field: "text"                  # Field name for input
+  output_field: "label"                # Field name for output
+  instruction: "Task instruction"      # For instruction-following tasks
 
 # Model Configuration
 model:
-  name: "bert-base-uncased"                # Model from HuggingFace Hub
-  max_length: 512                          # Sequence length
-  num_labels: 6                            # Number of classes
+  name: "model_name"                   # HuggingFace model identifier
+  max_seq_length: 2048                 # Maximum sequence length
+  dtype: null                          # Data type (auto-detected)
+  load_in_4bit: true                   # 4-bit quantization
 
 # Training Configuration
 training:
-  num_epochs: 3                            # Training epochs
-  batch_size: 16                           # Batch size
-  learning_rate: 2e-5                      # Learning rate
-  data_dir: "./data/processed/classification/emotion"
-  output_dir: "./results/classification/emotion_model"
+  num_epochs: 3                        # Training epochs
+  batch_size: 4                        # Batch size
+  learning_rate: 2e-4                  # Learning rate
+  warmup_steps: 5                      # Warmup steps
+  max_steps: 60                        # Maximum training steps
 
 # Inference Configuration
 inference:
-  model_path: "./results/classification/emotion_model"
-  device: "auto"                           # "auto", "cuda", "cpu"
-  batch_size: 32                           # Inference batch size
-  return_top_k: 3                          # Top K predictions
+  batch_size: 32                       # Inference batch size
+  max_new_tokens: 128                  # Max tokens to generate
+  temperature: 0.8                     # Sampling temperature
 ```
 
-### Styling Configuration Example
+### Configuration Parameters
+
+#### Data Processing Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `source` | string | "custom" | Data source type |
+| `data_path` | string | required | Path to raw data file |
+| `input_field` | string | "text" | Input field name |
+| `output_field` | string | "label" | Output field name |
+| `instruction` | string | task-specific | Task instruction |
+| `data_format` | string | "jsonl" | Data file format |
+| `max_length` | int | 256 | Maximum text length |
+| `min_length` | int | 10 | Minimum text length |
+| `clean_text` | boolean | true | Enable text cleaning |
+| `lowercase` | boolean | false | Convert to lowercase |
+
+#### Model Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `name` | string | required | HuggingFace model name |
+| `max_seq_length` | int | 2048 | Maximum sequence length |
+| `dtype` | string | null | Data type (auto-detected) |
+| `load_in_4bit` | boolean | true | Enable 4-bit quantization |
+| `token` | string | null | HuggingFace access token |
+
+#### Training Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `num_epochs` | int | 1 | Number of training epochs |
+| `batch_size` | int | 2 | Training batch size |
+| `learning_rate` | float | 2e-4 | Learning rate |
+| `weight_decay` | float | 0.01 | Weight decay |
+| `warmup_steps` | int | 5 | Warmup steps |
+| `max_steps` | int | 60 | Maximum training steps |
+| `gradient_accumulation_steps` | int | 4 | Gradient accumulation |
+
+#### LoRA Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `lora_r` | int | 16 | LoRA rank |
+| `lora_alpha` | int | 16 | LoRA alpha |
+| `lora_dropout` | float | 0 | LoRA dropout |
+| `target_modules` | list | ["q_proj", "k_proj", "v_proj", "o_proj"] | Target modules for LoRA |
+
+### Environment Variables
+
+```bash
+# HuggingFace token for gated models
+export HF_TOKEN="hf_..."
+
+# CUDA device selection
+export CUDA_VISIBLE_DEVICES="0"
+
+# Logging level
+export LOG_LEVEL="INFO"
+```
+
+## Scripts & Commands
+
+### Data Processing Scripts
+
+#### Basic Usage
+
+```bash
+python scripts/[task_type]/data_processor.py --config configs/[task_type]/[config].yaml
+```
+
+#### Advanced Options
+
+```bash
+python scripts/[task_type]/data_processor.py \
+  --config configs/[task_type]/[config].yaml \
+  --max-samples 1000 \
+  --log-level DEBUG \
+  --create-hf-dataset \
+  --hf-dataset-path ./datasets/[task_name]
+```
+
+#### Command Line Arguments
+
+| Argument | Type | Default | Description |
+|----------|------|---------|-------------|
+| `--config` | string | required | YAML configuration file |
+| `--max-samples` | int | all | Maximum samples to process |
+| `--log-level` | string | "INFO" | Logging level |
+| `--create-hf-dataset` | flag | false | Create HuggingFace dataset |
+| `--hf-dataset-path` | string | auto | HuggingFace dataset path |
+
+### Training Scripts
+
+#### Basic Usage
+
+```bash
+python scripts/[task_type]/train.py train --config configs/[task_type]/[config].yaml
+```
+
+#### Advanced Options
+
+```bash
+python scripts/[task_type]/train.py train \
+  --config configs/[task_type]/[config].yaml \
+  --epochs 5 \
+  --batch-size 8 \
+  --learning-rate 1e-4 \
+  --max-steps 100
+```
+
+#### Command Line Arguments
+
+| Argument | Type | Default | Description |
+|----------|------|---------|-------------|
+| `--config` | string | required | YAML configuration file |
+| `--epochs` | int | YAML value | Override training epochs |
+| `--batch-size` | int | YAML value | Override batch size |
+| `--learning-rate` | float | YAML value | Override learning rate |
+| `--max-steps` | int | YAML value | Override max steps |
+| `--output-dir` | string | YAML value | Override output directory |
+
+### Inference Scripts
+
+#### Basic Usage
+
+```bash
+python scripts/[task_type]/inference.py infer \
+  --config configs/[task_type]/[config].yaml \
+  --input-text "Your input text here"
+```
+
+#### Advanced Options
+
+```bash
+python scripts/[task_type]/inference.py infer \
+  --config configs/[task_type]/[config].yaml \
+  --input-text "Your input text here" \
+  --max-tokens 256 \
+  --temperature 0.7 \
+  --stream
+```
+
+#### Command Line Arguments
+
+| Argument | Type | Default | Description |
+|----------|------|---------|-------------|
+| `--config` | string | required | YAML configuration file |
+| `--input-text` | string | required | Text to process |
+| `--max-tokens` | int | 128 | Maximum tokens to generate |
+| `--temperature` | float | 0.8 | Sampling temperature |
+| `--stream` | flag | false | Enable streaming generation |
+
+### Batch Processing
+
+```bash
+# Process multiple inputs from file
+python scripts/[task_type]/inference.py batch \
+  --config configs/[task_type]/[config].yaml \
+  --input-file input.txt \
+  --output-file output.txt
+```
+
+### Interactive Mode
+
+```bash
+# Enter interactive mode for testing
+python scripts/[task_type]/inference.py interactive \
+  --config configs/[task_type]/[config].yaml
+```
+
+## Complete Workflows
+
+### Classification Task Workflow
+
+#### 1. Data Preparation
+
+```jsonl
+# data/raw/classification/sentiment.jsonl
+{"text": "I love this movie!", "label": "positive"}
+{"text": "This is terrible", "label": "negative"}
+{"text": "It's okay", "label": "neutral"}
+```
+
+#### 2. Configuration
 
 ```yaml
-# Styling Task Configuration
+# configs/classification/sentiment.yaml
+task:
+  name: "classification"
+  type: "sentiment_analysis"
+
+data:
+  source: "custom"
+  data_path: "./data/raw/classification/sentiment.jsonl"
+  input_field: "text"
+  output_field: "label"
+  instruction: "Classify the sentiment of the following text"
+
+model:
+  name: "microsoft/DialoGPT-medium"
+  max_seq_length: 512
+
+training:
+  num_epochs: 3
+  batch_size: 8
+  learning_rate: 3e-5
+```
+
+#### 3. Execute Pipeline
+
+```bash
+# Process data
+python scripts/classification/data_processor.py --config configs/classification/sentiment.yaml
+
+# Train model
+python scripts/classification/train.py train --config configs/classification/sentiment.yaml
+
+# Run inference
+python scripts/classification/inference.py infer \
+  --config configs/classification/sentiment.yaml \
+  --input-text "This product exceeded my expectations!"
+```
+
+### Styling Task Workflow
+
+#### 1. Data Preparation
+
+```jsonl
+# data/raw/styling/formal.jsonl
+{"text": "Hey there!", "styled_text": "Hello, how are you?"}
+{"text": "I'm gonna go", "styled_text": "I will be going"}
+{"text": "This is cool", "styled_text": "This is quite impressive"}
+```
+
+#### 2. Configuration
+
+```yaml
+# configs/styling/formal.yaml
 task:
   name: "styling"
   type: "style_transfer"
 
-# Data Processing Configuration
 data:
   source: "custom"
-  data_path: "./data/raw/styling/sample_formal.jsonl"
+  data_path: "./data/raw/styling/formal.jsonl"
   input_field: "text"
   output_field: "styled_text"
   instruction: "Rewrite the following text in a formal style"
-  output_dir: "./data/processed/styling/formal"
-  output_format: "alpaca"
 
-# Model Configuration
 model:
-  training_model: "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit"
-  training_max_seq_length: 2048
-  training_load_in_4bit: true
+  name: "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit"
+  max_seq_length: 2048
 
-# Training Configuration
 training:
   num_epochs: 3
-  batch_size: 2
+  batch_size: 4
   learning_rate: 2e-4
-  weight_decay: 0.01
-
-# Inference Configuration
-inference:
-  batch_size: 1
-  max_new_tokens: 128
-  temperature: 0.8
+  model_output_dir: "./models/styling"
 ```
 
-### Available Configuration Files
-
-- `configs/classification/emotion.yaml` - Emotion classification with HuggingFace dataset
-- `configs/classification/custom.yaml` - Custom dataset processing
-- `configs/styling/formal.yaml` - Formal style transfer with LoRA fine-tuning
-
-## Usage Examples
-
-### Data Processing Examples
+#### 3. Execute Pipeline
 
 ```bash
-# 1. Use YAML config only
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml
-
-# 2. Override YAML values
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml --max-samples 500
-
-# 3. Use CLI only (backward compatibility)
-python scripts/classification/data_processor.py --data-source huggingface --dataset-name dair-ai/emotion
-
-# 4. Run examples
-python scripts/classification/data_processor.py examples
-```
-
-### Training Examples
-
-```bash
-# 1. Use YAML config only
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml
-
-# 2. Override YAML values
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml --num-epochs 5
-
-# 3. Use CLI only
-python scripts/classification/trainer.py --model-name bert-base-uncased --num-epochs 3
-
-# 4. Run examples
-python scripts/classification/trainer.py examples
-```
-
-### Inference Examples
-
-```bash
-# 1. Single text prediction
-python scripts/classification/inference.py --config configs/classification/emotion.yaml --input-text "I love this product!"
-
-# 2. File-based prediction
-python scripts/classification/inference.py --config configs/classification/emotion.yaml --input-file input.txt --output-file predictions.jsonl
-
-# 3. Interactive mode
-python scripts/classification/inference.py --config configs/classification/emotion.yaml
-
-# 4. Run examples
-python scripts/classification/inference.py examples
-```
-
-### Styling Examples
-
-```bash
-# 1. Data Processing
+# Process data
 python scripts/styling/data_processor.py --config configs/styling/formal.yaml
-python scripts/styling/data_processor.py --config configs/styling/formal.yaml --create-hf-dataset
 
-# 2. Training
-python scripts/styling/train.py example
-python scripts/styling/train.py train --config configs/styling/formal.yaml --epochs 2
+# Train model
+python scripts/styling/train.py train --config configs/styling/formal.yaml
 
-# 3. Inference
-python scripts/styling/inference.py infer --config configs/styling/formal.yaml --text "Hey, what's up?"
-python scripts/styling/inference.py batch
-python scripts/styling/inference.py infer --config configs/styling/formal.yaml
-
-# 4. Run examples
-python scripts/styling/data_processor.py examples
-python scripts/styling/train.py features
-python scripts/styling/inference.py features
+# Run inference
+python scripts/styling/inference.py infer \
+  --config configs/styling/formal.yaml \
+  --instruction "Rewrite in formal style" \
+  --input-text "Hey there! What's up?"
 ```
 
-## Troubleshooting Common Errors
+### Completion Task Workflow
 
-### 1. ModuleNotFoundError: No module named 'utils'
+#### 1. Data Preparation
 
-**Error:**
-```
-ModuleNotFoundError: No module named 'utils'
+```jsonl
+# data/raw/completion/story.jsonl
+{"prompt": "Once upon a time", "completion": "there was a brave knight who lived in a castle..."}
+{"prompt": "The dragon roared", "completion": "and the ground shook beneath its massive feet..."}
 ```
 
-**Solution:**
-```bash
-# Set Python path before running scripts
-export PYTHONPATH=.
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml
-```
+#### 2. Configuration
 
-### 2. Model Path Not Found
-
-**Error:**
-```
-Model path not found: ./results/classification/emotion_model
-```
-
-**Solution:**
-```bash
-# Train the model first
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml
-
-# Then run inference
-python scripts/classification/inference.py --config configs/classification/emotion.yaml
-```
-
-### 3. Data Directory Not Found
-
-**Error:**
-```
-Data directory not found: ./data/processed/classification/emotion
-```
-
-**Solution:**
-```bash
-# Process data first
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml
-
-# Then train
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml
-```
-
-### 4. YAML Configuration Errors
-
-**Error:**
-```
-data_processor.py: error: --data-source is required (either in YAML config or CLI)
-```
-
-**Solution:**
-Check your YAML file structure. It should have:
 ```yaml
-data:
-  source: "huggingface"  # Not data_source
-  dataset_name: "dair-ai/emotion"
-```
+# configs/completion/story.yaml
+task:
+  name: "completion"
+  type: "story_generation"
 
-### 5. HuggingFace Download Issues
-
-**Error:**
-```
-KeyboardInterrupt during model download
-```
-
-**Solution:**
-```bash
-# Use smaller dataset for testing
-python scripts/classification/data_processor.py --config configs/classification/emotion.yaml --max-samples 100
-
-# Or use cached models
-export HF_HOME=./cache
-```
-
-### 6. CUDA/GPU Issues
-
-**Error:**
-```
-RuntimeError: CUDA out of memory
-```
-
-**Solution:**
-```bash
-# Reduce batch size
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml --batch-size 8
-
-# Or use CPU
-python scripts/classification/trainer.py --config configs/classification/emotion.yaml --device cpu
-```
-
-## Monitoring and Logs
-
-### Check Processing Status
-
-```bash
-# Check data processing output
-ls -la ./data/processed/classification/emotion/classification/
-
-# Check training output
-ls -la ./results/classification/emotion_model/
-
-# Check logs
-tail -f logs/training.log
-```
-
-### Expected File Structure After Processing
-
-```
-./data/processed/classification/emotion/classification/
-├── train.jsonl       # Training data
-├── validation.jsonl   # Validation data
-└── test.jsonl        # Test data
-
-./results/classification/emotion_model/
-├── config.json       # Model configuration
-├── pytorch_model.bin # Model weights
-├── tokenizer.json    # Tokenizer
-└── label_info.json   # Label mappings
-```
-
-## Workflow Summary
-
-### Classification Task
-1. **Setup**: Install dependencies and set PYTHONPATH
-2. **Data Processing**: Process raw data into organized splits
-3. **Training**: Train model using processed data
-4. **Inference**: Use trained model for predictions
-5. **Monitoring**: Check logs and outputs for errors
-
-### Styling Task
-1. **Setup**: Install dependencies (including unsloth) and set PYTHONPATH
-2. **Data Processing**: Process style transfer data with instruction/input/output format
-3. **Training**: LoRA fine-tuning using Unsloth for efficient style transfer
-4. **Inference**: Style transfer with streaming and batch processing
-5. **Monitoring**: Check training logs and model outputs
-
-## Creating Custom Configurations
-
-### For New Datasets
-
-1. Copy existing config:
-```bash
-cp configs/classification/emotion.yaml configs/classification/my_dataset.yaml
-```
-
-2. Modify parameters:
-```yaml
-data:
-  source: "huggingface"
-  dataset_name: "your-dataset-name"
-  output_dir: "./data/processed/classification/my_dataset"
-  # ... other parameters
-
-training:
-  data_dir: "./data/processed/classification/my_dataset"
-  output_dir: "./results/classification/my_dataset_model"
-```
-
-3. Run pipeline:
-```bash
-python scripts/classification/data_processor.py --config configs/classification/my_dataset.yaml
-```
-
-### For Custom Data
-
-1. Use custom config:
-```yaml
 data:
   source: "custom"
-  data_path: "./data/raw/my_data.jsonl"
-  output_dir: "./data/processed/classification/my_custom_dataset"
+  data_path: "./data/raw/completion/story.jsonl"
+  input_field: "prompt"
+  output_field: "completion"
+
+model:
+  name: "gpt2-medium"
+  max_seq_length: 1024
+
+training:
+  num_epochs: 2
+  batch_size: 16
+  learning_rate: 5e-5
 ```
 
-2. Run processing:
+#### 3. Execute Pipeline
+
 ```bash
-python scripts/classification/data_processor.py --config configs/classification/custom.yaml
+# Process data
+python scripts/completion/data_processor.py --config configs/completion/story.yaml
+
+# Train model
+python scripts/completion/train.py train --config configs/completion/story.yaml
+
+# Run inference
+python scripts/completion/inference.py infer \
+  --config configs/completion/story.yaml \
+  --input-text "The wizard cast a spell"
 ```
 
-## Best Practices
+## API Reference
 
-1. **Always check output directories** before running next step
-2. **Use small datasets for testing** before full runs
-3. **Monitor logs** for errors and warnings
-4. **Backup configurations** before major changes
-5. **Use version control** for YAML files
-6. **Test with CLI overrides** for quick experiments
+### Data Processing Classes
+
+#### BaseDataProcessor
+
+```python
+class BaseDataProcessor:
+    def __init__(self, config: Dict[str, Any])
+    def load_and_preprocess(self) -> Tuple[Dict, Dict]
+    def validate_data(self, data: Dict) -> Tuple[bool, List[str]]
+    def save_data(self, data: Dict, output_path: str)
+```
+
+#### ClassificationDataProcessor
+
+```python
+class ClassificationDataProcessor(BaseDataProcessor):
+    def convert_to_classification_format(self, data: Dict) -> Dict
+    def create_label_mapping(self, labels: List[str]) -> Dict[str, int]
+```
+
+#### StylingDataProcessor
+
+```python
+class StylingDataProcessor(BaseDataProcessor):
+    def convert_to_alpaca_format(self, data: Dict) -> Dict
+    def format_for_training(self, data: Dict) -> Dict
+```
+
+### Training Classes
+
+#### BaseTrainer
+
+```python
+class BaseTrainer:
+    def __init__(self, config: Dict[str, Any])
+    def load_model_and_tokenizer(self)
+    def setup_training(self, dataset: Dataset)
+    def train(self, dataset_path: str) -> Dict
+    def save_model(self)
+```
+
+#### ClassificationTrainer
+
+```python
+class ClassificationTrainer(BaseTrainer):
+    def setup_classification_head(self)
+    def compute_metrics(self, eval_pred) -> Dict
+```
+
+#### StylingTrainer
+
+```python
+class StylingTrainer(BaseTrainer):
+    def setup_lora(self)
+    def format_dataset(self, dataset: Dataset) -> Dataset
+```
+
+### Inference Classes
+
+#### BaseInference
+
+```python
+class BaseInference:
+    def __init__(self, config: Dict[str, Any])
+    def load_model_and_tokenizer(self)
+    def preprocess_input(self, input_text: str) -> torch.Tensor
+    def postprocess_output(self, output: torch.Tensor) -> str
+```
+
+#### ClassificationInference
+
+```python
+class ClassificationInference(BaseInference):
+    def classify(self, text: str) -> Dict[str, float]
+    def batch_classify(self, texts: List[str]) -> List[Dict]
+```
+
+#### StylingInference
+
+```python
+class StylingInference(BaseInference):
+    def style_transfer(self, text: str, instruction: str) -> str
+    def generate_text(self, instruction: str, input_text: str) -> str
+```
+
+## Troubleshooting
+
+### Common Issues
+
+#### 1. Model Loading Errors
+
+**Error**: `FileNotFoundError: ./models/[task_name]/*.json`
+
+**Solution**: 
+- Verify model was trained successfully
+- Check `model_output_dir` in YAML config
+- Ensure model files exist in specified directory
+
+#### 2. Memory Issues
+
+**Error**: `CUDA out of memory`
+
+**Solution**:
+- Reduce `batch_size` in YAML config
+- Enable `load_in_4bit: true`
+- Use gradient accumulation
+- Reduce `max_seq_length`
+
+#### 3. Data Format Errors
+
+**Error**: `KeyError: 'input_field'`
+
+**Solution**:
+- Verify field names in JSONL/CSV files
+- Check `input_field` and `output_field` in YAML
+- Ensure data format matches expected structure
+
+#### 4. Training Convergence Issues
+
+**Symptoms**: Loss not decreasing, poor model performance
+
+**Solution**:
+- Adjust learning rate (try 1e-5 to 5e-4)
+- Increase training epochs
+- Check data quality and quantity
+- Verify label distribution (for classification)
+
+### Debug Mode
+
+Enable detailed logging:
+
+```bash
+export LOG_LEVEL="DEBUG"
+python scripts/[task_type]/[script].py --log-level DEBUG
+```
+
+### Performance Optimization
+
+#### Memory Optimization
+
+```yaml
+model:
+  load_in_4bit: true          # 4-bit quantization
+  dtype: "float16"            # Use float16 if supported
+
+training:
+  gradient_accumulation_steps: 4  # Effective batch size = batch_size * steps
+  max_grad_norm: 1.0         # Gradient clipping
+```
+
+#### Speed Optimization
+
+```yaml
+training:
+  dataloader_num_workers: 4   # Parallel data loading
+  fp16: true                  # Mixed precision training
+  bf16: false                 # Disable bfloat16 if not supported
+```
+
+## Contributing
+
+### Adding New Task Types
+
+1. **Create task directory structure**:
+```
+pipelines/[new_task]/
+├── __init__.py
+├── data_processor.py
+├── train.py
+└── inference.py
+
+scripts/[new_task]/
+├── __init__.py
+├── data_processor.py
+├── train.py
+└── inference.py
+
+configs/[new_task]/
+└── example.yaml
+```
+
+2. **Implement base classes**:
+- Extend `BaseDataProcessor`
+- Extend `BaseTrainer` 
+- Extend `BaseInference`
+
+3. **Add configuration templates**:
+- Define task-specific parameters
+- Document all configuration options
+
+4. **Update documentation**:
+- Add task description to README
+- Include usage examples
+- Document configuration parameters
+
+### Code Style
+
+- Follow PEP 8 guidelines
+- Use type hints for all functions
+- Include comprehensive docstrings
+- Add unit tests for new functionality
+
+### Testing
+
+```bash
+# Run all tests
+python -m pytest tests/
+
+# Run specific task tests
+python -m pytest tests/[task_type]/
+
+# Run with coverage
+python -m pytest --cov=pipelines tests/
+```
+
+## License
+
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
 
 ## Support
 
-For issues and questions:
-1. Check the troubleshooting section above
-2. Review logs in the output directories
-3. Verify YAML configuration structure
-4. Test with smaller datasets first
+- **Issues**: [GitHub Issues](https://github.com/your-repo/issues)
+- **Discussions**: [GitHub Discussions](https://github.com/your-repo/discussions)
+- **Documentation**: [Wiki](https://github.com/your-repo/wiki)
 
 ---
 
-**Happy fine-tuning!**
+**Happy fine-tuning! 🚀**
diff --git a/untitled.txt b/untitled.txt
new file mode 100644
index 0000000..e69de29