added style mimicking piepelines
This commit is contained in:
@@ -0,0 +1,191 @@
|
||||
# Quick Reference Card
|
||||
|
||||
## Essential Parameters (Most Common)
|
||||
|
||||
### Data Source & Location
|
||||
```yaml
|
||||
data:
|
||||
source: "huggingface|custom" # REQUIRED: Data source type
|
||||
dataset_name: "dataset/name" # REQUIRED for huggingface
|
||||
data_path: "./path/to/file" # REQUIRED for custom
|
||||
data_format: "jsonl|csv|json" # REQUIRED for custom
|
||||
```
|
||||
|
||||
### Field Mapping
|
||||
```yaml
|
||||
data:
|
||||
input_field: "text" # REQUIRED: Input text field
|
||||
label_field: "label" # REQUIRED for classification
|
||||
output_field: "styled_text" # REQUIRED for styling
|
||||
instruction: "Style instruction" # REQUIRED for styling
|
||||
```
|
||||
|
||||
### Basic Processing
|
||||
```yaml
|
||||
data:
|
||||
max_samples: 1000 # Limit total samples
|
||||
train_split: 0.8 # Training ratio (0.0-1.0)
|
||||
validation_split: 0.1 # Validation ratio (0.0-1.0)
|
||||
test_split: 0.1 # Test ratio (0.0-1.0)
|
||||
output_dir: "./output/path" # Output directory
|
||||
```
|
||||
|
||||
### Text Preprocessing
|
||||
```yaml
|
||||
data:
|
||||
clean_text: true # Clean/normalize text
|
||||
lowercase: true # Convert to lowercase
|
||||
min_length: 10 # Minimum text length
|
||||
max_length: 512 # Maximum text length
|
||||
```
|
||||
|
||||
### Model & Training
|
||||
```yaml
|
||||
model:
|
||||
name: "bert-base-uncased" # Model name
|
||||
max_length: 512 # Max sequence length
|
||||
|
||||
training:
|
||||
num_epochs: 3 # Training epochs
|
||||
batch_size: 16 # Batch size
|
||||
learning_rate: 2e-5 # Learning rate
|
||||
```
|
||||
|
||||
## Common Configurations by Task
|
||||
|
||||
### Classification
|
||||
```yaml
|
||||
task:
|
||||
name: "classification"
|
||||
type: "sequence_classification"
|
||||
|
||||
data:
|
||||
source: "huggingface"
|
||||
dataset_name: "dair-ai/emotion"
|
||||
input_field: "text"
|
||||
label_field: "label"
|
||||
output_format: "classification"
|
||||
```
|
||||
|
||||
### Styling
|
||||
```yaml
|
||||
task:
|
||||
name: "styling"
|
||||
type: "style_transfer"
|
||||
|
||||
data:
|
||||
source: "custom"
|
||||
data_path: "./data.jsonl"
|
||||
input_field: "text"
|
||||
output_field: "styled_text"
|
||||
instruction: "Rewrite in formal style"
|
||||
output_format: "alpaca"
|
||||
```
|
||||
|
||||
### Text Generation
|
||||
```yaml
|
||||
task:
|
||||
name: "completion"
|
||||
type: "text_generation"
|
||||
|
||||
data:
|
||||
source: "custom"
|
||||
data_path: "./prompts.jsonl"
|
||||
input_field: "prompt"
|
||||
output_field: "completion"
|
||||
output_format: "instruction"
|
||||
```
|
||||
|
||||
## Quick Start Templates
|
||||
|
||||
### 1. HuggingFace Dataset
|
||||
```yaml
|
||||
task:
|
||||
name: "classification"
|
||||
type: "sequence_classification"
|
||||
|
||||
data:
|
||||
source: "huggingface"
|
||||
dataset_name: "your/dataset"
|
||||
input_field: "text"
|
||||
label_field: "label"
|
||||
max_samples: 1000
|
||||
output_dir: "./output"
|
||||
```
|
||||
|
||||
### 2. Custom JSONL File
|
||||
```yaml
|
||||
task:
|
||||
name: "styling"
|
||||
type: "style_transfer"
|
||||
|
||||
data:
|
||||
source: "custom"
|
||||
data_path: "./your_data.jsonl"
|
||||
data_format: "jsonl"
|
||||
input_field: "source"
|
||||
output_field: "target"
|
||||
instruction: "Your style instruction"
|
||||
output_dir: "./output"
|
||||
```
|
||||
|
||||
### 3. CSV File
|
||||
```yaml
|
||||
task:
|
||||
name: "classification"
|
||||
type: "sequence_classification"
|
||||
|
||||
data:
|
||||
source: "custom"
|
||||
data_path: "./your_data.csv"
|
||||
data_format: "csv"
|
||||
input_field: "text"
|
||||
label_field: "label"
|
||||
delimiter: ","
|
||||
output_dir: "./output"
|
||||
```
|
||||
|
||||
## Parameter Ranges & Recommendations
|
||||
|
||||
### Split Ratios
|
||||
- **Total must be ≤ 1.0**
|
||||
- **Common**: train=0.8, val=0.1, test=0.1
|
||||
- **Small datasets**: train=0.7, val=0.15, test=0.15
|
||||
|
||||
### Learning Rates
|
||||
- **Fine-tuning**: 1e-5 to 5e-5
|
||||
- **Training from scratch**: 1e-4 to 1e-3
|
||||
- **Start with**: 2e-5
|
||||
|
||||
### Batch Sizes
|
||||
- **GPU Memory**: 8, 16, 32, 64
|
||||
- **CPU**: 4, 8, 16
|
||||
- **Start with**: 16
|
||||
|
||||
### Text Lengths
|
||||
- **BERT**: 512 (max)
|
||||
- **GPT-2**: 1024 (max)
|
||||
- **T5**: 512 (max)
|
||||
- **Start with**: 256
|
||||
|
||||
## Common Issues & Fixes
|
||||
|
||||
| Issue | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| "File not found" | Wrong path | Check `data_path` and `output_dir` |
|
||||
| "Memory error" | Batch too large | Reduce `batch_size` |
|
||||
| "Split error" | Ratios > 1.0 | Ensure splits sum to ≤ 1.0 |
|
||||
| "Poor performance" | Wrong learning rate | Try 1e-5 to 5e-5 range |
|
||||
| "Slow processing" | Text too long | Reduce `max_length` |
|
||||
|
||||
## Environment Variables
|
||||
```bash
|
||||
# Set cache directory
|
||||
export HF_HOME="./cache"
|
||||
|
||||
# Set output directory
|
||||
export OUTPUT_DIR="./results"
|
||||
|
||||
# Set log level
|
||||
export LOG_LEVEL="INFO"
|
||||
```
|
||||
@@ -0,0 +1,207 @@
|
||||
# Configuration Files Documentation
|
||||
|
||||
This directory contains YAML configuration files for different machine learning tasks. Each configuration file is organized into logical sections and includes comprehensive documentation for all parameters.
|
||||
|
||||
## Configuration Structure
|
||||
|
||||
All configuration files follow a consistent structure organized into these main sections:
|
||||
|
||||
### 1. Task Configuration
|
||||
```yaml
|
||||
task:
|
||||
name: "task_type" # Task type: classification, completion, styling, matching
|
||||
type: "specific_type" # Specific model/task type
|
||||
```
|
||||
|
||||
**Available Task Types:**
|
||||
- **classification**: Text classification tasks (emotion, sentiment, topic, etc.)
|
||||
- **completion**: Text generation and completion tasks
|
||||
- **styling**: Style transfer and text transformation tasks
|
||||
- **matching**: Semantic matching and similarity tasks
|
||||
|
||||
### 2. Data Processing Configuration
|
||||
```yaml
|
||||
data:
|
||||
# Data Source
|
||||
source: "huggingface|custom" # Where to get data from
|
||||
|
||||
# Data Location
|
||||
dataset_name: "dataset/name" # HuggingFace dataset name (for huggingface source)
|
||||
data_path: "./path/to/file" # Path to custom data file (for custom source)
|
||||
data_format: "jsonl|csv|json" # File format for custom data
|
||||
|
||||
# Field Mapping
|
||||
input_field: "text" # Field containing input text
|
||||
output_field: "styled_text" # Field containing output (for styling)
|
||||
label_field: "label" # Field containing labels (for classification)
|
||||
id_field: "id" # Optional ID field for tracking
|
||||
|
||||
# Processing Parameters
|
||||
max_samples: 1000 # Maximum samples to process
|
||||
train_split: 0.8 # Training split ratio
|
||||
validation_split: 0.1 # Validation split ratio
|
||||
test_split: 0.1 # Test split ratio
|
||||
|
||||
# Text Preprocessing
|
||||
clean_text: true # Clean and normalize text
|
||||
remove_special_chars: false # Remove special characters
|
||||
lowercase: true # Convert to lowercase
|
||||
min_length: 10 # Minimum text length
|
||||
max_length: 1000 # Maximum text length
|
||||
|
||||
# Output Configuration
|
||||
output_format: "format_type" # Output format
|
||||
output_dir: "./output/path" # Output directory
|
||||
```
|
||||
|
||||
**Data Source Types:**
|
||||
- **huggingface**: Use datasets from HuggingFace Hub
|
||||
- **custom**: Use local files (JSONL, CSV, JSON)
|
||||
|
||||
**Output Formats:**
|
||||
- **classification**: Raw classification format
|
||||
- **instruction**: Instruction-following format
|
||||
- **conversation**: Conversational format
|
||||
- **qa**: Question-answer format
|
||||
- **styling**: Raw styling format
|
||||
- **alpaca**: Alpaca instruction format
|
||||
|
||||
### 3. Model Configuration
|
||||
```yaml
|
||||
model:
|
||||
name: "model_name" # Model from HuggingFace Hub
|
||||
max_length: 512 # Maximum sequence length
|
||||
num_labels: 6 # Number of labels (for classification)
|
||||
```
|
||||
|
||||
**Recommended Models by Task:**
|
||||
- **Classification**: `bert-base-uncased`, `distilbert-base-uncased`
|
||||
- **Styling**: `t5-base`, `gpt2-medium`
|
||||
- **Completion**: `gpt2-medium`, `gpt2-large`
|
||||
- **Matching**: `sentence-transformers/all-MiniLM-L6-v2`
|
||||
|
||||
### 4. Training Configuration
|
||||
```yaml
|
||||
training:
|
||||
num_epochs: 3 # Number of training epochs
|
||||
batch_size: 16 # Training batch size
|
||||
learning_rate: 2e-5 # Learning rate
|
||||
weight_decay: 0.01 # Weight decay
|
||||
lr_scheduler_type: "linear" # Learning rate scheduler
|
||||
warmup_ratio: 0.1 # Warmup ratio
|
||||
data_dir: "./data/path" # Training data directory
|
||||
output_dir: "./model/output" # Model output directory
|
||||
```
|
||||
|
||||
**Learning Rate Guidelines:**
|
||||
- **Fine-tuning**: 1e-5 to 5e-5
|
||||
- **Training from scratch**: 1e-4 to 1e-3
|
||||
|
||||
**Scheduler Types:**
|
||||
- **linear**: Linear decay
|
||||
- **cosine**: Cosine annealing
|
||||
- **polynomial**: Polynomial decay
|
||||
|
||||
### 5. Inference Configuration
|
||||
```yaml
|
||||
inference:
|
||||
model_path: "./model/path" # Path to saved model
|
||||
device: "auto" # Device to use
|
||||
batch_size: 32 # Inference batch size
|
||||
return_probabilities: true # Return probabilities
|
||||
return_top_k: 3 # Return top K predictions
|
||||
max_new_tokens: 128 # Max tokens to generate
|
||||
temperature: 0.8 # Sampling temperature
|
||||
```
|
||||
|
||||
**Device Options:**
|
||||
- **auto**: Automatically detect best device
|
||||
- **cuda**: Use GPU if available
|
||||
- **cpu**: Force CPU usage
|
||||
|
||||
**Temperature Guidelines:**
|
||||
- **0.0**: Deterministic (always same output)
|
||||
- **0.7-0.9**: Balanced creativity
|
||||
- **1.0+**: More random/creative
|
||||
|
||||
## Task-Specific Parameters
|
||||
|
||||
### Classification Tasks
|
||||
```yaml
|
||||
data:
|
||||
label_encoding: "auto|numeric|string" # How to encode labels
|
||||
multilabel: false # Multi-label vs single-label
|
||||
label_separator: "," # Separator for multi-label
|
||||
```
|
||||
|
||||
### Styling Tasks
|
||||
```yaml
|
||||
data:
|
||||
instruction: "Style instruction text" # The style instruction
|
||||
```
|
||||
|
||||
### Completion Tasks
|
||||
```yaml
|
||||
data:
|
||||
prompt_template: "template" # Prompt template
|
||||
completion_length: 100 # Target completion length
|
||||
```
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### HuggingFace Specific
|
||||
```yaml
|
||||
data:
|
||||
hf_split: "train" # Dataset split to use
|
||||
hf_cache_dir: "./cache" # Cache directory
|
||||
test_split_from: "train" # Source for test split
|
||||
val_split_from: "train" # Source for validation split
|
||||
```
|
||||
|
||||
### Custom Data Specific
|
||||
```yaml
|
||||
data:
|
||||
encoding: "utf-8" # File encoding
|
||||
delimiter: "," # CSV delimiter
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Usage
|
||||
```bash
|
||||
# Use YAML configuration
|
||||
python scripts/task_type/data_processor.py --config configs/task_type/config.yaml
|
||||
|
||||
# Override specific parameters
|
||||
python scripts/task_type/data_processor.py \
|
||||
--config configs/task_type/config.yaml \
|
||||
--max-samples 1000 \
|
||||
--learning-rate 3e-5
|
||||
```
|
||||
|
||||
### Creating Custom Configurations
|
||||
1. Copy an existing config file
|
||||
2. Modify parameters for your specific use case
|
||||
3. Update paths and model names
|
||||
4. Test with a small dataset first
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Start with Defaults**: Use default values and adjust based on results
|
||||
2. **Validate Paths**: Ensure all file paths are correct and accessible
|
||||
3. **Monitor Resources**: Adjust batch sizes based on available GPU memory
|
||||
4. **Test Incrementally**: Test with small datasets before full processing
|
||||
5. **Version Control**: Keep configurations in version control for reproducibility
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues:
|
||||
- **File Not Found**: Check `data_path` and `output_dir` paths
|
||||
- **Memory Errors**: Reduce `batch_size` or `max_length`
|
||||
- **Poor Performance**: Adjust `learning_rate` or `num_epochs`
|
||||
- **Split Errors**: Ensure split ratios sum to ≤ 1.0
|
||||
|
||||
### Getting Help:
|
||||
- Check the script help: `python script.py --help`
|
||||
- Review the pipeline logs for detailed error messages
|
||||
- Verify YAML syntax and parameter values
|
||||
@@ -1,6 +1,6 @@
|
||||
# Comprehensive Classification Configuration
|
||||
# This file defines all parameters for emotion classification using the dair-ai/emotion dataset
|
||||
# Organized by level: data processing, model, training, and inference
|
||||
# Organized by level: task, data processing, model, training, and inference
|
||||
|
||||
# Task Configuration
|
||||
task:
|
||||
@@ -15,9 +15,9 @@ data:
|
||||
data_format: "jsonl" # Data format: "jsonl", "csv", "json" (for custom data)
|
||||
|
||||
# Field Mapping
|
||||
input_field: "text" # Field name containing input text
|
||||
label_field: "label" # Field name containing labels
|
||||
id_field: null # Optional ID field name
|
||||
input_field: "text" # Field name containing input text to be classified
|
||||
label_field: "label" # Field name containing classification labels
|
||||
id_field: null # Optional ID field name for tracking individual samples
|
||||
|
||||
# Processing Parameters
|
||||
max_samples: 1000 # Maximum samples to process (null for all samples)
|
||||
@@ -26,54 +26,54 @@ data:
|
||||
test_split: 0.1 # Test split ratio (0.0 to 1.0)
|
||||
|
||||
# Text Preprocessing
|
||||
clean_text: true # Clean and normalize text
|
||||
remove_special_chars: false # Remove special characters from text
|
||||
lowercase: true # Convert text to lowercase
|
||||
clean_text: true # Clean and normalize text (remove extra spaces, normalize quotes, etc.)
|
||||
remove_special_chars: false # Remove special characters from text (keep for emotion analysis)
|
||||
lowercase: true # Convert text to lowercase (standard for BERT models)
|
||||
min_length: 10 # Minimum text length (filter out shorter texts)
|
||||
max_length: 1000 # Maximum text length (truncate longer texts)
|
||||
|
||||
# Label Processing
|
||||
label_encoding: "auto" # Label encoding: "auto", "numeric", "string"
|
||||
multilabel: false # Enable multilabel classification
|
||||
label_separator: "," # Separator for multilabel datasets
|
||||
multilabel: false # Enable multilabel classification (false for single emotion per text)
|
||||
label_separator: "," # Separator for multilabel datasets (comma-separated labels)
|
||||
|
||||
# Output Configuration
|
||||
output_format: "classification" # Output format: "classification", "instruction", "conversation", "qa"
|
||||
output_dir: "./data/processed/classification/emotion" # Specific output directory for this dataset
|
||||
output_dir: "./data/processed/classification/emotion" # Output directory for processed data and splits
|
||||
|
||||
# HuggingFace Specific
|
||||
hf_split: "train" # HuggingFace dataset split to use
|
||||
hf_cache_dir: null # HuggingFace cache directory (null for default)
|
||||
hf_split: "train" # HuggingFace dataset split to use as base
|
||||
hf_cache_dir: null # HuggingFace cache directory (null for default ~/.cache/huggingface)
|
||||
|
||||
# Split Configuration (Advanced)
|
||||
test_split_from: "train" # Source for test split: "train", "use_test_if_available", "use_val_if_available"
|
||||
val_split_from: "train" # Source for validation split: "train", "use_val_if_available"
|
||||
|
||||
# Custom Data Specific
|
||||
encoding: "utf-8" # File encoding for custom data
|
||||
delimiter: "," # Delimiter for CSV files
|
||||
encoding: "utf-8" # File encoding for custom data files
|
||||
delimiter: "," # Delimiter for CSV files (comma for standard CSV)
|
||||
|
||||
# Model Configuration
|
||||
model:
|
||||
name: "bert-base-uncased" # Model name from HuggingFace Hub
|
||||
max_length: 512 # Maximum sequence length for tokenization
|
||||
num_labels: 6 # Number of classification labels
|
||||
name: "bert-base-uncased" # Model name from HuggingFace Hub (good for text classification)
|
||||
max_length: 512 # Maximum sequence length for tokenization (BERT limit)
|
||||
num_labels: 6 # Number of classification labels (emotion categories)
|
||||
|
||||
# Training Configuration
|
||||
training:
|
||||
num_epochs: 3 # Number of training epochs
|
||||
batch_size: 16 # Training batch size
|
||||
learning_rate: 2e-5 # Learning rate (typical range: 1e-5 to 5e-5)
|
||||
weight_decay: 0.01 # Weight decay for optimizer (typical range: 0.01 to 0.1)
|
||||
num_epochs: 3 # Number of training epochs (adjust based on dataset size)
|
||||
batch_size: 16 # Training batch size (adjust based on GPU memory)
|
||||
learning_rate: 2e-5 # Learning rate (typical range: 1e-5 to 5e-5 for fine-tuning)
|
||||
weight_decay: 0.01 # Weight decay for optimizer (prevents overfitting)
|
||||
lr_scheduler_type: "linear" # Scheduler type: "linear", "cosine", "polynomial"
|
||||
warmup_ratio: 0.1 # Warmup ratio for scheduler (0.0 to 1.0)
|
||||
data_dir: "./data/processed/classification/emotion" # Directory containing train/validation/test JSONL files
|
||||
output_dir: "./results/classification/emotion_model" # Output directory for saved model
|
||||
output_dir: "./results/classification/emotion_model" # Output directory for saved model and checkpoints
|
||||
|
||||
# Inference Configuration
|
||||
inference:
|
||||
model_path: "./results/classification/emotion_model" # Path to saved model directory
|
||||
device: "auto" # Device: "auto", "cuda", "cpu"
|
||||
batch_size: 32 # Batch size for inference
|
||||
return_probabilities: true # Return all class probabilities
|
||||
return_top_k: 3 # Return top K predictions
|
||||
device: "auto" # Device: "auto", "cuda", "cpu" (auto detects best available)
|
||||
batch_size: 32 # Batch size for inference (can be larger than training)
|
||||
return_probabilities: true # Return all class probabilities (not just top prediction)
|
||||
return_top_k: 3 # Return top K predictions (useful for confidence analysis)
|
||||
|
||||
+60
-20
@@ -1,29 +1,69 @@
|
||||
# Comprehensive Styling Configuration
|
||||
# This file defines all parameters for formal style transfer tasks
|
||||
# Organized by level: task, data processing, model, training, and inference
|
||||
|
||||
# Task Configuration
|
||||
task:
|
||||
name: "styling"
|
||||
type: "style_transfer"
|
||||
name: "styling" # Task type: classification, completion, styling, matching
|
||||
type: "style_transfer" # Model type: style_transfer, text_generation, etc.
|
||||
|
||||
# Data Processing Configuration
|
||||
data:
|
||||
source: "custom"
|
||||
input_field: "text"
|
||||
style_field: "style"
|
||||
max_length: 256
|
||||
train_split: 0.8
|
||||
validation_split: 0.1
|
||||
test_split: 0.1
|
||||
source: "custom" # Data source: "huggingface" or "custom"
|
||||
data_path: "./data/raw/styling/sample_formal.jsonl" # Path to custom data file (required for custom source)
|
||||
dataset_name: null # HuggingFace dataset name (required for huggingface source)
|
||||
|
||||
# Field Mapping
|
||||
input_field: "text" # Field name containing source text to be styled
|
||||
output_field: "styled_text" # Field name containing the styled/transformed text
|
||||
|
||||
# Style Instruction
|
||||
instruction: "Rewrite the following text in a formal style" # The style instruction that guides the transformation
|
||||
|
||||
# Data Format & Processing
|
||||
data_format: "jsonl" # Data format: "jsonl", "csv", "json" (for custom data)
|
||||
max_length: 256 # Maximum text length (truncate longer texts)
|
||||
min_length: 10 # Minimum text length (filter out shorter texts)
|
||||
|
||||
# Text Preprocessing
|
||||
clean_text: true # Clean and normalize text (remove extra spaces, normalize quotes, etc.)
|
||||
lowercase: false # Convert text to lowercase (false for formal style to preserve case)
|
||||
|
||||
# Data Splitting
|
||||
train_split: 0.8 # Training split ratio (0.0 to 1.0)
|
||||
validation_split: 0.1 # Validation split ratio (0.0 to 1.0)
|
||||
test_split: 0.1 # Test split ratio (0.0 to 1.0)
|
||||
|
||||
# Output Configuration
|
||||
output_format: "alpaca" # Output format: "styling" (raw), "alpaca" (instruction format)
|
||||
output_dir: "./data/processed/styling/formal" # Output directory for processed data and HuggingFace datasets
|
||||
|
||||
# Model Configuration
|
||||
model:
|
||||
name: "t5-base"
|
||||
max_length: 256
|
||||
name: "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit" # Model name from HuggingFace Hub
|
||||
max_length: 2048 # Maximum sequence length for tokenization
|
||||
max_seq_length: 2048 # Maximum sequence length for training (RoPE scaling supported)
|
||||
dtype: null # Data type: null for auto detection, float16 for Tesla T4/V100, bfloat16 for Ampere+
|
||||
load_in_4bit: true # Use 4bit quantization to reduce memory usage
|
||||
token: null # HuggingFace token for gated models (e.g., "hf_...")
|
||||
|
||||
# Training Model Parameters
|
||||
training_model: "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit" # Model to use for training
|
||||
training_max_seq_length: 2048 # Max sequence length for training
|
||||
training_dtype: null # Data type for training
|
||||
training_load_in_4bit: true # 4bit quantization for training
|
||||
|
||||
# Training Configuration
|
||||
training:
|
||||
num_epochs: 3
|
||||
batch_size: 16
|
||||
learning_rate: 3e-5
|
||||
weight_decay: 0.01
|
||||
warmup_ratio: 0.1
|
||||
lr_scheduler_type: "linear"
|
||||
num_epochs: 3 # Number of training epochs
|
||||
batch_size: 16 # Training batch size (adjust based on GPU memory)
|
||||
learning_rate: 3e-5 # Learning rate (typical range: 1e-5 to 5e-5 for fine-tuning)
|
||||
weight_decay: 0.01 # Weight decay for optimizer (prevents overfitting)
|
||||
warmup_ratio: 0.1 # Warmup ratio for scheduler (0.0 to 1.0)
|
||||
lr_scheduler_type: "linear" # Scheduler type: "linear", "cosine", "polynomial"
|
||||
|
||||
# Inference Configuration
|
||||
inference:
|
||||
batch_size: 32
|
||||
max_new_tokens: 128
|
||||
temperature: 0.8
|
||||
batch_size: 32 # Batch size for inference (can be larger than training)
|
||||
max_new_tokens: 128 # Maximum new tokens to generate during inference
|
||||
temperature: 0.8 # Sampling temperature (0.0 = deterministic, 1.0 = random)
|
||||
|
||||
Reference in New Issue
Block a user