Files
DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/__pycache__/UnslothPRMTrainer.cpython-310.pyc
T

237 lines
26 KiB
Plaintext
Raw Normal View History

2025-08-28 17:57:59 +00:00
o
=—°hAŸã@dZddlmZddlZddlmZddlmZddlmZm Z m
Z
m Z m Z m
Z
mZmZddlmZmZmZmZmZmZmZm
Z
mZmZmZmZmZmZmZmZmZm Z m Z m!Z!m"Z"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(mZm)Z)m*Z*m+Z+mZm,Z,m
Z
mZmZmZm'Z'm)Z)mZddl)Z)ddlTddl-m.Z.m/Z/dd l0m1Z1ddlZddl2Z3dd
l4m5Z5ddlmZdd l6m7Z7m8Z9d d
d d
d
dœZ:ej;d d e:dddƒZ<e.GdddeƒƒZ= GdddeƒZ>Gddde>ƒZ?dS)z9
2025.8.9
2025.8.10
4.55.4
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)(ÚBaseImageProcessorr Ú DataCollatorÚ"DataCollatorForTokenClassificationÚDatasetÚEvalPredictionÚFeatureExtractionMixinrÚ PRMConfigÚ
PRMTrainerÚ PartialStateÚPathÚ PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚTrainerÚTrainerCallbackrÚchainÚcompute_accuracyÚdisable_dropout_in_modelÚfeaturesÚgenerate_model_cardÚinspectÚis_peft_availableÚis_wandb_availableÚnnÚosÚprepare_model_for_kbit_trainingÚtextwrapÚtorchÚwarningsrrrrr"r%r()Ú*)Ú dataclassÚfield)ÚVersion)Ú nullcontext)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚ max_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ fullgraphÚoptionsc
Ctj| d|jd¡ddd}tj| d¡ddd}g}t||ƒD](\}}| tj¡}tj|d| d¡d  d¡}tj
|dd}||} |  | ¡q! t  |¡}| |jd|jdf¡}|S)Néÿÿÿÿér)ÚchunksÚdim)r:Úindex)r:é)
r(ÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ unsqueezeÚsqueezeÚ logsumexpÚappendÚconcat)
Úlogitsr;Úchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚ chunk_logitsÚ chunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logps©rRúQ/workspace/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothPRMTrainer.pyÚchunked_selective_log_softmax"s  
rTceZdZUdZedddidZeeed<edddidZ ee
ed <eddd
idZ ee
ed <  
                            ! ! " #     $           $      % &  '         (      #    $   ) *       + ,   $    d/‡fd-d.„ Z Z
S)0ÚUnslothPRMConfiga:
Configuration class for the [`PRMTrainer`].
This class includes only the parameters that are specific to PRM training. For a full list of training arguments,
please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may
differ from those in [`~transformers.TrainingArguments`].
Using [`~transformers.HfArgumentParser`] we can turn this class into
[argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
command line.
Parameters:
max_length (`int` or `None`, *optional*, defaults to `1024`):
Maximum length of the sequences (prompt + completion) used for truncation.
max_prompt_length (`int` or `None`, *optional*, defaults to `512`):
Maximum length of the prompt used for truncation.
max_completion_length (`int` or `None`, *optional*, defaults to `None`):
Maximum length of the completion used for truncation. The completion is the concatenation of the steps.
disable_dropout (`bool`, *optional*, defaults to `True`):
Whether to disable dropout in the model.
step_separator (`str`, *optional*, defaults to `"
"`):
Separator used to separate each step of the reasoning process.
train_on_last_step_only (`bool`, *optional*, defaults to `False`):
Whether to train only on the last step.
dataset_num_proc (`int`, *optional*, defaults to `None`):
Number of processes to use for processing the dataset.
helpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsr7z8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksz'Maximum sequence length to truncate to.Úmax_seq_lengthFÚnor8éréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?çlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsr<éôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastéééc |dkr td|dƒ|dkrtd|dƒ|dur(|#dkr(|$dkr(d}d }#|‡dur:d
d lmt|Œƒd d
ƒ}‡tƒjd•id|d|d|d|d|d|d|d|d| “d|
d| d| d|
d|d|d|d|d|d |d!|d"|d#|d$|d%|d&|d'|d(|d)|d*|d+|d,|d-| “d.|!“d/|"“d0|#“d1|$“d2|%“d3|&“d4|'“d5|(“d6|)“d7|*“d8|+“d9|,“d:|-“d;|.“d<|/“d=|0“d>|1“d?|2“d@|3“dA|4“dB|5“dC|6“dD|7“dE|8“dF|9“dG|:“dH|;“dI|<“dJ|=“dK|>“dL|?“dM|@“dN|A“dO|B“dP|C“dQ|D“dR|E“dS|F“dT|G“dU|H“dV|I“dW|J“dX|K“dY|L“dZ|M“d[|N“d\|O“d]|P“d^|Q“d_|R“d`|S“da|T“db|U“dc|V“dd|W“de|X“df|Y“dg|Z“dh|[“di|\“dj|]“dk|^“dl|_“dm|`“dn|a“do|b“dp|c“dq|d“dr|e“ds|f“dt|g“du|h“dv|i“dw|j“dx|k“dy|l“dz|m“d{|n“d||o“d}|p“d~|q“d|r“d€|s“d|t“d|u“dƒ|v“d„|w“d…|x“d†|y“d‡|z“dˆ|{“d‰||“dŠ|}“d|~“dŒ|d|€“dŽ|d|‚“d|ƒ“d‘|„“d’|…“d“|†“d”|‡“|‹¤Ž|ˆ|_|‰|_|Š|_ dS)NgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!r<za` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rjrkÚunsloth_training_checkpointsr\r)Ú cpu_countr8r]Ú
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚ weight_decayÚ
adam_beta1Ú
adam_beta2Ú adam_epsilonÚ
max_grad_normÚnum_train_epochsÚ max_stepsÚlr_scheduler_typeÚ warmup_ratioÚ warmup_stepsÚ log_levelÚlog_level_replicaÚlog_on_each_nodeÚ logging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚ ddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚ disable_tqdmÚremove_unused_columnsÚ label_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚ fsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ adafactorÚgroup_by_lengthÚlength_column_nameÚ report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚ push_to_hubÚresume_from_checkpointÚ hub_model_idÚ hub_strategyÚ hub_tokenÚhub_private_repoÚhub_always_pushÚ hub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚ fp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚ torchdynamoÚ ray_scopeÚ ddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚ
max_lengthÚmax_prompt_lengthÚmax_completion_lengthÚdisable_dropoutÚstep_separatorÚtrain_on_last_step_onlyÚdataset_num_procrR)
ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingryÚmaxÚsuperÚ__init__rYrZr[)Úselfrzr{r|r}r~rr€rrr„r…r†r‡r‰rrrrrrr“r”r•r–r—r™rrr r­r¿rÿrrYrZr[Úkwargsry©Ú __class__rRrSr^sX  ÿþýüûúùø ÷
ö õ ô
óòñðïîíìëêéèçæåäãâá à!ß"Þ#Ý$Ü%Û&Ú'Ù(Ø)×*Ö+Õ,Ô-Ó.Ò/Ñ0Ð1Ï2Î3Í4Ì5Ë6Ê7É8È9Ç:Æ;Å<Ä=Ã>Â?Á@ÀA¿B¾C½D¼E»FºG¹H¸I·JKµL´M³N²O±P°Q¯R®S­T¬U«VªW©X¨Y§Z¦[¥\¤]£^¢_¡` aŸbžcdœefšgh˜ijklmnopqrŽstŒuvŠwxˆyz{|}ƒ~ÿþýüûúùø 
zUnslothPRMConfig.__init__)ŠNNFFFr\Fr8r8NNr]r]rr^r_r`rarbrcrdrer7rfrgrrhriTNrjFr<FrjrkNTFFFFFFrlrlFFFFrmrnFFNr7NNFroFNrNr7NNTNFNNFrorNNNNrprqNFFrrNNNNTFTFFNNrsNNFNFNFTrnNNNroTFNrtruFNNFFNNFFFNFTrvrwNTroFNNr7N)Ú__name__Ú
__module__Ú __qualname__Ú__doc__r,rYrrÚ__annotations__rZÚintr[rÚ
__classcell__rRrRr rSrU3s4
þþþórUcsLeZdZdZddgZ            ddeeeej fdee
dee d ee d
eee e
ee ffd eeeeeefd eegefd
eeege
fdeeedeejjejjjfdeeejejgejfdee
ffdd
ZeddƒZ ‡fddZ!   ddeedeedeeeedffddZ"‡Z#S)Ú_UnslothPRMTrainerroÚtrlÚprmN©NNÚmodelÚargsÚ
data_collatorÚ
train_datasetÚ eval_datasetÚprocessing_classÚ
model_initÚcompute_metricsÚ callbacksÚ
optimizersÚpreprocess_logits_for_metricsÚ peft_configc
s(tƒs | dur tdƒtƒrU| durUt|tƒsUt|ddƒs#t|ddƒrSdtt t¡j ƒv}
d|j
i}|
s?|j dur?t  
d¡n |
rK|j durK|j |d<t|fi|¤Ž}|}|jr\t|ƒ|durbt}|duru|durntdƒt||jd }d
|jvrñtƒ ¡j||j|j|j|j|jd œ}i|¥d di¥}|j|j||j|jd
t t t  d¡¡t t  d¡¡dœ¡d}i|¥d di¥}|durâ|j|j||j|jdt t t  d¡¡t t  d¡¡dœ¡d}Wdƒn1sìwYt!ƒj"||||||||| |
| d t#|j$dƒr|j$ %|j&¡dSdS)NzvPEFT is not installed and you passed a `peft_config` in the trainer's kwargs, please install it to use the PEFT modelsÚis_loaded_in_8bitFÚ is_quantizedràÚuse_gradient_checkpointingzÂYou passed `gradient_checkpointing_kwargs` in the trainer's kwargs, but your peft version does not support it. please update to the latest version of peft to use `gradient_checkpointing_kwargs`.z^A processing_class must be specified when using the default DataCollatorForTokenClassification)Ú input_ids)Ú tokenizerrþrÿÚis_evalzTokenizing train datasetÚint64)Úlabelsr%)Ú fn_kwargsÚnum_procÚremove_columnsÚdescrTzTokenizing eval dataset) rrrrrrrrrrr Úadd_model_tags)'r"Ú
ValueErrorÚ
isinstancerÚgetattrÚlistr!Ú signaturer&Ú
parametersrßr)ÚwarnrýrrrÚ column_namesrÚmain_process_firstrþrÿÚmapÚ tokenize_rowrrÚFeaturesÚSequenceÚValuerrÚhasattrrr.Ú
_tag_names)rrrrrrrrrrrr r!Ú_supports_gc_kwargsÚprepare_model_kwargsr*Útrain_fn_kwargsÚeval_fn_kwargsr rRrSrˆÿ

ÿ
ÿ
ÿ
 úþÿúþÿúæ(õÿz_UnslothPRMTrainer.__init__c
sJˆ|dddd}fdd|dDƒ} |r.|s.dgt|d ƒd
t|d d ƒg}
n d d|d Dƒ}
ˆj|ddfd
d| Dƒ} ddt| |
ƒDƒ}
tt| Žƒ} tt|
Žƒ}
ˆjdurhˆjg|}|durs|| d}|durƒ| d|} |
d|}
|| } dgt|ƒ|
}
|dur | d|} |
d|}
| |
dœS)a/
Tokenize a row of the dataset.
Args:
features (`dict[str, str]`):
Row of the dataset, should contain the keys `"prompt"`, `"completions"`, and `"labels"`.
tokenizer (`PreTrainedTokenizerBase`):
Tokenizer used to process the data.
step_separator (`str`):
Separator between steps in the completion.
max_length (`int` or `None`):
Maximum length of the sequences (prompt + completion). If `None`, the sequences are not truncated.
max_prompt_length (`int` or `None`):
Maximum length of the prompt. If `None`, the prompt is not truncated.
max_completion_length (`int` or `None`):
Maximum length of the completion sequences. If `None`, the completion sequences are not truncated.
train_on_last_step_only (`bool`):
Whether to train only on the last step. If `True`, the labels are `-100` for all tokens except the last
token of the completion.
is_eval (`bool`):
Whether the function is used to tokenize samples from a training or an evaluation dataset. Used only if
`train_on_last_step_only` is set to `True`.
Returns:
`dict[str, list[int]]`:
Tokenized sequences with the keys `"input_ids"`, and `"labels".
Example:
```python
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
>>> features = {
... "prompt": "Which number is larger, 9.8 or 9.11?",
... "completions": ["11 is greater than 8.", "Hence, 9.11 > 9.8."],
... "labels": [True, False],
... }
>>> PRMTrainer.tokenize_row(
... features, tokenizer, "\n", max_completion_length=None, train_on_last_step_only=False, is_eval=False
... )
{'input_ids': [23085, 1372, 374, 8131, 11, 220, 24, 13, 23, 476, 220, 24, 13, 16, 16, 30, 16, 16, 374, 7046, 1091, 220, 23, 13, 198, 39, 763, 11, 220, 24, 13, 16, 16, 861, 220, 24, 13, 23, 13, 198],
'labels': [-100, -100, -100, -100, -100, -100, -100, -100, 1, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 0]}
```
ÚpromptF©Úadd_special_tokensr%csg|]
}ˆ|dddqS)FrDr%rR©Ú.0Ú
completion)r&rRrSÚ
<listcomp>7sÿz3_UnslothPRMTrainer.tokenize_row.<locals>.<listcomp>Ú completionséœÿÿÿr)r<r7cSsg|]}t|ƒqSrR)r)rGÚlabelrRrRrSrI=ócsg|]}|ˆqSrRrRrF)Ú
separator_idsrRrSrIArMcSs(g|]\}}dgt|ƒd|gqS)rKr<)Úlen)rGrHrLrRrRrSrIDs(N)r%r))rOrÚencoder@r2rÚ bos_token_id)
rr&rÿr'Ú
prompt_idsÚcompletions_idsr)Úcompletion_idsr%rR)rNr&rSr9þs28
ÿ*  
     
z_UnslothPRMTrainer.tokenize_rowcsL|jjdurt|jjƒj}n |jj d¡d}|j|dtƒ ||¡dS)/r7)Ú
model_name) rrrzÚnameÚsplitÚcreate_model_cardrÚ_save_checkpoint)rrÚtrialrVr rRrSrZ^s
 z#_UnslothPRMTrainer._save_checkpointrVÚ dataset_nameÚtagsc C| ¡sdSt|jjdƒrtj |jjj¡s|jjj}nd}|dur&tƒ}n
t |t
ƒr/|h}nt|ƒ}t|jjdƒr?|  d¡|  |j
¡t d¡}t|||j||tƒr]tjdur]tjjndd|dd }| tj |jjd ¡¡dS)

Creates a draft of a model card using the information available to the `Trainer`.
Args:
model_name (`str` or `None`, *optional*, defaults to `None`):
Name of the model.
dataset_name (`str` or `None`, *optional*, defaults to `None`):
Name of the dataset used for training.
tags (`str`, `list[str]` or `None`, *optional*, defaults to `None`):
Tags to be associated with the model card.
_name_or_pathÚunsloth_versionÚunslotha² @article{uesato2022solving,
title = {{Solving Math Word Problems With Process- and Outcome-Based Feedback}},
author = {Uesato, Jonathan and Kushman, Nate and Kumar, Ramana and Song, Francis and Siegel, Noah and Wang, Lisa and Creswell, Antonia and Irving, Geoffrey and Higgins, Irina},
year = 2022,
journal = {arXiv preprint arXiv:2211.14275}
PRMzBSolving math word problems with process-and outcome-based feedback) Ú
base_modelrVr\r]Ú wandb_urlÚ trainer_nameÚtrainer_citationÚ paper_titlez README.md)Úis_world_process_zeror=rÚconfigr%ÚpathÚisdirr^Úsetr0ÚstrÚaddÚupdater>r'Údedentr r#ÚwandbÚrunÚurlÚsaveÚjoinrrz)rrVr\r]rbÚcitationÚ
model_cardrRrRrSrYfs4  

 
÷ z$_UnslothPRMTrainer.create_model_card) NNNNNNNNNrNN)NNN)$r r r
rr>rrrr$ÚModulerr
rÚdictrlrr rrr rr2rÚtupler(Ú OptimizerÚ lr_schedulerÚLambdaLRrrÚ staticmethodr9rZrYrrRrRr rSrƒsnîþýüûúÿù
ö õ
ô
óïîv
 _
üþýürcs8eZdZdZ           dfdd„ ZZS)ÚUnslothPRMTrainera@
Initialize PRMTrainer.
Args:
model (`transformers.PreTrainedModel`):