Files
DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/__pycache__/UnslothKTOTrainer.cpython-310.pyc
T

475 lines
55 KiB
Plaintext
Raw Normal View History

2025-08-28 17:57:59 +00:00
o
2025-08-28 22:41:56 +00:00
ö×°hB‡ã@sBdZddlmZddlZddlmZddlmZddlmZm Z m
2025-08-28 17:57:59 +00:00
Z
m Z m Z m
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZmZmZm Z m!Z!m"Z"m#Z#m$Z$m%Z%m Z m&Z&m'Z'm(Z(m)Z)m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0m1Z1m2Z2m3Z3m4Z4m5Z5m6Z6m7Z7m8Z8m9Z9m:Z:m;Z;mZm<Z<m=Z=m>Z>m?Z?m@Z@mAZAmBZBmCZCmDZDmEZEmFZFmZmGZGmHZHmZm
Z
mZmZm#Z#m5Z5m>Z>mZddl>Z>ddlTddlImJZJmKZKdd lLmMZMddlZddlNZ<dd
lOm=Z=ddlmZdd lPmQZQmRZSd d
d d
d
dœZTejUd d eTdddƒZVeJGdddeƒƒZW Gddde#ƒZXGdddeXƒZYdS)z9
2025.8.9
2025.8.10
4.55.4
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)GrÚAutoModelForCausalLMÚBaseImageProcessorr ÚDPODataCollatorWithPaddingÚ DataCollatorÚ
DataLoaderÚDatasetÚEvalLoopOutputÚFeatureExtractionMixinÚ KTOConfigÚ
KTOTrainerÚLiteralrÚ PartialStateÚPathÚ PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚSequentialSamplerÚTrainerÚTrainerCallbackÚTrainingArgumentsrÚ_get_kl_datasetÚ_process_tokensÚ _tokenizeÚautocastÚconcatenate_datasetsÚcontextmanagerÚcreate_reference_modelÚ defaultdictÚdisable_dropout_in_modelÚgenerate_model_cardÚget_comet_experiment_urlÚ
has_lengthÚinspectÚis_comet_availableÚis_liger_kernel_availableÚis_peft_availableÚis_wandb_availableÚ
itemgetterÚlog_table_to_comet_experimentÚmaybe_apply_chat_templateÚmaybe_extract_promptÚmaybe_unpair_preference_datasetÚnnÚnpÚ nullcontextÚosÚ
pad_to_lengthÚpdÚpeft_module_casting_to_bf16Úprepare_deepspeedÚprepare_model_for_kbit_trainingÚrandomÚselective_log_softmaxÚtextwrapÚtorchÚtqdmÚwarningsrrrrrr1r;rD)Ú*)Ú dataclassÚfield)ÚVersion)r:)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚ max_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ fullgraphÚoptionsc
Ctj| d|jd¡ddd}tj| d¡ddd}g}t||ƒD](\}}| tj¡}tj|d| d¡d  d¡}tj
|dd}||} |  | ¡q! t  |¡}| |jd|jdf¡}|S)Néÿÿÿÿér)ÚchunksÚdim)rVÚindex)rVé)
rDÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ unsqueezeÚsqueezeÚ logsumexpÚappendÚconcat)
ÚlogitsrWÚchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚ chunk_logitsÚ chunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logps©rnúQ/workspace/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothKTOTrainer.pyÚchunked_selective_log_softmax"s  
rpceZdZUdZedddidZeeed<edddidZ ee
ed <eddd
idZ ee
ed <  
                            ! ! " #     $           $      % &  '         (      #    $   ) *       + ,   -   .  /      0   d3‡fd1d2„ Z Z
S)4ÚUnslothKTOConfiguÐ
Configuration class for the [`KTOTrainer`].
This class includes only the parameters that are specific to KTO training. For a full list of training arguments,
please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may
differ from those in [`~transformers.TrainingArguments`].
Using [`~transformers.HfArgumentParser`] we can turn this class into
[argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
command line.
Parameters:
max_length (`int` or `None`, *optional*, defaults to `1024`):
Maximum length of the sequences (prompt + completion) in the batch. This argument is required if you want
to use the default data collator.
max_prompt_length (`int` or `None`, *optional*, defaults to `512`):
Maximum length of the prompt. This argument is required if you want to use the default data collator.
max_completion_length (`int` or `None`, *optional*, defaults to `None`):
Maximum length of the completion. This argument is required if you want to use the default data collator
and your model is an encoder-decoder.
beta (`float`, *optional*, defaults to `0.1`):
Parameter controlling the deviation from the reference model. Higher β means less deviation from the
reference model.
loss_type (`str`, *optional*, defaults to `"kto"`):
Type of loss to use. Possible values are:
- `"kto"`: KTO loss from the [KTO](https://huggingface.co/papers/2402.01306) paper.
- `"apo_zero_unpaired"`: Unpaired variant of APO-zero loss from the
[APO](https://huggingface.co/papers/2408.06266) paper.
desirable_weight (`float`, *optional*, defaults to `1.0`):
Desirable losses are weighed by this factor to counter unequal number of desirable and undesirable paris.
undesirable_weight (`float`, *optional*, defaults to `1.0`):
Undesirable losses are weighed by this factor to counter unequal number of desirable and undesirable pairs.
label_pad_token_id (`int`, *optional*, defaults to `-100`):
Label pad token id. This argument is required if you want to use the default data collator.
padding_value (`int` or `None`, *optional*, defaults to `None`):
Padding value to use. If `None`, the padding value of the tokenizer is used.
truncation_mode (`str`, *optional*, defaults to `"keep_end"`):
Truncation mode to use when the prompt is too long. Possible values are `"keep_end"` or `"keep_start"`.
This argument is required if you want to use the default data collator.
generate_during_eval (`bool`, *optional*, defaults to `False`):
If `True`, generates and logs completions from both the model and the reference model to W&B or Comet
during evaluation.
is_encoder_decoder (`bool` or `None`, *optional*, defaults to `None`):
When using the `model_init` argument (callable) to instantiate the model instead of the `model` argument,
you need to specify if the model returned by the callable is an encoder-decoder model.
precompute_ref_log_probs (`bool`, *optional*, defaults to `False`):
Whether to precompute reference model log probabilities for training and evaluation datasets. This is
useful when training without the reference model to reduce the total GPU memory needed.
model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the model from a
string.
ref_model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the reference model
from a string.
dataset_num_proc: (`int` or `None`, *optional*, defaults to `None`):
Number of processes to use for processing the dataset.
disable_dropout (`bool`, *optional*, defaults to `True`):
Whether to disable dropout in the model and reference model.
use_liger_loss (`bool`, *optional*, defaults to `False`):
Whether to use Liger loss. It requires liger-kernel to be installed.
base_model_attribute_name (`str`, *optional*, defaults to `"model"`):
Name of the attribute in the model that contains the base model. This is used to get the base model from
the model when the model does not have a `get_decoder` method in the case when `use_liger_loss` is `True`.
helpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsrSz8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksz'Maximum sequence length to truncate to.Úmax_seq_lengthFÚnorTéréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?çlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsrXéôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastéééÚktoéœÿÿÿÚkeep_endÚmodelc— s|dkr td|dƒ|dkrtd|dƒ|dur(|#dkr(|$dkr(d}d }#|dur:d
d lm}˜t|˜ƒd d
ƒ}tƒjd¡id|d|d|d|d|d|d|d|d| “d|
d| d| d|
d|d|d|d|d|d |d!|d"|d#|d$|d%|d&|d'|d(|d)|d*|d+|d,|d-| “d.|!“d/|"“d0|#“d1|$“d2|%“d3|&“d4|'“d5|(“d6|)“d7|*“d8|+“d9|,“d:|-“d;|.“d<|/“d=|0“d>|1“d?|2“d@|3“dA|4“dB|5“dC|6“dD|7“dE|8“dF|9“dG|:“dH|;“dI|<“dJ|=“dK|>“dL|?“dM|@“dN|A“dO|B“dP|C“dQ|D“dR|E“dS|F“dT|G“dU|H“dV|I“dW|J“dX|K“dY|L“dZ|M“d[|N“d\|O“d]|P“d^|Q“d_|R“d`|S“da|T“db|U“dc|V“dd|W“de|X“df|Y“dg|Z“dh|[“di|\“dj|]“dk|^“dl|_“dm|`“dn|a“do|b“dp|c“dq|d“dr|e“ds|f“dt|g“du|h“dv|i“dw|j“dx|k“dy|l“dz|m“d{|n“d||o“d}|p“d~|q“d|r“d€|s“d|t“d|u“dƒ|v“d„|w“d…|x“d†|y“d‡|z“dˆ|{“d‰||“dŠ|}“d|~“dŒ|d|€“dŽ|d|‚“d|ƒ“d‘|„“d’|…“d“|†“d”|‡“d•|ˆ“d–|‰“d—|Š“d˜|‹“d™|Œ“dš|d›|Ž“dœ|d|dž|‘“dŸ|’“d |““|—¤Ž|”|_|•|_||_ dS)¢NgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!rXza` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!r†r‡Úunsloth_training_checkpointsrxr)Ú cpu_countrTryÚ
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚ weight_decayÚ
adam_beta1Ú
adam_beta2Ú adam_epsilonÚ
max_grad_normÚnum_train_epochsÚ max_stepsÚlr_scheduler_typeÚ warmup_ratioÚ warmup_stepsÚ log_levelÚlog_level_replicaÚlog_on_each_nodeÚ logging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚ ddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚ disable_tqdmÚremove_unused_columnsÚ label_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚ fsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ adafactorÚgroup_by_lengthÚlength_column_nameÚ report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚ push_to_hubÚresume_from_checkpointÚ hub_model_idÚ hub_strategyÚ hub_tokenÚhub_private_repoÚhub_always_pushÚ hub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚ fp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚ torchdynamoÚ ray_scopeÚ ddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚ
max_lengthÚmax_prompt_lengthÚmax_completion_lengthÚbetaÚ loss_typeÚdesirable_weightÚundesirable_weightÚlabel_pad_token_idÚ
padding_valueÚtruncation_modeÚgenerate_during_evalÚis_encoder_decoderÚdisable_dropoutÚprecompute_ref_log_probsÚmodel_init_kwargsÚref_model_init_kwargsÚdataset_num_procÚuse_liger_lossÚbase_model_attribute_namern)
ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingr™ÚmaxÚsuperÚ__init__rurvrw)™Úselfršrrr r­r¿rÿrrrrrrrrrr r
r r r
rrrrrrrrrrrrrrrrrrr r!r"r#r$r%r&r'r(r)r*r+r,rurvrwÚkwargsr™©Ú __class__rnror2s¸  ÿþýüûúùø ÷
ö õ ô
óòñðïîíìëêéèçæåäãâá à!ß"Þ#Ý$Ü%Û&Ú'Ù(Ø)×*Ö+Õ,Ô-Ó.Ò/Ñ0Ð1Ï2Î3Í4Ì5Ë6Ê7É8È9Ç:Æ;Å<Ä=Ã>Â?Á@ÀA¿B¾C½D¼E»FºG¹H¸I·JKµL´M³N²O±P°Q¯R®S­T¬U«VªW©X¨Y§Z¦[¥\¤]£^¢_¡` aŸbžcdœefšgh˜ijklmnopqrŽstŒuvŠwxˆyz{|}ƒ~ÿþýüûúùø ÷
ö õ ô
óòñðïîíì
zUnslothKTOConfig.__init__)NNFFFrxFrTrTNNryryrrzr{r|r}r~rr€rrSrrr„r…TNr†FrXFr†r‡NTFFFFFFrˆrˆFFFFr‰FFNrSNNFrFNrNrSNNTNFNNFrrNNNNrŒrNFFrŽNNNNTFTFFNNrNNFNFNFTrŠNNNrTFNrrFNNFFNNFFFNFTrr“Nrƒr”r€r€r•NrFNTFNNNFr—NrSN)Ú__name__Ú
__module__Ú __qualname__Ú__doc__rIrurrÚ__annotations__rvÚintrwr2Ú
__classcell__rnrnr5rorq3sL
Dþþþèrqc eZdZdZddgZ               d^deeeje fde
eeeje fde d e
e d
e
ee e
e e ffd e
eeeeefd e
ed
e
egefde
eedeejjejjjfde
eejejgejfde
e
de
eege
fde
e de
e ffdd
ZeddƒZ de!ffdd Z"d_d
e
e de!ffdd
Z#de
de
fdd „Z$e% ! " !d`d#ej&d$ej'd%e(d&e)d'e(dej&f d(d)„ƒZ*dejd*e
e eeej'ffdeej&ej&ej&ej&ffd+d,„Z+d-ej&d.ej&d/ej&d0ej&d1ej&d2ej&deej&ej&ej&ej&ffd3d4„Z,d5d6„Z-d7d8„Z.d*e
e eeej'fffd9d:„Z/ ! dadeeejfd;e
e eeje0ffdeejeeje
e ejffffd<d=„Z1dbd?e
e e2fd@e3dAddfdBdC„Z4d_dDe
e de
ej5j6j7fdEdF„Z8d*e
e ej'fdee e ffdGdH„Z9 d_deeejfd;e
e eeje0ffdIe(dJe
ee fdKdL„Z:   MdcdNe!dOe dIe
e(dJe
ee dPe def fdQdR„
Z;d_dSe
e e2fdTe
e2ddffdUdV„
Z<‡fdWdX„Z=   dddYe
e dZe
e d[ee ee dffd\d]„Z>‡Z?S)eÚ_UnslothKTOTrainerrÚtrlr”NNr—Ú ref_modelÚargsÚ
train_datasetÚ eval_datasetÚprocessing_classÚ
data_collatorÚ
model_initÚ callbacksÚ
optimizersÚpreprocess_logits_for_metricsÚ peft_configÚcompute_metricsÚmodel_adapter_nameÚref_adapter_namec$
t|ƒtur
tdƒt|tƒs||urtdƒ|jduri}n9t|tƒs(tdƒ|j}| d¡}|durXt|tƒrB|dkrBtt|ƒ}|dkrTt|tj ƒsTtd|dƒ||d<|j
dur`i}n9t|tƒsitdƒ|j
}| d¡}|dur™t|tƒrƒ|dkrƒtt|ƒ}|dkr•t|tj ƒs•td|dƒ||d<t|tƒr§t j |fi|¤Ž}t|tƒrµt j |fi|¤Ž}d |_
tƒsÃ| durÃtd
ƒtƒr5| dur5t|tƒrÕ| ¡}t|d d ƒsât|d d ƒrt|d
ƒoðd
tt t¡jƒv}d|ji}|rý|j|d
<t|fi|¤Ž}n|jr t|dƒr| ¡n dd}| ¡ |¡|}|jr4t|d d ƒr4t|ƒd|_
n|jrOt|dƒrD| ¡n dd}| ¡ |¡|jr_tƒs_t ƒs_tdƒ|durj|j!j"|_"n|j"durttdƒ|j"|_"tƒo€t|tƒ|_#||_$||_%|r||_&n|j#s—|j'rd|_&nt(|ƒ|_&|dur©tdƒ|j)dur·t* +dt,¡d}|j)durÀ|j)}|j-durÎt* +dt,¡d}|j-dur×|j-}d}|j.durë|j"rët* +dt,¡d}|j.durø|j"rø|j.}|durt/|j0|j1|j"d}|j2rd |_2t* +dt,¡d|_3nd |_3|j4r.t5|ƒ|j&dur.t5|j&ƒ|j6|_6||_)|j|_|j1|_1|j7durF|j7n|j0|_7||_-|j8|_8||_.||_9|j'|_'d|_:|j6dvrgd |_:d |_;d |_<t=dd „ƒ|_>|j?|_?|j@|_@|jA|_At|j!d!d ƒ|_Bt|j!d"d#ƒ|_C|jBr |jCd#kr t* +d$t,¡d|jDd%<tEƒ qˆjGtH|jId&d'tJˆ|jId(d)ˆjGtKd*|i|jId+d,ˆdurëˆjGtH|jId-d'tJˆ|jId.d)ˆjGtKd*|i|jId/d,ˆjGtLdd*|j9i|jId0d1d2|j"|j9|j)|j8|j1|j-|j.d3œ}ˆjGtM||jId4d,ˆdur2ˆjGtLd*|j9id|jId5d6ˆjGtM||jId7d,|j:r—|jNd8kr@td9ƒˆjGtOd|jN|jId:d;}d<|d=<|jGtM||jI‡fd>d?„|jPDƒd@dA}tQˆ|gd8dBˆdur—ˆjGtOd|jN|jIdCd;}|jGtM||jI‡fdDd?„|jPDƒdEdA}tQˆ|gd8dBtRtSˆdFƒd8ƒ}tRtTˆdFƒ|d8ƒ}||krtU||jA|d8dGƒ}tU||jA|dHdGƒ}tU||j@|dHdGƒ} tU||j@|d8dGƒ}!||j@koë|kn}"| |jAkoù|!kn}#|"s|#st* +dI|dJ|dK| dJ|!dL t,¡Wdƒn 1s wYtVƒjW|||ˆˆ|||
| |
| dM d |_Xt|jYdNƒrG|jY Z|j[¡t|dOƒsQt\dPƒ|j]rf|j^j_j`jadQkrf|j'rftdRƒ|j&dury|j#sx|j'sxtdSƒn|j]r†tb|j&|j^ƒ|_&n
|j^jc|j&ddT|_&|jdjerÌtfƒstgdUƒ|j6dvr§tdVƒ|j'r¯tdWƒ|j#s¹|j%dur½tdXƒth|j1|j?|j&dudY|_idSdS)ZNz1Please use `KTOConfig` instead TrainingArguments.zœ`model` and `ref_model` cannot be the same object. If you want `ref_model` to be the same as `model`, you must mass a copy of it, or `None` if you use peft.zRYou passed model_kwargs to the KTOTrainer. But your model is already instantiated.Ú torch_dtyperŠznInvalid `torch_dtype` passed to the KTOConfig. Expected a string with either `torch.dtype` or 'auto', but got Ú.zZYou passed ref_model_kwargs to the KTOTrainer. But your ref_model is already instantiated.FzŽPEFT is not installed and you passed a `peft_config` in the trainer's kwargs, please install it with `pip install peft` to use the PEFT modelsÚis_loaded_in_8bitÚis_loaded_in_4bitrÚuse_gradient_checkpointingÚenable_input_require_gradscSó| d¡dS©NT©Úrequires_grad_©ÚmoduleÚinputÚoutputrnrnroÚmake_inputs_require_grad-óz=_UnslothKTOTrainer.__init__.<locals>.make_inputs_require_gradTcSrUrVrWrYrnrnror]Br^z`generate_during_eval=True` requires Weights and Biases or Comet to be installed. Please install `wandb` or `comet-ml` to resolve.zMWhen no model is provided, you need to pass the parameter is_encoder_decoder.zdmax_length or a processing_class must be specified when using the default DPODataCollatorWithPaddingz¬When using DPODataCollatorWithPadding, you should set `max_length` in the KTOTrainer's init it will be set to `512` by default, but you should do it yourself in the future.r“z³When using DPODataCollatorWithPadding, you should set `max_prompt_length` in the KTOTrainer's init it will be set to `128` by default, but you should do it yourself in the future.é€zÜWhen using DPODataCollatorWithPadding with an encoder decoder architecture, you should set `max_completion_length` in the KTOTrainer's init it will be set to `128` by default, but you should do it yourself in the future.)Ú pad_token_idr!r%zªWhen using DPODataCollatorWithPadding, you should set `remove_unused_columns=False` in your KTOConfig we have set it for you, but you should do it yourself in the future.)Úapo_zero_unpairedcSsttƒS©N)r)ÚlistrnrnrnroÚ<lambda>³sz-_UnslothKTOTrainer.__init__.<locals>.<lambda>Úoutput_router_logitsÚrouter_aux_loss_coefrŒa-You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to `0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary loss.Úestimate_tokensz$Extracting prompt from train dataset)Únum_procÚdesczUnpairing train dataset)riÚ tokenizerz'Applying chat template to train dataset)Ú fn_kwargsrhriz#Extracting prompt from eval datasetzUnpairing eval datasetz&Applying chat template to eval datasetzTokenizing train dataset)Úbatchedrkrhrir)Úprefixr%rjrr#r!rrz"Processing tokenized train datasetzTokenizing eval dataset)rkrlrhriz!Processing tokenized eval datasetrXz‡Actual (not effective) batch size must be > 1. KTO will not work properly because the KL term will be equivalent to the implied reward.zExtracting KL train dataset)rlÚ
batch_sizerhriÚKL_rmcóg|] }|ˆjvr|qSrn©Ú column_names©Ú.0Úc)rCrnroÚ
<listcomp>/óz/_UnslothKTOTrainer.__init__.<locals>.<listcomp>z%Processing tokenized train KL dataset)rkrhÚremove_columnsri)ÚaxiszExtracting eval KL datasetcrprnrqrs©rDrnrorvDrwz$Processing tokenized eval KL datasetÚlabelrygHáz®Gõ?zìYou have different amounts of desirable/positive and undesirable/negative examples but the weights on the desirable and undesirable losses don't seem to be in an ideal range. Based on your data, we recommend EITHER desirable_weight in [z, z] or undesirable_weight in [zN] (but NOT BOTH). See the documentation on how to optimally set these weights.) r—rBrFrCrDrErGrLrHrIrJÚadd_model_tagsÚ acceleratorzXYour `Trainer` does not have an `accelerator` object. Consider upgrading `transformers`.ézrYou cannot use `precompute_ref_log_probs=True` with Deepspeed ZeRO-3. Please set `precompute_ref_log_probs=False`.z]No reference model and model is not a Peft model. Try setting `precompute_ref_log_probs=True`)Úevaluation_modezYou set `use_liger_loss=True` but the liger kernel is not available. Please install liger-kernel first: `pip install liger-kernel`znYou cannot set `loss_type='apo_zero_unpaired'` with liger-kernel.Only KTO loss is supported with liger-kernel.znYou cannot use `precompute_ref_log_probs=True` with liger kernel. Please set `precompute_ref_log_probs=False`.zYYou cannot use `use_liger_loss=True` with Peft models. Please set `use_liger_loss=False`.)Ú ignore_indexrÚ
use_ref_model)jÚtyper!Ú
ValueErrorÚ
isinstanceÚstrr(ÚgetÚgetattrrDÚdtyper)r Úfrom_pretrainedÚ_peft_has_been_casted_to_bf16r1rÚmerge_and_unloadÚhasattrrcr.Ú signaturer@Ú
parametersrÿrrTÚget_input_embeddingsÚregister_forward_hookrÊr>r$r2r/Úconfigr%Ú
is_peft_modelrMrNrAr'r(rrFÚwarnÚ UserWarningrrrr`r!Úuse_dpo_data_collatorr&r*rr"r#rEÚ calculate_KLÚ _precomputed_train_ref_log_probsÚ_precomputed_eval_ref_log_probsr)Ú_stored_metricsrrr Úaux_loss_enabledÚ
aux_loss_coefÚwarnings_issuedrÚmain_process_firstÚmapr6r*r7r5r$r#r"rrr&r0ÚsumÚlenÚroundr1r2Úmodel_accepts_loss_kwargsr—r|Ú
_tag_namesÚAttributeErrorÚis_deepspeed_enabledr}ÚstateÚdeepspeed_pluginÚ
zero_stager?Ú
prepare_modelrBr+r0Ú ImportErrorÚLigerFusedLinearKTOLossÚ kto_loss_fn)$r3r—rArBrCrDrErFrGrHrIrJrKrLrMrNr(rOr)Ú_support_gc_kwargsÚprepare_model_kwargsr]rrrrkÚtrain_kl_datasetÚeval_kl_datasetÚ
num_desirableÚnum_undesirableÚdes_weight_lower_boundÚdes_weight_upper_boundÚund_weight_lower_boundÚund_weight_upper_boundÚdes_weight_in_rangeÚund_weight_in_ranger5)rDrCror2Æs´ ÿ




ÿ

ÿ


ÿ

ÿ
ÿ
ÿþ

 
 
ÿ
  

ÿ ý  ý ý
ýý 
 û
ÿÿü
ÿÿüû ø ü
ûü ÿûû
ûû 
 ýýüüùôõ ÿÿ ÿ
ÿ ÿÿÿ ÿìz_UnslothKTOTrainer.__init__cc|jr|js|j |j¡ ¡ntƒ*|jr|j |j¡dV|jr5|j |jp+d¡WdƒdSWdƒdS1s@wYdS)zWContext manager for handling null reference model (that is, peft adapter manipulation).Nrs) rrNr}Ú unwrap_modelr—Údisable_adapterr:Ú set_adapterrM©r3rnrnroÚnull_ref_context«sÿÿý÷"øz#_UnslothKTOTrainer.null_ref_contextÚreturnc|jry|jsy|jj|j|jj|jjddœ}|j t |j
fi|¤Ž¡}g}g}t |ddD]&}|  |¡\}}|j 
|¡}| | ¡¡|jrR|j 
|¡}| | ¡¡q,|j
jdt |¡ ¡ ¡d|_
|jrv|j
jdt |¡ ¡ ¡d|_
d|_tƒ ¡S) z·
Returns the training [`~torch.utils.data.DataLoader`].
Subclass of transformers.src.transformers.trainer.get_train_dataloader to precompute `ref_log_probs`.
rnÚ
collate_fnÚ num_workersÚ
pin_memoryÚshufflez!Train dataset reference log probs©ÚiterableriÚreference_logps©ÚnameÚcolumnÚreference_KL_logpsT)r'r—rBrFr}ÚpreparerrCrEÚcompute_reference_log_probsÚgather_for_metricsrcÚcpurÚ
add_columnrDÚcatÚfloatÚnumpyr1Úget_train_dataloader)r3Údataloader_paramsÚ data_loaderÚreference_completion_logpsrÊÚ padded_batchÚreference_completion_logpÚreference_KL_logpr5rnro¹s6 û   ÿÿ
z'_UnslothKTOTrainer.get_train_dataloaderc s2|dur
|jdur
tdƒ|dur|n|j}|jr|js|jj|j|jj|jjddœ}|j  
t |fi|¤Ž¡}g}g}t |ddD]&}| 
|¡\}}|j  |¡}| | ¡¡|jrg|j  |¡}| | ¡¡qA|jdt |¡ ¡ ¡d}|jr‡|jd t |¡ ¡ ¡d}|jdur||_d
|_tƒj|d S) 
Returns the evaluation [`~torch.utils.data.DataLoader`].
Subclass of transformers.src.transformers.trainer.get_eval_dataloader to precompute `ref_log_probs`.
Args:
eval_dataset (`torch.utils.data.Dataset`, *optional*):
If provided, will override `self.eval_dataset`. If it is a [`~datasets.Dataset`], columns not accepted
by the `model.forward()` method are automatically removed. It must implement `__len__`.
Nz-Trainer: evaluation requires an eval_dataset.Fr¿z Eval dataset reference log probsrÄTrz)rDr'r˜rBrFr}rrErcrrDr1Úget_eval_dataloader) r3rDr×r5rnroås@  û   ÿÿ
z&_UnslothKTOTrainer.get_eval_dataloaderr×c Ct ¡²|jdurg| ¡P|jr<|j|d|d| d¡|ddj}|jr;|j|d|d| d ¡|d
dj}n|j|d |d d
j}|jrW|j|d|dd
j}Wdƒn1sawYnH|jr”|j|d|d| d¡|ddj}|jr“|j|d|d| d ¡|d
dj}n|j|d |d d
j}|jr¯|j|d|dd
j}Wdƒn1s¹wY|j ||dd|j|j
d}|jrá|j ||d
d|j|j
d}||fSd}||fS)zfComputes log probabilities of the reference model for a single padded batch of a KTO specific dataset.NÚprompt_input_idsÚprompt_attention_maskÚcompletion_decoder_input_idsÚcompletion_labels)Úattention_maskÚdecoder_input_idsÚlabelsÚKL_prompt_input_idsÚKL_prompt_attention_maskÚKL_completion_decoder_input_idsÚKL_completion_labelsÚcompletion_input_idsÚcompletion_attention_mask)ÚKL_completion_input_idsÚKL_completion_attention_maskF©Úaverage_log_probr%r!) rDÚno_gradrAr%r—r†rerÚget_batch_logpsr!)r3r×Úcompletion_logitsÚ KL_logitsÚcompletion_logpsÚKL_logpsrnrnro


üûüûþýþýéüûüû ÿþþýÍ8ûû
þz._UnslothKTOTrainer.compute_reference_log_probsFr•rer!r%cC|jdd|jkrtdƒ|s*|ddddf ¡}|ddddddf}n| ¡}||k}d|||k<t||ƒ}|rK|| d¡| d¡S|| d¡S)aCompute the log probabilities of the given labels under the given logits.
Args:
logits:
Logits of the model (unnormalized). Shape: (batch_size, sequence_length, vocab_size)
labels:
Labels for which to compute the log probabilities. Label tokens with a value of label_pad_token_id are
ignored. Shape: (batch_size, sequence_length)
average_log_prob:
If True, return the average log probability per (non-masked) token. Otherwise, return the sum of the
log probabilities of the (non-masked) tokens.
Returns:
A tensor of shape (batch_size,) containing the average/sum log probabilities of the given labels under the
given logits.
NrSzKLogits (batch and sequence length dim) and labels must have the same shape.rXr)r[ÚclonerB)rer!r%Ú loss_maskrmrnrnroks 
z"_UnslothKTOTrainer.get_batch_logpsÚbatchcs"| |ˆ¡}|jrˆdˆ d¡dœni}|jrd|d<|ˆdfdˆdi|¤Ž}|j}|j|ˆdd |j|jd
}|jd tˆd ƒkrJt d
ƒfddt
|jd ƒDƒ}fddt
|jd ƒDƒ} ||df}
|| df} ||df} || df}
|jrŠ|
| | |
||j fS|
| | |
|fS)NrÞ©TreFrêrr{z‡There is a mismatch between the number of examples in this batch and the number of examples for which an output sequence was predicted.có g|] }ˆd|dur|qS©r{Trn©rtÚrnrorv¾ó z._UnslothKTOTrainer.forward.<locals>.<listcomp>c©r{Frnrnrorv¿.) Ú_compute_kl_logpsr%r†rer!r[r ÚrangeÚaux_loss)r3r—Ú model_kwargsÚoutputsrîÚ
chosen_idxÚ rejected_idxÚ chosen_logpsÚrejected_logpsÚ
chosen_logitsÚrejected_logitsrnroÚforward™sL üþúÿþýûÿ    z_UnslothKTOTrainer.forwardÚpolicy_chosen_logpsÚpolicy_rejected_logpsÚpolicy_KL_logpsÚreference_chosen_logpsÚreference_rejected_logpsrÊcC|jr|| ¡ ¡}|j |¡ ¡jdd}n t d¡ |j ¡}|j
ddks/|j
ddkr\||}|j dkrEdt  
|j||¡} n|j dkrTdt  
|j|¡} |j| ¡}
nt g¡ |jj ¡} t g¡ |jj ¡}
|j
ddks~|j
ddkr©||} |j dkr”dt  
|j|| ¡} n
|j dkr¡t  
|j| ¡} |j|  ¡}
nt g¡ |jj ¡} t g¡ |jj ¡}
t |j| |j| fd¡}||
|
|fS)avCompute the KTO loss for a batch of policy and reference model log probabilities.
Args:
policy_chosen_logps:
Log probabilities of the policy model for the chosen responses. Shape: (num(chosen) in batch_size,)
policy_rejected_logps:
Log probabilities of the policy model for the rejected responses. Shape: (num(rejected) in batch_size,)
policy_KL_logps: Log probabilities of the policy model for the KL responses. Shape: (batch_size,)
reference_chosen_logps:
Log probabilities of the reference model for the chosen responses. Shape: (num(chosen) in batch_size,)
reference_rejected_logps:
Log probabilities of the reference model for the rejected responses. Shape: (num(rejected) in
batch_size,)
reference_KL_logps: Log probabilities of the reference model for the KL responses. Shape: (batch_size,)
Returns:
A tuple of four tensors: (losses, chosen_rewards, rejected_rewards, KL). The losses tensor contains the KTO
loss for each example in the batch. The chosen_rewards and rejected_rewards tensors contain the rewards for
the chosen and rejected responses, respectively. The KL tensor contains the detached KL divergence estimate
between the policy and reference models.
r©ÚminrXr”ra)rÚmeanÚdetachr}ÚclamprDÚzerosr]Údevicer[rrÚsigmoidrrrr )r3r r
r r r
ÚklÚchosen_logratiosÚ
chosen_lossesÚchosen_rewardsÚrejected_logratiosÚrejected_lossesÚrejected_rewardsÚlossesrnrnroÚkto_lossÌs6



þ z_UnslothKTOTrainer.kto_losscCd}|jrL|jr|d|d|d| d¡dœ}n |d|dd œ}t ¡|d i|¤Žj}Wdƒn1s9wY|j||dd
|j|jd }|S)
z/Compute KL log probabilities for a given batch.Nrâ)Ú input_idsrß)rFrêrn)rr%r†rDrer!)r3r—ÚKL_model_kwargsrïrnrnros,üþ
ÿûz$_UnslothKTOTrainer._compute_kl_logpsc C| ||¡}| |j|¡}|jr%|| ¡ ¡}|j |¡ ¡jdd}n
t  d¡ 
|jj ¡}|j r<|d| 
d¡dœni}|jrEd|d<|j r| ¡|d f|d
dd œ|¤Ž}| ¡d|d |jd
dœ|¤Ž}|j ¡|d f|d
dd œ|¤Ž} |j ¡d|d | jd
dœ|¤Ž}
nCt|dƒr—| ¡} nt||jjƒ} | |d f|d
d
dœ|¤Ž}t|jdƒr¹|j ¡} nt|j|jjƒ} | |d f|d
d
dœ|¤Ž}
| ¡}
|j ¡}|j|j sé|jddddfn|j|
j|dddddft|
dƒr|
jndtj|dtjd 
|jj ¡|j s|
jddddfn|j|jt|
dƒr,|jnd|d \}\}}}}}}||||||||dœ}|jrM|j|d<|S)a!
Compute the KTO loss using the Liger-Kernel's LigerFusedLinearKTOLoss.
Args:
model:
The policy model used for generating log probabilities and outputs. It could be an encoder-decoder
model or a regular language model.
batch: A dictionary containing the input data and labels for the batch.
Returns:
A dictionary containing the following keys:
- "loss": The computed KTO loss for the batch.
- "chosen_logits_sum": Sum of the logits for the chosen responses from the policy model.
- "rejected_logits_sum": Sum of the logits for the rejected responses from the policy model.
- "chosen_logps": Log probabilities of the chosen responses from the policy model.
- "rejected_logps": Log probabilities of the rejected responses from the policy model.
- "chosen_rewards": Rewards for the chosen responses.
- "rejected_rewards": Rewards for the rejected responses.
- "kl": The KL divergence between the policy and reference models (detached).
If auxiliary loss is enabled, the dictionary will also include:
- "aux_loss": The auxiliary loss from the model outputs.
rrrXTre)Ú return_dictràF)rÚencoder_hidden_statesÚ use_cacheÚ get_decoder)r#NrSÚbiasr{)rˆ) Ú_inputÚ
lin_weightÚtargetr%Úpreference_labelsÚ ref_inputÚ
ref_weightÚref_biasr)ÚlossÚchosen_logits_sumÚrejected_logits_sumÚchosen_logps_sumÚrejected_logps_sumÚchosen_rewards_sumÚrejected_rewards_sumrrÿrn)rArrrr}rrDrr]rr%r†Ú get_encoderr$Úlast_hidden_staterŒr‡rBr,Úget_output_embeddingsr¬Úweightr%ÚtensorÚboolrÿ)r3r—r rrÚencoder_outputsrÚref_encoder_outputsÚ ref_outputsÚ
base_modelÚref_base_modelÚlm_headÚ ref_lm_headr-r0r1r.r/r2r3r\rnrnroÚ_compute_loss_liger6 üþúÿýüýüÿýü
ýü

ÿýü  ÿýü
 ÿõöø
z&_UnslothKTOTrainer._compute_loss_ligerc s@i}fddˆ ¡Dƒt ˆd¡}| ¡ ˆjj¡}t|ƒ| ˆjj¡}ˆjj rZˆ 
|ˆ¡}|d}|d} |d}
|d} |d} |d }
|d
}|d }ˆj rY|d }n³ˆ  |ˆ¡}|d
d\} } } }
}ˆj rr|d}dˆvr±‡fddt
ˆdjdƒDƒ}fddt
ˆdjdƒDƒ}ˆd|df}ˆd|df}ˆjr®ˆd}nQd
}nNt ¡Bˆjd
uràˆ ¡ˆ  ˆjˆ¡d
d\}}}}}Wd
ƒn1sÚwYnˆ  ˆjˆ¡d
d\}}}}}Wd
ƒn1súwYˆ | | ||||¡\}}
}}| ¡|d <ˆj |¡ ¡ ¡}ˆj |¡ ¡ ¡}|dkrZˆj |
 ¡¡ ¡ ¡|d<ˆj |  ¡¡ ¡ ¡|d<ˆj |  ¡¡ ¡ ¡|d<||d<|dkrˆj | ¡¡ ¡ ¡|d<ˆj |  ¡¡ ¡ ¡|d<ˆj |
 ¡¡ ¡ ¡|d<||d<| ¡}ˆj rœ|ˆj|7}||fS)zWCompute the KTO loss and other metrics for the given batch of inputs for train or test.cs0i|]\}}|t|tjƒr| ˆjj¡n|qSrn)r„rDrr]r}r©rtÚvr¼rnroÚ
<dictcomp>Çs0z=_UnslothKTOTrainer.get_batch_loss_metrics.<locals>.<dictcomp>r{r-r.r/r0r1r2r3rrÿcrnrnrorvçz=_UnslothKTOTrainer.get_batch_loss_metrics.<locals>.<listcomp>rcrnrnrorvè.rÊzrewards/chosen_sumzlogps/chosen_sumúlogits/chosen_sumz count/chosenzrewards/rejected_sumzlogps/rejected_sumúlogits/rejected_sumzcount/rejected)ÚitemsrDr8r]r}rr rBr+rArr[rrAr—rÚitemrÍÚnansumÚnanmeanr)r3r—ÚmetricsráÚ
num_chosenÚ num_rejectedÚ model_outputrÚpolicy_chosen_logitsÚpolicy_rejected_logitsr r
rrrrÿÚforward_outputr rrr r
Úall_num_chosenÚall_num_rejectedr-rn)r3roÚget_batch_loss_metricsÀ  
ú  



úúúð ú 
ÿÿÿ
ÿÿÿz)_UnslothKTOTrainer.get_batch_loss_metricsÚinputscCs|jr
t|jjjƒntƒ}|| ||¡\}}Wdƒn1s"wY| |jj¡}|jj r9|j
|dd|r?||fS|S)train©Ú
train_eval) r%r}rrr:rWr]rBÚis_main_processÚ
store_metrics)r3r—rXÚreturn_outputsÚnum_items_in_batchÚcompute_loss_context_managerr-rMrnrnroÚ compute_loss0sÿÿz_UnslothKTOTrainer.compute_lossrYrMr[)rYÚevalcCs*| ¡D]\}}|j|| |¡qdSrb)rIr™rc)r3rMr[ÚkeyÚvaluernrnror]Hsÿz _UnslothKTOTrainer.store_metricsÚdatasetcCs*|dur|j}|dust|ƒsdSt|ƒSrb)rCr-r)r3rernrnroÚ_get_train_samplerLs
z%_UnslothKTOTrainer._get_train_samplerc Cs:|jr
t|jjjƒntƒ}|`|j|d|d|jd|jj d}d|vr*|d}n>|j
durV|  ¡|j j|d|d|jd|jj d}Wdƒn1sPwYn|j
j|d|d|jd|jj d}Wdƒn1srwYt
||j|jj ƒ}|jj|dd}t
||j|jj ƒ}|jj|dd}||fS)zRGenerate samples from the model and reference model for the given batch of inputs.rÛT)rrÚ do_sampler`Úreference_outputN)Úskip_special_tokens)r%r}rrr:ÚgeneraterrEr`rAr—r<Ú batch_decode)r3r—Úgenerate_context_managerÚ
policy_outputrhÚpolicy_output_decodedÚreference_output_decodedrnrnroÚgenerate_from_model_and_refSsJÿû


ûÿ ûéz._UnslothKTOTrainer.generate_from_model_and_refr Ú ignore_keysc s>ˆdurt|dƒrt|jdgƒng|jrt|jjjƒntƒ}t  
¡"||  ||¡\}}Wdƒn1s:wYWdƒn1sIwY|jj rY|j
|dd|rb| ¡ddfSi}d|vrn|d|d<d|vrx|d|d<fd d
| ¡Dƒ} t j| |jjd } t j| jd |jjd }
| ¡| |
fS)
NrÚkeys_to_ignore_at_inferencerbrZrGzeval_logits/chosenrHzeval_logits/rejectedcsg|]
\}}|ˆvr|qSrnrnrB©rqrnrorv£sz6_UnslothKTOTrainer.prediction_step.<locals>.<listcomp>)rr)r‡rr%r}rrr:rDrWr\r]rrIr8rr[) r3r—rXr rqÚprediction_context_managerr-rMÚ logits_dictrernrsroÚprediction_stepƒs0
ÿÿ  z"_UnslothKTOTrainer.prediction_steprbÚ
dataloaderÚ descriptionÚmetric_key_prefixcs$|jr†t|jƒ}tjt|ƒ|jjd}|j |¡}|  |¡} | 
| ¡} t j | dt j
|jjd}
t  |
¡d} | d| | d| t| Ž| dƒdœ} | |j| ¡\}
}tjgd ¢d
d t| d|
|ƒDƒd }d
|jjvrzt dtj|di¡d|jjvr†td|dtƒ |||||¡}|S)
Overriding built-in evaluation loop to store metrics for each batch. Prediction/evaluation loop, shared by
`Trainer.evaluate()` and `Trainer.predict()`.