Files
DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/__pycache__/UnslothKTOTrainer.cpython-311.pyc
T

470 lines
94 KiB
Plaintext
Raw Normal View History

2025-08-13 23:50:20 +00:00
§
4$h߃ãó”dZddlmZddlZddlmZddlmZddlmZm Z m
Z
m Z m Z m
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZmZmZm Z m!Z!m"Z"m#Z#m$Z$m%Z%m Z m&Z&m'Z'm(Z(m)Z)m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0m1Z1m2Z2m3Z3m4Z4m5Z5m6Z6m7Z7m8Z8m9Z9m:Z:m;Z;mZm<Z<m=Z=m>Z>m?Z?m@Z@mAZAmBZBmCZCmDZDmEZEmFZFmZmGZGmHZHmZm
Z
mZmZm#Z#m5Z5m>Z>mZddl>Z>ddlTddlImJZJmKZKdd lLmMZMddlZddlNZ<dd
lOm=Z=ddlmZdd lPmQZQmRZSd d
d d
d
dœZTejUd d eT¬¦«d¦«ZVeJGdde¦«¦«ZW Gdde#¦«ZXGddeX¦«ZYdS)z8
2025.8.4
2025.8.5
4.55.1
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)GrÚAutoModelForCausalLMÚBaseImageProcessorr ÚDPODataCollatorWithPaddingÚ DataCollatorÚ
DataLoaderÚDatasetÚEvalLoopOutputÚFeatureExtractionMixinÚ KTOConfigÚ
KTOTrainerÚLiteralrÚ PartialStateÚPathÚ PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚSequentialSamplerÚTrainerÚTrainerCallbackÚTrainingArgumentsr Ú_get_kl_datasetÚ_process_tokensÚ _tokenizeÚautocastÚconcatenate_datasetsÚcontextmanagerÚcreate_reference_modelÚ defaultdictÚdisable_dropout_in_modelÚgenerate_model_cardÚget_comet_experiment_urlÚ
has_lengthÚinspectÚis_comet_availableÚis_liger_kernel_availableÚis_peft_availableÚis_wandb_availableÚ
itemgetterÚlog_table_to_comet_experimentÚmaybe_apply_chat_templateÚmaybe_extract_promptÚmaybe_unpair_preference_datasetÚnnÚnpÚ nullcontextÚosÚ
pad_to_lengthÚpdÚpeft_module_casting_to_bf16Úprepare_deepspeedÚprepare_model_for_kbit_trainingÚrandomÚselective_log_softmaxÚtextwrapÚtorchÚtqdmÚwarningsrrrrr r2r<rE)Ú*)Ú dataclassÚfield)ÚVersion)r;)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚ max_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ fullgraphÚoptionscó’tj| d|jd¦«dd¬¦«}tj| d¦«dd¬¦«}g}t ||¦«D]\}}| tj¦«}tj|d| d¦«¬¦«  d¦«}tj
|d¬¦«}||z
} |  | ¦«Œ’ tj |¦«}| |jd|jdf¦«}|S)Néÿÿÿÿér)ÚchunksÚdim)rXÚindex)rXé)
rEÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ unsqueezeÚsqueezeÚ logsumexpÚappendÚconcat)
ÚlogitsrYÚchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚ chunk_logitsÚ chunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logpss
ú]/workspace/Fine-tuning/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothKTOTrainer.pyÚchunked_selective_log_softmaxrq"s5õ”[ §¢°°F´LÀÔ4DÑ!EÔ!EÐPQÐYZÐ[€NÝ”[ §¢¨rÑ!2Ô!2¸QÀaÐH€MØÐå%(¨¸Ñ%GÔ%Gð #—¥u¤}Ñ Ýœ, |¸2À{×G\ÒG\Ð]_ÑG`ÔG`Ða×iÐjlÑmˆÝ œ?¨<¸Ø)Ð,<Ñ<ˆØ×" Ýœ,Ð':ÑØ-×5°v´|ÀA´ÈÌ ÐUVÌÐ6XÑØ Ðócó¸eZdZUdZedddi¬¦«Zeeed<edddi¬¦«Z ee
ed < d0ˆfd/„ Z ˆxZ S)1ÚUnslothKTOConfiguÐ
Configuration class for the [`KTOTrainer`].
This class includes only the parameters that are specific to KTO training. For a full list of training arguments,
please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may
differ from those in [`~transformers.TrainingArguments`].
Using [`~transformers.HfArgumentParser`] we can turn this class into
[argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
command line.
Parameters:
max_length (`int` or `None`, *optional*, defaults to `1024`):
Maximum length of the sequences (prompt + completion) in the batch. This argument is required if you want
to use the default data collator.
max_prompt_length (`int` or `None`, *optional*, defaults to `512`):
Maximum length of the prompt. This argument is required if you want to use the default data collator.
max_completion_length (`int` or `None`, *optional*, defaults to `None`):
Maximum length of the completion. This argument is required if you want to use the default data collator
and your model is an encoder-decoder.
beta (`float`, *optional*, defaults to `0.1`):
Parameter controlling the deviation from the reference model. Higher β means less deviation from the
reference model.
loss_type (`str`, *optional*, defaults to `"kto"`):
Type of loss to use. Possible values are:
- `"kto"`: KTO loss from the [KTO](https://huggingface.co/papers/2402.01306) paper.
- `"apo_zero_unpaired"`: Unpaired variant of APO-zero loss from the
[APO](https://huggingface.co/papers/2408.06266) paper.
desirable_weight (`float`, *optional*, defaults to `1.0`):
Desirable losses are weighed by this factor to counter unequal number of desirable and undesirable paris.
undesirable_weight (`float`, *optional*, defaults to `1.0`):
Undesirable losses are weighed by this factor to counter unequal number of desirable and undesirable pairs.
label_pad_token_id (`int`, *optional*, defaults to `-100`):
Label pad token id. This argument is required if you want to use the default data collator.
padding_value (`int` or `None`, *optional*, defaults to `None`):
Padding value to use. If `None`, the padding value of the tokenizer is used.
truncation_mode (`str`, *optional*, defaults to `"keep_end"`):
Truncation mode to use when the prompt is too long. Possible values are `"keep_end"` or `"keep_start"`.
This argument is required if you want to use the default data collator.
generate_during_eval (`bool`, *optional*, defaults to `False`):
If `True`, generates and logs completions from both the model and the reference model to W&B or Comet
during evaluation.
is_encoder_decoder (`bool` or `None`, *optional*, defaults to `None`):
When using the `model_init` argument (callable) to instantiate the model instead of the `model` argument,
you need to specify if the model returned by the callable is an encoder-decoder model.
precompute_ref_log_probs (`bool`, *optional*, defaults to `False`):
Whether to precompute reference model log probabilities for training and evaluation datasets. This is
useful when training without the reference model to reduce the total GPU memory needed.
model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the model from a
string.
ref_model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the reference model
from a string.
dataset_num_proc: (`int` or `None`, *optional*, defaults to `None`):
Number of processes to use for processing the dataset.
disable_dropout (`bool`, *optional*, defaults to `True`):
Whether to disable dropout in the model and reference model.
use_liger_loss (`bool`, *optional*, defaults to `False`):
Whether to use Liger loss. It requires liger-kernel to be installed.
base_model_attribute_name (`str`, *optional*, defaults to `"model"`):
Name of the attribute in the model that contains the base model. This is used to get the base model from
the model when the model does not have a `get_decoder` method in the case when `use_liger_loss` is `True`.
helpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsrUz8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksFÚnorVéréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?çlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsrZéôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastéééÚktoéœÿÿÿÚkeep_endÚmodelc ó–|dkrtd|d¦«|dkrtd|d¦«||#dkr
|$dkrd}d }#|‘€!d
d lm}—t |—¦«d zd ¦«}‘t ¦«jd id
|d|d|d|d|d|d|d|d| “d|
d| d| d|
d|d|d|d|d|d|d |d!|d"|d#|d$|d%|d&|d'|d(|d)|d*|d+|d,| “d-|!“d.|"“d/|#“d0|$“d1|%“d2|&“d3|'“d4|(“d5|)“d6|*“d7|+“d8|,“d9|-“d:|.“d;|/“d<|0“d=|1“d>|2“d?|3“d@|4“dA|5“dB|6“dC|7“dD|8“dE|9“dF|:“dG|;“dH|<“dI|=“dJ|>“dK|?“dL|@“dM|A“dN|B“dO|C“dP|D“dQ|E“dR|F“dS|G“dT|H“dU|I“dV|J“dW|K“dX|L“dY|M“dZ|N“d[|O“d\|P“d]|Q“d^|R“d_|S“d`|T“da|U“db|V“dc|W“dd|X“de|Y“df|Z“dg|[“dh|\“di|]“dj|^“dk|_“dl|`“dm|a“dn|b“do|c“dp|d“dq|e“dr|f“ds|g“dt|h“du|i“dv|j“dw|k“dx|l“dy|m“dz|n“d{|o“d||p“d}|q“d~|r“d|s“d€|t“d|u“d|v“dƒ|w“d„|x“d…|y“d†|z“d‡|{“dˆ||“d‰|}“dŠ|~“d|dŒ|€“d|dŽ|‚“d|ƒ“d|„“d‘|…“d’|†“d“|‡“d”|ˆ“d•|‰“d–|Š“d—|‹“d˜|Œ“d™|dš|Ž“d›|dœ|d|‘“dž|’“dŸ|““|–¤Ž|”|_|•|_dS)¡NgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!rZza` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rˆr‰Úunsloth_training_checkpointsrzr)Ú cpu_countr{Ú
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚ weight_decayÚ
adam_beta1Ú
adam_beta2Ú adam_epsilonÚ
max_grad_normÚnum_train_epochsÚ max_stepsÚlr_scheduler_typeÚ warmup_ratioÚ warmup_stepsÚ log_levelÚlog_level_replicaÚlog_on_each_nodeÚ logging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚ ddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚ disable_tqdmÚremove_unused_columnsÚ label_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚ fsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ adafactorÚgroup_by_lengthÚlength_column_nameÚ report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚ push_to_hubÚresume_from_checkpointÚ hub_model_idÚ hub_strategyÚ hub_tokenÚhub_private_repoÚhub_always_pushÚ hub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚ fp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚ torchdynamoÚ ray_scopeÚ ddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚ
max_lengthÚmax_prompt_lengthÚmax_completion_lengthÚbetaÚ loss_typeÚdesirable_weightÚundesirable_weightÚlabel_pad_token_idÚ
padding_valueÚtruncation_modeÚgenerate_during_evalÚis_encoder_decoderÚdisable_dropoutÚprecompute_ref_log_probsÚmodel_init_kwargsÚref_model_init_kwargsÚdataset_num_procÚuse_liger_lossÚbase_model_attribute_name©) ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingrœÚminÚsuperÚ__init__rxry)™Úselfrr r­r¿rÿrrrrrrrrrr r
r r r
rrrrrrrrrrrrrrrrrrr r!r"r#r$r%r&r'r(r)r*r+r,r-r.r/rxryÚkwargsrœÚ __class__s™ €rpr6zUnslothKTOConfig.__init__sg
ø€ðr ˜ Ð Õ'9ð;VÐ]jð;Vð;Vð;Vñ(Wô(Wð"WØ ˜ Ð ¥Mð3FÐUbð3Fð3Fð3Fñ%Gô%GðGØ Ð  -°7Ò":Ð":¸zÈSÒ?PÐ?PØ7ˆ ˆ Ð " 9 9¡;¤;¨q¡=°!Ñ àŒÔðS LðS LðS LØ#˜ðS Là#7Ð#7ðS Lð S Lðgð S Lð
$˜ð S Lð *˜
S Lð$8Ð#7ðS Lð+FÐ*EðS Lð*DÐ)CðS Lð(@Ð'?ðS Lð'>Ð&=ðS Lð+FÐ*EðS Lð'>Ð&=ðS Lð$˜ðS Lð'>Ð&=ðS Lð *˜Mð!S Lð"(˜<ð#S Lð$$˜ð%S Lð&$˜ð'S Lð((˜<ð)S Lð**˜Mð+S Lð,/ð-S Lð."˜ ð/S Lð0!2Ð 1ð1S Lð2(˜<ð3S Lð4(˜<ð5S Lð6"˜ ð7S Lð8!2Ð 1ð9S Lð:/ð;S Lð<&˜+ð=S Lð>/ð?S Lð@"4Ð!3ðAS LðB*˜MðCS LðD&<Ð%;ðES LðF*˜MðGS LðH$˜ðIS LðJ/ðKS LðL/ðMS LðN!2Ð 1ðOS LðP.˜oðQS LðR7^Ð6]ðSS LðTgðUS LðVgðWS LðX,˜^ðYS LðZ4ð[S Lð\"˜ ð]S Lð^*˜Mð_S Lð` xðaS Lðb4ðcS Lðd4ðeS Lðf,˜^ðgS Lðh&<Ð%;ðiS Lðj,˜^ðkS Lðl,˜^ðmS Lðn4ðoS Lðp$˜ðqS Lðr&˜+ðsS Lðt*˜MðuS Lðv!2Ð 1ðwS LðxEðyS Lðz$8Ð#7ð{S Lð|$˜ð}S Lð~&<Ð%;ðS Lð@*DÐ)CðAS LðB$˜ðCS LðD xðES LðF(˜<ðGS LðH%:Ð$9ðIS LðJ&˜+ðKS LðL&<Ð%;ðMS LðN%:Ð$9ðOS LðP!2Ð 1ðQS LðR/ðSS LðT4ðUS LðV#6Ð"5ðWS LðX&˜+ðYS LðZ2TÐ1Sð[S Lð\"4Ð!3ð]S Lð^"˜ ð_S Lð`&<Ð%;ðaS LðbEðcS Lðd$˜ðeS Lðf"˜ ðgS Lðh.˜oðiS Lðj"4Ð!3ðkS Lðl"˜ ðmS Lðn*DÐ)CðoS Lðp!2Ð 1ðqS Lðr%:Ð$9ðsS Lðt%:Ð$9ðuS Lðv-JÐ,IðwS Lðx#6Ð"5ðyS Lðz*DÐ)Cð{S Lð|&˜+ð}S Lð~&<Ð%;ðS Lð@(˜<ðAS LðB(˜<ðCS LðD"˜ ðES LðF/ðGS LðH.˜oðIS LðJ(˜<ðKS LðL&<Ð%;ðMS LðN-JÐ,IðOS LðP*DÐ)CðQS LðR&<Ð%;ðSS LðT(˜<ðUS LðV$8Ð#7ðWS LðX(@Ð'?ðYS LðZ!2Ð 1ð[S Lð\*˜Mð]S Lð^$8Ð#7ð_S Lð`/ðaS Lðb&˜+ðcS Lðd"˜ ðeS Lðf&˜+ðgS Lðh*˜MðiS Lðj%:Ð$9ðkS Lðl"4Ð!3ðmS Lðn)BÐ(AðoS Lðp-JÐ,IðqS Lðr#6Ð"5ðsS Lðt$8Ð#7ðuS Lðv"4Ð!3ðwS Lðx*˜MðyS Lðz/ð{S Lð|#6Ð"5ð}S Lð~&<Ð%;ðS Lð@-JÐ,IðAS LðB$˜ðCS LðD!2Ð 1ðES LðF%:Ð$9ðGS LðH4ðIS LðJ"˜ ðKS LðL/ðMS LðN"4Ð!3ðOS LðP"4Ð!3ðQS LðR*˜MðSS LðT.˜oðUS LðV$8Ð#7ðWS LðX"4Ð!3ðYS LðZ.˜oð[S Lð\(@Ð'?ð]S Lð^!2Ð 1ð_S Lð`%:Ð$9ðaS Lðb/ðcS Lðd,˜^ðeS Lðf)BÐ(AÀFðgS LðS LðS Lðh%9ˆÔ!Ø"4ˆÔÐÐrr)•NNFFFrzFrVrVNNr{r{rr|r}r~rr€rrrUr„r…rr†r‡TNrˆFrZFrˆr‰NTFFFFFFrŠFFFFrFFNrUNNFrFNrNrUNNTNFNNFrrNNNNrŽrNFFrNNNNTFTFFNNrNNFNFNFTrŒNNNrTFNrr“FNNFFNNFFFNFTr”r•Nr…rrrr—Nr˜FNTFNNNFr™NrU)
Ú__name__Ú
__module__Ú __qualname__Ú__doc__rJrxrrÚ__annotations__ryÚintr6Ú
__classcell__©r9s@rprtrt3ø€ðCðCðH+0¨%ØØÐ+ñ+ô+И( 3œ-ððñð*/¨ØØÐ*ñ*ô*И #œððñð ØØØØØ$Ø&'Ø%&Ø#'Ø"&Ø&'Ø"#ØØ"%ØØØØØØØØØØØØØØØ!&ØØØØØØ27ØØØØØØØØØØØ!'ØØØØØØØØØ!"Ø%)ØØØØ $ØØ!&Ø $Ø Ø ØØØØ-1ØØ!$ØØØØØØ%)Ø Ø $Ø $Ø(-Ø"Ø%*ØØ!%ØØØØØØ!&Ø(,Ø%*Ø!%ØØ#Ø#'Ø ØØ ØØØØØ $Ø!Ø$)Ø(-ØØ Ø"Ø!&Ø(,ØØØ $ØØØØ ØØØ#(Ø Ø $ØØØ$+Øðmwwwwwwwwww5rrrtc óðeZdZdZddgZ dJdeeeje fde
eeeje fde d e
e d
e
ee e
e e ffd e
eeeeefd e
ed
e
egefde
eedeejjejjjfde
eejejgejfde
e
de
eege
fde
e de
e fˆfd
Zed¦«Z de!fˆfd Z"dKd
e
e de!fˆfd
Z#de
de
fdZ$e% dLdej&dej'd e(d!e)d"e(dej&f d#„¦«Z*dejd$e
e eeej'ffdeej&ej&ej&ej&ffd%„Z+d&ej&d'ej&d(ej&d)ej&d*ej&d+ej&deej&ej&ej&ej&ffd,„Z,d-„Z-d.„Z.d$e
e eeej'fffd/„Z/ dMdeeejfd0e
e eeje0ffdeejeeje
e ejffffd1„Z1dNd3e
e e2fd4e3d5ddfd6„Z4dKd7e
e de
ej5j6j7fd8„Z8d$e
e ej'fdee e ffd9„Z9 dKdeeejfd0e
e eeje0ffd:e(d;e
ee fd<„Z: dOd>e!d?e d:e
e(d;e
ee d@e def ˆfdA„
Z;dKdBe
e e2fdCe
e2ddfˆfdD„
Z<ˆfdE„Z= dPdFe
e dGe
e dHee ee dffdI„Z>ˆxZ?S)QÚ_UnslothKTOTrainerrÚtrlrNNr™Ú ref_modelÚargsÚ
train_datasetÚ eval_datasetÚprocessing_classÚ
data_collatorÚ
model_initÚ callbacksÚ
optimizersÚpreprocess_logits_for_metricsÚ peft_configÚcompute_metricsÚmodel_adapter_nameÚref_adapter_namec
ó˜t|¦«turtd¦«t|t¦«s||urtd¦«|ji}nªt|t¦«std¦«|j}| d¦«}|ht|t¦«r|dkrtt|¦«}|dkr-t|tj ¦«std|d¦«||d<|j
i}nªt|t¦«std¦«|j
}| d¦«}|ht|t¦«r|dkrtt|¦«}|dkr-t|tj ¦«std|d¦«||d<t|t¦«rtj |fi|¤Ž}t|t¦«rtj |fi|¤Ž}d |_
t¦«s| td
¦«t¦«r5| 2t|t¦«r| ¦«}t|d d ¦«st|d d ¦«r`t#|d
¦«o,d
t%t'jt*¦«j¦«v}d|ji}|r
|j|d
<t+|fi|¤Ž}nV|jrOt#|d¦«r| ¦«n*d}| ¦« |¦«|}|jr't|d d ¦«rt;|¦«d|_
nV|jrOt#|d¦«r| ¦«n*d}| ¦« |¦«|jr+t?¦«stA¦«std¦«||j!j"|_"n"|j"td¦«|j"|_"t¦«ot|t¦«|_#||_$||_%|r||_&n*|j#s|j'rd|_&ntQ|¦«|_&|td¦«|j)tUj+dtX¦«d}|j)|j)}|j-tUj+dtX¦«d}|j-|j-}d}|j.€#|j"rtUj+dtX¦«d}|j.|j"r|j.}|€Qt_|j0|j1|j"¬¦«}|j2r!d |_2tUj+dtX¦«d|_3nd |_3|j4r*tk|¦«|j&tk|j&¦«|j6|_6||_)|j|_|j1|_1|j7|j7n|j0|_7||_-|j8|_8||_.||_9|j'|_'d|_:|j6dvrd |_:d |_;d |_<t{d¦«|_>|j?|_?|j@|_@|jA|_At|j!dd ¦«|_Bt|j!d d!¦«|_C|jBr%|jCd!krtUj+d"tX¦«d|jDd#<t¦« F¦«5 Gt|jId$¬%¦«Št•|jId&¬'¦«Š Gtd(|i|jId)¬*¦«Š^‰ Gt|jId+¬%¦«Št•|jId,¬'¦«Š Gtd(|i|jId-¬*¦«Š Gt˜dd(|j9i|jId.¬/¦«Šd0|j"|j9|j)|j8|j1|j-|j.d1œ} G||jId2¬*¦«ŠN‰ Gt˜d(|j9id|jId3¬4¦«Š G||jId5¬*¦«Š|j:r|jNd6krtd7¦« Gd|jN|jId8¬9¦«}d:|d;<| G||jIˆfd<„|jPD¦«d=¬>¦«}|gd6¬?¦«Šq‰ Gd|jN|jId@¬9¦«}| G||jIˆfdA„|jPD¦«dB¬>¦«}|gd6¬?¦«ŠdC¦«d6¦«}dC¦«|z
d6¦«}||krÍt«||jAz|z d6zdD¦«}||jAz|z dEzdD¦«}||j@z|z dEz dD¦«} t«||j@z|z d6z dD¦«}!||j@cxko|knc}"| |jAcxko|!knc}#|"s)|#s'tUj+dF|dG|dH| dG|!dI tX¦«ddd¦«n #1swxYwYt­¦« W||||||
| |
| ¬J¦ « d |_Xt#|jYdK¦«r|jY Z|j[¦«t#|dL¦«sdM¦«|j]r0|j^j_j`jadNkr|j'rtdO¦«|j&|j#s|j'stdP¦«nM|j]r tÅ|j&|j^¦«|_&n&|j^ c|j&d¬Q¦«|_&|jdjer¦«sdR¦«|j6dvrtdS¦«|j'rtdT¦«|j#s|j%tdU¦«|j1|j?|j&du¬V¦«|_idSdS)WNz1Please use `KTOConfig` instead TrainingArguments.zœ`model` and `ref_model` cannot be the same object. If you want `ref_model` to be the same as `model`, you must mass a copy of it, or `None` if you use peft.zRYou passed model_kwargs to the KTOTrainer. But your model is already instantiated.Ú torch_dtyperŒznInvalid `torch_dtype` passed to the KTOConfig. Expected a string with either `torch.dtype` or 'auto', but got ú.zZYou passed ref_model_kwargs to the KTOTrainer. But your ref_model is already instantiated.FzŽPEFT is not installed and you passed a `peft_config` in the trainer's kwargs, please install it with `pip install peft` to use the PEFT modelsÚis_loaded_in_8bitÚis_loaded_in_4bitrÚuse_gradient_checkpointingÚenable_input_require_gradscó0| d¦«dS©NT©Úrequires_grad_©ÚmoduleÚinputÚoutputs rpÚmake_inputs_require_gradz=_UnslothKTOTrainer.__init__.<locals>.make_inputs_require_grad'sØ×-¨dÑ3rrTcó0| d¦«dSr\r]r_s rprcz=_UnslothKTOTrainer.__init__.<locals>.make_inputs_require_grad<sØ×)¨$Ñ/rrz`generate_during_eval=True` requires Weights and Biases or Comet to be installed. Please install `wandb` or `comet-ml` to resolve.zMWhen no model is provided, you need to pass the parameter is_encoder_decoder.zdmax_length or a processing_class must be specified when using the default DPODataCollatorWithPaddingz¬When using DPODataCollatorWithPadding, you should set `max_length` in the KTOTrainer's init it will be set to `512` by default, but you should do it yourself in the future.r•z³When using DPODataCollatorWithPadding, you should set `max_prompt_length` in the KTOTrainer's init it will be set to `128` by default, but you should do it yourself in the future.é€zÜWhen using DPODataCollatorWithPadding with an encoder decoder architecture, you should set `max_completion_length` in the KTOTrainer's init it will be set to `128` by default, but you should do it yourself in the future.)Ú pad_token_idr$r(zªWhen using DPODataCollatorWithPadding, you should set `remove_unused_columns=False` in your KTOConfig we have set it for you, but you should do it yourself in the future.)Úapo_zero_unpairedcó*tt¦«S©N)r*Úlistr0rrrpú<lambda>z-_UnslothKTOTrainer.__init__.<locals>.<lambda>­sµ;½tÑ3DÔ3D€rrÚoutput_router_logitsÚrouter_aux_loss_coefrŽa-You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to `0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary loss.Úestimate_tokensz$Extracting prompt from train dataset)Únum_procÚdesczUnpairing train dataset)rpÚ tokenizerz'Applying chat template to train dataset)Ú fn_kwargsrorpz#Extracting prompt from eval datasetzUnpairing eval datasetz&Applying chat template to eval datasetzTokenizing train dataset)Úbatchedrrrorpr
batch_sizerorpÚKL_rtcó&g|]
}|jv¯ |ŒSr0©Ú column_names)Ú.0ÚcrHs €rpú
<listcomp>z/_UnslothKTOTrainer.__init__.<locals>.<listcomp>)s(ø€Ð#pÐ#pÐ#p¨!ÐPQÐUbÔUoÐPoÐPo AÐPoÐPoÐPorrz%Processing tokenized train KL dataset)rrroÚremove_columnsrp)ÚaxiszExtracting eval KL datasetcó&g|]
}|jv¯ |ŒSr0rx)rzr{rIs €rpr|z/_UnslothKTOTrainer.__init__.<locals>.<listcomp>>s(ø€Ð'rÐ'rÐ'r¨aÐSTÐXdÔXqÐSqÐSq¨ÐSqÐSqÐSqrrz$Processing tokenized eval KL datasetÚlabelr{gHáz®Gõ?zìYou have different amounts of desirable/positive and undesirable/negative examples but the weights on the desirable and undesirable losses don't seem to be in an ideal range. Based on your data, we recommend EITHER desirable_weight in [z, z] or undesirable_weight in [zN] (but NOT BOTH). See the documentation on how to optimally set these weights.) r™rGrKrHrIrJrLrQrMrNrOÚadd_model_tagsÚ acceleratorzXYour `Trainer` does not have an `accelerator` object. Consider upgrading `transformers`.ézrYou cannot use `precompute_ref_log_probs=True` with Deepspeed ZeRO-3. Please set `precompute_ref_log_probs=False`.z]No reference model and model is not a Peft model. Try setting `precompute_ref_log_probs=True`)Úevaluation_modezYou set `use_liger_loss=True` but the liger kernel is not available. Please install liger-kernel first: `pip install liger-kernel`znYou cannot set `loss_type='apo_zero_unpaired'` with liger-kernel.Only KTO loss is supported with liger-kernel.znYou cannot use `precompute_ref_log_probs=True` with liger kernel. Please set `precompute_ref_log_probs=False`.zYYou cannot use `use_liger_loss=True` with Peft models. Please set `use_liger_loss=False`.)Ú ignore_indexr Ú
use_ref_model)jÚtyper"Ú
ValueErrorÚ
isinstanceÚstrr+ÚgetÚgetattrrEÚdtyper,r
Úfrom_pretrainedÚ_peft_has_been_casted_to_bf16r2rÚmerge_and_unloadÚhasattrrjr/Ú signaturerAÚ
parametersrrrZÚget_input_embeddingsÚregister_forward_hookrÍr?r'r3r0Úconfigr(Ú
is_peft_modelrRrSrFr*r)rrGÚwarnÚ UserWarningrrrrfr$Úuse_dpo_data_collatorr)r+r!r%r&rJÚ calculate_KLÚ _precomputed_train_ref_log_probsÚ_precomputed_eval_ref_log_probsr*Ú_stored_metricsr r"r#Úaux_loss_enabledÚ
aux_loss_coefÚwarnings_issuedrÚmain_process_firstÚmapr7r-r8r6r%r$r#ryr'ÚmaxÚsumÚlenÚroundr5r6Úmodel_accepts_loss_kwargsr™rÚ
_tag_namesÚAttributeErrorÚis_deepspeed_enabledrÚstateÚdeepspeed_pluginÚ
zero_stager@Ú
prepare_modelrGr.r1Ú ImportErrorÚLigerFusedLinearKTOLossÚ kto_loss_fn)%r7r™rFrGrHrIrJrKrLrMrNrOrPrQrRrSr+rUr,Ú_support_gc_kwargsÚprepare_model_kwargsrcrrrrrÚtrain_kl_datasetÚeval_kl_datasetÚ
num_desirableÚnum_undesirableÚdes_weight_lower_boundÚdes_weight_upper_boundÚund_weight_lower_boundÚund_weight_upper_boundÚdes_weight_in_rangeÚund_weight_in_ranger9s% `` €rpr6z_UnslothKTOTrainer.__init__Às9øøø€õ( ‰:Œ:Õ Ð ˜%¥Ñ ¨)°uÐ*<Ð*<ÝðZñôð
ð
Ô )Ø "РРݘE¥3Ñ
Ð rà $Ô 6Ð Ø
Ñ>ˆÐ˜k­>°KÀ6Ò4IÐ4IÝ")­Ñ"=Ô"= ¸ÌÑ1UÔ1UÐXðJUðXðXðXñôðð4?Ð!  Ô -Ø$&Ð ˜I¥sÑ CÝØôð
ð%)Ô$>Ð /×3°MÑBˆÐ˜k­>°KÀ6Ò4IÐ4IÝ")­Ñ"=Ô"= ¸ÌÑ1UÔ1UÐXðJUðXðXðXñôðð8CÐ% mÑ e  UÝ8¸ÐTÐBSÐTˆEå Ñ  aÝ<¸`ÐJ_Ð`ˆ.3ˆÔ Ñ4 ] {Ð'>Ýðañôð
õÑ
Ô
ñ0 ] [Ñ%<å˜Ñ
×0å1°5Ñ
a½WÀUÐL_ÐafÑ=gÔ=gð
aÝ%,ØÐ&ô&ð&àÝÔ%Õ&EÑ:ô:ðð )EÀdÔFaÐ'bÐoØLPÔLnÐ(Ð)HÑ7¸ÐVÐAUÐVØÔ
aå˜5Ð">ÑaØ××FÐG_шEØŒyð
:W UÐ,?ÀÑ
+¨EÑ2à59Ô2øð
Ô
]å
]Ø××BÐC[Ñ Ô  Õ.@Ñ.BÔ.Bð ÕFXÑFZÔFZð ÝðDñôð
ð
Ð Ø&+¤lÔ&Eˆ
Ô
Ð mà&*Ô&=ˆ QµZÀÅyÑ5QÔ5QˆÔØ"4ˆÔØ 0ˆÔà ð&ˆDŒNˆ
Ô
ð ; 4Ô#@ð!ˆDŒNˆ3°EÑ:ˆDŒNà Ð Øôð
ð Œ?Ð ŒMðdåñ
ô
ð
ð
ˆJØ Œ?Ð œˆJà Ô ŒMðdåñ
ô
ð
ð
!$Ð Ø Ô -Ø $Ô 6Ð à $ÐØ Ô -°$Ô2IÐ ŒMðdåñ
ô
ð
ð
%(Ð Ô 1°dÔ6MÐ 1Ø$(Ô$>Ð Ð Ý:Ø#'Ô#:Ø#'Ô#:ðñôˆMð Ô
Ø-2Ô
ð\åñôðð *.ˆ &à).ˆ  Ô ð $  Œ~ЬќˆŒØŒØ$(Ô$=ˆÔ!Ø"&Ô"9ˆÔØ37Ô3EÐ3Q˜/ÐWgÔWtˆÔØ!2ˆÔØÔØ%:ˆÔ"Ø 0ˆÔØ(,Ô(EˆÔÔØ Œ>Ð 2Ø %ˆ ð16ˆÔ-Ø/4ˆÔ +Ð+DÐ+DÑÔð”IˆŒ Ø $Ô 5ˆÔØ"&Ô"9ˆÔÝ '¨¬ Ð6LÈeÑ TÔ TˆÔÝ$ U¤\Ð3IÈ3ÑÔØ Ô ð  TÔ%7¸3Ò%>Ð%>Ý ŒMðõñ 
ô
ð
ð48ˆÔЉ^Œ^×
S ðS à)×$¨tÔ/DÐKqðôˆ˜tÔ4Ð;TðñôˆMð*×&Ð(8ÐÔôˆ Ð+×(°4Ô3HÐOtð ô  õ  $Ô"7Ð>Vð ñ ô  ð ,×*Ð,<Ð ô  ðØØÔ(=ÐÔ ôˆØ&*Ô&=Ø"œoØ#'Ô#7Ø&*Ô&=Ø%)Ô%;Ø)-Ô)Cð ð ˆØÔôˆÐ+ר*¨DÔ,AÐ Ø  ô  ð ,× ô  ðÔ ñ/
aØÔ3°qÒbñôðð $1×#4Ò#4Ý Ø $5ñ$ô$Ð ð', ˜#Ø#3×#7Ò#7Ý2Ø#pÐ#pÐ#pÐ#pÐ/?Ô/LÐ#pÑ#pÔ#pØ $8ñ$ô$Ð õ!5°mÐEUÐ5VÐ]^Ð _Ñ _Ô _
àÐ+à&2×&6Ò&6Ý'Ø $Ø#'Ô#CØ!%Ô!6Ø '7ñ'ô''6×&9Ò&9Ý'Ø"+Ø!%Ô!6Ø'rÐ'rÐ'rÐ'r°?Ô3OÐ'rÑ'rÔ'rØ ':ñ'ô'$8¸ÀÐ8WÐ^_Ð#`Ñ#`Ô#` ¥ M°'Ô$:Ñ ;Ô ;¸?ˆ!¥# m°GÔ&<Ñ"=Ô"=À
Ñ"MÈqÑQˆ Ò/å).°À$ÔBYÑ0YÐ\iÑ0iÐmnÑ/nÐpqÑ)rÔ)rÐ&Ý).°À$ÔBYÑ0YÐ\iÑ0iÐmqÑ/qÐstÑ)uÔ)uÐ&Ý).°
ÀÔ@UÑ0UÐXgÑ0gÐkoÑ/oÐqrÑ)sÔ)sÐ&Ý).°
ÀÔ@UÑ0UÐXgÑ0gÐklÑ/lÐnoÑ)pÔ)pÐ&à&<ÀÔ@UÐ&oÐ&oÒ&oÐ&oÐYoÒ&oÐ&oÐ&oÐ&oÐ#Ø&<ÀÔ@WÐ&qÐ&qÒ&qÐ&qÐ[qÒ&qÐ&qÐ&qÐ&qÐ Ð/Bð Ý”MðWð1GðWðWðKaðWðWð3Ið WðWðMcð WðWðWõ ôððWS ðS ðS ñS ôS ðS ðS ðS ðS ðS ðS øøøðS ðS ðS ðS õj Œ×ÒØØØØ!Ø*Gð ñ
ô
ð
ð"*/ˆÔ 4”:Ð  ŒJ× % d¤oÑ t˜]Ñ Ý Øôð
ð
Ô  ØÔÔAÀQÒFÈ4ÔKhÐ ðIñôðð Œ>Ð Ô
¨$Ô*Gð
Ý ØôðøðÔ
fÝ!2°4´>À4ÔCSÑ!TÔ!Tà!%Ô!1×!?Ò!?ÀÄÐ`dÐ!?Ñ!eÔ!eð Œ9Ô  Ý
ÝTñôððŒ~Ð!6Ð ðDñôððÔ
Ý ðôððÔ
 TÔ%:Ð%FÝ Øôðõ4¸4¼9ÐUYÔUcÐkoÐUoð ñ ô ˆDÔ Ð Ð ð) ð sÜ4N2k2ë2k6ë9k6c#ózK|jr8|js1|j |j¦« ¦«n
t
¦«5|jr|j |j¦«dV|jr!|j |jpd¦«ddd¦«dS#1swxYwYdS)zWContext manager for handling null reference model (that is, peft adapter manipulation).Nrv) r—rSrÚ unwrap_modelr™Údisable_adapterr;Ú set_adapterrR)r7s rpÚnull_ref_contextz#_UnslothKTOTrainer.null_ref_context¥sèèð
Ô
Ø*.Ô*?ð
ˆDÔ × )¨$¬*Ñ ð Mð Mð
Ô

×& tÔ'<Ñ ˆEˆEˆÔ
MØ
×& tÔ'>Ð'KÀ)Ñ Mð Mð Mñ Mô Mð Mð Mð Mð Mð Mð Mð Møøøð Mð Mð Mð Mð Mð MsÁAB0Â0B4Â7B4Úreturncóø|jrÒ|jsÊ|jj|j|jj|jjddœ}|j t|j
fi|¤Ž¦«}g}g}t|d¬¦«D]£}|  |¦«\}}|j 
|¦«}| | ¦«¦«|jrA|j 
|¦«}| | ¦«¦«Œ¤|j
 dt%j|¦« ¦« ¦«¬¦«|_
|jrW|j
 dt%j|¦« ¦« ¦«¬¦«|_
d|_t-¦« ¦«S) z·
Returns the training [`~torch.utils.data.DataLoader`].
Subclass of transformers.src.transformers.trainer.get_train_dataloader to precompute `ref_log_probs`.
ruÚ
collate_fnÚ num_workersÚ
pin_memoryÚshufflez!Train dataset reference log probs©ÚiterablerpÚreference_logps©ÚnameÚcolumnÚreference_KL_logpsT)r*rGrKrÚpreparerrHrFÚcompute_reference_log_probsÚgather_for_metricsreÚcpurÚ
add_columnrEÚcatÚfloatÚnumpyr5Úget_train_dataloader) r7Údataloader_paramsÚ data_loaderÚreference_completion_logpsrÑÚ padded_batchÚreference_completion_logpÚreference_KL_logpr9s €rpz'_UnslothKTOTrainer.get_train_dataloader³ø€ð Ô ! 9°Ô1Vñ! 9à"œiÔ#œyÔ"œiÔ ð !ð!Ð ðÔ*×2µ:¸dÔ>PÐ3fÐ3fÐTeÐ3fÐ3fÑgˆKØ)+Ð &Ø!#Ð å $¨kÐ@cÐ dÑ dÔ dð
Gð
G Ø?C×?_Ò?_Ð`lÑ?mÔ?mÑ)Ð+<à,0Ô,<×,OÒ,OÐPiÑ,jÔ,jÐ*×1Ð2K×2OÒ2OÑ2QÔ2QÑÔGØ(,Ô(8×(KÒ(KÐL]Ñ(^Ô(^Ð&×-Ð.?×.CÒ.CÑ.EÔ.EÑFøà!%Ô!3×!>Ò!>Ø&­u¬yÐ9SÑ/TÔ/T×/ZÒ/ZÑ/\Ô/\×/bÒ/bÑ/dÔ/dð"?ñ"ô"ˆ ðÔ ð
Ø%)Ô%7×%BÒ%BØ-µe´iÐ@RÑ6SÔ6S×6YÒ6YÑ6[Ô6[×6aÒ6aÑ6cÔ6cð&Cñ&ô&Ô59ˆ ‰wŒw×-rrcó,||jtd¦«||n|j}|jrÇ|js¿|jj|j|jj|jjddœ}|j  
t|fi|¤Ž¦«}g}g}t|d¬¦«D]£}| 
|¦«\}}|j  |¦«}| | ¦«¦«|jrA|j  |¦«}| | ¦«¦«Œ¤| dt'j|¦« ¦« ¦«¬¦«}|jrM| d t'j|¦« ¦« ¦«¬¦«}|j||_d
|_t/¦« |¬ ¦«S) 
Returns the evaluation [`~torch.utils.data.DataLoader`].
Subclass of transformers.src.transformers.trainer.get_eval_dataloader to precompute `ref_log_probs`.
Args:
eval_dataset (`torch.utils.data.Dataset`, *optional*):
If provided, will override `self.eval_dataset`. If it is a [`~datasets.Dataset`], columns not accepted
by the `model.forward()` method are automatically removed. It must implement `__len__`.
Nz-Trainer: evaluation requires an eval_dataset.FrÆz Eval dataset reference log probsrËT)rI)rIrˆr*rrGrKrrrFrerrEr×r5Úget_eval_dataloader)
r7rIr9s
€rpz&_UnslothKTOTrainer.get_eval_dataloaderßs+ø€ð Ð  DÔ$5Ð$=ÝÐ MØ'3Ð'?||ÀTÔEVˆ à Ô $ 8°Ô1Uñ$ 8à"œiÔ#œyÔ"œiÔ ð !ð!Ð ðÔ*×2µ:¸lÐ3`Ð3`ÐN_Ð3`Ð3`ÑaˆKà)+Ð &Ø!#Ð å $¨kÐ@bÐ cÑ cÔ cð
Gð
G Ø?C×?_Ò?_Ð`lÑ?mÔ?mÑ)Ð+<à,0Ô,<×,OÒ,OÐPiÑ,jÔ,jÐ*×1Ð2K×2OÒ2OÑ2QÔ2QÑÔGØ(,Ô(8×(KÒ(KÐL]Ñ(^Ô(^Ð&×-Ð.?×.CÒ.CÑ.EÔ.EÑFøà'×&­u¬yÐ9SÑ/TÔ/T×/ZÒ/ZÑ/\Ô/\×/bÒ/bÑ/dÔ/dðôˆLðÔ ð
Ø+×-µe´iÐ@RÑ6SÔ6S×6YÒ6YÑ6[Ô6[×6aÒ6aÑ6cÔ6cð ô  ð
Ô Ð,Ø$0Ô!Ø37ˆDÔ ‰wŒw×*¸ ÐErrc ó6tj¦«5|j| ¦«5|jrŽ| |d|d| d¦«|d¬¦«j}|jrC| |d|d| d ¦«|d
¬¦«j}nW| |d |d ¬
¦«j}|jr(| |d|d¬
¦«j}ddd¦«n #1swxYwYnì|jrŽ| |d|d| d¦«|d¬¦«j}|jrC| |d|d| d ¦«|d
¬¦«j}nW| |d |d ¬
¦«j}|jr(| |d|d¬
¦«j}ddd¦«n #1swxYwY|  ||dd|j|j
¬¦«}|jr+|  ||d
d|j|j
¬¦«}nd}||fS)zfComputes log probabilities of the reference model for a single padded batch of a KTO specific dataset.NÚprompt_input_idsÚprompt_attention_maskÚcompletion_decoder_input_idsÚcompletion_labels)Úattention_maskÚdecoder_input_idsÚlabelsÚKL_prompt_input_idsÚKL_prompt_attention_maskÚKL_completion_decoder_input_idsÚKL_completion_labelsÚcompletion_input_idsÚcompletion_attention_mask)ÚKL_completion_input_idsÚKL_completion_attention_maskF©Úaverage_log_probr(r$) rEÚno_gradrFr(r™rrgrÚget_batch_logpsr$)r7Úcompletion_logitsÚ KL_logitsÚcompletion_logpsÚKL_logpss rpz._UnslothKTOTrainer.compute_reference_log_probså
Œ]‰_Œ_ð6 !ð6 !ØŒ~Ñ×Ô%Ø,0¯JªJØ(Ð);Ô<Ø+7Ð8OÔ+PØ.:×.>Ò.>Ð?]Ñ.^Ô.^Ø#/Ð0CÔ#Dð -7ñ-ô-ô
  Ô%Ø(,¯
ª
Ø ,Ð-BÔ CØ/;Ð<VÔ/WØ2>×2BÒ2BÐCdÑ2eÔ2eØ'3Ð4JÔ'Kð )3ñ)ô)ô
 &øð-1¯JªJØ(Ð)?Ô@Ø+7Ð8SÔ+Tð-7ñ-ô-ô
 Ô%Ø(,¯
ª
Ø ,Ð-FÔ GØ/;Ð<ZÔ/[ð)3ñ)ô)ô&ð/%øøøð%øð8Ô!Ø(,¯ªØ$Ð%7Ô8Ø'3Ð4KÔ'LØ*6×*:Ò*:Ð;YÑ*ZÔ*ZØ+Ð,?Ô)7ñ)ô)ô
ð Ô!Ø$(§N¢NØ(Ð)>Ô?Ø+7Ð8RÔ+SØ.:×.>Ò.>Ð?`Ñ.aÔ.aØ#/Ð0FÔ#Gð %3ñ%ô%ô
 "øð)-¯ªØ$Ð%;Ô<È\ÐZuÔMvð)7ñ)ô)äðÔ!Ø$(§N¢NØ(Ð)BÔCØ+7Ð8VÔ+Wð%3ñ%ô%ô"ðg6 !ð6 !ð6 !ñ6 !ô6 !ð6 !ð6 !ð6 !ð6 !ð6 !ð6 !øøøð6 !ð6 !ð6 !ð6 !ðp × Ø Ð  
ô
Ðð Ô ð Ø×ØÐ4Ø!&Ø#'Ô#:Ø#'Ô#:ð ôˆHˆHðˆHà Ð)s6H.±C-D*Ä H.Ä*D. Ä.H.Ä1D. Ä2C0H.È.H2È5H2Fr—rgr$r(có®|jdd|jkrtd¦«|s2|ddddf ¦«}|ddddddf}n| ¦«}||k}d|||k<t||¦«}|r.||z d¦«| d¦«z S||z d¦«S)aCompute the log probabilities of the given labels under the given logits.
Args:
logits:
Logits of the model (unnormalized). Shape: (batch_size, sequence_length, vocab_size)
labels:
Labels for which to compute the log probabilities. Label tokens with a value of label_pad_token_id are
ignored. Shape: (batch_size, sequence_length)
average_log_prob:
If True, return the average log probability per (non-masked) token. Otherwise, return the sum of the
log probabilities of the (non-masked) tokens.
Returns:
A tensor of shape (batch_size,) containing the average/sum log probabilities of the given labels under the
given logits.
NrUzKLogits (batch and sequence length dim) and labels must have the same shape.rZr)r]rˆÚclonerC)rgr$r(Ú loss_maskros rpz"_UnslothKTOTrainer.get_batch_logpseð0 Œ<˜˜˜Ô  ¤ Ò Ð ˜A˜A˜A˜q˜r˜r˜E”]×*ˆFؘA˜A˜A˜s ˜s A A A˜&ˆFˆ—\\^”^ˆÐ0ˆ ð01ˆˆvиÑà ð# iÑ/×4°RÑ8¸9¿=º=ÈÑ;LÔ;LÑ # /×4°RÑ 8rrÚbatchcóª| |¦«}|jrd d¦«dœni}|jrd|d<|dfddi|¤Ž}|j}| |dd |j|j¬
¦«}|jd td ¦«krtd
¦«ˆfdt|jd ¦«D¦«}ˆfdt|jd ¦«D¦«} ||df}
|| df} ||df} || df}
|jr
|
| | |
||j fS|
| | |
|fS)Nrç©TrlFrórr€z‡There is a mismatch between the number of examples in this batch and the number of examples for which an output sequence was predicted.có4g|]}d|du¯|ŒS©r€Tr0©rzÚirþs €rpr|z._UnslothKTOTrainer.forward.<locals>.<listcomp>¸s.ø€Ð_˜AÀUÈ7Ä^ÐTUÔEVÐZ^ÐE^ÐE^aÐE^ÐE^ÐE^rrcó4g|]}d|du¯|ŒS©r€Fr0rs €rpr|z._UnslothKTOTrainer.forward.<locals>.<listcomp>¹s.ø€Ðb˜aÀuÈWÄ~ÐVWÔGXÐ\aÐGaÐGa˜ÐGaÐGaÐGarr.) Ú_compute_kl_logpsr(rrgr$r]rˆÚrangeÚaux_loss)r7r™Ú model_kwargsÚoutputsr÷Ú
chosen_idxÚ rejected_idxÚ chosen_logpsÚrejected_logpsÚ
chosen_logitsÚrejected_logitss ` rpÚforwardz_UnslothKTOTrainer.forward“ø€ð×)¨%°Ñ7ˆðÔ
ØÐ 3Ô4Ø%*§Y¢YÐ/MÑ%NÔ%Nð
ð
ð
ð
ð
ð Ô ð 8Ø37ˆLÐ  Ð 
ð
à Ð!<Ô
ðð
ð
ˆð
$œNÐà× Ø Ð  
ô
Ðð Ô !  $­¨E°'¬NÑ(;Ô(;Ò ðGñôð
ð
`ÐÐ'7Ô'=¸aÔ'@Ñ!AÔ!AÐ_ˆ
Øb¥5Ð)9Ô)?ÀÔ)BÑ#CÔ#CÐbˆ à
°C¨Ô Ø)¨,¸Ð*;Ô<ˆà)¨*°c¨/Ô:ˆ
Ø+¨L¸#Ð,=Ô>ˆà Ô ð \Ø  .°-ÀÐRZÐ\cÔ\lÐ   .°-ÀÐRZÐ [rrÚpolicy_chosen_logpsÚpolicy_rejected_logpsÚpolicy_KL_logpsÚreference_chosen_logpsÚreference_rejected_logpsrÑcóˆ|jrj||z
 ¦« ¦«}|j |¦« ¦« d¬¦«}n,t
jd¦« |j ¦«}|j
ddks|j
ddkrz||z
}|j dkr#dtj
|j||z
z¦«z
} n*|j dkrdtj
|j|z¦«z
} |j| ¦«z}
nbt
jg¦« |jj ¦«} t
jg¦« |jj ¦«}
|j
ddks|j
ddkrw||z
} |j dkr#dtj
|j|| z
z¦«z
} n'|j dkrtj
|j| z¦«} |j|  ¦«z}
nbt
jg¦« |jj ¦«} t
jg¦« |jj ¦«}
t
j|j| z|j| zfd¦«}||
|
|fS)avCompute the KTO loss for a batch of policy and reference model log probabilities.
Args:
policy_chosen_logps:
Log probabilities of the policy model for the chosen responses. Shape: (num(chosen) in batch_size,)
policy_rejected_logps:
Log probabilities of the policy model for the rejected responses. Shape: (num(rejected) in batch_size,)
policy_KL_logps: Log probabilities of the policy model for the KL responses. Shape: (batch_size,)
reference_chosen_logps:
Log probabilities of the reference model for the chosen responses. Shape: (num(chosen) in batch_size,)
reference_rejected_logps:
Log probabilities of the reference model for the rejected responses. Shape: (num(rejected) in
batch_size,)
reference_KL_logps: Log probabilities of the reference model for the KL responses. Shape: (batch_size,)
Returns:
A tuple of four tensors: (losses, chosen_rewards, rejected_rewards, KL). The losses tensor contains the KTO
loss for each example in the batch. The chosen_rewards and rejected_rewards tensors contain the rewards for
the chosen and rejected responses, respectively. The KL tensor contains the detached KL divergence estimate
between the policy and reference models.
r©r4rZrrg)rÚmeanÚdetachrÚclamprEÚzerosr_Údevicer]r!rÚsigmoidr rr×r"r#)r7rrrrrÚklÚchosen_logratiosÚ
chosen_lossesÚchosen_rewardsÚrejected_logratiosÚrejected_lossesÚrejected_rewardsÚlossess rpÚkto_lossz_UnslothKTOTrainer.kto_lossÆð< Ô ð!Ð$6Ñ6×>×GˆÔ4°RÑEÈ!ÐLˆBˆ˜Q×"Ð#6Ô#=Ñ>ˆ Ô $  '¨1Ò ,Ð0FÔ0LÈQÔ0OÐSTÒ0TÐ0TØ2Ð5KÑ àŒ~ Ò&à !¥A¤I¨d¬iÐ;KÈbÑ;PÑ.QÑ$RÔ$RÑ R
ØÐ#6Ò!"¥A¤I¨d¬iÐ:JÑ.JÑ$KÔ$KÑ K
à!œYÐ)9×)@Ò)@Ñ)BÔ)BÑBˆNˆ"œL¨Ñ,×Ô0@Ô0GÑHˆ"œ\¨"Ñ-×Ô1AÔ1HÑIˆ &  )¨QÒ .Ð2JÔ2PÐQRÔ2SÐWXÒ2XÐ2XØ!6Ð9QÑ!QÐ àŒ~ Ò&Ø"#¥a¤i°´ ¸RÐBTÑ=TÑ0UÑ&VÔ&VÑ"VØÐ#6Ò6Ý"#¤)¨D¬IÐ8JÑ,JÑ"KÔ"Kà#œyÐ+=×+DÒ+DÑ+FÔ+FÑ Ð õ$œl¨2Ñ.×1°$Ô2BÔ2IÑJˆ$œ|¨BÑ/×2°4Ô3CÔ3JÑ åØ
Ô
" 
2°DÔ4KÈoÑ4]Ð
ñ
ô
ˆð
~Ð'7¸Ð;rrcófd}|jr§|jr-|d|d|d| d¦«dœ}n|d|dd œ}tj¦«5|d i|¤Žj}ddd¦«n #1swxYwY| ||dd
|j|j¬ ¦«}|S)
z/Compute KL log probabilities for a given batch.Nrë)Ú input_idsrè)r*Frór0)rr(rrErgr$)r7r™ÚKL_model_kwargsrøs rprz$_UnslothKTOTrainer._compute_kl_logpss/àˆØ Ô ð ØÔ
à!&Ð'<Ô!=Ø&+Ð,FÔ&GØ#Ð$:Ô;Ø).¯ªÐ3TÑ)UÔ)Uð #ð#ð"'Ð'@Ô!AØ&+Ð,JÔ&Kð#ð#õ
ð

!˜4 ; ð










<øøøð



רÐ-Ø!&Ø#'Ô#:Ø#'Ô#:ð ôˆˆsÁ"A<Á<BÂBc
óp| ||¦«}| |j|¦«}|jrj||z
 ¦« ¦«}|j |¦« ¦« d¬¦«}n1tj d¦«