unsloth_compiled_cache/__pycache__/UnslothKTOTrainer.cpython-311.pyc

§
4$�hßƒãó”—dZddlmZddlZddlmZddlmZddlmZm	Z	m
Z
mZmZm
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZmZmZm Z m!Z!m"Z"m#Z#m$Z$m%Z%mZm&Z&m'Z'm(Z(m)Z)m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0m1Z1m2Z2m3Z3m4Z4m5Z5m6Z6m7Z7m8Z8m9Z9m:Z:m;Z;mZm<Z<m=Z=m>Z>m?Z?m@Z@mAZAmBZBmCZCmDZDmEZEmFZFmZmGZGmHZHmZm
Z
mZmZm#Z#m5Z5m>Z>mZddl>Z>ddlTddlImJZJmKZKdd	lLmMZMddlZddlNZ<dd
lOm=Z=ddlmZddlPmQZQmRZSdd
dd
d
dœZTejUddeT¬¦«d„¦«ZVeJGd„de¦«¦«ZW	Gd„de#¦«ZXGd„deX¦«ZYdS)z8
2025.8.4
2025.8.5
4.55.1
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)GrÚAutoModelForCausalLMÚBaseImageProcessorrÚDPODataCollatorWithPaddingÚDataCollatorÚ
DataLoaderÚDatasetÚEvalLoopOutputÚFÚFeatureExtractionMixinÚ	KTOConfigÚ
KTOTrainerÚLiteralrÚPartialStateÚPathÚ	PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚSequentialSamplerÚTrainerÚTrainerCallbackÚTrainingArgumentsr	Ú_get_kl_datasetÚ_process_tokensÚ	_tokenizeÚautocastÚconcatenate_datasetsÚcontextmanagerÚcreate_reference_modelÚdefaultdictÚdisable_dropout_in_modelÚgenerate_model_cardÚget_comet_experiment_urlÚ
has_lengthÚinspectÚis_comet_availableÚis_liger_kernel_availableÚis_peft_availableÚis_wandb_availableÚ
itemgetterÚlog_table_to_comet_experimentÚmaybe_apply_chat_templateÚmaybe_extract_promptÚmaybe_unpair_preference_datasetÚnnÚnpÚnullcontextÚosÚ
pad_to_lengthÚpdÚpeft_module_casting_to_bf16Úprepare_deepspeedÚprepare_model_for_kbit_trainingÚrandomÚselective_log_softmaxÚtextwrapÚtorchÚtqdmÚwarningsrrrrr r2r<rE)Ú*)Ú	dataclassÚfield)ÚVersion)r;)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚmax_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ	fullgraphÚoptionscó’—tj| d|jd¦«dd¬¦«}tj| d¦«dd¬¦«}g}t	||¦«D]‘\}}| tj¦«}tj|d| d¦«¬¦« 	d¦«}tj
|d¬¦«}||z
}	| |	¦«Œ’	tj|¦«}| |jd|jdf¦«}|S)Néÿÿÿÿér)ÚchunksÚdim)rXÚindex)rXé)
rEÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ	unsqueezeÚsqueezeÚ	logsumexpÚappendÚconcat)
ÚlogitsrYÚchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚchunk_logitsÚchunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logpss
          ú]/workspace/Fine-tuning/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothKTOTrainer.pyÚchunked_selective_log_softmaxrq"s5€õ”[ §¢°°F´LÀÔ4DÑ!EÔ!EÐPQÐYZÐ[Ñ[Ô[€NÝ”[ §¢¨rÑ!2Ô!2¸QÀaÐHÑHÔH€MØÐå%(¨¸Ñ%GÔ%Gð4ð4Ñ!ˆ�kØ#—’¥u¤}Ñ5Ô5ˆÝœ, |¸2À{×G\ÒG\Ð]_ÑG`ÔG`ÐaÑaÔa×iÒiÐjlÑmÔmˆÝ œ?¨<¸rÐBÑBÔBÐØ)Ð,<Ñ<ˆØ×"Ò" ?Ñ3Ô3Ð3Ð3ØÝœ,Ð':Ñ;Ô;ÐØ-×5Ò5°v´|ÀA´ÈÌÐUVÌÐ6XÑYÔYÐØÐócó¸‡—eZdZUdZedddi¬¦«Zeeed<edddi¬¦«Z	ee
ed	<																																																																																																																																																					d0ˆfd/„	ZˆxZS)1ÚUnslothKTOConfiguÐ
    
    Configuration class for the [`KTOTrainer`].

    This class includes only the parameters that are specific to KTO training. For a full list of training arguments,
    please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may
    differ from those in [`~transformers.TrainingArguments`].

    Using [`~transformers.HfArgumentParser`] we can turn this class into
    [argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
    command line.

    Parameters:
        max_length (`int` or `None`, *optional*, defaults to `1024`):
            Maximum length of the sequences (prompt + completion) in the batch. This argument is required if you want
            to use the default data collator.
        max_prompt_length (`int` or `None`, *optional*, defaults to `512`):
            Maximum length of the prompt. This argument is required if you want to use the default data collator.
        max_completion_length (`int` or `None`, *optional*, defaults to `None`):
            Maximum length of the completion. This argument is required if you want to use the default data collator
            and your model is an encoder-decoder.
        beta (`float`, *optional*, defaults to `0.1`):
            Parameter controlling the deviation from the reference model. Higher Î² means less deviation from the
            reference model.
        loss_type (`str`, *optional*, defaults to `"kto"`):
            Type of loss to use. Possible values are:

                - `"kto"`: KTO loss from the [KTO](https://huggingface.co/papers/2402.01306) paper.
                - `"apo_zero_unpaired"`: Unpaired variant of APO-zero loss from the
                  [APO](https://huggingface.co/papers/2408.06266) paper.

        desirable_weight (`float`, *optional*, defaults to `1.0`):
            Desirable losses are weighed by this factor to counter unequal number of desirable and undesirable paris.
        undesirable_weight (`float`, *optional*, defaults to `1.0`):
            Undesirable losses are weighed by this factor to counter unequal number of desirable and undesirable pairs.
        label_pad_token_id (`int`, *optional*, defaults to `-100`):
            Label pad token id. This argument is required if you want to use the default data collator.
        padding_value (`int` or `None`, *optional*, defaults to `None`):
            Padding value to use. If `None`, the padding value of the tokenizer is used.
        truncation_mode (`str`, *optional*, defaults to `"keep_end"`):
            Truncation mode to use when the prompt is too long. Possible values are `"keep_end"` or `"keep_start"`.
            This argument is required if you want to use the default data collator.
        generate_during_eval (`bool`, *optional*, defaults to `False`):
            If `True`, generates and logs completions from both the model and the reference model to W&B or Comet
            during evaluation.
        is_encoder_decoder (`bool` or `None`, *optional*, defaults to `None`):
            When using the `model_init` argument (callable) to instantiate the model instead of the `model` argument,
            you need to specify if the model returned by the callable is an encoder-decoder model.
        precompute_ref_log_probs (`bool`, *optional*, defaults to `False`):
            Whether to precompute reference model log probabilities for training and evaluation datasets. This is
            useful when training without the reference model to reduce the total GPU memory needed.
        model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
            Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the model from a
            string.
        ref_model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
            Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the reference model
            from a string.
        dataset_num_proc: (`int` or `None`, *optional*, defaults to `None`):
            Number of processes to use for processing the dataset.
        disable_dropout (`bool`, *optional*, defaults to `True`):
            Whether to disable dropout in the model and reference model.
        use_liger_loss (`bool`, *optional*, defaults to `False`):
            Whether to use Liger loss. It requires liger-kernel to be installed.
        base_model_attribute_name (`str`, *optional*, defaults to `"model"`):
            Name of the attribute in the model that contains the base model. This is used to get the base model from
            the model when the model does not have a `get_decoder` method in the case when `use_liger_loss` is `True`.
    
    NÚhelpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsrUz8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksFÚnorVéréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?ç@Úlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsrZéôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastéééÚktoéœÿÿÿÚkeep_endÚmodelc–ó–•—|dkrtd|›d�¦«‚|dkrtd|›d�¦«‚|€|#dkr
|$dkrd}d	}#|‘€!d
dlm}—t	|—¦«dzd¦«}‘t¦«jd id
|“d|“d|“d|“d|“d|“d|“d|“d|	“d|
“d|“d|“d|
“d|“d|“d|“d|“d|“d|“d |“d!|“d"|“d#|“d$|“d%|“d&|“d'|“d(|“d)|“d*|“d+|“d,| “d-|!“d.|"“d/|#“d0|$“d1|%“d2|&“d3|'“d4|(“d5|)“d6|*“d7|+“d8|,“d9|-“d:|.“d;|/“d<|0“d=|1“d>|2“d?|3“d@|4“dA|5“dB|6“dC|7“dD|8“dE|9“dF|:“dG|;“dH|<“dI|=“dJ|>“dK|?“dL|@“dM|A“dN|B“dO|C“dP|D“dQ|E“dR|F“dS|G“dT|H“dU|I“dV|J“dW|K“dX|L“dY|M“dZ|N“d[|O“d\|P“d]|Q“d^|R“d_|S“d`|T“da|U“db|V“dc|W“dd|X“de|Y“df|Z“dg|[“dh|\“di|]“dj|^“dk|_“dl|`“dm|a“dn|b“do|c“dp|d“dq|e“dr|f“ds|g“dt|h“du|i“dv|j“dw|k“dx|l“dy|m“dz|n“d{|o“d||p“d}|q“d~|r“d|s“d€|t“d�|u“d‚|v“dƒ|w“d„|x“d…|y“d†|z“d‡|{“dˆ||“d‰|}“dŠ|~“d‹|“dŒ|€“d�|�“dŽ|‚“d�|ƒ“d�|„“d‘|…“d’|†“d“|‡“d”|ˆ“d•|‰“d–|Š“d—|‹“d˜|Œ“d™|�“dš|Ž“d›|�“dœ|�“d�|‘“dž|’“dŸ|““|–¤Ž|”|_|•|_dS)¡NgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!rZza` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rˆr‰Úunsloth_training_checkpointsrzr)Ú	cpu_countr{Ú
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚweight_decayÚ
adam_beta1Ú
adam_beta2Úadam_epsilonÚ
max_grad_normÚnum_train_epochsÚ	max_stepsÚlr_scheduler_typeÚwarmup_ratioÚwarmup_stepsÚ	log_levelÚlog_level_replicaÚlog_on_each_nodeÚlogging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ	data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚdisable_tqdmÚremove_unused_columnsÚlabel_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚfsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ	deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ	adafactorÚgroup_by_lengthÚlength_column_nameÚ	report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚpush_to_hubÚresume_from_checkpointÚhub_model_idÚhub_strategyÚ	hub_tokenÚhub_private_repoÚhub_always_pushÚhub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚfp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚtorchdynamoÚ	ray_scopeÚddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚ
max_lengthÚmax_prompt_lengthÚmax_completion_lengthÚbetaÚ	loss_typeÚdesirable_weightÚundesirable_weightÚlabel_pad_token_idÚ
padding_valueÚtruncation_modeÚgenerate_during_evalÚis_encoder_decoderÚdisable_dropoutÚprecompute_ref_log_probsÚmodel_init_kwargsÚref_model_init_kwargsÚdataset_num_procÚuse_liger_lossÚbase_model_attribute_name©)	ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingrœÚminÚsuperÚ__init__rxry)™Úselfr�ržrŸr r¡r¢r£r¤r¥r¦r§r¨r©rªr«r¬rr®r¯r°r±r²r³r´rµr¶r·r¸r¹rºr»r¼r½r¾r¿rÀrÁrÂrÃrÄrÅrÆrÇrÈrÉrÊrËrÌrÍrÎrÏrÐrÑrÒrÓrÔrÕrÖr×rØrÙrÚrÛrÜrÝrÞrßràrárârãrärårærçrèrérêrërìrírîrïrðrñròrórôrõrör÷rørùrúrûrürýrþrÿrrrrrrrrrr	r
rrr
rrrrrrrrrrrrrrrrrrr r!r"r#r$r%r&r'r(r)r*r+r,r-r.r/rxryÚkwargsrœÚ	__class__s™                                                                                                                                                        €rpr6zUnslothKTOConfig.__init__�sg
ø€ðr˜4ÒÐÕ'9ð;VÐ]jð;Vð;Vð;Vñ(Wô(Wð"WØ˜1ÒÐ¥Mð3FÐUbð3Fð3Fð3Fñ%Gô%GðGØÐ -°7Ò":Ð":¸zÈSÒ?PÐ?PØ7ˆJØ ˆMØÐ#Ø1Ð1Ð1Ð1Ð1Ð1Ý" 9 9¡;¤;¨q¡=°!Ñ4Ô4Ðà�‰ŒÔðS	LðS	LðS	LØ#˜ðS	Là#7Ð#7ðS	Lð �xðS	Lð�gð	S	Lð
$˜ðS	Lð*˜Mð
S	Lð$8Ð#7ðS	Lð+FÐ*EðS	Lð*DÐ)CðS	Lð(@Ð'?ðS	Lð'>Ð&=ðS	Lð+FÐ*EðS	Lð'>Ð&=ðS	Lð$˜ðS	Lð'>Ð&=ðS	Lð *˜Mð!S	Lð"(˜<ð#S	Lð$$˜ð%S	Lð&$˜ð'S	Lð((˜<ð)S	Lð**˜Mð+S	Lð, 0Ð/ð-S	Lð."˜	ð/S	Lð0!2Ð 1ð1S	Lð2(˜<ð3S	Lð4(˜<ð5S	Lð6"˜	ð7S	Lð8!2Ð 1ð9S	Lð: 0Ð/ð;S	Lð<&˜+ð=S	Lð> 0Ð/ð?S	Lð@"4Ð!3ðAS	LðB*˜MðCS	LðD&<Ð%;ðES	LðF*˜MðGS	LðH$˜ðIS	LðJ 0Ð/ðKS	LðL 0Ð/ðMS	LðN!2Ð 1ðOS	LðP.˜oðQS	LðR7^Ð6]ðSS	LðT�gðUS	LðV�gðWS	LðX,˜^ðYS	LðZ�4ð[S	Lð\"˜	ð]S	Lð^*˜Mð_S	Lð` �xðaS	Lðb�4ðcS	Lðd�4ðeS	Lðf,˜^ðgS	Lðh&<Ð%;ðiS	Lðj,˜^ðkS	Lðl,˜^ðmS	Lðn�4ðoS	Lðp$˜ðqS	Lðr&˜+ðsS	Lðt*˜MðuS	Lðv!2Ð 1ðwS	Lðx�EðyS	Lðz$8Ð#7ð{S	Lð|$˜ð}S	Lð~&<Ð%;ðS	Lð@*DÐ)CðAS	LðB$˜ðCS	LðD �xðES	LðF(˜<ðGS	LðH%:Ð$9ðIS	LðJ&˜+ðKS	LðL&<Ð%;ðMS	LðN%:Ð$9ðOS	LðP!2Ð 1ðQS	LðR 0Ð/ðSS	LðT�4ðUS	LðV#6Ð"5ðWS	LðX&˜+ðYS	LðZ2TÐ1Sð[S	Lð\"4Ð!3ð]S	Lð^"˜	ð_S	Lð`&<Ð%;ðaS	Lðb�EðcS	Lðd$˜ðeS	Lðf"˜	ðgS	Lðh.˜oðiS	Lðj"4Ð!3ðkS	Lðl"˜	ðmS	Lðn*DÐ)CðoS	Lðp!2Ð 1ðqS	Lðr%:Ð$9ðsS	Lðt%:Ð$9ðuS	Lðv-JÐ,IðwS	Lðx#6Ð"5ðyS	Lðz*DÐ)Cð{S	Lð|&˜+ð}S	Lð~&<Ð%;ðS	Lð@(˜<ðAS	LðB(˜<ðCS	LðD"˜	ðES	LðF 0Ð/ðGS	LðH.˜oðIS	LðJ(˜<ðKS	LðL&<Ð%;ðMS	LðN-JÐ,IðOS	LðP*DÐ)CðQS	LðR&<Ð%;ðSS	LðT(˜<ðUS	LðV$8Ð#7ðWS	LðX(@Ð'?ðYS	LðZ!2Ð 1ð[S	Lð\*˜Mð]S	Lð^$8Ð#7ð_S	Lð` 0Ð/ðaS	Lðb&˜+ðcS	Lðd"˜	ðeS	Lðf&˜+ðgS	Lðh*˜MðiS	Lðj%:Ð$9ðkS	Lðl"4Ð!3ðmS	Lðn)BÐ(AðoS	Lðp-JÐ,IðqS	Lðr#6Ð"5ðsS	Lðt$8Ð#7ðuS	Lðv"4Ð!3ðwS	Lðx*˜MðyS	Lðz 0Ð/ð{S	Lð|#6Ð"5ð}S	Lð~&<Ð%;ðS	Lð@-JÐ,IðAS	LðB$˜ðCS	LðD!2Ð 1ðES	LðF%:Ð$9ðGS	LðH�4ðIS	LðJ"˜	ðKS	LðL 0Ð/ðMS	LðN"4Ð!3ðOS	LðP"4Ð!3ðQS	LðR*˜MðSS	LðT.˜oðUS	LðV$8Ð#7ðWS	LðX"4Ð!3ðYS	LðZ.˜oð[S	Lð\(@Ð'?ð]S	Lð^!2Ð 1ð_S	Lð`%:Ð$9ðaS	Lðb 0Ð/ðcS	Lðd,˜^ðeS	Lðf)BÐ(AÀFðgS	LðS	LðS	Lðh%9ˆÔ!Ø"4ˆÔÐÐrr)•NNFFFrzFrVrVNNr{r{rr|r}r~rr€r�r‚rƒrUr„r…rr†r‡TNrˆFrZFrˆr‰NTFFFFFFrŠrŠFFFFr‹rŒFFNrUNNFr�FNrNrUNNTNFNNFr�rNNNNrŽr�NFFr�NNNNTFTFFNNr‘NNFNFNFTrŒNNNr�TFNr’r“FNNFFNNFFFNFTr”r•Nr…r–r‚r‚r—Nr˜FNTFNNNFr™NrU)
Ú__name__Ú
__module__Ú__qualname__Ú__doc__rJrxrrÚ__annotations__ryÚintr6Ú
__classcell__©r9s@rprtrt3sŒø€€€€€€ðCðCðH+0¨%ØØÐ1Ð2ð+ñ+ô+Ð˜( 3œ-ððñð*/¨ØØÐVÐWð*ñ*ô*Ð˜ #œððñðØ#ØØØØØ$Ø&'Ø%&Ø#'Ø"&Ø&'Ø"#ØØ"%ØØØØØØØØØ$ØØØØ%ØØØ"Ø"ØØ!&ØØØØØ!ØØ27ØØØØØØØØØØØ!'ØØØØØØØ!ØØ$ØØ!"Ø%)ØØØØ $ØØ!&Ø $Ø Ø ØØØØ-1Ø!ØØ!$ØØØØØ%ØØ%)Ø Ø $Ø $Ø(-Ø"Ø%*ØØ!%ØØ#ØØØØØ!&Ø(,Ø%*Ø!%ØØ#Ø#'Ø ØØ#Ø ØØØØØ $Ø!Ø$)Ø(-Ø"Ø#Ø"ØØ Ø"Ø!&Ø(,ØØØ $ØØØØ Ø!ØØ$Ø$Ø!ØØ#(Ø Ø $ØØØ$+Ø#Øðmw5ðw5ðw5ðw5ðw5ðw5ðw5ðw5ðw5ðw5rrrtc óð‡—eZdZdZddgZ															dJdeeeje	fde
eeeje	fded	e
ed
e
eee
e	effde
eeeeefde
ed
e
egefde
eedeejjejjjfde
eejejgejfde
e
de
eege
fde
e	de
e	fˆfd„
Zed„¦«Z de!fˆfd„Z"dKd
e
ede!fˆfd„
Z#de
de
fd„Z$e%			dLdej&dej'd e(d!e)d"e(dej&fd#„¦«Z*dejd$e
e	eeej'ffdeej&ej&ej&ej&ffd%„Z+d&ej&d'ej&d(ej&d)ej&d*ej&d+ej&deej&ej&ej&ej&ffd,„Z,d-„Z-d.„Z.d$e
e	eeej'fffd/„Z/		dMdeeejfd0e
e	eeje0ffdeejeeje
e	ejffffd1„Z1dNd3e
e	e2fd4e3d5ddfd6„Z4dKd7e
ede
ej5j6j7fd8„Z8d$e
e	ej'fdee	e	ffd9„Z9	dKdeeejfd0e
e	eeje0ffd:e(d;e
ee	fd<„Z:			dOd>e!d?e	d:e
e(d;e
ee	d@e	defˆfdA„
Z;dKdBe
e	e2fdCe
e2ddfˆfdD„
Z<ˆfdE„Z=			dPdFe
e	dGe
e	dHee	ee	dffdI„Z>ˆxZ?S)QÚ_UnslothKTOTrainerr�Útrlr–N©NNr™Ú	ref_modelÚargsÚ
train_datasetÚeval_datasetÚprocessing_classÚ
data_collatorÚ
model_initÚ	callbacksÚ
optimizersÚpreprocess_logits_for_metricsÚpeft_configÚcompute_metricsÚmodel_adapter_nameÚref_adapter_namec
ó˜•‡‡—t|¦«turtd¦«‚t|t¦«s||urtd¦«‚|j€i}nªt|t¦«std¦«‚|j}| d¦«}|�ht|t¦«r|dkrtt|¦«}|dkr-t|tj	¦«std|›d�¦«‚||d<|j
€i}nªt|t¦«std¦«‚|j
}| d¦«}|�ht|t¦«r|dkrtt|¦«}|dkr-t|tj	¦«std|›d�¦«‚||d<t|t¦«rtj|fi|¤Ž}t|t¦«rtj|fi|¤Ž}d	|_
t¦«s|�td
¦«‚t¦«�r5|��2t|t¦«r| ¦«}t|dd	¦«st|dd	¦«r`t#|d
¦«o,d
t%t'jt*¦«j¦«v}d|ji}|r
|j|d
<t+|fi|¤Ž}nV|jrOt#|d¦«r| ¦«n*d„}| ¦« |¦«|}|jr't|dd	¦«rt;|¦«d|_
nV|jrOt#|d¦«r| ¦«n*d„}| ¦« |¦«|jr+t?¦«stA¦«std¦«‚|�|j!j"|_"n"|j"€td¦«‚|j"|_"t¦«ot|t¦«|_#||_$||_%|r||_&n*|j#s|j'rd|_&ntQ|¦«|_&|€td¦«‚|j)€tUj+dtX¦«d}|j)�|j)}|j-€tUj+dtX¦«d}|j-�|j-}d}|j.€#|j"rtUj+dtX¦«d}|j.�|j"r|j.}|€Qt_|j0|j1|j"¬¦«}|j2r!d	|_2tUj+dtX¦«d|_3nd	|_3|j4r*tk|¦«|j&�tk|j&¦«|j6|_6||_)|j|_|j1|_1|j7�|j7n|j0|_7||_-|j8|_8||_.||_9|j'|_'d|_:|j6dvrd	|_:d	|_;d	|_<t{d„¦«|_>|j?|_?|j@|_@|jA|_At|j!dd	¦«|_Bt|j!d d!¦«|_C|jBr%|jCd!krtUj+d"tX¦«d|jDd#<t‹¦« F¦«5‰ Gt�|jId$¬%¦«Št•‰|jId&¬'¦«Š‰ Gt–d(|i|jId)¬*¦«Š‰�^‰ Gt�|jId+¬%¦«Št•‰|jId,¬'¦«Š‰ Gt–d(|i|jId-¬*¦«Š‰ Gt˜dd(|j9i|jId.¬/¦«Šd0|j"|j9|j)|j8|j1|j-|j.d1œ}‰ Gtš||jId2¬*¦«Š‰�N‰ Gt˜d(|j9id|jId3¬4¦«Š‰ Gtš||jId5¬*¦«Š|j:�r|jNd6krtd7¦«‚‰ Gtžd|jN|jId8¬9¦«}d:|d;<| Gtš||jIˆfd<„|jPD¦«d=¬>¦«}t£‰|gd6¬?¦«Š‰�q‰ Gtžd|jN|jId@¬9¦«}| Gtš||jIˆfdA„|jPD¦«dB¬>¦«}t£‰|gd6¬?¦«Št¥t§‰dC¦«d6¦«}t¥t©‰dC¦«|z
d6¦«}||krÍt«||jAz|zd6zdD¦«}t«||jAz|zdEzdD¦«}t«||j@z|zdEzdD¦«} t«||j@z|zd6zdD¦«}!||j@cxko|knc}"| |jAcxko|!knc}#|"s)|#s'tUj+dF|›dG|›dH| ›dG|!›dI�	tX¦«ddd¦«n#1swxYwYt¦« W|||‰‰|||
|	|
|¬J¦«d	|_Xt#|jYdK¦«r|jY Z|j[¦«t#|dL¦«st¹dM¦«‚|j]r0|j^j_j`jadNkr|j'rtdO¦«‚|j&€|j#s|j'stdP¦«‚nM|j]r tÅ|j&|j^¦«|_&n&|j^ c|j&d¬Q¦«|_&|jdjer’tÍ¦«stÏdR¦«‚|j6dvrtdS¦«‚|j'rtdT¦«‚|j#s|j%�tdU¦«‚tÑ|j1|j?|j&du¬V¦«|_idSdS)WNz1Please use `KTOConfig` instead TrainingArguments.zœ`model` and `ref_model` cannot be the same object. If you want `ref_model` to be the same as `model`, you must mass a copy of it, or `None` if you use peft.zRYou passed model_kwargs to the KTOTrainer. But your model is already instantiated.Útorch_dtyperŒznInvalid `torch_dtype` passed to the KTOConfig. Expected a string with either `torch.dtype` or 'auto', but got ú.zZYou passed ref_model_kwargs to the KTOTrainer. But your ref_model is already instantiated.FzŽPEFT is not installed and you passed a `peft_config` in the trainer's kwargs, please install it with `pip install peft` to use the PEFT modelsÚis_loaded_in_8bitÚis_loaded_in_4bitrÚuse_gradient_checkpointingÚenable_input_require_gradscó0—| d¦«dS©NT©Úrequires_grad_©ÚmoduleÚinputÚoutputs   rpÚmake_inputs_require_gradz=_UnslothKTOTrainer.__init__.<locals>.make_inputs_require_grad's€Ø×-Ò-¨dÑ3Ô3Ð3Ð3Ð3rrTcó0—| d¦«dSr\r]r_s   rprcz=_UnslothKTOTrainer.__init__.<locals>.make_inputs_require_grad<s€Ø×)Ò)¨$Ñ/Ô/Ð/Ð/Ð/rrz‚`generate_during_eval=True` requires Weights and Biases or Comet to be installed. Please install `wandb` or `comet-ml` to resolve.zMWhen no model is provided, you need to pass the parameter is_encoder_decoder.zdmax_length or a processing_class must be specified when using the default DPODataCollatorWithPaddingz¬When using DPODataCollatorWithPadding, you should set `max_length` in the KTOTrainer's init it will be set to `512` by default, but you should do it yourself in the future.r•z³When using DPODataCollatorWithPadding, you should set `max_prompt_length` in the KTOTrainer's init it will be set to `128` by default, but you should do it yourself in the future.é€zÜWhen using DPODataCollatorWithPadding with an encoder decoder architecture, you should set `max_completion_length` in the KTOTrainer's init it will be set to `128` by default, but you should do it yourself in the future.)Úpad_token_idr$r(zªWhen using DPODataCollatorWithPadding, you should set `remove_unused_columns=False` in your KTOConfig we have set it for you, but you should do it yourself in the future.)Úapo_zero_unpairedcó*—tt¦«S©N)r*Úlistr0rrrpú<lambda>z-_UnslothKTOTrainer.__init__.<locals>.<lambda>s€µ;½tÑ3DÔ3D€rrÚoutput_router_logitsÚrouter_aux_loss_coefrŽa-You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to `0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary loss.Úestimate_tokensz$Extracting prompt from train dataset)Únum_procÚdesczUnpairing train dataset)rpÚ	tokenizerz'Applying chat template to train dataset)Ú	fn_kwargsrorpz#Extracting prompt from eval datasetzUnpairing eval datasetz&Applying chat template to eval datasetzTokenizing train dataset)Úbatchedrrrorpr�
batch_sizerorpÚKL_rtcó&•—g|]
}|‰jv¯|‘ŒSr0©Úcolumn_names)Ú.0ÚcrHs  €rpú
<listcomp>z/_UnslothKTOTrainer.__init__.<locals>.<listcomp>)s(ø€Ð#pÐ#pÐ#p¨!ÐPQÐUbÔUoÐPoÐPo AÐPoÐPoÐPorrz%Processing tokenized train KL dataset)rrroÚremove_columnsrp)ÚaxiszExtracting eval KL datasetcó&•—g|]
}|‰jv¯|‘ŒSr0rx)rzr{rIs  €rpr|z/_UnslothKTOTrainer.__init__.<locals>.<listcomp>>s(ø€Ð'rÐ'rÐ'r¨aÐSTÐXdÔXqÐSqÐSq¨ÐSqÐSqÐSqrrz$Processing tokenized eval KL datasetÚlabelr{gHáz®Gõ?zìYou have different amounts of desirable/positive and undesirable/negative examples but the weights on the desirable and undesirable losses don't seem to be in an ideal range. Based on your data, we recommend EITHER desirable_weight in [z, z] or undesirable_weight in [zN] (but NOT BOTH). See the documentation on how to optimally set these weights.)r™rGrKrHrIrJrLrQrMrNrOÚadd_model_tagsÚacceleratorzXYour `Trainer` does not have an `accelerator` object. Consider upgrading `transformers`.ézrYou cannot use `precompute_ref_log_probs=True` with Deepspeed ZeRO-3. Please set `precompute_ref_log_probs=False`.z]No reference model and model is not a Peft model. Try setting `precompute_ref_log_probs=True`)Úevaluation_modez‚You set `use_liger_loss=True` but the liger kernel is not available. Please install liger-kernel first: `pip install liger-kernel`znYou cannot set `loss_type='apo_zero_unpaired'` with liger-kernel.Only KTO loss is supported with liger-kernel.znYou cannot use `precompute_ref_log_probs=True` with liger kernel. Please set `precompute_ref_log_probs=False`.zYYou cannot use `use_liger_loss=True` with Peft models. Please set `use_liger_loss=False`.)Úignore_indexr Ú
use_ref_model)jÚtyper"Ú
ValueErrorÚ
isinstanceÚstrr+ÚgetÚgetattrrEÚdtyper,r
Úfrom_pretrainedÚ_peft_has_been_casted_to_bf16r2rÚmerge_and_unloadÚhasattrrjr/Ú	signaturerAÚ
parametersrrrZÚget_input_embeddingsÚregister_forward_hookrÍr?r'r3r0Úconfigr(Ú
is_peft_modelrRrSrFr*r)rrGÚwarnÚUserWarningrrrrfr$ràÚuse_dpo_data_collatorr)r+r!r%r&rJÚcalculate_KLÚ _precomputed_train_ref_log_probsÚ_precomputed_eval_ref_log_probsr*Ú_stored_metricsr r"r#Úaux_loss_enabledÚ
aux_loss_coefÚwarnings_issuedrÚmain_process_firstÚmapr7r-r8r6r%r$r¤r#ryr'ÚmaxÚsumÚlenÚroundr5r6Úmodel_accepts_loss_kwargsr™r�Ú
_tag_namesÚAttributeErrorÚis_deepspeed_enabledr‚ÚstateÚdeepspeed_pluginÚ
zero_stager@Ú
prepare_modelrGr.r1ÚImportErrorÚLigerFusedLinearKTOLossÚkto_loss_fn)%r7r™rFrGrHrIrJrKrLrMrNrOrPrQrRrSr+rUr,Ú_support_gc_kwargsÚprepare_model_kwargsrcrrrrrÚtrain_kl_datasetÚeval_kl_datasetÚ
num_desirableÚnum_undesirableÚdes_weight_lower_boundÚdes_weight_upper_boundÚund_weight_lower_boundÚund_weight_upper_boundÚdes_weight_in_rangeÚund_weight_in_ranger9s%    ``                              €rpr6z_UnslothKTOTrainer.__init__Às9øøø€õ(�‰:Œ:Õ*Ð*Ð*ÝÐPÑQÔQÐQå˜%¥Ñ%Ô%ð	¨)°uÐ*<Ð*<ÝðZñôð
ð
Ô!Ð)Ø "ÐÐÝ˜E¥3Ñ'Ô'ð
	?ÝÐqÑrÔrÐrà $Ô 6ÐØ+×/Ò/°
Ñ>Ô>ˆKØÐ&å˜k3Ñ/Ô/ð>°KÀ6Ò4IÐ4IÝ")%°Ñ"=Ô"=�KØ &Ò(Ð(µ¸KÍÌÑ1UÔ1UÐ(Ý$ðXðJUðXðXðXñôðð4?Ð! -Ñ0àÔ%Ð-Ø$&Ð!Ð!Ý˜I¥sÑ+Ô+ð	CÝØlñôð
ð%)Ô$>Ð!Ø/×3Ò3°MÑBÔBˆKØÐ&å˜k3Ñ/Ô/ð>°KÀ6Ò4IÐ4IÝ")%°Ñ"=Ô"=�KØ &Ò(Ð(µ¸KÍÌÑ1UÔ1UÐ(Ý$ðXðJUðXðXðXñôðð8CÐ% mÑ4å�e�SÑ!Ô!ð	UÝ(Ô8¸ÐTÐTÐBSÐTÐTˆEå�i¥Ñ%Ô%ð	aÝ,Ô<¸YÐ`Ð`ÐJ_Ð`Ð`ˆIð.3ˆÔ*å Ñ"Ô"ð4	] {Ð'>Ýðañôð
õÑ
 Ô
 ñ0	] [Ñ%<å˜%¥Ñ+Ô+ð
1Ø×.Ò.Ñ0Ô0�å�uÐ1°5Ñ9Ô9ð
a½WÀUÐL_ÐafÑ=gÔ=gð
aÝ%,ØÐ9ñ&ô&ð&à5½ÝÔ%Õ&EÑFÔFÔQñ:ô:ðð#ð)EÀdÔFaÐ'bÐ$à%ðoØLPÔLnÐ(Ð)HÑIå7¸ÐVÐVÐAUÐVÐV��ØÔ,ð	
aå˜5Ð">Ñ?Ô?ðaØ×4Ò4Ñ6Ô6Ð6Ð6ð4ð4ð4ð×.Ò.Ñ0Ô0×FÒFÐG_Ñ`Ô`Ð`ðˆEØŒyð
:�W UÐ,?ÀÑGÔGð
:Ý+¨EÑ2Ô2Ð2à59�Ô2øð
Ô
(ð		]å�uÐ:Ñ;Ô;ð
]Ø×0Ò0Ñ2Ô2Ð2Ð2ð0ð0ð0ð×*Ò*Ñ,Ô,×BÒBÐC[Ñ\Ô\Ð\àÔ$ð	Õ.@Ñ.BÔ.Bð	ÕFXÑFZÔFZð	ÝðDñôð
ð
ÐØ&+¤lÔ&EˆDÔ#Ð#Ø
Ô
$Ð
,ÝÐlÑmÔmÐmà&*Ô&=ˆDÔ#å.Ñ0Ô0ÐQµZÀÅyÑ5QÔ5QˆÔØ"4ˆÔØ 0ˆÔàð	;Ø&ˆDŒNˆNØ
Ô
ð	; 4Ô#@ð	;à!ˆDŒNˆNå3°EÑ:Ô:ˆDŒNàÐ#ÝØvñôð
ðŒ?Ð"ÝŒMðdåñ
ô
ð
ð
ˆJØŒ?Ð&ØœˆJàÔ!Ð)ÝŒMðdåñ
ô
ð
ð
!$ÐØÔ!Ð-Ø $Ô 6Ðà $ÐØÔ%Ð-°$Ô2IÐ-ÝŒMðdåñ
ô
ð
ð
%(Ð!ØÔ%Ð1°dÔ6MÐ1Ø$(Ô$>Ð!àÐ Ý6Ø-Ô:Ø#'Ô#:Ø#'Ô#:ðñôˆMðÔ)ð
Ø-2�Ô*å”
ð\åñôðð*.ˆDÔ&Ð&à).ˆDÔ&ðÔð	9Ý$ UÑ+Ô+Ð+ØŒ~Ð)Ý(¨¬Ñ8Ô8Ð8àœˆŒØ$ˆŒØ$(Ô$=ˆÔ!Ø"&Ô"9ˆÔØ37Ô3EÐ3Q˜TÔ/Ð/ÐWgÔWtˆÔØ!2ˆÔØ#Ô3ˆÔØ%:ˆÔ"Ø 0ˆÔØ(,Ô(EˆÔ%ð!ˆÔØŒ>Ð2Ð2Ð2Ø %ˆDÔð16ˆÔ-Ø/4ˆÔ,õ +Ð+DÐ+DÑEÔEˆÔð”IˆŒ	Ø $Ô 5ˆÔØ"&Ô"9ˆÔÝ '¨¬Ð6LÈeÑ TÔ TˆÔÝ$ U¤\Ð3IÈ3ÑOÔOˆÔØÔ ð	 TÔ%7¸3Ò%>Ð%>ÝŒMðõñ
ô
ð
ð48ˆÔÐ/Ñ0õ‰^Œ^×
.Ò
.Ñ
0Ô
0ðS	ðS	à)×-Ò-Ý$¨tÔ/DÐKqð.ñôˆMõ<Ø˜tÔ4Ð;TðñôˆMð*×-Ò-Ý)Ø&Ð(8Ð9ØÔ.Ø>ð	.ñôˆMðÐ'Ø+×/Ò/Ý(°4Ô3HÐOtð 0ñ ô �õ ?Ø  $Ô"7Ð>Vð ñ ô �ð ,×/Ò/Ý-Ø*Ð,<Ð=Ø!Ô2ØAð	 0ñ ô �ð*×-Ò-ÝØØ&¨Ô(=Ð>ØÔ.Ø/ð.ñôˆMðØ&*Ô&=Ø!Ô2Ø"œoØ#'Ô#7Ø&*Ô&=Ø%)Ô%;Ø)-Ô)Cð	ð	ˆIð*×-Ò-ÝØ#ØÔ.Ø9ð	.ñôˆMðÐ'Ø+×/Ò/ÝØ*¨DÔ,AÐBØ Ø!Ô2Ø2ð 0ñ ô �ð ,×/Ò/Ý#Ø'Ø!Ô2Ø<ð	 0ñ ô �ðÔ ñ/
aØÔ3°qÒ8Ð8Ý$ðbñôðð$1×#4Ò#4Ý#Ø Ø#Ô?Ø!Ô2Ø6ð$5ñ$ô$Ð ð',�	˜(Ñ#Ø#3×#7Ò#7Ý#Ø'Ø!Ô2Ø#pÐ#pÐ#pÐ#pÐ/?Ô/LÐ#pÑ#pÔ#pØ@ð$8ñ$ô$Ð õ!5°mÐEUÐ5VÐ]^Ð _Ñ _Ô _�
àÐ+à&2×&6Ò&6Ý'Ø $Ø#'Ô#CØ!%Ô!6Ø9ð'7ñ'ô'�Oð'6×&9Ò&9Ý'Ø"+Ø!%Ô!6Ø'rÐ'rÐ'rÐ'r°?Ô3OÐ'rÑ'rÔ'rØCð':ñ'ô'�Oõ$8¸ÀÐ8WÐ^_Ð#`Ñ#`Ô#`�Lõ ¥ M°'Ô$:Ñ ;Ô ;¸QÑ?Ô?ˆMÝ!¥# m°GÔ&<Ñ"=Ô"=À
Ñ"MÈqÑQÔQˆOà Ò/Ð/å).°À$ÔBYÑ0YÐ\iÑ0iÐmnÑ/nÐpqÑ)rÔ)rÐ&Ý).°À$ÔBYÑ0YÐ\iÑ0iÐmqÑ/qÐstÑ)uÔ)uÐ&Ý).°
ÀÔ@UÑ0UÐXgÑ0gÐkoÑ/oÐqrÑ)sÔ)sÐ&Ý).°
ÀÔ@UÑ0UÐXgÑ0gÐklÑ/lÐnoÑ)pÔ)pÐ&à&<ÀÔ@UÐ&oÐ&oÒ&oÐ&oÐYoÒ&oÐ&oÐ&oÐ&oÐ#Ø&<ÀÔ@WÐ&qÐ&qÒ&qÐ&qÐ[qÒ&qÐ&qÐ&qÐ&qÐ#à+ð	Ð/Bð	Ý”MðWð1GðWðWðKaðWðWð3Ið	WðWðMcð	WðWðWõ$ñôððWS	ðS	ðS	ñS	ôS	ðS	ðS	ðS	ðS	ðS	ðS	øøøðS	ðS	ðS	ðS	õj	‰Œ×ÒØØØ'Ø'Ø%Ø-Ø!Ø+ØØ!Ø*Gð	ñ	
ô	
ð	
ð"*/ˆÔ&õ�4”:Ð/Ñ0Ô0ð	7ØŒJ×%Ò% d¤oÑ6Ô6Ð6å�t˜]Ñ+Ô+ð	Ý Øjñôð
ð
Ô$ð	ØÔÔ%Ô6ÔAÀQÒFÐFÈ4ÔKhÐFÝ ðIñôððŒ>Ð!ØÔ&ð
¨$Ô*Gð
Ý ØsñôðøðÔ(ð
fÝ!2°4´>À4ÔCSÑ!TÔ!T�”�à!%Ô!1×!?Ò!?ÀÄÐ`dÐ!?Ñ!eÔ!e�”ðŒ9Ô#ð	Ý,Ñ.Ô.ð
Ý!ðTñôððŒ~Ð!6Ð6Ð6Ý ðDñôððÔ,ð
Ý ð8ñôððÔ!ð
 TÔ%:Ð%FÝ Øoñôðõ 7Ø!Ô4¸4¼9ÐUYÔUcÐkoÐUoð ñ ô ˆDÔÐÐð)	ð	sÜ4N2k2ë2k6ë9k6c#ózK—|jr8|js1|j |j¦« ¦«n
t
¦«5|jr|j |j¦«dV—|jr!|j |jpd¦«ddd¦«dS#1swxYwYdS)zWContext manager for handling null reference model (that is, peft adapter manipulation).Nrv)	r—rSr‚Úunwrap_modelr™Údisable_adapterr;Úset_adapterrR)r7s rpÚnull_ref_contextz#_UnslothKTOTrainer.null_ref_context¥sèè€ð
Ô!ð
Ø*.Ô*?ð
ˆDÔ×)Ò)¨$¬*Ñ5Ô5×EÒEÑGÔGÐGå‘”ð		Mð		Mð
Ô$ð
>Ø”
×&Ò& tÔ'<Ñ=Ô=Ð=ØˆEˆEˆEØÔ$ð
MØ”
×&Ò& tÔ'>Ð'KÀ)ÑLÔLÐLð		Mð		Mð		Mñ		Mô		Mð		Mð		Mð		Mð		Mð		Mð		Mð		Møøøð		Mð		Mð		Mð		Mð		Mð		MsÁAB0Â0B4Â7B4Úreturncóø•—|j�rÒ|j�sÊ|jj|j|jj|jjddœ}|j t|j
fi|¤Ž¦«}g}g}t|d¬¦«D]£}| |¦«\}}|j 
|¦«}| | ¦«¦«|jrA|j 
|¦«}| | ¦«¦«Œ¤|j
 dt%j|¦« ¦« ¦«¬¦«|_
|jrW|j
 dt%j|¦« ¦« ¦«¬¦«|_
d|_t-¦« ¦«S)	z·
        Returns the training [`~torch.utils.data.DataLoader`].

        Subclass of transformers.src.transformers.trainer.get_train_dataloader to precompute `ref_log_probs`.
        F©ruÚ
collate_fnÚnum_workersÚ
pin_memoryÚshufflez!Train dataset reference log probs©ÚiterablerpÚreference_logps©ÚnameÚcolumnÚreference_KL_logpsT)r*rœrGr¤rKrÛrör‚ÚpreparerrHrFÚcompute_reference_log_probsÚgather_for_metricsreÚcpur›Ú
add_columnrEÚcatÚfloatÚnumpyr5Úget_train_dataloader)	r7Údataloader_paramsÚdata_loaderÚreference_completion_logpsrÑÚpadded_batchÚreference_completion_logpÚreference_KL_logpr9s	        €rprÚz'_UnslothKTOTrainer.get_train_dataloader³s÷ø€ðÔ(ñ!	9°Ô1Vñ!	9à"œiÔCØ"Ô0Ø#œyÔ?Ø"œiÔ=Ø ð!ð!ÐðÔ*×2Ò2µ:¸dÔ>PÐ3fÐ3fÐTeÐ3fÐ3fÑgÔgˆKØ)+Ð&Ø!#Ðå $¨kÐ@cÐ dÑ dÔ dð
Gð
G�Ø?C×?_Ò?_Ð`lÑ?mÔ?mÑ<Ð)Ð+<à,0Ô,<×,OÒ,OÐPiÑ,jÔ,jÐ)Ø*×1Ò1Ð2K×2OÒ2OÑ2QÔ2QÑRÔRÐRàÔ$ðGØ(,Ô(8×(KÒ(KÐL]Ñ(^Ô(^Ð%Ø&×-Ò-Ð.?×.CÒ.CÑ.EÔ.EÑFÔFÐFøà!%Ô!3×!>Ò!>Ø&u¬yÐ9SÑ/TÔ/T×/ZÒ/ZÑ/\Ô/\×/bÒ/bÑ/dÔ/dð"?ñ"ô"ˆDÔðÔ ð
Ø%)Ô%7×%BÒ%BØ-µe´iÐ@RÑ6SÔ6S×6YÒ6YÑ6[Ô6[×6aÒ6aÑ6cÔ6cð&Cñ&ô&�Ô"ð59ˆDÔ1å‰wŒw×+Ò+Ñ-Ô-Ð-rrcó,•—|€|j€td¦«‚|�|n|j}|j�rÇ|j�s¿|jj|j|jj|jjddœ}|j	 
t|fi|¤Ž¦«}g}g}t|d¬¦«D]£}| 
|¦«\}}|j	 |¦«}| | ¦«¦«|jrA|j	 |¦«}| | ¦«¦«Œ¤| dt'j|¦« ¦« ¦«¬¦«}|jrM| d	t'j|¦« ¦« ¦«¬¦«}|j�||_d
|_t/¦« |¬¦«S)aé
        Returns the evaluation [`~torch.utils.data.DataLoader`].

        Subclass of transformers.src.transformers.trainer.get_eval_dataloader to precompute `ref_log_probs`.

        Args:
            eval_dataset (`torch.utils.data.Dataset`, *optional*):
                If provided, will override `self.eval_dataset`. If it is a [`~datasets.Dataset`], columns not accepted
                by the `model.forward()` method are automatically removed. It must implement `__len__`.
        Nz-Trainer: evaluation requires an eval_dataset.FrÆz Eval dataset reference log probsrËrÍrÎrÑT)rI)rIrˆr*r�rGr¥rKrÛrör‚rÒrrFrÓrÔrerÕr›rÖrEr×rØrÙr5Úget_eval_dataloader)
r7rIrÛrÜrÝrÑrÞrßràr9s
         €rprâz&_UnslothKTOTrainer.get_eval_dataloaderßs+ø€ðÐ DÔ$5Ð$=ÝÐLÑMÔMÐMØ'3Ð'?�|�|ÀTÔEVˆàÔ(ñ$	8°Ô1Uñ$	8à"œiÔBØ"Ô0Ø#œyÔ?Ø"œiÔ=Ø ð!ð!ÐðÔ*×2Ò2µ:¸lÐ3`Ð3`ÐN_Ð3`Ð3`ÑaÔaˆKà)+Ð&Ø!#Ðå $¨kÐ@bÐ cÑ cÔ cð
Gð
G�Ø?C×?_Ò?_Ð`lÑ?mÔ?mÑ<Ð)Ð+<à,0Ô,<×,OÒ,OÐPiÑ,jÔ,jÐ)Ø*×1Ò1Ð2K×2OÒ2OÑ2QÔ2QÑRÔRÐRàÔ$ðGØ(,Ô(8×(KÒ(KÐL]Ñ(^Ô(^Ð%Ø&×-Ò-Ð.?×.CÒ.CÑ.EÔ.EÑFÔFÐFøà'×2Ò2Ø&u¬yÐ9SÑ/TÔ/T×/ZÒ/ZÑ/\Ô/\×/bÒ/bÑ/dÔ/dð3ñôˆLðÔ ð
Ø+×6Ò6Ø-µe´iÐ@RÑ6SÔ6S×6YÒ6YÑ6[Ô6[×6aÒ6aÑ6cÔ6cð 7ñ ô �ð
Ô Ð,Ø$0�Ô!Ø37ˆDÔ0å‰wŒw×*Ò*¸Ð*ÑEÔEÐErrrÞc	ó6—tj¦«5|j�€| ¦«5|jrŽ| |d|d| d¦«|d¬¦«j}|jrC| |d|d| d	¦«|d
¬¦«j}nW| |d|d¬
¦«j}|jr(| |d|d¬
¦«j}ddd¦«n#1swxYwYnì|jrŽ| |d|d| d¦«|d¬¦«j}|jrC| |d|d| d	¦«|d
¬¦«j}nW| |d|d¬
¦«j}|jr(| |d|d¬
¦«j}ddd¦«n#1swxYwY| 	||dd|j|j
¬¦«}|jr+| 	||d
d|j|j
¬¦«}nd}||fS)zfComputes log probabilities of the reference model for a single padded batch of a KTO specific dataset.NÚprompt_input_idsÚprompt_attention_maskÚcompletion_decoder_input_idsÚcompletion_labels)Úattention_maskÚdecoder_input_idsÚlabelsÚKL_prompt_input_idsÚKL_prompt_attention_maskÚKL_completion_decoder_input_idsÚKL_completion_labelsÚcompletion_input_idsÚcompletion_attention_mask)rèÚKL_completion_input_idsÚKL_completion_attention_maskF©Úaverage_log_probr(r$)rEÚno_gradrFrÃr(r™r‹rgr›Úget_batch_logpsr$)r7rÞÚcompletion_logitsÚ	KL_logitsÚcompletion_logpsÚKL_logpss      rprÓz._UnslothKTOTrainer.compute_reference_log_probss®€å
Œ]‰_Œ_ð6	!ð6	!ØŒ~Ñ%Ø×*Ò*Ñ,Ô,ð%ð%ØÔ.ð%Ø,0¯JªJØ(Ð);Ô<Ø+7Ð8OÔ+PØ.:×.>Ò.>Ð?]Ñ.^Ô.^Ø#/Ð0CÔ#Dð	-7ñ-ô-ô
!ð*ð Ô,ð%Ø(,¯
ª
Ø ,Ð-BÔ CØ/;Ð<VÔ/WØ2>×2BÒ2BÐCdÑ2eÔ2eØ'3Ð4JÔ'Kð	)3ñ)ô)ô
%ð&øð-1¯JªJØ(Ð)?Ô@Ø+7Ð8SÔ+Tð-7ñ-ô-ô!ð*ð
 Ô,ð%Ø(,¯
ª
Ø ,Ð-FÔ GØ/;Ð<ZÔ/[ð)3ñ)ô)ô%ð&ð/%ð%ð%ñ%ô%ð%ð%ð%ð%ð%ð%øøøð%ð%ð%ð%øð8Ô*ð!Ø(,¯ªØ$Ð%7Ô8Ø'3Ð4KÔ'LØ*6×*:Ò*:Ð;YÑ*ZÔ*ZØ+Ð,?Ô@ð	)7ñ)ô)ô
ð&ðÔ(ð!Ø$(§N¢NØ(Ð)>Ô?Ø+7Ð8RÔ+SØ.:×.>Ò.>Ð?`Ñ.aÔ.aØ#/Ð0FÔ#Gð	%3ñ%ô%ô
!ð"øð)-¯ªØ$Ð%;Ô<È\ÐZuÔMvð)7ñ)ô)äð&ðÔ(ð!Ø$(§N¢NØ(Ð)BÔCØ+7Ð8VÔ+Wð%3ñ%ô%ô!ð"ðg6	!ð6	!ð6	!ñ6	!ô6	!ð6	!ð6	!ð6	!ð6	!ð6	!ð6	!øøøð6	!ð6	!ð6	!ð6	!ðp ×/Ò/ØØÐ,Ô-Ø"Ø#Ô6Ø#Ô6ð0ñ
ô
ÐðÔð		Ø×+Ò+ØØÐ3Ô4Ø!&Ø#'Ô#:Ø#'Ô#:ð,ñôˆHˆHðˆHà Ð)Ð)s6”H.±C-D*ÄH.Ä*D.	Ä.H.Ä1D.	Ä2C0H.È.H2È5H2Fr—rgrêrôr$r(có®—|jdd…|jkrtd¦«‚|s2|dd…dd…f ¦«}|dd…dd…dd…f}n| ¦«}||k}d|||k<t||¦«}|r.||z d¦«| d¦«zS||z d¦«S)aCompute the log probabilities of the given labels under the given logits.

        Args:
            logits:
                Logits of the model (unnormalized). Shape: (batch_size, sequence_length, vocab_size)
            labels:
                Labels for which to compute the log probabilities. Label tokens with a value of label_pad_token_id are
                ignored. Shape: (batch_size, sequence_length)
            average_log_prob:
                If True, return the average log probability per (non-masked) token. Otherwise, return the sum of the
                log probabilities of the (non-masked) tokens.

        Returns:
            A tensor of shape (batch_size,) containing the average/sum log probabilities of the given labels under the
            given logits.
        NrUzKLogits (batch and sequence length dim) and labels must have the same shape.rZr)r]rˆÚclonerCr¥)rgrêrôr$r(Ú	loss_maskros       rpröz"_UnslothKTOTrainer.get_batch_logpsesø€ð0Œ<˜˜˜Ô ¤Ò,Ð,ÝÐjÑkÔkÐkà!ð	$Ø˜A˜A˜A˜q˜r˜r˜E”]×(Ò(Ñ*Ô*ˆFØ˜A˜A˜A˜s ˜s A A A˜IÔ&ˆFˆFð—\’\‘^”^ˆFàÐ0Ò0ˆ	ð01ˆˆvÐ+Ò+Ñ,å/°¸Ñ?Ô?ˆàð	9Ø# iÑ/×4Ò4°RÑ8Ô8¸9¿=º=ÈÑ;LÔ;LÑLÐLà# iÑ/×4Ò4°RÑ8Ô8Ð8rrÚbatchcóª‡—| |‰¦«}|jr‰d‰ d¦«dœni}|jrd|d<|‰dfd‰di|¤Ž}|j}| |‰dd	|j|j¬
¦«}|jdt‰d¦«krtd
¦«‚ˆfd„t|jd¦«D¦«}ˆfd„t|jd¦«D¦«}	||df}
||	df}||df}||	df}
|jr
|
|||
||jfS|
|||
|fS)Nrçræ©rêréTrlrïrèrðFrórr€z‡There is a mismatch between the number of examples in this batch and the number of examples for which an output sequence was predicted.có4•—g|]}‰d|du¯|‘ŒS©r€Tr0©rzÚirþs  €rpr|z._UnslothKTOTrainer.forward.<locals>.<listcomp>¸s.ø€Ð_Ð_Ð_˜AÀUÈ7Ä^ÐTUÔEVÐZ^ÐE^ÐE^�aÐE^ÐE^ÐE^rrcó4•—g|]}‰d|du¯|‘ŒS©r€Fr0rs  €rpr|z._UnslothKTOTrainer.forward.<locals>.<listcomp>¹s.ø€ÐbÐbÐb˜aÀuÈWÄ~ÐVWÔGXÐ\aÐGaÐGa˜ÐGaÐGaÐGarr.)Ú_compute_kl_logpsr(r‹rŸrgrör$r]r¦rˆÚrangeÚaux_loss)r7r™rþrúÚmodel_kwargsÚoutputsr÷rùÚ
chosen_idxÚrejected_idxÚchosen_logpsÚrejected_logpsÚ
chosen_logitsÚrejected_logitss  `           rpÚforwardz_UnslothKTOTrainer.forward“sèø€ð×)Ò)¨%°Ñ7Ô7ˆðÔ&ð	
ØÐ 3Ô4Ø%*§Y¢YÐ/MÑ%NÔ%Nð
ð
ð
ð
ð
	ðÔ ð	8Ø37ˆLÐ/Ñ0à�%ØÐ(Ô)ð
ð
à Ð!<Ô=ð
ðð
ð
ˆð
$œNÐà×/Ò/ØØÐ%Ô&Ø"Ø#Ô6Ø#Ô6ð0ñ
ô
ÐðÔ! !Ô$¨E°'¬NÑ(;Ô(;Ò;Ð;ÝðGñôð
ð
`Ð_Ð_Ð_¥Ð'7Ô'=¸aÔ'@Ñ!AÔ!AÐ_Ñ_Ô_ˆ
ØbÐbÐbÐb¥5Ð)9Ô)?ÀÔ)BÑ#CÔ#CÐbÑbÔbˆà'¨
°C¨Ô8ˆØ)¨,¸Ð*;Ô<ˆà)¨*°c¨/Ô:ˆ
Ø+¨L¸#Ð,=Ô>ˆàÔ ð	\Ø  .°-ÀÐRZÐ\cÔ\lÐmÐmà  .°-ÀÐRZÐ[Ð[rrÚpolicy_chosen_logpsÚpolicy_rejected_logpsÚpolicy_KL_logpsÚreference_chosen_logpsÚreference_rejected_logpsrÑcóˆ—|jrj||z
 ¦« ¦«}|j |¦« ¦« d¬¦«}n,t
jd¦« |j	¦«}|j
ddks|j
ddkrz||z
}|jdkr#dtj
|j||z
z¦«z
}	n*|jdkrdtj
|j|z¦«z
}	|j| ¦«z}
nbt
jg¦« |jj	¦«}	t
jg¦« |jj	¦«}
|j
ddks|j
ddkrw||z
}|jdkr#dtj
|j||z
z¦«z
}n'|jdkrtj
|j|z¦«}|j| ¦«z}
nbt
jg¦« |jj	¦«}t
jg¦« |jj	¦«}
t
j|j|	z|j|zfd¦«}||
|
|fS)avCompute the KTO loss for a batch of policy and reference model log probabilities.

        Args:
            policy_chosen_logps:
                Log probabilities of the policy model for the chosen responses. Shape: (num(chosen) in batch_size,)
            policy_rejected_logps:
                Log probabilities of the policy model for the rejected responses. Shape: (num(rejected) in batch_size,)
            policy_KL_logps: Log probabilities of the policy model for the KL responses. Shape: (batch_size,)
            reference_chosen_logps:
                Log probabilities of the reference model for the chosen responses. Shape: (num(chosen) in batch_size,)
            reference_rejected_logps:
                Log probabilities of the reference model for the rejected responses. Shape: (num(rejected) in
                batch_size,)
            reference_KL_logps: Log probabilities of the reference model for the KL responses. Shape: (batch_size,)

        Returns:
            A tuple of four tensors: (losses, chosen_rewards, rejected_rewards, KL). The losses tensor contains the KTO
            loss for each example in the batch. The chosen_rewards and rejected_rewards tensors contain the rewards for
            the chosen and rejected responses, respectively. The KL tensor contains the detached KL divergence estimate
            between the policy and reference models.
        r©r4rZr–rg)r›ÚmeanÚdetachr‚rÔÚclamprEÚzerosr_Údevicer]r!rÚsigmoidr rr×r"r#)r7rrrrrrÑÚklÚchosen_logratiosÚ
chosen_lossesÚchosen_rewardsÚrejected_logratiosÚrejected_lossesÚrejected_rewardsÚlossess               rpÚkto_lossz_UnslothKTOTrainer.kto_lossÆs©€ð<Ôð	?Ø!Ð$6Ñ6×<Ò<Ñ>Ô>×EÒEÑGÔGˆBØÔ!×4Ò4°RÑ8Ô8×=Ò=Ñ?Ô?×EÒEÈ!ÐEÑLÔLˆBˆBå”˜Q‘”×"Ò"Ð#6Ô#=Ñ>Ô>ˆBðÔ$ QÔ'¨1Ò,Ð,Ð0FÔ0LÈQÔ0OÐSTÒ0TÐ0TØ2Ð5KÑKÐàŒ~ Ò&Ð&à !¥A¤I¨d¬iÐ;KÈbÑ;PÑ.QÑ$RÔ$RÑ R�
�
Ø”Ð#6Ò6Ð6ð!"¥A¤I¨d¬iÐ:JÑ.JÑ$KÔ$KÑ K�
à!œYÐ)9×)@Ò)@Ñ)BÔ)BÑBˆNˆNõ"œL¨Ñ,Ô,×/Ò/°Ô0@Ô0GÑHÔHˆMÝ"œ\¨"Ñ-Ô-×0Ò0°Ô1AÔ1HÑIÔIˆNð!Ô& qÔ)¨QÒ.Ð.Ð2JÔ2PÐQRÔ2SÐWXÒ2XÐ2XØ!6Ð9QÑ!QÐàŒ~ Ò&Ð&Ø"#¥a¤i°´	¸RÐBTÑ=TÑ0UÑ&VÔ&VÑ"V��Ø”Ð#6Ò6Ð6Ý"#¤)¨D¬IÐ8JÑ,JÑ"KÔ"K�à#œyÐ+=×+DÒ+DÑ+FÔ+FÑFÐÐõ$œl¨2Ñ.Ô.×1Ò1°$Ô2BÔ2IÑJÔJˆOÝ$œ|¨BÑ/Ô/×2Ò2°4Ô3CÔ3JÑKÔKÐå”Ø
Ô
" ]Ñ
2°DÔ4KÈoÑ4]Ð^Ø
ñ
ô
ˆð
�~Ð'7¸Ð;Ð;rrcóf—d}|jr§|jr-|d|d|d| d¦«dœ}n|d|dd	œ}tj¦«5|di|¤Žj}ddd¦«n#1swxYwY| ||dd
|j|j¬¦«}|S)
z/Compute KL log probabilities for a given batch.Nrërìrîrí)Ú	input_idsrèrêrérñrò)r*rèFrór0)r›r(r‹rErõrgrör$)r7r™rþrúÚKL_model_kwargsrøs      rprz$_UnslothKTOTrainer._compute_kl_logpss/€àˆØÔð	ØÔ&ð
à!&Ð'<Ô!=Ø&+Ð,FÔ&GØ#Ð$:Ô;Ø).¯ªÐ3TÑ)UÔ)Uð	#ð#��ð"'Ð'@Ô!AØ&+Ð,JÔ&Kð#ð#�õ
”‘”ð
<ð
<Ø!˜EÐ4Ð4 OÐ4Ð4Ô;�	ð
<ð
<ð
<ñ
<ô
<ð
<ð
<ð
<ð
<ð
<ð
<øøøð
<ð
<ð
<ð
<ð×+Ò+ØØÐ,Ô-Ø!&Ø#'Ô#:Ø#'Ô#:ð,ñôˆHðˆsÁ"A<Á<BÂBc
óp—| ||¦«}| |j|¦«}|jrj||z
 ¦« ¦«}|j |¦« ¦« d¬¦«}n1tj	d¦«