unsloth_compiled_cache/__pycache__/UnslothCPOTrainer.cpython-311.pyc

§
3$�hÚ/ãóh—dZddlmZddlZddlmZddlmZddlmZm	Z	m
Z
mZmZm
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZmZmZm Z m!Z!m"Z"m#Z#mZm$Z$m%Z%m&Z&m'Z'm(Z(m)Z)m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0m1Z1m2Z2mZm3Z3m4Z4m5Z5m6Z6m7Z7m8Z8m9Z9m:Z:m;Z;m<Z<mZm=Z=mZm
Z
mZmZm"Z"m-Z-m5Z5mZddl5Z5ddlTddl>m?Z?m@Z@dd	lAmBZBddlZddlCZ3dd
lDm4Z4ddlmZddlEmFZFmGZHdd
dd
d
dœZIejJddeI¬¦«d„¦«ZKe?Gd„de¦«¦«ZL	Gd„de"¦«ZMGd„deM¦«ZNdS)z8
2025.8.4
2025.8.5
4.55.1
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)<rÚAutoModelForCausalLMÚBaseImageProcessorÚ	CPOConfigÚ
CPOTrainerrÚDPODataCollatorWithPaddingÚDataCollatorÚ
DataLoaderÚDatasetÚEvalLoopOutputÚFÚFeatureExtractionMixinÚLiteralrÚPartialStateÚPathÚ	PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚTrainerÚTrainerCallbackr	Úadd_bos_token_if_neededÚadd_eos_token_if_neededÚautocastÚdefaultdictÚdisable_dropout_in_modelÚgenerate_model_cardÚget_comet_experiment_urlÚinspectÚis_comet_availableÚis_peft_availableÚis_torch_fx_proxyÚis_wandb_availableÚlog_table_to_comet_experimentÚmaybe_apply_chat_templateÚmaybe_extract_promptÚnnÚnpÚnullcontextÚosÚ
pad_to_lengthÚpdÚpeft_module_casting_to_bf16Úprepare_model_for_kbit_trainingÚrandomÚselective_log_softmaxÚtextwrapÚtorchÚwarningsrrrrrr*r3r;)Ú*)Ú	dataclassÚfield)ÚVersion)r2)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚmax_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ	fullgraphÚoptionscó’—tj| d|jd¦«dd¬¦«}tj| d¦«dd¬¦«}g}t	||¦«D]‘\}}| tj¦«}tj|d| d¦«¬¦« 	d¦«}tj
|d¬¦«}||z
}	| |	¦«Œ’	tj|¦«}| |jd|jdf¦«}|S)Néÿÿÿÿér)ÚchunksÚdim)rMÚindex©rMé)
r;ÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ	unsqueezeÚsqueezeÚ	logsumexpÚappendÚconcat)
ÚlogitsrNÚchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚchunk_logitsÚchunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logpss
          ú]/workspace/Fine-tuning/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothCPOTrainer.pyÚchunked_selective_log_softmaxrg"s5€õ”[ §¢°°F´LÀÔ4DÑ!EÔ!EÐPQÐYZÐ[Ñ[Ô[€NÝ”[ §¢¨rÑ!2Ô!2¸QÀaÐHÑHÔH€MØÐå%(¨¸Ñ%GÔ%Gð4ð4Ñ!ˆ�kØ#—’¥u¤}Ñ5Ô5ˆÝœ, |¸2À{×G\ÒG\Ð]_ÑG`ÔG`ÐaÑaÔa×iÒiÐjlÑmÔmˆÝ œ?¨<¸rÐBÑBÔBÐØ)Ð,<Ñ<ˆØ×"Ò" ?Ñ3Ô3Ð3Ð3ØÝœ,Ð':Ñ;Ô;ÐØ-×5Ò5°v´|ÀA´ÈÌÐUVÌÐ6XÑYÔYÐØÐócó²‡—eZdZUdZedddi¬¦«Zeeed<edddi¬¦«Z	ee
ed	<																																																																																																																																																		d0ˆfd/„	ZˆxZS)1ÚUnslothCPOConfiguy
    
    Configuration class for the [`CPOTrainer`].

    This class includes only the parameters that are specific to CPO training. For a full list of training arguments,
    please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may
    differ from those in [`~transformers.TrainingArguments`].

    Using [`~transformers.HfArgumentParser`] we can turn this class into
    [argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
    command line.

    Parameters:
        max_length (`int` or `None`, *optional*, defaults to `1024`):
            Maximum length of the sequences (prompt + completion) in the batch. This argument is required if you want
            to use the default data collator.
        max_prompt_length (`int` or `None`, *optional*, defaults to `512`):
            Maximum length of the prompt. This argument is required if you want to use the default data collator.
        max_completion_length (`int` or `None`, *optional*, defaults to `None`):
            Maximum length of the completion. This argument is required if you want to use the default data collator
            and your model is an encoder-decoder.
        beta (`float`, *optional*, defaults to `0.1`):
            Parameter controlling the deviation from the reference model. Higher Î² means less deviation from the
            reference model. For the IPO loss (`loss_type="ipo"`), Î² is the regularization parameter denoted by Ï„ in
            the [paper](https://huggingface.co/papers/2310.12036).
        label_smoothing (`float`, *optional*, defaults to `0.0`):
            Label smoothing factor. This argument is required if you want to use the default data collator.
        loss_type (`str`, *optional*, defaults to `"sigmoid"`):
            Type of loss to use. Possible values are:

                - `"sigmoid"`: sigmoid loss from the original [DPO](https://huggingface.co/papers/2305.18290) paper.
                - `"hinge"`: hinge loss on the normalized likelihood from the
                  [SLiC](https://huggingface.co/papers/2305.10425) paper.
                - `"ipo"`: IPO loss from the [IPO](https://huggingface.co/papers/2310.12036) paper.
                - `"simpo"`: SimPO loss from the [SimPO](https://huggingface.co/papers/2405.14734) paper.

        disable_dropout (`bool`, *optional*, defaults to `True`):
            Whether to disable dropout in the model.
        cpo_alpha (`float`, *optional*, defaults to `1.0`):
            Weight of the BC regularizer in CPO training.
        simpo_gamma (`float`, *optional*, defaults to `0.5`):
            Target reward margin for the SimPO loss, used only when the `loss_type="simpo"`.
        label_pad_token_id (`int`, *optional*, defaults to `-100`):
            Label pad token id. This argument is required if you want to use the default data collator.
        padding_value (`int` or `None`, *optional*, defaults to `None`):
            Padding value to use. If `None`, the padding value of the tokenizer is used.
        truncation_mode (`str`,*optional*,  defaults to `"keep_end"`):
            Truncation mode to use when the prompt is too long. Possible values are `"keep_end"` or `"keep_start"`.
            This argument is required if you want to use the default data collator.
        generate_during_eval (`bool`, *optional*, defaults to `False`):
            If `True`, generates and logs completions from the model to W&B or Comet during evaluation.
        is_encoder_decoder (`bool` or `None`, *optional*, defaults to `None`):
            When using the `model_init` argument (callable) to instantiate the model instead of the `model` argument,
            you need to specify if the model returned by the callable is an encoder-decoder model.
        model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
            Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the model from a
            string.
        dataset_num_proc (`int` or `None`, *optional*, defaults to `None`):
            Number of processes to use for processing the dataset.
    
    NÚhelpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsrJz8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksFÚnorKéréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?ç@Úlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsrPéôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastéééÚsigmoidçà?éœÿÿÿÚkeep_endc“ó„•—|dkrtd|›d�¦«‚|dkrtd|›d�¦«‚|€|#dkr
|$dkrd}d	}#|�€!d
dlm}”t	|”¦«dzd¦«}�t¦«jd�id
|“d|“d|“d|“d|“d|“d|“d|“d|	“d|
“d|“d|“d|
“d|“d|“d|“d|“d|“d|“d |“d!|“d"|“d#|“d$|“d%|“d&|“d'|“d(|“d)|“d*|“d+|“d,| “d-|!“d.|"“d/|#“d0|$“d1|%“d2|&“d3|'“d4|(“d5|)“d6|*“d7|+“d8|,“d9|-“d:|.“d;|/“d<|0“d=|1“d>|2“d?|3“d@|4“dA|5“dB|6“dC|7“dD|8“dE|9“dF|:“dG|;“dH|<“dI|=“dJ|>“dK|?“dL|@“dM|A“dN|B“dO|C“dP|D“dQ|E“dR|F“dS|G“dT|H“dU|I“dV|J“dW|K“dX|L“dY|M“dZ|N“d[|O“d\|P“d]|Q“d^|R“d_|S“d`|T“da|U“db|V“dc|W“dd|X“de|Y“df|Z“dg|[“dh|\“di|]“dj|^“dk|_“dl|`“dm|a“dn|b“do|c“dp|d“dq|e“dr|f“ds|g“dt|h“du|i“dv|j“dw|k“dx|l“dy|m“dz|n“d{|o“d||p“d}|q“d~|r“d|s“d€|t“d�|u“d‚|v“dƒ|w“d„|x“d…|y“d†|z“d‡|{“dˆ||“d‰|}“dŠ|~“d‹|“dŒ|€“d�|�“dŽ|‚“d�|ƒ“d�|„“d‘|…“d’|†“d“|‡“d”|ˆ“d•|‰“d–|Š“d—|‹“d˜|Œ“d™|�“dš|Ž“d›|�“dœ|�“|“¤Ž|‘|_|’|_dS)žNgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!rPza` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!r~rÚunsloth_training_checkpointsrpr)Ú	cpu_countrqÚ
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚweight_decayÚ
adam_beta1Ú
adam_beta2Úadam_epsilonÚ
max_grad_normÚnum_train_epochsÚ	max_stepsÚlr_scheduler_typeÚwarmup_ratioÚwarmup_stepsÚ	log_levelÚlog_level_replicaÚlog_on_each_nodeÚlogging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ	data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚdisable_tqdmÚremove_unused_columnsÚlabel_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚfsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ	deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ	adafactorÚgroup_by_lengthÚlength_column_nameÚ	report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚpush_to_hubÚresume_from_checkpointÚhub_model_idÚhub_strategyÚ	hub_tokenÚhub_private_repoÚhub_always_pushÚhub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚfp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚtorchdynamoÚ	ray_scopeÚddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚ
max_lengthÚmax_prompt_lengthÚmax_completion_lengthÚbetaÚlabel_smoothingÚ	loss_typeÚdisable_dropoutÚ	cpo_alphaÚsimpo_gammaÚlabel_pad_token_idÚ
padding_valueÚtruncation_modeÚgenerate_during_evalÚis_encoder_decoderÚmodel_init_kwargsÚdataset_num_proc©)	ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingr’ÚminÚsuperÚ__init__rnro)–Úselfr“r”r•r–r—r˜r™ršr›rœr�ržrŸr r¡r¢r£r¤r¥r¦r§r¨r©rªr«r¬rr®r¯r°r±r²r³r´rµr¶r·r¸r¹rºr»r¼r½r¾r¿rÀrÁrÂrÃrÄrÅrÆrÇrÈrÉrÊrËrÌrÍrÎrÏrÐrÑrÒrÓrÔrÕrÖr×rØrÙrÚrÛrÜrÝrÞrßràrárârãrärårærçrèrérêrërìrírîrïrðrñròrórôrõrör÷rørùrúrûrürýrþrÿrrrrrrrrrr	r
rrr
rrrrrrrrrrrrrrrrrrr r!r"rnroÚkwargsr’Ú	__class__s–                                                                                                                                                     €rfr)zUnslothCPOConfig.__init__zs›	ø€ðl˜4ÒÐÕ'9ð;VÐ]jð;Vð;Vð;Vñ(Wô(Wð"WØ˜1ÒÐ¥Mð3FÐUbð3Fð3Fð3Fñ%Gô%GðGØÐ -°7Ò":Ð":¸zÈSÒ?PÐ?PØ7ˆJØ ˆMØÐ#Ø1Ð1Ð1Ð1Ð1Ð1Ý" 9 9¡;¤;¨q¡=°!Ñ4Ô4Ðà�‰ŒÔðP	:ðP	:ðP	:Ø#˜ðP	:à#7Ð#7ðP	:ð �xðP	:ð�gð	P	:ð
$˜ðP	:ð*˜Mð
P	:ð$8Ð#7ðP	:ð+FÐ*EðP	:ð*DÐ)CðP	:ð(@Ð'?ðP	:ð'>Ð&=ðP	:ð+FÐ*EðP	:ð'>Ð&=ðP	:ð$˜ðP	:ð'>Ð&=ðP	:ð *˜Mð!P	:ð"(˜<ð#P	:ð$$˜ð%P	:ð&$˜ð'P	:ð((˜<ð)P	:ð**˜Mð+P	:ð, 0Ð/ð-P	:ð."˜	ð/P	:ð0!2Ð 1ð1P	:ð2(˜<ð3P	:ð4(˜<ð5P	:ð6"˜	ð7P	:ð8!2Ð 1ð9P	:ð: 0Ð/ð;P	:ð<&˜+ð=P	:ð> 0Ð/ð?P	:ð@"4Ð!3ðAP	:ðB*˜MðCP	:ðD&<Ð%;ðEP	:ðF*˜MðGP	:ðH$˜ðIP	:ðJ 0Ð/ðKP	:ðL 0Ð/ðMP	:ðN!2Ð 1ðOP	:ðP.˜oðQP	:ðR7^Ð6]ðSP	:ðT�gðUP	:ðV�gðWP	:ðX,˜^ðYP	:ðZ�4ð[P	:ð\"˜	ð]P	:ð^*˜Mð_P	:ð` �xðaP	:ðb�4ðcP	:ðd�4ðeP	:ðf,˜^ðgP	:ðh&<Ð%;ðiP	:ðj,˜^ðkP	:ðl,˜^ðmP	:ðn�4ðoP	:ðp$˜ðqP	:ðr&˜+ðsP	:ðt*˜MðuP	:ðv!2Ð 1ðwP	:ðx�EðyP	:ðz$8Ð#7ð{P	:ð|$˜ð}P	:ð~&<Ð%;ðP	:ð@*DÐ)CðAP	:ðB$˜ðCP	:ðD �xðEP	:ðF(˜<ðGP	:ðH%:Ð$9ðIP	:ðJ&˜+ðKP	:ðL&<Ð%;ðMP	:ðN%:Ð$9ðOP	:ðP!2Ð 1ðQP	:ðR 0Ð/ðSP	:ðT�4ðUP	:ðV#6Ð"5ðWP	:ðX&˜+ðYP	:ðZ2TÐ1Sð[P	:ð\"4Ð!3ð]P	:ð^"˜	ð_P	:ð`&<Ð%;ðaP	:ðb�EðcP	:ðd$˜ðeP	:ðf"˜	ðgP	:ðh.˜oðiP	:ðj"4Ð!3ðkP	:ðl"˜	ðmP	:ðn*DÐ)CðoP	:ðp!2Ð 1ðqP	:ðr%:Ð$9ðsP	:ðt%:Ð$9ðuP	:ðv-JÐ,IðwP	:ðx#6Ð"5ðyP	:ðz*DÐ)Cð{P	:ð|&˜+ð}P	:ð~&<Ð%;ðP	:ð@(˜<ðAP	:ðB(˜<ðCP	:ðD"˜	ðEP	:ðF 0Ð/ðGP	:ðH.˜oðIP	:ðJ(˜<ðKP	:ðL&<Ð%;ðMP	:ðN-JÐ,IðOP	:ðP*DÐ)CðQP	:ðR&<Ð%;ðSP	:ðT(˜<ðUP	:ðV$8Ð#7ðWP	:ðX(@Ð'?ðYP	:ðZ!2Ð 1ð[P	:ð\*˜Mð]P	:ð^$8Ð#7ð_P	:ð` 0Ð/ðaP	:ðb&˜+ðcP	:ðd"˜	ðeP	:ðf&˜+ðgP	:ðh*˜MðiP	:ðj%:Ð$9ðkP	:ðl"4Ð!3ðmP	:ðn)BÐ(AðoP	:ðp-JÐ,IðqP	:ðr#6Ð"5ðsP	:ðt$8Ð#7ðuP	:ðv"4Ð!3ðwP	:ðx*˜MðyP	:ðz 0Ð/ð{P	:ð|#6Ð"5ð}P	:ð~&<Ð%;ðP	:ð@-JÐ,IðAP	:ðB$˜ðCP	:ðD!2Ð 1ðEP	:ðF%:Ð$9ðGP	:ðH�4ðIP	:ðJ.˜oðKP	:ðL"˜	ðMP	:ðN.˜oðOP	:ðP"˜	ðQP	:ðR&˜+ðSP	:ðT"4Ð!3ðUP	:ðV*˜MðWP	:ðX.˜oðYP	:ðZ$8Ð#7ð[P	:ð\"4Ð!3ð]P	:ð^!2Ð 1ð_P	:ð` 0Ð/°&ðaP	:ðP	:ðP	:ðb%9ˆÔ!Ø"4ˆÔÐÐrh)’NNFFFrpFrKrKNNrqrqrrrrsrtrurvrwrxryrJrzr{rr|r}TNr~FrPFr~rNTFFFFFFr€r€FFFFr�r‚FFNrJNNFrƒFNrNrJNNTNFNNFrƒrNNNNr„r…NFFr†NNNNTFTFFNNr‡NNFNFNFTr‚NNNrƒTFNrˆr‰FNNFFNNFFFNFTrŠr‹Nr{r„rŒTrxr�rŽNr�FNNNNrJ)
Ú__name__Ú
__module__Ú__qualname__Ú__doc__r?rnrrÚ__annotations__roÚintr)Ú
__classcell__©r,s@rfrjrj3s�ø€€€€€€ð<ð<ðz+0¨%ØØÐ1Ð2ð+ñ+ô+Ð˜( 3œ-ððñð*/¨ØØÐVÐWð*ñ*ô*Ð˜ #œððñðØ#ØØØØØ$Ø&'Ø%&Ø#'Ø"&Ø&'Ø"#ØØ"%ØØØØØØØØØ$ØØØØ%ØØØ"Ø"ØØ!&ØØØØØ!ØØ27ØØØØØØØØØØØ!'ØØØØØØØ!ØØ$ØØ!"Ø%)ØØØØ $ØØ!&Ø $Ø Ø ØØØØ-1Ø!ØØ!$ØØØØØ%ØØ%)Ø Ø $Ø $Ø(-Ø"Ø%*ØØ!%ØØ#ØØØØØ!&Ø(,Ø%*Ø!%ØØ#Ø#'Ø ØØ#Ø ØØØØØ $Ø!Ø$)Ø(-Ø"Ø#Ø"ØØ Ø"Ø!&Ø(,ØØØ $ØØØØØØØ!ØØ$Ø$Ø!Ø ØØ#Øðgq5ðq5ðq5ðq5ðq5ðq5ðq5ðq5ðq5ðq5rhrjcó¢‡—eZdZdZddgZ												dAdeeeej	e
fdeedeed	ee
d
eee
ee
e
ffdeeeeeefdeegefd
eeedeejjejjjfdeeejejgejfdeedeeegeffˆfd„
Zd„ZdBdeeeej	fdefd„Z e!				dCdee
eeej"ffde#de$de$deej%dee
ej"ffd„¦«Z&dej'd ej'deej'ej'ej'ffd!„Z(e!			dDd"ej'd#ej"d$e#de$de#dej'fd%„¦«Z)dej	dee
eeej"ffdeej'ej'ej'ej'ffd&„Z*	dEdee
eeej"ffd(e+d)fd*„Z,		dFdeeej	fd+ee
eeje-ffdeejeejee
ejffffd,„Z.dee
ej"fde
fd-„Z/	dBdeeej	fd+ee
eeje-ffd.e#d/eee
fd0„Z0dEd1ee
e1fd(e+d)ddfd2„Z2			dGd4e3d5e
d.ee#d/eee
d6e
defˆfd7„
Z4dBd8ee
e1fd9ee1ddfˆfd:„
Z5d;„Z6ˆfd<„Z7			dHd=ee
d>ee
d?ee
ee
dffd@„Z8ˆxZ9S)IÚ_UnslothCPOTrainerrƒÚtrlÚcpoN©NNÚmodelÚargsÚ
data_collatorÚ
train_datasetÚeval_datasetÚprocessing_classÚ
model_initÚ	callbacksÚ
optimizersÚpreprocess_logits_for_metricsÚpeft_configÚcompute_metricsc

ó¬•—|j€i}
nªt|t¦«std¦«‚|j}
|
 d¦«}|�ht|t¦«r|dkrtt|¦«}|dkr-t|tj¦«std|›d�¦«‚||
d<t|t¦«rtj	|fi|
¤Ž}d|_
t¦«s|�td¦«‚t¦«�r5|��2t|t¦«r| 
¦«}t|dd¦«st|d	d¦«r`t|d
¦«o,d
tt!jt$¦«j¦«v}d|ji}|r
|j|d
<t%|fi|¤Ž}nV|jrOt|d¦«r| ¦«n*d
„}| ¦« |¦«|}|jr't|d	d¦«rt5|¦«d|_
nV|jrOt|d¦«r| ¦«n*d„}| ¦« |¦«|jr+t9¦«st;¦«std¦«‚|�|jj|_n"|j€td¦«‚|j|_|jr"|jj |_ |jj!|_!|€td¦«‚|j"€tGj$dtJ¦«d}n|j"}|j&€tGj$dtJ¦«d}n|j&}||kstd|›d|›d�¦«‚|j'€$|jrtGj$dtJ¦«d}n|j'}|€QtQ|j!|j)|j¬¦«}|j*r!d|_*tGj$dtJ¦«d|_+nd|_+|j,rt[|¦«||_"|j|_|j)|_)|j.�|j.n|j!|_.||_&|j/|_/||_'||_0|j1dvr.|j2dkr#tGj$d|j1›d �tJ¦«|j1d!krtd"¦«‚|j3|_3|j2|_2|j1|_1|j4|_4t|jd#d¦«|_5t|jd$d%¦«|_6|j5r%|j6d%krtGj$d&tJ¦«|j1d'kr|j7|_7tqd(„¦«|_9d|j:d)<tw¦« <¦«5| =t||j?¬*¦«}| =t€d+|i|j?¬,¦«}|�E| =t||j?¬*¦«}| =t€d+|i|j?¬,¦«}| =|jA|j?¬*¦«}|�!| =|jA|j?¬*¦«}ddd¦«n#1swxYwYt…¦« C||||||||||	|
¬-¦«d|_Dt|jEd.¦«r|jE F|jG¦«t|d/¦«st‘d0¦«‚dS)1NzRYou passed model_kwargs to the CPOTrainer. But your model is already instantiated.Útorch_dtyper‚znInvalid `torch_dtype` passed to the CPOConfig. Expected a string with either `torch.dtype` or 'auto', but got ú.FzvPEFT is not installed and you passed a `peft_config` in the trainer's kwargs, please install it to use the PEFT modelsÚis_loaded_in_8bitÚis_loaded_in_4bitrùÚuse_gradient_checkpointingÚenable_input_require_gradscó0—| d¦«dS©NT©Úrequires_grad_©ÚmoduleÚinputÚoutputs   rfÚmake_inputs_require_gradz=_UnslothCPOTrainer.__init__.<locals>.make_inputs_require_gradøs€Ø×-Ò-¨dÑ3Ô3Ð3Ð3Ð3rhTcó0—| d¦«dSrNrOrQs   rfrUz=_UnslothCPOTrainer.__init__.<locals>.make_inputs_require_grad
s€Ø×)Ò)¨$Ñ/Ô/Ð/Ð/Ð/rhz‚`generate_during_eval=True` requires Weights and Biases or Comet to be installed. Please install `wandb` or `comet-ml` to resolve.zMWhen no model is provided, you need to pass the parameter is_encoder_decoder.z=processing_class must be specified to tokenize a CPO dataset.z�`max_length` is not set in the CPOConfig's init it will default to `512` by default, but you should do it yourself in the future.r‹zˆ`max_prompt_length` is not set in the CPOConfig's init it will default to `128` by default, but you should do it yourself in the future.é€zmax_prompt_length (z+) should be strictly less than max_length (z).z¼When using an encoder decoder architecture, you should set `max_completion_length` in the CPOConfig's init it will default to `128` by default, but you should do it yourself in the future.)Úpad_token_idrr z²When using DPODataCollatorWithPadding, you should set `remove_unused_columns=False` in your TrainingArguments we have set it for you, but you should do it yourself in the future.)ÚhingeÚiporzYou are using the z™ loss type that does not support label smoothing. The `label_smoothing` parameter will be ignored. Set `label_smoothing` to `0.0` to remove this warning.Úkto_pairzKSupport for kto_pair has been removed in CPOTrainer. Please use KTOTrainer.Úoutput_router_logitsÚrouter_aux_loss_coefr„a-You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to `0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary loss.Úsimpocó*—tt¦«S©N)r$Úlistr#rhrfú<lambda>z-_UnslothCPOTrainer.__init__.<locals>.<lambda>ƒs€µ;½tÑ3DÔ3D€rhÚestimate_tokens)Únum_procÚ	tokenizer)Ú	fn_kwargsrd)r:r;r<r=r>r?r@rErArBrCÚadd_model_tagsÚacceleratorzXYour `Trainer` does not have an `accelerator` object. Consider upgrading `transformers`.)Ir!Ú
isinstanceÚstrÚ
ValueErrorÚgetÚgetattrr;Údtyper
Úfrom_pretrainedÚ_peft_has_been_casted_to_bf16r*rÚmerge_and_unloadÚhasattrrar(Ú	signaturer7Ú
parametersrørùrLÚget_input_embeddingsÚregister_forward_hookrÃr6rr,r)Úconfigr Údecoder_start_token_idrXrr<ÚwarnÚUserWarningrrrrrÖÚuse_dpo_data_collatorrr%rrr?rrrrÚaux_loss_enabledÚ
aux_loss_coefrr$Ú_stored_metricsÚwarnings_issuedrÚmain_process_firstÚmapr/r"r.Útokenize_rowr(r)Úmodel_accepts_loss_kwargsr:rgÚ
_tag_namesÚAttributeError)r*r:r;r<r=r>r?r@rArBrCrDrEr!rGÚ_support_gc_kwargsÚprepare_model_kwargsrUrrrr,s                     €rfr)z_UnslothCPOTrainer.__init__³s÷ø€ð"Ô!Ð)Ø "ÐÐÝ˜E¥3Ñ'Ô'ð
	?ÝÐqÑrÔrÐrà $Ô 6ÐØ+×/Ò/°
Ñ>Ô>ˆKØÐ&å˜k3Ñ/Ô/ð>°KÀ6Ò4IÐ4IÝ")%°Ñ"=Ô"=�KØ &Ò(Ð(µ¸KÍÌÑ1UÔ1UÐ(Ý$ðXðJUðXðXðXñôðð4?Ð! -Ñ0å�e�SÑ!Ô!ð	UÝ(Ô8¸ÐTÐTÐBSÐTÐTˆEð.3ˆÔ*å Ñ"Ô"ð4	] {Ð'>ÝðIñôð
õÑ
 Ô
 ñ0	] [Ñ%<å˜%¥Ñ+Ô+ð
1Ø×.Ò.Ñ0Ô0�å�uÐ1°5Ñ9Ô9ð
a½WÀUÐL_ÐafÑ=gÔ=gð
aÝ%,ØÐ9ñ&ô&ð&à5½ÝÔ%Õ&EÑFÔFÔQñ:ô:ðð#ð)EÀdÔFaÐ'bÐ$à%ðoØLPÔLnÐ(Ð)HÑIå7¸ÐVÐVÐAUÐVÐV��ØÔ,ð	
aå˜5Ð">Ñ?Ô?ðaØ×4Ò4Ñ6Ô6Ð6Ð6ð4ð4ð4ð×.Ò.Ñ0Ô0×FÒFÐG_Ñ`Ô`Ð`ðˆEØŒyð
:�W UÐ,?ÀÑGÔGð
:Ý+¨EÑ2Ô2Ð2à59�Ô2øð
Ô
(ð		]å�uÐ:Ñ;Ô;ð
]Ø×0Ò0Ñ2Ô2Ð2Ð2ð0ð0ð0ð×*Ò*Ñ,Ô,×BÒBÐC[Ñ\Ô\Ð\àÔ$ð	Õ.@Ñ.BÔ.Bð	ÕFXÑFZÔFZð	ÝðDñôð
ð
ÐØ&+¤lÔ&EˆDÔ#Ð#Ø
Ô
$Ð
,ÝÐlÑmÔmÐmà&*Ô&=ˆDÔ#àÔ"ð	:Ø*/¬,Ô*MˆDÔ'Ø %¤Ô 9ˆDÔàÐ#ÝÐ\Ñ]Ô]Ð]ØŒ?Ð"ÝŒMðeåñ
ô
ð
ð
ˆJˆJàœˆJØÔ!Ð)ÝŒMðeåñ
ô
ð
ð
!$ÐÐà $Ô 6Ðà  :Ò-Ð-ÝØrÐ&7ÐrÐrÐdnÐrÐrÐrñôð
ðÔ%Ð-°$Ô2IÐ-ÝŒMðeåñ
ô
ð
ð
%(Ð!Ð!à$(Ô$>Ð!àÐ Ý6Ø-Ô:Ø#'Ô#:Ø#'Ô#:ðñôˆMðÔ)ð
Ø-2�Ô*å”
ð\åñôðð*.ˆDÔ&Ð&à).ˆDÔ&ðÔð	,Ý$ UÑ+Ô+Ð+à$ˆŒØ$(Ô$=ˆÔ!Ø"&Ô"9ˆÔØ37Ô3EÐ3Q˜TÔ/Ð/ÐWgÔWtˆÔØ!2ˆÔØ#Ô3ˆÔØ%:ˆÔ"Ø 0ˆÔàŒ>Ð-Ð-Ð-°$Ô2FÈÒ2JÐ2JÝŒMðv T¤^ðvðvðvåñ
ô
ð
ð
Œ>˜ZÒ'Ð'ÝÐjÑkÔkÐkà”IˆŒ	Ø#Ô3ˆÔØœˆŒØœˆŒÝ '¨¬Ð6LÈeÑ TÔ TˆÔÝ$ U¤\Ð3IÈ3ÑOÔOˆÔØÔ ð	 TÔ%7¸3Ò%>Ð%>ÝŒMðõñ
ô
ð
ðŒ>˜WÒ$Ð$Ø#Ô/ˆDÔå*Ð+DÐ+DÑEÔEˆÔð48ˆÔÐ/Ñ0õ‰^Œ^×
.Ò
.Ñ
0Ô
0ð	cð	cà)×-Ò-Õ.BÈTÔMbÐ-ÑcÔcˆMØ)×-Ò-Ý)°kÐCSÐ5TÐ_cÔ_tð.ñôˆMðÐ'Ø+×/Ò/Õ0DÈtÔOdÐ/ÑeÔe�Ø+×/Ò/Ý-Ø*Ð,<Ð=Ø!Ô2ð 0ñ ô �ð*×-Ò-¨dÔ.?È$ÔJ_Ð-Ñ`Ô`ˆMØÐ'Ø+×/Ò/°Ô0AÈDÔLaÐ/ÑbÔb�ð#	cð	cð	cñ	cô	cð	cð	cð	cð	cð	cð	cøøøð	cð	cð	cð	cõ&	‰Œ×ÒØØØ'Ø'Ø%Ø-Ø!Ø+ØØ!Ø*Gð	ñ	
ô	
ð	
ð"*/ˆÔ&õ�4”:Ð/Ñ0Ô0ð	7ØŒJ×%Ò% d¤oÑ6Ô6Ð6å�t˜]Ñ+Ô+ð	Ý Øjñôð
ð	ð	s×&C[Û[Û
[cóö—| ||zd¬¦«}| |d¬¦«d}|dt|¦«d…}|dt|¦«d…}tj||g¦«}tj|d¦«}t|¦«t|¦«krtd¦«‚t|¦«}	||dd|	…kr|	dz}	|dd|	…}|dd|	…}
t|¦«t|
¦«krtd¦«‚|d|	d…}|d|	d…}t
||
||¬	¦«S)
a
        Llama tokenizer does satisfy `enc(a + b) = enc(a) + enc(b)`. It does ensure `enc(a + b) = enc(a) + enc(a +
        b)[len(enc(a)):]`. Reference:
            https://github.com/EleutherAI/lm-evaluation-harness/pull/531#issuecomment-1595586257
        F©Úadd_special_tokensÚ	input_idsNÚattention_maskzBPrompt input ids and answer input ids should have the same length.rPz@Prompt input ids and attention mask should have the same length.)Úprompt_input_idsÚprompt_attention_maskr‹rŒ)r?Úlenr1ÚconcatenateÚarrayrkÚdict)r*ÚpromptÚanswerÚfull_tokenizedr�Úanswer_input_idsÚanswer_attention_maskÚfull_concat_input_idsÚfull_input_idsÚresponse_token_ids_start_idxrŽs           rfÚbuild_tokenized_answerz)_UnslothCPOTrainer.build_tokenized_answer¿sÀ€ð×.Ò.¨v¸©ÐSXÐ.ÑYÔYˆØ×0Ò0°ÈEÐ0ÑRÔRÐS^Ô_Ðà)¨+Ô6µsÐ;KÑ7LÔ7LÐ7NÐ7NÔOÐØ .Ð/?Ô @ÅÐEUÑAVÔAVÐAXÐAXÔ YÐõ!#¤Ð0@ÐBRÐ/SÑ TÔ TÐõœ .°Ô"=Ñ>Ô>ˆåˆ~ÑÔ¥#Ð&;Ñ"<Ô"<Ò<Ð<ÝÐaÑbÔbÐbõ(+Ð+;Ñ'<Ô'<Ð$ð˜~¨kÔ:Ð;XÐ<XÐ;XÔYÒYÐYØ(¨AÑ-Ð(à)¨+Ô6Ð7TÐ8TÐ7TÔUÐØ .Ð/?Ô @ÐA^ÐB^ÐA^Ô _ÐåÐÑ Ô ¥CÐ(=Ñ$>Ô$>Ò>Ð>ÝÐ_Ñ`Ô`Ð`à)¨+Ô6Ð7SÐ7TÐ7TÔUÐØ .Ð/?Ô @ÐA]ÐA^ÐA^Ô _ÐåØ-Ø"7Ø&Ø0ð	
ñ
ô
ð	
rhÚreturnc	óœ
‡‡—i}|d}|d}|d}|j�s-t|t¦«stdt	|¦«›�¦«‚| |d¬¦«}d„| ¦«D¦«}t|t¦«stdt	|¦«›�¦«‚| ||¦«Št|t¦«std	t	|¦«›�¦«‚| ||¦«Št|d
¦«}t‰d
¦«}	t‰d
¦«}
t|	|
¦«}| ¦«D]\}}|d|…||<Œtd„t‰d
‰d
¦«D¦«¦«}
t|	|
z
¦«}|
d
ks|d
krtd¦«‚t|jj|||	‰|
‰¦«\}ŠŠt|jj‰‰¦«\ŠŠt#t‰d¦«t‰d¦«¦«}‰‰|fD]�}t|d
¦«|z|jkrj|jdkrdD]}||d|j…||<ŒŒL|jdkrdD]}|||jd…||<ŒŒvtd|j›�¦«‚ŒŽ‰‰fD]H}t|d
¦«|z|jkr%dD]"}||d|j|jz
…||<Œ#ŒIˆfd„dD¦«}ˆfd„dD¦«}|ddd…|d<|jgt‰d
¦«z|ddt‰d
¦«…<|ddd…|d<|jgt‰d
¦«z|ddt‰d
¦«…<|||dœ ¦«D]/\}}| ¦«D]\}}|dkrŒ|||›|›�<ŒŒ0nú| |d|jd¬¦«Š| |d|jd¬¦«Š| |d|jd¬¦«}‰d|d<‰d|d<|d|d
<|d|d<|�rt/|d ¦«rb| t3j|d¦«¬!¦«|d"<| t3j|d¦«¬!¦«|d#<|S)$a.Tokenize a single row from a CPO specific dataset.

        At this stage, we don't convert to PyTorch tensors yet; we just handle the truncation in case the prompt +
        chosen or prompt + rejected responses is/are too long. First we truncate the prompt; if we're still too long,
        we truncate the chosen/rejected.

        We also create the labels for the chosen/rejected responses, which are of length equal to the sum of the length
        of the prompt and the chosen/rejected response, with label_pad_token_id for the prompt tokens.
        r“ÚchosenÚrejectedz prompt should be an str but got Fr‰có —i|]\}}d|›�|“ŒS©Úprompt_r#)Ú.0ÚkÚvs   rfú
<dictcomp>z3_UnslothCPOTrainer.tokenize_row.<locals>.<dictcomp>s$€ÐPÐPÐP±$°!°Q˜] q˜]˜]¨AÐPÐPÐPrhz chosen should be an str but got z"rejected should be an str but got r�Ncó —g|]\}}||k‘ŒSr#r#)r£ÚaÚbs   rfú
<listcomp>z3_UnslothCPOTrainer.tokenize_row.<locals>.<listcomp> s €ÐpÐpÐp™D˜A˜q��a’ÐpÐpÐprhrPzdChosen and rejected prompt_input_ids might only differ on the last token due to tokenizer merge ops.r‹Ú
keep_start)r�rŽr�zUnknown truncation mode: )r‹rŒcó:•—i|]}|‰d|›�‰|z“ŒSr¡r#)r£r¤Ú
chosen_tokenss  €rfr¦z3_UnslothCPOTrainer.tokenize_row.<locals>.<dictcomp>Ns=ø€ð&ð&ð&ØGH��= ¨1  Ô/°-ÀÔ2BÑBð&ð&ð&rhcó:•—i|]}|‰d|›�‰|z“ŒSr¡r#)r£r¤Úrejected_tokenss  €rfr¦z3_UnslothCPOTrainer.tokenize_row.<locals>.<dictcomp>Qs=ø€ð(ð(ð(ØKL��? =¨Q = =Ô1°OÀAÔ4FÑFð(ð(ð(rhÚlabels)Úchosen_Ú	rejected_rƒÚtoken_type_idsT)Ú
truncationrrŠÚ
chosen_labelsÚrejected_labelsrŒrŽÚ%prepare_decoder_input_ids_from_labels)r°Úrejected_decoder_input_idsÚchosen_decoder_input_ids)r rirjrkÚtyper?Úitemsr›r�r'ÚsumrTÚabsr!Úbos_token_idr"Úeos_token_idÚmaxrrrrrrrr·r;Útensor)r*Úfeaturer:Úbatchr“ržrŸÚ
prompt_tokensÚprompt_len_input_idsÚchosen_prompt_len_input_idsÚrejected_prompt_len_input_idsr¤r¥Únum_diff_tokensÚnum_diff_lenÚlonger_response_lengthÚ
answer_tokensÚchosen_sequence_tokensÚrejected_sequence_tokensÚtoksÚtype_keyÚtokensrr¯s                      @@rfr‚z_UnslothCPOTrainer.tokenize_rowðsêøø€ðˆØ˜Ô"ˆØ˜Ô"ˆØ˜:Ô&ˆàÔ&ñ~	õ˜f¥cÑ*Ô*ð
TÝ Ð!RÅDÈÁLÄLÐ!RÐ!RÑSÔSÐSØ ×1Ò1°&ÈUÐ1ÑSÔSˆMØPÐP¸-×:MÒ:MÑ:OÔ:OÐPÑPÔPˆMå˜f¥cÑ*Ô*ð
TÝ Ð!RÅDÈÁLÄLÐ!RÐ!RÑSÔSÐSØ ×7Ò7¸ÀÑGÔGˆMå˜hÑ,Ô,ð
XÝ Ð!VÅdÈ8ÁnÄnÐ!VÐ!VÑWÔWÐWØ"×9Ò9¸&À(ÑKÔKˆOõ$' }Ð5GÔ'HÑ#IÔ#IÐ å*-¨mÐ<NÔ.OÑ*PÔ*PÐ'Ý,/°Ð@RÔ0SÑ,TÔ,TÐ)Ý#&Ð'BÐDaÑ#bÔ#bÐ à%×+Ò+Ñ-Ô-ð
<ð
<‘��1Ø#$Ð%:Ð&:Ð%:Ô#;�
˜aÑ Ð õ"ØpÐp¥C¨
Ð6HÔ(IÈ?Ð[mÔKnÑ$oÔ$oÐpÑpÔpñôˆOõÐ:Ð=ZÑZÑ[Ô[ˆLØ Ò"Ð" l°QÒ&6Ð&6Ý ð=ñôðõ=TØÔ%Ô2Ø$ØØ+ØØ-Øñ=ô=Ñ9ˆM˜=¨/õ.EØÔ%Ô2°MÀ?ñ.ô.Ñ*ˆM˜?õ&)¨]¸;Ô-GÑ)HÔ)HÍ#ÈoÐ^iÔNjÑJkÔJkÑ%lÔ%lÐ"ð#0°À-Ð!Pð	
]ð	
]�
Ý�}Ð%7Ô8Ñ9Ô9Ð<RÑRÐUYÔUdÒdÐdØÔ+¨|Ò;Ð;Ø!NðZðZ˜AØ/<¸QÔ/?Ð@XÀ$ÔBXÐ@XÔ/Y˜M¨!Ñ,Ð,ðZàÔ-°Ò;Ð;Ø!Nð[ð[˜AØ/<¸QÔ/?ÀÔAWÐ@WÐ@YÐ@YÔ/Z˜M¨!Ñ,Ð,ð[õ)Ð)[ÀTÔEYÐ)[Ð)[Ñ\Ô\Ð\ðeð#0°Ð!Að
hð
h�
Ý�}Ð%7Ô8Ñ9Ô9Ð<RÑRÐUYÔUdÒdÐdØ<ðhðh˜Ø+8¸Ô+;Ð<f¸d¼oÐPTÔPfÑ>fÐ<fÔ+g˜
 aÑ(Ð(øð&ð&ð&ð&ØLkð&ñ&ô&Ð"ð(ð(ð(ð(ØPoð(ñ(ô(Ð$ð0FÀkÔ/RÐSTÐSTÐSTÔ/UÐ" 8Ñ,àÔ'ðZå�MÐ"4Ô5Ñ6Ô6ñZ7Ð" 8Ô,Ð-Us°=ÐASÔ3TÑ/UÔ/UÐ-UÑVð2JÈ+Ô1VÐWXÐWXÐWXÔ1YÐ$ XÑ.àÔ'ð^å�OÐ$6Ô7Ñ8Ô8ñ^9Ð$ XÔ.Ð/Yµ°_ÐEWÔ5XÑ1YÔ1YÐ/YÑZð
2Ø5Ø!ðð÷Še‰gŒgð	
5ð
5‘��4ð
)-¯
ª
©¬ð5ð5Ñ$�H˜fØÐ#3Ò3Ð3Ø Ø.4�E˜QÐ* Ð*Ð*Ñ+Ð+ð5ð
5ð!×1Ò1Ø 4°DÔ4NÐcgð2ñôˆMð#×3Ò3Ø T°dÔ6PÐeið4ñôˆOð!×1Ò1Ø 4°DÔ4JÐ_cð2ñôˆMð&3°;Ô%?ˆE�/Ñ"Ø'6°{Ô'CˆEÐ#Ñ$Ø(5°kÔ(BˆEÐ$Ñ%Ø-:Ð;KÔ-LˆEÐ)Ñ*àÐ ¥W¨UÐ4[Ñ%\Ô%\Ð Ø6;×6aÒ6aÝ œ<¨Ð.?Ô(@ÑAÔAð7bñ7ô7�Ð2Ñ3ð5:×4_Ò4_Ý œ<¨¨oÔ(>Ñ?Ô?ð5`ñ5ô5�Ð0Ñ1ðˆrhFrŽrrÃr rrÚdevicec	ó”—i}|r3t|djd|djd¦«}n2t|djd|djd¦«}|D] }| d¦«r‰t||tj¦«rid|vs|r|}n/| d¦«r|}n| d	¦«rd
}| dd¦«}	t||||¬¦«||	<Œ¡|D]Ð}| d
¦«r¹t||tj¦«r™d|vs|r|}n/| d¦«r|}n| d	¦«rd
}| d
d¦«}	t	j	||	t||||¬¦«fd
¬¦« 
|¬¦«||	<ŒÑ|rf|d dd¦« 
|¬¦«|d<|d dd¦« 
|¬¦«|d<|S)açConcatenate the chosen and rejected inputs into a single tensor.

        Args:
            batch:
                A batch of data. Must contain the keys 'chosen_input_ids' and 'rejected_input_ids', which are tensors
                of shape (batch_size, sequence_length).
            is_encoder_decoder:
                Whether the model is an encoder-decoder model.
            label_pad_token_id:
                The label pad token id.
            padding_value:
                The padding value to use for the concatenated inputs_ids.
            device:
                The device for the concatenated inputs.

        Returns:
            A dictionary containing the concatenated inputs under the key 'concatenated_input_ids'.
        rµrPr¶Úchosen_input_idsÚrejected_input_idsržr°Ú
_input_idsÚ_attention_maskrÚconcatenated)Ú	pad_valuerŸrO©rÑr�rqÚconcatenated_input_idsrŽÚconcatenated_attention_mask)rÀrSÚ
startswithrir;rÚendswithÚreplacer4ÚcatrUÚrepeat)
rÃr rrrÑÚconcatenated_batchrr¤rØÚconcatenated_keys
          rfÚconcatenated_inputsz&_UnslothCPOTrainer.concatenated_inputs�s˜€ð4 Ðàð	gÝ˜U ?Ô3Ô9¸!Ô<¸eÐDUÔ>VÔ>\Ð]^Ô>_Ñ`Ô`ˆJˆJå˜UÐ#5Ô6Ô<¸QÔ?ÀÐG[ÔA\ÔAbÐcdÔAeÑfÔfˆJàð		pð		pˆAØ�|Š|˜HÑ%Ô%ð
p*°U¸1´X½u¼|Ñ*LÔ*Lð
pØ˜q�=�=Ð$6�=Ø 2�I�IØ—Z’Z Ñ-Ô-ð"Ø -�I�IØ—Z’ZÐ 1Ñ2Ô2ð"Ø !�IØ#$§9¢9¨X°~Ñ#FÔ#FÐ Ý7DÀUÈ1ÄXÈzÐenÐ7oÑ7oÔ7oÐ"Ð#3Ñ4øØð	$ð	$ˆAØ�|Š|˜JÑ'Ô'ð
$J°u¸Q´xÅÄÑ,NÔ,Nð
$Ø˜q�=�=Ð$6�=Ø 2�I�IØ—Z’Z Ñ-Ô-ð"Ø -�I�IØ—Z’ZÐ 1Ñ2Ô2ð"Ø !�IØ#$§9¢9¨Z¸Ñ#HÔ#HÐ Ý7<´yà*Ð+;Ô<Ý% e¨A¤h°
ÀiÐPÑPÔPððð8ñ8ô8÷’"˜F�"Ñ#Ô#ð
#Ð#3Ñ4øðð	Ø;@ÐASÔ;T×;[Ò;[Ð\]Ð_`Ñ;aÔ;a×;dÒ;dÐlrÐ;dÑ;sÔ;sÐÐ7Ñ8àÐ-Ô.×5Ò5°a¸Ñ;Ô;×>Ò>ÀfÐ>ÑMÔMð
Ð<Ñ=ð"Ð!rhÚpolicy_chosen_logpsÚpolicy_rejected_logpscóˆ—||z
 |jj¦«}|jdkrc|j|jz}||z
}t
j|j|z¦«d|jz
zt
j|j|z¦«|jzz
}n¼|jdkrOt
j|j|z¦«d|jz
zt
j|j|z¦«|jzz
}nb|jdkr tj
d|j|zz
¦«}n7|jdkr|dd|jzzz
dz}ntd|j›d�¦«‚|j| |jj¦« ¦«z}|j| |jj¦« ¦«z}|||fS)	aµCompute the CPO loss for a batch of policy and reference model log probabilities.

        Args:
            policy_chosen_logps:
                Log probabilities of the policy model for the chosen responses. Shape: (batch_size,)
            policy_rejected_logps:
                Log probabilities of the policy model for the rejected responses. Shape: (batch_size,)

        Returns:
            A tuple of three tensors: (losses, chosen_rewards, rejected_rewards). The losses tensor contains the CPO
            loss for each example in the batch. The chosen_rewards and rejected_rewards tensors contain the rewards for
            the chosen and rejected responses, respectively.
        r^rPrŒrYrZrqzUnknown loss type: z7. Should be one of ['sigmoid', 'hinge', 'ipo', 'simpo'])
rUrhrÑrrrrÚ
logsigmoidrr;ÚrelurkÚdetach)r*rärår]Úgamma_logratiosÚlossesÚchosen_rewardsÚrejected_rewardss        rfÚcpo_lossz_UnslothCPOTrainer.cpo_lossÅsÞ€ð$&Ð(=Ñ=×AÒAÀ$ÔBRÔBYÑZÔZˆðŒ>˜WÒ$Ð$Ø"Ô.°´Ñ:ˆOØ˜oÑ-ˆFõ”˜dœi¨&Ñ0Ñ1Ô1Ð1°Q¸Ô9MÑ5MÑNÝ”, ¤	˜z¨FÑ2Ñ3Ô3°dÔ6JÑJñKð
ˆFðŒ^˜yÒ
(Ð
(õ”˜dœi¨&Ñ0Ñ1Ô1Ð1°Q¸Ô9MÑ5MÑNÝ”, ¤	˜z¨FÑ2Ñ3Ô3°dÔ6JÑJñKð
ˆFðŒ^˜wÒ
&Ð
&Ý”Z  D¤I°Ñ$6Ñ 6Ñ7Ô7ˆFˆFØ
Œ^˜uÒ
$Ð
$à˜q A¨¬	¡MÑ2Ñ2°qÑ8ˆFˆFåØm d¤nÐmÐmÐmñôð
ðœÐ&9×&<Ò&<¸TÔ=MÔ=TÑ&UÔ&U×%]Ò%]Ñ%_Ô%_Ñ_ˆØœ9Ð(=×(@Ò(@ÀÔAQÔAXÑ(YÔ(Y×'aÒ'aÑ'cÔ'cÑcÐà�~Ð'7Ð7Ð7rhr]r°Úaverage_log_probcó„—|jdd…|jkrtd¦«‚|s1|dd…dd…f ¦«}|dd…dd…dd…f}||k}d|||k<t||¦«}|r.||z d¦«| d¦«zS||z d¦«S)aŽCompute the log probabilities of the given labels under the given logits.

        Args:
            logits: Logits of the model (unnormalized). Shape: (batch_size, sequence_length, vocab_size)
            labels:
                Labels for which to compute the log probabilities. Label tokens with a value of label_pad_token_id are
                ignored. Shape: (batch_size, sequence_length)
            average_log_prob:
                If True, return the average log probability per (non-masked) token. Otherwise, return the sum of the
                log probabilities of the (non-masked) tokens.
            label_pad_token_id: The label pad token id.
            is_encoder_decoder: Whether the model is an encoder-decoder model.

        Returns:
            A tensor of shape (batch_size,) containing the average/sum log probabilities of the given labels under the
            given logits.
        NrJzKLogits (batch and sequence length dim) and labels must have the same shape.rPr)rSrkÚcloner9r¼)r]r°rïrr Ú	loss_maskres       rfÚget_batch_logpsz"_UnslothCPOTrainer.get_batch_logpsúsç€ð2Œ<˜˜˜Ô ¤Ò,Ð,ÝÐjÑkÔkÐkà!ð	'Ø˜A˜A˜A˜q˜r˜r˜E”]×(Ò(Ñ*Ô*ˆFØ˜A˜A˜A˜s ˜s A A A˜IÔ&ˆFØÐ0Ò0ˆ	ð01ˆˆvÐ+Ò+Ñ,å/°¸Ñ?Ô?ˆàð	9Ø# iÑ/×4Ò4°RÑ8Ô8¸9¿=º=ÈÑ;LÔ;LÑLÐLà# iÑ/×4Ò4°RÑ8Ô8Ð8rhcó
‡—‰ |‰j‰j‰j‰jj¬¦«}|djd}‰jrd‰ |d¦«ini}‰jrd|d<||df|d	d
dœ|¤Ž}|j	}ˆfd„}|d 
¦«}	‰jdkr2tj
d
¦« ‰jj¦«}
n||d|…|	d|…¦«}
‰ ||d‰jdv‰j‰j¬¦«}|d|…}||d…}
|d|…}||d…}‰jr
||
|||
|jfS||
|||
fS)zÆRun the given model on the given batch of inputs, concatenating the chosen and rejected inputs together.

        We do this to avoid doing two forward passes, because it's faster for FSDP.
        )r rrrÑrµrÚdecoder_input_idsÚconcatenated_labelsTr\rÚrÛF)rŒÚ	use_cachecór•—‰js?|ddd…dd…f ¦«}|ddd…f ¦«}tj¦«}| d|jd¦«}| d¦«}| |j¦«}|||¦«}|S)N.rJrP)r Ú
contiguousr0ÚCrossEntropyLossÚviewrSrUrÑ)r]r°Úloss_fctÚlossr*s    €rfÚcross_entropy_losszC_UnslothCPOTrainer.concatenated_forward.<locals>.cross_entropy_lossHs¯ø€ØÔ*ð
6à  S b S¨!¨!¨! Ô,×7Ò7Ñ9Ô9�Ø  Q R R œ×3Ò3Ñ5Ô5�åÔ*Ñ,Ô,ˆHØ—[’[  V¤\°"Ô%5Ñ6Ô6ˆFØ—[’[ ‘_”_ˆFà—Y’Y˜vœ}Ñ-Ô-ˆFØ�8˜F FÑ+Ô+ˆDØˆKrhr„N)rZr^)rïr r)rãr rrrhrÑrSÚ_shift_rightr|r]rñrr;rÁrUrórÚaux_loss)r*r:rÃráÚ