Files
DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/__pycache__/UnslothXPOTrainer.cpython-311.pyc
T

242 lines
51 KiB
Plaintext
Raw Normal View History

2025-08-13 23:50:20 +00:00
§
5$hÀãó dZddlmZddlZddlmZddlmZddlmZm Z m
Z
m Z m Z m
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZmZmZmZmZm Z mZm Z m!Z!m"Z"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(m)Z)mZm*Z*m+Z+m,Z,mZm-Z-m.Z.ddl*Z*ddlTddl/m0Z0m1Z1dd l2m3Z3ddlZddl4Z5dd
l6m7Z7ddlmZdd l8m9Z9m:Z;d d
d d
d
dœZ<ej=d d e<¬¦«d¦«Z>e0Gdde¦«¦«Z? Gdde¦«Z@Gdde@¦«ZAdS)z8
2025.8.4
2025.8.5
4.55.1
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)%rÚBaseImageProcessorÚBasePairwiseJudger ÚDatasetÚEvalPredictionÚFeatureExtractionMixinÚIterableDatasetÚOnlineDPOTrainerÚOptimizerNamesrÚ PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚSIMPLE_CHAT_TEMPLATEÚTrainerCallbackr Ú XPOConfigÚ
XPOTrainerÚ empty_cacheÚgenerate_model_cardÚget_comet_experiment_urlÚ
get_rewardÚis_conversationalÚis_peft_availableÚis_wandb_availableÚjinja2Úmaybe_apply_chat_templateÚnnÚosÚselective_log_softmaxÚtextwrapÚtorchÚtruncate_rightÚunwrap_model_for_generation)Ú*)Ú dataclassÚfield)ÚVersion)Ú nullcontext)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚ max_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ fullgraphÚoptionscó’tj| d|jd¦«dd¬¦«}tj| d¦«dd¬¦«}g}t ||¦«D]\}}| tj¦«}tj|d| d¦«¬¦«  d¦«}tj
|d¬¦«}||z
} |  | ¦«Œ’ tj |¦«}| |jd|jdf¦«}|S)Néÿÿÿÿér)ÚchunksÚdim)r?Úindex©r?é)
r+ÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ unsqueezeÚsqueezeÚ logsumexpÚappendÚconcat)
Úlogitsr@Úchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚ chunk_logitsÚ chunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logpss
ú]/workspace/Fine-tuning/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothXPOTrainer.pyÚchunked_selective_log_softmaxrY"s5õ”[ §¢°°F´LÀÔ4DÑ!EÔ!EÐPQÐYZÐ[€NÝ”[ §¢¨rÑ!2Ô!2¸QÀaÐH€MØÐå%(¨¸Ñ%GÔ%Gð #—¥u¤}Ñ Ýœ, |¸2À{×G\ÒG\Ð]_ÑG`ÔG`Ða×iÐjlÑmˆÝ œ?¨<¸Ø)Ð,<Ñ<ˆØ×" Ýœ,Ð':ÑØ-×5°v´|ÀA´ÈÌ ÐUVÌÐ6XÑØ Ðócó®eZdZUdZedddi¬¦«Zeeed<edddi¬¦«Z ee
ed < d/ˆfd.„ Z ˆxZ S)0ÚUnslothXPOConfiga­
Configuration class for the [`XPOTrainer`].
Subclass of [`OnlineDPOConfig`] we can use all its arguments and add the following:
Parameters:
alpha (`float` or `list[float]`, *optional*, defaults to `1e-5`):
Weight of the XPO loss term. If a list of floats is provided then the alpha is selected for each new epoch
and the last alpha is used for the rest of the epochs.
helpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsr<z8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksFÚnor=éréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?çlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsrBéôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastéé@éÚsigmoidÚvllmçš™™™™™á?c‘ óÌ|dkrtd|d¦«|dkrtd|d¦«||#dkr
|$dkrd}d }#|ˆ€!d
d lm}t |’¦«d zd ¦«}ˆ|…d
krt d
¦«|…dkrt d¦«t
¦«jdžid|d|d|d|d|d|d|d|d| “d|
d| d| d|
d|d|d|d |d!|d"|d#|d$|d%|d&|d'|d(|d)|d*|d+|d,|d-|d.|d/| “d0|!“d1|"“d2|#“d3|$“d4|%“d5|&“d6|'“d7|(“d8|)“d9|*“d:|+“d;|,“d<|-“d=|.“d>|/“d?|0“d@|1“dA|2“dB|3“dC|4“dD|5“dE|6“dF|7“dG|8“dH|9“dI|:“dJ|;“dK|<“dL|=“dM|>“dN|?“dO|@“dP|A“dQ|B“dR|C“dS|D“dT|E“dU|F“dV|G“dW|H“dX|I“dY|J“dZ|K“d[|L“d\|M“d]|N“d^|O“d_|P“d`|Q“da|R“db|S“dc|T“dd|U“de|V“df|W“dg|X“dh|Y“di|Z“dj|[“dk|\“dl|]“dm|^“dn|_“do|`“dp|a“dq|b“dr|c“ds|d“dt|e“du|f“dv|g“dw|h“dx|i“dy|j“dz|k“d{|l“d||m“d}|n“d~|o“d|p“d€|q“d|r“d|s“dƒ|t“d„|u“d…|v“d†|w“d‡|x“dˆ|y“d‰|z“dŠ|{“d||“dŒ|}“d|~“dŽ|d|€“d|d‘|‚“d’|ƒ“d“|„“d”|…“d•|†“d–|‡“d—|ˆ“d˜|‰“d™|Š“dš|‹“d›|Œ“dœ|d|Ž“|‘¤Ž||_||_ dS)ŸNgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!rBza` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rprqÚunsloth_training_checkpointsrbr)Ú cpu_countrczUUnsloth: Please set a positive non-zero temperature since your results will be wrong.é
zgUnsloth: Please set a positive non-zero temperature less than 10, since sampling will be quite erratic.Ú
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚ weight_decayÚ
adam_beta1Ú
adam_beta2Ú adam_epsilonÚ
max_grad_normÚnum_train_epochsÚ max_stepsÚlr_scheduler_typeÚ warmup_ratioÚ warmup_stepsÚ log_levelÚlog_level_replicaÚlog_on_each_nodeÚ logging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚ ddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚ disable_tqdmÚremove_unused_columnsÚ label_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚ fsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ adafactorÚgroup_by_lengthÚlength_column_nameÚ report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚ push_to_hubÚresume_from_checkpointÚ hub_model_idÚ hub_strategyÚ hub_tokenÚhub_private_repoÚhub_always_pushÚ hub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚ fp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚ torchdynamoÚ ray_scopeÚ ddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚreward_model_pathÚjudgeÚmax_new_tokensÚ
max_lengthÚ temperatureÚmissing_eos_penaltyÚ loss_typeÚdataset_num_procÚdisable_dropoutÚuse_vllmÚvllm_model_implÚgpu_memory_utilizationÚds3_gather_for_generationÚmodel_init_kwargs©)
ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingrƒÚminÚ MathErrorÚsuperÚ__init__r`ra)”Úselfr…r†r‡rˆr‰rrrrrrr“r”r•r–r—r™rrr r­r¿rÿrrrrrrrrrr r
r r r
rrrrrr`raÚkwargsrƒÚ __class__s” €rXrzUnslothXPOConfig.__init__Is¿ ø€ðh ˜ Ð Õ'9ð;VÐ]jð;Vð;Vð;Vñ(Wô(Wð"WØ ˜ Ð ¥Mð3FÐUbð3Fð3Fð3Fñ%Gô%GðGØ Ð  -°7Ò":Ð":¸zÈSÒ?PÐ?PØ7ˆ ˆ Ð " 9 9¡;¤;¨q¡=°!Ñ Ø ˜!Ò Ð ÝÐ
˜
Ð
ÝðFñGôGð
Gð ŒÔðNNN#˜ðN <à#7Ð#7ðN Ngð N
$˜ð N *˜
N$8Ð#7ðN+FÐ*EðN*DÐ)CðN(@Ð'?ðN'>Ð&=ðN+FÐ*EðN'>Ð&=ðN$˜ðN'>Ð&=ðN*˜Mð!N <ð"(˜<ð#N <ð$$˜ð%N <ð&$˜ð'N <ð((˜<ð)N <ð**˜Mð+N <ð,/ð-N <ð."˜ ð/N <ð0!2Ð 1ð1N <ð2(˜<ð3N <ð4(˜<ð5N <ð6"˜ ð7N <ð8!2Ð 1ð9N <ð:/ð;N <ð<&˜+ð=N <ð>/ð?N <ð@"4Ð!3ðAN <ðB*˜MðCN <ðD&<Ð%;ðEN <ðF*˜MðGN <ðH$˜ðIN <ðJ/ðKN <ðL/ðMN <ðN!2Ð 1ðON <ðP.˜oðQN <ðR7^Ð6]ðSN <ðTgðUN <ðVgðWN <ðX,˜^ðYN <ðZ4ð[N <ð\"˜ ð]N <ð^*˜Mð_N <ð` xðaN <ðb4ðcN <ðd4ðeN <ðf,˜^ðgN <ðh&<Ð%;ðiN <ðj,˜^ðkN <ðl,˜^ðmN <ðn4ðoN <ðp$˜ðqN <ðr&˜+ðsN <ðt*˜MðuN <ðv!2Ð 1ðwN <ðxEðyN <ðz$8Ð#7ð{N <ð|$˜ð}N <ð~&<Ð%;ðN <ð@*DÐ)CðAN <ðB$˜ðCN <ðD xðEN <ðF(˜<ðGN <ðH%:Ð$9ðIN <ðJ&˜+ðKN <ðL&<Ð%;ðMN <ðN%:Ð$9ðON <ðP!2Ð 1ðQN <ðR/ðSN <ðT4ðUN <ðV#6Ð"5ðWN <ðX&˜+ðYN <ðZ2TÐ1Sð[N <ð\"4Ð!3ð]N <ð^"˜ ð_N <ð`&<Ð%;ðaN <ðbEðcN <ðd$˜ðeN <ðf"˜ ðgN <ðh.˜oðiN <ðj"4Ð!3ðkN <ðl"˜ ðmN <ðn*DÐ)CðoN <ðp!2Ð 1ðqN <ðr%:Ð$9ðsN <ðt%:Ð$9ðuN <ðv-JÐ,IðwN <ðx#6Ð"5ðyN <ðz*DÐ)Cð{N <ð|&˜+ð}N <ð~&<Ð%;ðN <ð@(˜<ðAN <ðB(˜<ðCN <ðD"˜ ðEN <ðF/ðGN <ðH.˜oðIN <ðJ(˜<ðKN <ðL&<Ð%;ðMN <ðN-JÐ,IðON <ðP*DÐ)CðQN <ðR&<Ð%;ðSN <ðT(˜<ðUN <ðV$8Ð#7ðWN <ðX(@Ð'?ðYN <ðZ!2Ð 1ð[N <ð\*˜Mð]N <ð^$8Ð#7ð_N <ð`/ðaN <ðb&˜+ðcN <ðd"˜ ðeN <ðf&˜+ðgN <ðh*˜MðiN <ðj%:Ð$9ðkN <ðl"4Ð!3ðmN <ðn)BÐ(AðoN <ðp-JÐ,IðqN <ðr#6Ð"5ðsN <ðt$8Ð#7ðuN <ðv"4Ð!3ðwN <ðx*˜MðyN <ðz/ð{N <ð|#6Ð"5ð}N <ð~&<Ð%;ðN <ð@-JÐ,IðAN <ðB!2Ð 1ðCN <ðDEðEN <ðF,˜^ðGN <ðH$˜ðIN <ðJ&˜+ðKN <ðL#6Ð"5ðMN <ðN"˜ ðON <ðP/ðQN <ðR.˜oðSN <ðT xðUN <ðV.˜oðWN <ðX&<Ð%;ðYN <ðZ)BÐ(Að[N <ð\!2Ð 1°Fð]NNN <ð^%9ˆÔ!Ø"4ˆÔÐÐrZ)NNFFFrbFr=r=NNrcrcrrdrerfrgrhrirjrkr<rlrmrrnroTNrpFrBFrprqNTFFFFFFrrrrFFFFrsrtFFNr<NNFruFNrNr<NNTNFNNFrurNNNNrvrwNFFrxNNNNTFTFFNNryNNFNFNFTrtNNNruTFNrzr{FNNFFNNFFFNFTNNr|r}rgNr~NTFrr€TNNr<)
Ú__name__Ú
__module__Ú __qualname__Ú__doc__r0r`rrÚ__annotations__raÚintrÚ
__classcell__©rs@rXr\r\3szø€ð ð ð+0¨%ØØÐ+ñ+ô+И( 3œ-ððñð*/¨ØØÐ*ñ*ô*И #œððñð ØØØØØ$Ø&'Ø%&Ø#'Ø"&Ø&'Ø"#ØØ"%ØØØØØØØØØØØØØØØ!&ØØØØØØ27ØØØØØØØØØØØ!'ØØØØØØØØØ!"Ø%)ØØØØ $ØØ!&Ø $Ø Ø ØØØØ-1ØØ!$ØØØØØØ%)Ø Ø $Ø $Ø(-Ø"Ø%*ØØ!%ØØØØØØ!&Ø(,Ø%*Ø!%ØØ#Ø#'Ø ØØ ØØØØØ $Ø!Ø$)Ø(-ØØ Ø"Ø!&Ø(,Ø ØØØØØØØØØ Ø!%Ø$(Ø Øðcrrrrrrrrrr5rZr\c óeZdZdZddgZ d%deeejfdeeejfde ejd e e
d
e e d e e d e ee
efd
e ee
eee
ffde eeeeefde ede e egefde eedeejjejjjfde e ejejgejfddfˆfd
Ze d¦«Z!dZ"dZ#dZ$dZ%dZ&dZ' ddZ( d&dejdeeeeje)ffde e*dejfd „Z+ d'd!e ed"e ed#eeeedffd$„Z,ˆxZ-S)(Ú_UnslothXPOTrainerruÚtrlÚxpoN©NNÚmodelÚ ref_modelÚ reward_modelrÚargsÚ
data_collatorÚ
train_datasetÚ eval_datasetÚprocessing_classÚ peft_configÚcompute_metricsÚ callbacksÚ
optimizersÚpreprocess_logits_for_metricsÚreturncót¦« ||||||||| | |
| | |
|¬¦«|jj|_ggggggggggggggdœ|_|j g|jd<g|jd<g|jd<dSdS)N)r+r,rr-r.r/r0r1r2Úreward_processing_classr3r4r5r6r7)úloss/dpoúloss/xpoú objective/klúobjective/entropyúrewards/chosenúrewards/rejectedúrewards/accuraciesúrewards/marginsú logps/chosenúlogps/rejectedúval/model_contain_eos_tokenúval/ref_contain_eos_tokenÚalphaÚbetaúobjective/model_scoresúobjective/ref_scoresúobjective/scores_margin)rrr.rGÚ_alphaÚstatsr-)rr+r,r-rr.r/r0r1r2r3r4r5r6r7rs €rXrz_UnslothXPOTrainer.__init__ƒø€õ& Œ×ÒØØØØØ-Ø$4ØØ!Ø*Gð ñ
ô
ð
ð$”i”oˆŒ ð ØØØ!#Ø Ø "Ø"$ØØ à+-Ø)+ØØð#
ð
ˆŒ
ð& Ô Ð (à35ˆDŒJÐ 0Ø13ˆDŒJÐ .Ø46ˆDŒJÐ (rZcóÀt|jt¦«r>|jj}|t |j¦«kr
|j|n |jdS|jS)Nr<)Ú
isinstancerLÚlistÚstateÚepochÚlen)rrRs rXrGz_UnslothXPOTrainer.alphaÄsUå d”k¥4Ñ  Ø”JÔ$ˆEØ).µ°T´[Ñ1AÔ1AÒ)AÐ)A4”;˜%ÀtÄ{ÐSUÄÐ ”;Ð rZcóXt||j¦«5}| |d|d|j¬¦«}ddd¦«n #1swxYwY|j€U|j |¦«}t
¦«r*t|t¦«r|  ¦«}n"|}n|j |j¦«}t||j¦«5}| |d|d|j¬¦«}ddd¦«n #1swxYwY||fS)NÚ input_idsÚattention_mask)rUrVÚgeneration_config)
r-Ú acceleratorÚgeneraterWr,Ú unwrap_modelr#rOrÚget_base_model) rÚpromptsr+Úunwrapped_policy_model_for_genÚ model_outputÚ"unwrapped_main_model_for_ref_logicÚactual_model_for_ref_generationÚfinal_ref_model_for_genÚ
ref_outputs rXÚ_generate_completionsz(_UnslothXPOTrainer._generate_completionsÌÝ
°Ô0@Ñ
 ÐEcØ9×! &Ð'7Ô8Ø"&Ô"8ðCñôˆ ð ð ñ ô ð ð ð ð ð ð øøøð ð ð ð ð Œ>Ð !Ø15Ô1A×1NÒ1NÈuÑ1UÔ1UÐ  Ñ
U¥zÐ2TÕV_Ñ'`Ô'`ð
UØ2T×2cÒ2cÑ2eÔ2eÐ/à2TÐ/à.2Ô.>×.KÒ.KÈDÌNÑ.[Ô.[Ð
(Ð)HÈ$ÔJZÑ
 Ð_vØ0×! &Ð'7Ô8Ø"&Ô"8ðôˆ ð ð ñ ô ð ð ð ð ð ð øøøð ð ð ð ð˜'s#*A Á AÁAÃ'*DÄD!Ä$D!có|djd}|dd|df}t||jj|jj¦«\}}t j|d|fd¬¦«t j|d|fd¬¦«|ddœ}|dd|df}t||jj|jj¦«\}} t j|d|fd¬¦«t j|d| fd¬¦«|ddœ}
||
fS)NrUrBrArVÚraw©rUrVre)rEr,r2Ú eos_token_idÚ pad_token_idr+Úcat) rr^rbr\Úcontext_lengthÚmodel_completion_idsÚmodel_completion_maskÚ
model_dataÚref_completion_idsÚref_completion_maskÚref_datas rXÚ_process_completionsz'_UnslothXPOTrainer._process_completionsèsKØ  Ô3°AÔð ,¨A¨A¨A¨~¨¨Ð,>ÔÝ6DØ  $Ô"7Ô"DÀdÔF[ÔFhñ7
ô7
ÑМ G¨KÔ$8Ð:NÐ#OÐUVÐ#œi¨Ð1AÔ)BÐDYÐ(ZÐ`aИ5”>ð
ð
ˆ
𨨨>¨?¨?Ð(:ÔÝ2@Ø  Ô 5Ô BÀDÔDYÔDfñ3
ô3
ÑМ G¨KÔ$8Ð:LÐ#MÐSTÐ#œi¨Ð1AÔ)BÐDWÐ(XÐ^_И5”>ð
ð
ˆð ˜#rZcó8tj¦«5t|j|d|jj|¦«\}}}t|j|d|jj|¦«\}}}ddd¦«n #1swxYwY|jjŠtj|d|jj kd¬¦«}tj|d|jj kd¬¦«}||xx|jjzcc<||xx|jjzcc<||fS)NrUr<rA)
r+Úno_gradr!r-r2rhr.r
Úanyrg) rrmrprjÚ model_scoresÚ
ref_scoresÚmodel_contain_eosÚref_contain_eoss rXÚ_compute_rewardsz#_UnslothXPOTrainer._compute_rewardssmÝ
Œ]‰_Œ_ð ð Ý!+ØÔ! :¨kÔ#:¸DÔ<QÔ<^Ð`nñ"ô"Ñ ˆAˆ|˜Ô! 8¨KÔ#8¸$Ô:OÔ:\Ð^lñ ô Ñ ˆAˆz˜1ð  ð ð ñ ô ð ð ð ð ð ð øøøð ð ð ð ð Œ9Ô 4Ý %¤ ¨*°[Ô*AÀTÔEZÔEgÒ*gÐmoÐ pÑ pÔ pÐ Ý#œi¨°Ô(=ÀÔAVÔAcÒ(cÐikÐlˆ Ð ´ Ô0MÑ ˜Ð (¨D¬IÔ,IÑ ˜'s”AA7Á7A;Á>A;c óʇ —|d}|j |ddd|dfd¬¦«}d|D¦«}|j |ddd|dfd¬¦«}d|D¦«}td|di¦«rod „|D¦«}tj¦«}| t ¦«Š ˆ fd
|D¦«}ˆ fd |D¦«}d |D¦«}ˆ fd
|D¦«}|j |tt||¦«¦«¦«}tj d|D¦«|dj ¬¦«S)NrerUT)Úskip_special_tokenscó6g|]}| ¦«ŒSr©Ústrip©Ú.0Ú
completions rXú
<listcomp>z5_UnslothXPOTrainer._compute_judge.<locals>.<listcomp>s$Ð!^Ð!^Ð!^¸ *×"2Ò"2Ñ"4Ô"4Ð!^Ð!^Ð!^rZcó6g|]}| ¦«ŒSrr~r€s rXz5_UnslothXPOTrainer._compute_judge.<locals>.<listcomp>s$ÐZ°z 
× 0Ò 0Ñ 2Ô 2ÐZrZÚpromptrcóg|]}d|dœgŒ S©Ú assistant)ÚroleÚcontentrr€s rXz5_UnslothXPOTrainer._compute_judge.<locals>.<listcomp>"s0ð&ð&ð&ØCM˜+°*Ð&ð&ð&rZcó<g|]} |¬¦«ŒS©)Úmessages©Úrender)rÚmessageÚtemplates €rXz5_UnslothXPOTrainer._compute_judge.<locals>.<listcomp>'s'ø€ÐP¸Wx—°ÑPrZcó<g|]} |¬¦«ŒS©rrrs €rXz5_UnslothXPOTrainer._compute_judge.<locals>.<listcomp>(s'ø€Ð%tÐ%tÐ%tÈz h§o¢o¸z oÑ&JÔ&JÐ%tÐ%tÐ%trZcóg|]}d|dœgŒ Sr‡rr€s rXz5_UnslothXPOTrainer._compute_judge.<locals>.<listcomp>*s0ð$ð$ð$ØCM˜+°*Ð$ð$ð$rZcó<g|]} |¬¦«ŒSr“s €rXz5_UnslothXPOTrainer._compute_judge.<locals>.<listcomp>-s'ø€Ð#pÐ#pÐ#pÈZ H§O¢O¸Z OÑ$HÔ$HÐ#pÐ#pÐ#prZcóg|]}|dkŒ S)rr)rÚranks rXz5_UnslothXPOTrainer._compute_judge.<locals>.<listcomp>6sÐM¨4˜T QšYÐMrZ)Údevice)
r2Ú batch_decoder"r%Ú EnvironmentÚ from_stringrrrPrFr+Útensorr˜)
rrmrprjr\Úmodel_data_completionsÚref_data_completionsÚ environmentÚranks_of_first_completionrs
@rXÚ_compute_judgez!_UnslothXPOTrainer._compute_judgeø€Ø˜Ø!%Ô!6×!CÒ!CØ  # A A A ~  Ð$6Ô 7ÈTð"Dñ"
ô"
Ðð"_Ð!^ÐG]Ð!^Ñ!^Ô!^Ðà4×  ! ! ! ! ^ _ _Ð"4Ô 5È4ð Bñ
ô
Ðð [ÐZÐEYÐå ˜h¨°¬
Ð qð&ð&ØQgð&ñ&ô&Ð .ˆ"×.Õ/CÑDˆÐPˆGØ%tÐ%tÐ%tÐ%tÐ]sÐ%tÑ%tÔ%tÐ $ð$ØQeð$ñ$ô$Ð ð$qÐ#pÐ#pÐ#pÐ[oÐ#pÑ#pÔ#pÐ à$(¤J×$4Ò$4Ø Ý Ð+Ð-AÑ %
ô%
ÐŒ|ÐMÐ3LÐMÐV`ÐalÔVmÔVtÐurZcóŒˆfd}|||¦«}|||¦«}tj¦«5|j€E| ¦«5|||¦«}|||¦«} ddd¦«n #1swxYwYn"||j|¦«}||j|¦«} ddd¦«n #1swxYwY|ddddfdk}
|ddddfdk} | |
d¦«}| | d¦«}|  | d¦«} | |
d¦«}||| |fS)Ncóª||d|d¬¦«}|jdddz
df}t||ddddf¦«}|S)NrUrV)rVrBr<)rOr))ÚdataÚoutputrOÚtoken_logprobsrjs €rXÚcompute_logprobs_for_datazG_UnslothXPOTrainer._compute_logprobs.<locals>.compute_logprobs_for_data9spø€ØQt˜(¸Ð>NÔ9OÐPˆ”] 1 1 1 n°qÑ&8¸2Ð&=Ð#=Ô>ˆ2°6¸ Ô;LÈQÈQÈQÐP^ÐP_ÐP_ÐM_Ô;`Ñaˆ !rZrVrrv)r+rsr,Údisable_adapterÚ masked_fill) rr+rmrprjÚmodel_logprobs_model_dataÚmodel_logprobs_ref_dataÚref_logprobs_model_dataÚref_logprobs_ref_dataÚmodel_padding_maskÚref_padding_masks ` rXÚ_compute_logprobsz$_UnslothXPOTrainer._compute_logprobs8sIø€ð%>Ð$=¸eÀZÑ$PÔ$PÐ!à";Ð";¸EÀ8Ñ"LÔ"LÐõŒ]‰_Œ_ð \ð \ØŒ~Ð×WðWØ.GÐ.GÈÈzÑ.ZÔ.ZÐ+Ø,EÐ,EÀeÈXÑ,VÔ,VÐWðWðWñWôWðWðWðWðWðWðWøøøðWðWðWðWøð+DÐ*CÀDÄNÐT^Ñ*_Ô*_Ð'Ø(AÐ(AÀ$Ä.ÐRZÑ([Ô([Ð \ð \ð \ñ \ô \ð \ð \ð \ð \ð \ð \øøøð \ð \ð \ð \ð(Ð(8Ô9¸!¸!¸!¸^¸_¸_Ð:LÔMÐQRÒØ#Ð$4Ô5°a°a°a¸¸¸Ð6HÔIÈQÒØ$=×$IÒ$IÐJ\Ð^aÑ$bÔ$bÐ!Ø"9×"EÒ"EÐFVÐX[Ñ"\Ô"\ÐØ 5× AÒ AÐBRÐTWÑ XÔ XÐØ"9×"EÒ"EÐFXÐZ]Ñ"^Ô"^Ðà(Ð*AÐCXÐZqÐqs5²B-ÁA3Á' B-Á3A7 Á7B-Á:A7 Á;&B-Â-B1Â4B1có¤| d¦«}| d¦«}| d¦«}| d¦«} tj|||¦«}
tj|| |¦«} |
| z
} tj|||¦«}
tj|| |¦«}|
|z
}| |z
}|jjdkrt j|j|z¦« }n@|jjdkr|dd|jzz z
dz}ntd|jj¦«|j |z}||z 
¦«}|||fS)NrBr~Úiporczinvalid loss type ) Úsumr+Úwherer.r rÚ
logsigmoidrHÚNotImplementedErrorrGÚmean)rr­Ú chosen_maskÚmodel_logprobs_model_data_sumÚmodel_logprobs_ref_data_sumÚref_logprobs_ref_data_sumÚref_logprobs_model_data_sumÚchosen_model_logprobsÚchosen_ref_logprobsÚchosen_log_ratiosÚrejected_model_logprobsÚrejected_ref_logprobsÚrejected_log_ratiosrOÚ
dpo_lossesÚ
xpo_lossesÚlosss rXÚ_compute_lossesz"_UnslothXPOTrainer._compute_lossesXs}ð)B×(EÒ(EÀaÑ(HÔ(HÐ%Ø&=×&AÒ&AÀ!Ñ&DÔ&DÐ#Ø$9×$=Ò$=¸aÑ$@Ô$@Ð!Ø&=×&AÒ&AÀ!Ñ&DÔ&DÐ#å %¤ ¨KÐ9VÐXsÑ tÔ tÐÝ#œk¨+Ð7RÐTmÑØ1Ð4GÑå"'¤+¨{¨lÐ<YÐ[vÑ"wÔ"wÐÝ %¤ ¨[¨LÐ:UÐWpÑ qÔ qÐØ5Ð8MÑð#Ð%8Ñà Œ9Ô  )Ò œ, t¤y°6Ñ'9Ñ:ˆJˆ
ŒYÔ
 
  ¨D¬I©
Ñ#6Ñ6¸<ˆJˆ%Ð&P¸4¼9Ô;NÐ&PÐ&PÑ ”ZÐ"=Ñ=ˆ
ð˜'×/ˆàZ Ð+rZc
óê
ˆfd}
jd |
|¦«¦«jd |
| ¦«¦«j~‰jd |
| ¦«¦«jd |
| ¦«¦«jd |
| | z
¦«¦«| d¦«}| d¦«}| d¦«}| d¦«}t j|||¦«}t j|||¦«}||z
}t j|||¦«}t j|||¦«}||z
}jd |
| ¦«| ¦«z¦«¦«jd  |
| ¦«| ¦«z¦«¦«|jz}|jz}jd
 |
| ¦«¦«¦«jd  |
| ¦«¦«¦«||z
}||z
}| d¦«| d¦«z ¦«d z }jd
 |
|¦«¦«| d¦« }| d¦« }| ¦«| ¦«zd z }jd |
|¦«¦«||z
} ‰jd |
|  ¦«¦«¦«| dk ¦«}!‰jd |
|! ¦«¦«¦«|ddd|
dfj j
k  d¬¦«}"|ddd|
dfj j
k  d¬¦«}#‰jd |
|" ¦«¦«¦«jd |
|# ¦«¦«¦«jd j ¦«jd j¦«dS)Ncó€j |¦« ¦« ¦«S©N)rXÚgather_for_metricsr¸Úitem)rs €rXÚ gather_meanz7_UnslothXPOTrainer._log_statistics.<locals>.gather_means2ø€ØÔ#×6°vÑ>×E× LrZr;r<rIrJrKrBrCrDr?r@rcr=r>rBrrArUrArErFrGrH)
rMrMr-r´r+r¸rHÚfloatr2rgrtrG)$rrmrpr­rjrvrwr¿Úchosen_rewardsÚrejected_rewardsÚ
kl_model_dataÚ kl_ref_dataÚmean_klÚentropy_model_dataÚentropy_ref_dataÚ mean_entropyÚmarginÚaccuracyÚ model_eosÚref_eoss$` rXÚ_log_statisticsz"_UnslothXPOTrainer._log_statistics€sDø€ð  Mð Mð Mð Mð Mð
Œ
×% k k°*Ñ&=Ô&=Ñ Œ
×% k k°*Ñ&=Ô&=Ñ Ô Ð ŒJÐ ¸ ÀLÑ8QÔ8QÑ ŒJÐ 5°k°kÀ*Ñ6MÔ6MÑ ŒJÐ ¸À\ÐT^ÑE^Ñ9_Ô9_Ñ )B×(EÒ(EÀaÑ(HÔ(HÐ%Ø&=×&AÒ&AÀ!Ñ&DÔ&DÐ#Ø$9×$=Ò$=¸aÑ$@Ô$@Ð!Ø&=×&AÒ&AÀ!Ñ&DÔ&DÐ#å %¤ ¨KÐ9VÐXsÑ tÔ tÐÝ#œk¨+Ð7RÐTmÑØ1Ð4GÑå"'¤+¨{¨lÐ<YÐ[vÑ"wÔ"wÐÝ %¤ ¨[¨LÐ:UÐWpÑ qÔ qÐØ5Ð8MÑà Œ
"×)¨+¨+Ð6K×6PÒ6PÑ6RÔ6RÐUh×UmÒUmÑUoÔUoÑ6oÑ*pÔ*pÑ Œ
Ð$×+¨K¨KÐ8O×8TÒ8TÑ8VÔ8VÐYn×YsÒYsÑYuÔYuÑ8uÑ,vÔ,vÑ+¨T¬YÑØ´ÑØ Œ
Ð$×+¨K¨K¸×8KÒ8KÑ8MÔ8MÑ,NÔ,NÑ Œ
Ð&×-¨k¨kÐ:J×:OÒ:OÑ:QÔ:QÑ.RÔ.RÑ2Ð4KÑKˆ
Ø-Ð0EÑEˆ Ø ×$ '¨+¯/ª/¸!Ñ*<Ô*<Ñ<×DÀqÑHˆØ Œ
"×)¨+¨+°gÑ*>Ô*>Ñ8×;¸Ø3×7¸ÑØ1Ð4D×4IÒ4IÑ4KÔ4KÑKÈqÑPˆ Ø Œ
Ð'×.¨{¨{¸<Ñ/HÔ/HÑ Ð"2Ñ2ˆØ Œ
Ð,¨[¨[¸¿º¹¼Ñ-GÔ-GјQJ×'ˆØ Œ
Ð(× ° ¸H¿MºM¹O¼OÑ0LÔ0LÑ   Ô,¨Q¨Q¨Q°°°Ð-?Ô@ÀDÔDYÔDfÒf×kÐpqÐrˆ ؘKÔ¨¨¨N¨O¨OÐ);ÔÔ@UÔ@bÒb×gÐlmÐØ Œ
иÀYÇ_Â_ÑEVÔEVÑ9WÔ9WÑ Œ
Ð6°{°{À7Ç=Â=Á?Ä?Ñ7SÔ7SÑ
Œ
×" 4¤:Ñ Œ
×! $¤)Ñ,rZÚinputsÚnum_items_in_batchcó4| ¦«ttt ¦«¦«¦«¦«}d}ˆfdt |¦«D¦«ŠˆfdD¦«ŠˆfdD¦«Š ¦«Š ¦«Šdjd}dd|dœ}  ||¦«\}} 
|||¦«\} }
j !‰  | |
|¦«\} } | | k}
nd \} }  
| |
|¦«}
 || |
|¦«\}}}} |||||
¦«\}}} | |
| ¦«| ¦«|||
| ¦«| ¦«|| | ¦ « jj+‰jjjjzd
krt-¦«i}jjt0jt0jfvr ¦«|d <jjdkr| ¦«}jrMt>  |j!¦«5}| "¦«ddd¦«n #1swxYwYnj#j"|fi|¤Ž| ¦«jj$z S) Nr…cóRg|]"Šˆfd ¦«D¦«Œ#S)có(i|]\}}||ŒSrr)rÚis €rXú
<dictcomp>z?_UnslothXPOTrainer.training_step.<locals>.<listcomp>.<dictcomp>Ûs#ø€Ð6™t˜q !1a˜”dÐ6rZ)Úitems)rs @€rXz4_UnslothXPOTrainer.training_step.<locals>.<listcomp>Ûs7øø€ÐR¸1Ð6 v§|¢|¡~¤~ÐRrZcó:g|]}t|j¦«ŒSr)r&r2©rÚxrs €rXz4_UnslothXPOTrainer.training_step.<locals>.<listcomp>Üs'ø€ÐVÈ!Õ+¨A¨tÔ/DÑVrZcófg|]-} |jjjj¦«Œ.Sr)Ú tokenize_rowr+ÚconfigÚis_encoder_decoderr2s €rXz4_UnslothXPOTrainer.training_step.<locals>.<listcomp>Ýs7ø€ÐtÐhi$×# A t¤zÔ'8Ô'KÈTÔMbÑtrZÚprompt_input_idsrBÚprompt_attention_maskrfr*rr”)%ÚtrainrSÚnextÚiterÚvaluesÚranger/Ú_prepare_inputsrErcrqr-rzÚdetachr.r“rQÚ global_steprrÚLOMOÚADALOMOÚ_get_learning_rateÚn_gpur¸Úuse_apexÚampÚ
scale_lossÚ optimizerÚbackwardrXr)rr+Ú
batch_sizer\rjr^rbrmrprvrwr­rÚ scaled_losss` ` rXÚ
training_stepz _UnslothXPOTrainer.training_stepÓsøø€ð  Š
Œ
ˆ
õd 6§=¢=¡?¤?Ñ
Ø˜ÔØÀjÑ@QÔ@QÐØVÈvÐØtÐmsÐØ×# FÑð×% fÑØÐ 2Ô9¸!Ô<ˆàÐ 2Ô$Ð%<Ôð
ð
ˆð