Files
DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/__pycache__/UnslothNashMDTrainer.cpython-311.pyc
T

224 lines
49 KiB
Plaintext
Raw Normal View History

2025-08-13 23:50:20 +00:00
§
4$há¶ãódZddlmZddlZddlmZddlmZddlmZm Z m
Z
m Z m Z m
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZmZmZm Z m!Z!m Z m"Z"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(m)Z)m*Z*mZm+Z+m,Z,m-Z-mZm.Z.m/Z/ddl+Z+ddlTddl0m1Z1m2Z2dd l3m4Z4ddlZddl5Z6dd
l7m8Z8ddlmZdd l9m:Z:m;Z<d d
d d
d
dœZ=ej>d d e=¬¦«d¦«Z?e1Gdde¦«¦«Z@ Gdde¦«ZAGddeA¦«ZBdS)z8
2025.8.4
2025.8.5
4.55.1
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)&rÚBaseImageProcessorÚBasePairwiseJudger ÚDatasetÚEvalPredictionÚFeatureExtractionMixinÚGeometricMixtureWrapperÚIterableDatasetÚ NashMDConfigÚ
NashMDTrainerÚOnlineDPOTrainerÚOptimizerNamesrÚ PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚSIMPLE_CHAT_TEMPLATEÚTrainerCallbackr Ú empty_cacheÚgenerate_model_cardÚget_comet_experiment_urlÚ
get_rewardÚis_conversationalÚis_peft_availableÚis_wandb_availableÚjinja2Úmaybe_apply_chat_templateÚnnÚosÚselective_log_softmaxÚtextwrapÚtorchÚtruncate_rightÚunwrap_model_for_generation)Ú*)Ú dataclassÚfield)ÚVersion)Ú nullcontext)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚ max_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ fullgraphÚoptionscó’tj| d|jd¦«dd¬¦«}tj| d¦«dd¬¦«}g}t ||¦«D]\}}| tj¦«}tj|d| d¦«¬¦«  d¦«}tj
|d¬¦«}||z
} |  | ¦«Œ’ tj |¦«}| |jd|jdf¦«}|S)Néÿÿÿÿér)ÚchunksÚdim)r@Úindex©r@é)
r,ÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ unsqueezeÚsqueezeÚ logsumexpÚappendÚconcat)
ÚlogitsrAÚchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚ chunk_logitsÚ chunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logpss
ú`/workspace/Fine-tuning/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothNashMDTrainer.pyÚchunked_selective_log_softmaxrZ"s5õ”[ §¢°°F´LÀÔ4DÑ!EÔ!EÐPQÐYZÐ[€NÝ”[ §¢¨rÑ!2Ô!2¸QÀaÐH€MØÐå%(¨¸Ñ%GÔ%Gð #—¥u¤}Ñ Ýœ, |¸2À{×G\ÒG\Ð]_ÑG`ÔG`Ða×iÐjlÑmˆÝ œ?¨<¸Ø)Ð,<Ñ<ˆØ×" Ýœ,Ð':ÑØ-×5°v´|ÀA´ÈÌ ÐUVÌÐ6XÑØ Ðócó®eZdZUdZedddi¬¦«Zeeed<edddi¬¦«Z ee
ed < d/ˆfd.„ Z ˆxZ S)0ÚUnslothNashMDConfigaö
Configuration class for the [`NashMDTrainer`].
Subclass of [`OnlineDPOConfig`] we can use all its arguments and add the following:
Parameters:
mixture_coef (`float` or `list[float]`, *optional*, defaults to `0.5`):
Logit mixture coefficient for the model and reference model. If a list of floats is provided then the
mixture coefficient is selected for each new epoch and the last coefficient is used for the rest of the
epochs.
helpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsr=z8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksFÚnor>éréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?çlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsrCéôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastéé@éÚsigmoidÚvllmçš™™™™™á?c‘ óÌ|dkrtd|d¦«|dkrtd|d¦«||#dkr
|$dkrd}d }#|ˆ€!d
d lm}t |’¦«d zd ¦«}ˆ|…d
krt d
¦«|…dkrt d¦«t
¦«jdžid|d|d|d|d|d|d|d|d| “d|
d| d| d|
d|d|d|d |d!|d"|d#|d$|d%|d&|d'|d(|d)|d*|d+|d,|d-|d.|d/| “d0|!“d1|"“d2|#“d3|$“d4|%“d5|&“d6|'“d7|(“d8|)“d9|*“d:|+“d;|,“d<|-“d=|.“d>|/“d?|0“d@|1“dA|2“dB|3“dC|4“dD|5“dE|6“dF|7“dG|8“dH|9“dI|:“dJ|;“dK|<“dL|=“dM|>“dN|?“dO|@“dP|A“dQ|B“dR|C“dS|D“dT|E“dU|F“dV|G“dW|H“dX|I“dY|J“dZ|K“d[|L“d\|M“d]|N“d^|O“d_|P“d`|Q“da|R“db|S“dc|T“dd|U“de|V“df|W“dg|X“dh|Y“di|Z“dj|[“dk|\“dl|]“dm|^“dn|_“do|`“dp|a“dq|b“dr|c“ds|d“dt|e“du|f“dv|g“dw|h“dx|i“dy|j“dz|k“d{|l“d||m“d}|n“d~|o“d|p“d€|q“d|r“d|s“dƒ|t“d„|u“d…|v“d†|w“d‡|x“dˆ|y“d‰|z“dŠ|{“d||“dŒ|}“d|~“dŽ|d|€“d|d‘|‚“d’|ƒ“d“|„“d”|…“d•|†“d–|‡“d—|ˆ“d˜|‰“d™|Š“dš|‹“d›|Œ“dœ|d|Ž“|‘¤Ž||_||_ dS)ŸNgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!rCza` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rqrrÚunsloth_training_checkpointsrcr)Ú cpu_countrdzUUnsloth: Please set a positive non-zero temperature since your results will be wrong.é
zgUnsloth: Please set a positive non-zero temperature less than 10, since sampling will be quite erratic.Ú
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚ weight_decayÚ
adam_beta1Ú
adam_beta2Ú adam_epsilonÚ
max_grad_normÚnum_train_epochsÚ max_stepsÚlr_scheduler_typeÚ warmup_ratioÚ warmup_stepsÚ log_levelÚlog_level_replicaÚlog_on_each_nodeÚ logging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚ ddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚ disable_tqdmÚremove_unused_columnsÚ label_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚ fsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ adafactorÚgroup_by_lengthÚlength_column_nameÚ report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚ push_to_hubÚresume_from_checkpointÚ hub_model_idÚ hub_strategyÚ hub_tokenÚhub_private_repoÚhub_always_pushÚ hub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚ fp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚ torchdynamoÚ ray_scopeÚ ddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚreward_model_pathÚjudgeÚmax_new_tokensÚ
max_lengthÚ temperatureÚmissing_eos_penaltyÚ loss_typeÚdataset_num_procÚdisable_dropoutÚuse_vllmÚvllm_model_implÚgpu_memory_utilizationÚds3_gather_for_generationÚmodel_init_kwargs©)
ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingr„ÚminÚ MathErrorÚsuperÚ__init__rarb)”Úselfr†r‡rˆr‰rrrrrrr“r”r•r–r—r™rrr r­r¿rÿrrrrrrrrrr r
r r r
rrrrrrrarbÚkwargsr„Ú __class__s” €rYrzUnslothNashMDConfig.__init__Js¿ ø€ðh ˜ Ð Õ'9ð;VÐ]jð;Vð;Vð;Vñ(Wô(Wð"WØ ˜ Ð ¥Mð3FÐUbð3Fð3Fð3Fñ%Gô%GðGØ Ð  -°7Ò":Ð":¸zÈSÒ?PÐ?PØ7ˆ ˆ Ð " 9 9¡;¤;¨q¡=°!Ñ Ø ˜!Ò Ð ÝÐ
˜
Ð
ÝðFñGôGð
Gð ŒÔðNNN#˜ðN <à#7Ð#7ðN Ngð N
$˜ð N *˜
N$8Ð#7ðN+FÐ*EðN*DÐ)CðN(@Ð'?ðN'>Ð&=ðN+FÐ*EðN'>Ð&=ðN$˜ðN'>Ð&=ðN*˜Mð!N <ð"(˜<ð#N <ð$$˜ð%N <ð&$˜ð'N <ð((˜<ð)N <ð**˜Mð+N <ð,/ð-N <ð."˜ ð/N <ð0!2Ð 1ð1N <ð2(˜<ð3N <ð4(˜<ð5N <ð6"˜ ð7N <ð8!2Ð 1ð9N <ð:/ð;N <ð<&˜+ð=N <ð>/ð?N <ð@"4Ð!3ðAN <ðB*˜MðCN <ðD&<Ð%;ðEN <ðF*˜MðGN <ðH$˜ðIN <ðJ/ðKN <ðL/ðMN <ðN!2Ð 1ðON <ðP.˜oðQN <ðR7^Ð6]ðSN <ðTgðUN <ðVgðWN <ðX,˜^ðYN <ðZ4ð[N <ð\"˜ ð]N <ð^*˜Mð_N <ð` xðaN <ðb4ðcN <ðd4ðeN <ðf,˜^ðgN <ðh&<Ð%;ðiN <ðj,˜^ðkN <ðl,˜^ðmN <ðn4ðoN <ðp$˜ðqN <ðr&˜+ðsN <ðt*˜MðuN <ðv!2Ð 1ðwN <ðxEðyN <ðz$8Ð#7ð{N <ð|$˜ð}N <ð~&<Ð%;ðN <ð@*DÐ)CðAN <ðB$˜ðCN <ðD xðEN <ðF(˜<ðGN <ðH%:Ð$9ðIN <ðJ&˜+ðKN <ðL&<Ð%;ðMN <ðN%:Ð$9ðON <ðP!2Ð 1ðQN <ðR/ðSN <ðT4ðUN <ðV#6Ð"5ðWN <ðX&˜+ðYN <ðZ2TÐ1Sð[N <ð\"4Ð!3ð]N <ð^"˜ ð_N <ð`&<Ð%;ðaN <ðbEðcN <ðd$˜ðeN <ðf"˜ ðgN <ðh.˜oðiN <ðj"4Ð!3ðkN <ðl"˜ ðmN <ðn*DÐ)CðoN <ðp!2Ð 1ðqN <ðr%:Ð$9ðsN <ðt%:Ð$9ðuN <ðv-JÐ,IðwN <ðx#6Ð"5ðyN <ðz*DÐ)Cð{N <ð|&˜+ð}N <ð~&<Ð%;ðN <ð@(˜<ðAN <ðB(˜<ðCN <ðD"˜ ðEN <ðF/ðGN <ðH.˜oðIN <ðJ(˜<ðKN <ðL&<Ð%;ðMN <ðN-JÐ,IðON <ðP*DÐ)CðQN <ðR&<Ð%;ðSN <ðT(˜<ðUN <ðV$8Ð#7ðWN <ðX(@Ð'?ðYN <ðZ!2Ð 1ð[N <ð\*˜Mð]N <ð^$8Ð#7ð_N <ð`/ðaN <ðb&˜+ðcN <ðd"˜ ðeN <ðf&˜+ðgN <ðh*˜MðiN <ðj%:Ð$9ðkN <ðl"4Ð!3ðmN <ðn)BÐ(AðoN <ðp-JÐ,IðqN <ðr#6Ð"5ðsN <ðt$8Ð#7ðuN <ðv"4Ð!3ðwN <ðx*˜MðyN <ðz/ð{N <ð|#6Ð"5ð}N <ð~&<Ð%;ðN <ð@-JÐ,IðAN <ðB!2Ð 1ðCN <ðDEðEN <ðF,˜^ðGN <ðH$˜ðIN <ðJ&˜+ðKN <ðL#6Ð"5ðMN <ðN"˜ ðON <ðP/ðQN <ðR.˜oðSN <ðT xðUN <ðV.˜oðWN <ðX&<Ð%;ðYN <ðZ)BÐ(Að[N <ð\!2Ð 1°Fð]NNN <ð^%9ˆÔ!Ø"4ˆÔÐÐr[)NNFFFrcFr>r>NNrdrdrrerfrgrhrirjrkrlr=rmrnrrorpTNrqFrCFrqrrNTFFFFFFrsrsFFFFrtruFFNr=NNFrvFNrNr=NNTNFNNFrvrNNNNrwrxNFFryNNNNTFTFFNNrzNNFNFNFTruNNNrvTFNr{r|FNNFFNNFFFNFTNNr}r~rhNrNTFr€rTNNr=)
Ú__name__Ú
__module__Ú __qualname__Ú__doc__r1rarrÚ__annotations__rbÚintrÚ
__classcell__©rs@rYr]r]3szø€ð ð ð+0¨%ØØÐ+ñ+ô+И( 3œ-ððñð*/¨ØØÐ*ñ*ô*И #œððñð ØØØØØ$Ø&'Ø%&Ø#'Ø"&Ø&'Ø"#ØØ"%ØØØØØØØØØØØØØØØ!&ØØØØØØ27ØØØØØØØØØØØ!'ØØØØØØØØØ!"Ø%)ØØØØ $ØØ!&Ø $Ø Ø ØØØØ-1ØØ!$ØØØØØØ%)Ø Ø $Ø $Ø(-Ø"Ø%*ØØ!%ØØØØØØ!&Ø(,Ø%*Ø!%ØØ#Ø#'Ø ØØ ØØØØØ $Ø!Ø$)Ø(-ØØ Ø"Ø!&Ø(,Ø ØØØØØØØØØ Ø!%Ø$(Ø Øðcrrrrrrrrrr5r[r]c ó"eZdZdZddgZ d%deeejfdeeejfdeeejdfd e e
d
e e d e e d e ee
efd
e ee
eee
ffde eeeeefde ede e egefde eedeejjejjjfde e ejejgejfddfˆfd
Ze d¦«Z!dZ"dZ#dZ$dZ%dZ&dZ' ddZ( d&dejdeeeeje)ffde e*dejfd „Z+ d'd!e ed"e ed#eeeedffd$„Z,ˆxZ-S)(Ú_UnslothNashMDTrainerrvÚtrlznash-mdN©NNÚmodelÚ ref_modelÚ reward_modelrÚargsÚ
data_collatorÚ
train_datasetÚ eval_datasetÚprocessing_classÚ peft_configÚcompute_metricsÚ callbacksÚ
optimizersÚpreprocess_logits_for_metricsÚreturncóìt¦« ||||||||| | |
| | |
|¬¦«|jj|_ggggggggggggdœ |_|jg|jd<g|jd<dSdS)N)r+r,r-rr.r/r0r1r2Úreward_processing_classr3r4r5r6r7) úloss/klúobjective/entropyú
loss/scoreúrewards/probabilitiesúrewards/accuraciesúrewards/marginsú logps/chosenúlogps/rejectedúval/model_contain_eos_tokenúval/ref_contain_eos_tokenÚbetaÚ mixture_coefúrewards/chosenúrewards/rejected)rrr.rFÚ
_mixture_coefÚstatsr-)rr+r,r-rr.r/r0r1r2r3r4r5r6r7rs €rYrz_UnslothNashMDTrainer.__init__„ø€õ& Œ×ÒØØØØØ-Ø$4ØØ!Ø*Gð ñ
ô
ð
ð$"œYÔ3ˆÔð Ø!#ØØ%'Ø"$ØØ Ø+-Ø)+ØØð
ð
ˆŒ
ð Ô Ð (Ø+-ˆDŒJÐ (Ø-/ˆDŒJÐ  (r[cóÀt|jt¦«r>|jj}|t |j¦«kr
|j|n |jdS|jS)Nr=)Ú
isinstancerIÚlistÚstateÚepochÚlen)rrOs rYrFz"_UnslothNashMDTrainer.mixture_coefÀs\å (­ ”JÔ$ˆEØ05½¸DÔ<NÑ8OÔ8OÒ0OÐ0O% ,ÐUYÔUgÐhjÔUkÐ Ô %r[có¤t||j¦«5}| |d|d|j¬¦«}ddd¦«n #1swxYwY|j |¦«}|j€;t
¦«r*t|t¦«r|  ¦«}n"|}n|j |j¦«}tj ¦«5t|||j|j
|jj¬¦«}| |d|d|j¬¦«}ddd¦«n #1swxYwY||fS)NÚ input_idsÚattention_mask)rRrSÚgeneration_config)r+r,rTrFÚdevice)r.Ú acceleratorÚgeneraterTÚ unwrap_modelr,r$rLrÚget_base_modelr,Úno_gradrrFrU) rr+ÚpromptsÚunwrapped_policy_for_gen_ctxÚ model_outputÚpolicy_model_for_gmwÚref_model_for_gmwÚ
mixture_modelÚmixture_outputs rYÚ_generate_completionsz+_UnslothNashMDTrainer._generate_completionsÈå
°Ô0@Ñ
 ÐEaØ7×! &Ð'7Ô8Ø"&Ô"8ðAñôˆ ð ð ñ ô ð ð ð ð ð ð øøøð ð ð ð ð/×<¸ð
Œ>Ð 
9¥zÐ2FÍ Ñ'RÔ'Rð
9Ø$8×$GÒ$GÑ$IÔ$IÐ%9Ð!%Ô 0× =Ò =¸d¼nÑ MÔ MÐ õŒ]‰_Œ_ð
ð
Ý+Ø"&Ô"8ØÔ ñôˆMð+×! &Ð'7Ô8Ø"&Ô"8ðôˆ
ð
ð
ñ
ô
ð
ð
ð
ð
ð
ð
øøøð
ð
ð
ð
ð˜+s$*A Á AÁAÃ%AEÅEÅ
Ecó|djd}|dd|df}t||jj|jj¦«\}}t j|d|fd¬¦«t j|d|fd¬¦«|ddœ}|dd|df}t||jj|jj¦«\}} t j|d|fd¬¦«t j|d| fd¬¦«|ddœ}
||
fS)NrRrCrBrSÚraw©rRrSrd)rFr-r2Ú eos_token_idÚ pad_token_idr,Úcat) rr]rar[Úcontext_lengthÚmodel_completion_idsÚmodel_completion_maskÚ
model_dataÚmixture_completion_idsÚmixture_completion_maskÚ mixture_datas rYÚ_process_completionsz*_UnslothNashMDTrainer._process_completionsøsKØ  Ô3°AÔð ,¨A¨A¨A¨~¨¨Ð,>ÔÝ6DØ  $Ô"7Ô"DÀdÔF[ÔFhñ7
ô7
ÑМ G¨KÔ$8Ð:NÐ#OÐUVÐ#œi¨Ð1AÔ)BÐDYÐ(ZÐ`aИ5”>ð
ð
ˆ
ð"0°°°°>°?°?Ð0BÔ!CÐÝ:HØ " DÔ$9Ô$FÈÔH]ÔHjñ;
ô;
ÑÐ 7õœ G¨KÔ$8Ð:PÐ#QÐWXÐ#œi¨Ð1AÔ)BÐD[Ð(\ÐbcИ5”>ð
ð
ˆ ð ˜'r[có8tj¦«5t|j|d|jj|¦«\}}}t|j|d|jj|¦«\}}}ddd¦«n #1swxYwY|jjŠtj|d|jj kd¬¦«}tj|d|jj kd¬¦«}||xx|jjzcc<||xx|jjzcc<||fS)NrRr=rB)
r,rZr"r-r2rgr.r Úanyrf) rrlroriÚ model_scoresÚmixture_scoresÚmodel_contain_eosÚmixture_contain_eoss rYÚ_compute_rewardsz&_UnslothNashMDTrainer._compute_rewardssoÝ
Œ]‰_Œ_ð ð Ý!+ØÔ! :¨kÔ#:¸DÔ<QÔ<^Ð`nñ"ô"Ñ ˆAˆ|˜$.ØÔ!  Ô#<¸dÔ>SÔ>`Ðbpñ$ô$Ñ ˆAˆ~˜qð  ð ð ñ ô ð ð ð ð ð ð øøøð ð ð ð ð Œ9Ô 4Ý %¤ ¨*°[Ô*AÀTÔEZÔEgÒ*gÐmoÐ pÑ pÔ pÐ Ý"'¤)¨L¸Ô,EÈÔI^ÔIkÒ,kÐqsÐ"tÑ"tÔ"tÐ Ø Ð ´ Ô0MÑ Ð 0°D´IÔ4QÑ ˜+s”AA7Á7A;Á>A;c óº‡ —|d}|j |ddd|dfd¬¦«}d|D¦«}|j |ddd|dfd¬¦«}d|D¦«}td|di¦«rod „|D¦«}tj¦«}| t ¦«Š ˆ fd
|D¦«}ˆ fd |D¦«}d |D¦«}ˆ fd
|D¦«}|j |tt||¦«¦«d¬¦«}tj ||dj ¬¦«S)NrdrRT)Úskip_special_tokenscó6g|]}| ¦«ŒSr©Ústrip©Ú.0Ú
completions rYú
<listcomp>z8_UnslothNashMDTrainer._compute_judge.<locals>.<listcomp>*s$Ð!^Ð!^Ð!^¸ *×"2Ò"2Ñ"4Ô"4Ð!^Ð!^Ð!^r[có6g|]}| ¦«ŒSrr|r~s rYrz8_UnslothNashMDTrainer._compute_judge.<locals>.<listcomp>/s$Ð#bÐ#bÐ#b¸: J×$4Ò$4Ñ$6Ô$6Ð#bÐ#bÐ#br[Úpromptrcóg|]}d|dœgŒ S©Ú assistant)ÚroleÚcontentrr~s rYrz8_UnslothNashMDTrainer._compute_judge.<locals>.<listcomp>1s0ð&ð&ð&ØCM˜+°*Ð&ð&ð&r[có<g|]} |¬¦«ŒS©)Úmessages©Úrender)rÚmessageÚtemplates €rYrz8_UnslothNashMDTrainer._compute_judge.<locals>.<listcomp>6s'ø€ÐP¸Wx—°ÑPr[có<g|]} |¬¦«ŒS©rr€rs €rYrz8_UnslothNashMDTrainer._compute_judge.<locals>.<listcomp>7s'ø€Ð%tÐ%tÐ%tÈz h§o¢o¸z oÑ&JÔ&JÐ%tÐ%tÐ%tr[cóg|]}d|dœgŒ Sr…rr~s rYrz8_UnslothNashMDTrainer._compute_judge.<locals>.<listcomp>9s0ð(ð(ð(ØCM˜+°*Ð(ð(ð(r[có<g|]} |¬¦«ŒSrs €rYrz8_UnslothNashMDTrainer._compute_judge.<locals>.<listcomp><s4ø€ð(ð(ð(Ø9C¨Ñ(ð(ð(r[)Ú
return_scores)rU)
r2Ú batch_decoder#r&Ú EnvironmentÚ from_stringrrrMrGr,ÚtensorrU)
rrlrorir[Úmodel_data_completionsÚmixture_data_completionsÚ environmentÚ probabilityrs
@rYÚ_compute_judgez$_UnslothNashMDTrainer._compute_judge%ø€Ø˜Ø!%Ô!6×!CÒ!CØ  # A A A ~  Ð$6Ô 7ÈTð"Dñ"
ô"
Ðð"_Ð!^ÐG]Ð!^Ñ!^Ô!^Ðà#'Ô#8×#EÒ#EØ ˜Ô % a a ¨¨Ð&8Ô 9Ètð$Fñ$
ô$
Ð ð$cÐ#bÐIaÐ#bÑ#bÔ#bÐ Ý ˜°¬
Ð  ð&ð&ØQgð&ñ&ô&Ð .ˆ"×.Õ/CÑDˆÐPˆGØ%tÐ%tÐ%tÐ%tÐ]sÐ%tÑ%tÔ%tÐ (ð(ØQið(ñ(ô(Ð (ð(ð(ð(ØG_ð(ñ(ô(Ð ”j× Ý Ð+Ð-EÑ ð
ô
ˆ õ
Œ|˜
¸;Ô0GÔ0NÐOr[có®ˆfd}|||¦«}tj¦«5|j€9| ¦«5|||¦«}ddd¦«n #1swxYwYn||j|¦«}ddd¦«n #1swxYwY|ddddfdk}| |d¦«}| |d¦«}||fS)Ncóª||d|d¬¦«}|jdddz
df}t||ddddf¦«}|S)NrRrS)rSrCr=)rPr*)ÚdataÚoutputrPÚtoken_logprobsris €rYÚcompute_logprobs_for_datazJ_UnslothNashMDTrainer._compute_logprobs.<locals>.compute_logprobs_for_dataHspø€ØQt˜(¸Ð>NÔ9OÐPˆ”] 1 1 1 n°qÑ&8¸2Ð&=Ð#=Ô>ˆ2°6¸ Ô;LÈQÈQÈQÐP^ÐP_ÐP_ÐM_Ô;`Ñaˆ !r[rSrrw)r,rZr,Údisable_adapterÚ masked_fill)rr+rlriÚmodel_logprobs_model_dataÚref_logprobs_model_dataÚmodel_padding_masks ` rYÚ_compute_logprobsz'_UnslothNashMDTrainer._compute_logprobsGs´ø€ð%>Ð$=¸eÀZÑ$PÔ$PÐŒ]‰_Œ_ð `ð `ØŒ~Ð×[ð[Ø.GÐ.GÈÈzÑ.ZÔ.ZÐ[ð[ð[ñ[ô[ð[ð[ð[ð[ð[ð[øøøð[ð[ð[ð[øð+DÐ*CÀDÄNÐT^Ñ*_Ô*_Ð  `ð `ð `ñ `ô `ð `ð `ð `ð `ð `ð `øøøð `ð `ð `ð `ð(Ð(8Ô9¸!¸!¸!¸^¸_¸_Ð:LÔMÐQRÒØ$=×$IÒ$IÐJ\Ð^aÑ$bÔ$bÐ!Ø"9×"EÒ"EÐFXÐZ]Ñ"^Ô"^Ðà)Ð+BÐCs5¦BÁ
AÁ BÁA ÁBÁ"A Á#BÂBÂ Bcó:|dz
| d¦«z}tj¦«5||z
}| d¦«}ddd¦«n #1swxYwY||z d¦«}|j|z|z
}| ¦«||fS)Ngà?rC)Úsumr,rZrEÚmean) rÚscoreÚ log_ratioÚ
kl_div_logÚ kl_div_lossÚlosss rYÚ_compute_lossesz%_UnslothNashMDTrainer._compute_losses`ð˜"Ð&?×&CÒ&CÀAÑ&FÔ&FÑFˆõŒ]‰_Œ_ð1Ð4KÑKˆš )ˆ *øøøð!Ð#<Ñ<×AÀ!ÑDˆ ðŒy˜;ÑÑ.ˆàyŠy‰{Œ{˜E :Ð-s¯AÁAÁAc ófˆfd} jd | |¦«¦«jd | |¦«¦«| d¦«} | d¦«}
jd | | ¦«¦«jd | |
¦«¦«jR‰jd | | ¦«¦«jd | |
¦«¦«jd  | |¦«¦«| d¦« }jd
 | |¦«¦«| |
z
}jd  | |¦«¦«|d k ¦«}jd
 | |¦«¦«|ddd|dfjjk d¬¦«}|ddd|dfjjk d¬¦«}jd | | ¦«¦«¦«jd | | ¦«¦«¦«jd j¦«jd j ¦«dS)Ncó€j |¦« ¦« ¦«S©N)rVÚgather_for_metricsr­Úitem)r˜rs €rYÚ gather_meanz:_UnslothNashMDTrainer._log_statistics.<locals>.gather_means2ø€ØÔ#×6°vÑ>×E× Lr[r=r;rCrArBrGrHr>r<r@rr?rRrBrCrDrErF)
rJrNr-Úfloatr2rfrrrErF)rrlroÚkl_divrirtruÚmodel_logprobs_model_data_sumÚref_logprobs_model_data_sumÚentropy_model_dataÚmarginÚaccuracyÚ model_eosÚ mixture_eoss` rYÚ_log_statisticsz%_UnslothNashMDTrainer._log_statisticsts-ø€ð Mð Mð Mð Mð Mð
Œ
 × ¨ °EÑ(:Ô(:Ñ Œ
×$ [ Ñ%8Ô%8Ñ)B×(EÒ(EÀaÑ(HÔ(HÐ%Ø&=×&AÒ&AÀ!Ñ&DÔ&DÐ Œ
"×)¨+¨+Ð6SÑ*TÔ*TÑ Œ
Ð$×+¨K¨KÐ8SÑ,TÔ,TÑ Ô Ð ŒJÐ (× ° ¸LÑ0IÔ0IÑ ŒJÐ 1°+°+¸nÑ2MÔ2MÑ 
Œ
Ð+×2°;°;¸{Ñ3KÔ3KÑ8×;¸Ø Œ
Ð'×.¨{¨{Ð;MÑ/NÔ/NÑ/Ð1LÑØ Œ
Ð,¨[¨[¸Ñ-@Ô-@јQJ×'ˆØ Œ
Ð(× ° ¸HÑ0EÔ0EÑ   Ô,¨Q¨Q¨Q°°°Ð-?Ô@ÀDÔDYÔDfÒf×kÐpqÐrˆ Ø# KÔ°°°N°O°OÐ1CÔÔH]ÔHjÒj×oÐtuÐ Ø Œ
иÀYÇ_Â_ÑEVÔEVÑ9WÔ9WÑ Œ
Ð/×6°{°{À;×CTÒCTÑCVÔCVÑ7WÔ7WÑ
Œ
×! $¤)Ñ Œ
"×)¨$Ô*;Ñ<r[ÚinputsÚnum_items_in_batchc ó$| ¦«ttt ¦«¦«¦«¦«}d}ˆfdt |¦«D¦«ŠˆfdD¦«ŠˆfdD¦«Š ¦«Š ¦«Šdjd}dd|dœ}  ||¦«\}} 
|||¦«\} }
j 2‰  | |
|¦«\} } tj| | z
¦«}
nd \} }  | |
|¦«}
 || |¦«\}} |||
¦«\}}} | |
| ¦«||
| ¦«| ¦«|| | ¦
«
jj+‰jjjjzd
krt1¦«i}jjt4jt4jfvr ¦«|d <jjdkr| ¦«}j rMtB "|j#¦«5}| $¦«ddd¦«n #1swxYwYnj%j$|fi|¤Ž| ¦«jj&z S) NrƒcóRg|]"Šˆfd ¦«D¦«Œ#S)có(i|]\}}||ŒSrr)rÚis €rYú
<dictcomp>zB_UnslothNashMDTrainer.training_step.<locals>.<listcomp>.<dictcomp>·s#ø€Ð6™t˜q !1a˜”dÐ6r[)Úitems)rs @€rYrz7_UnslothNashMDTrainer.training_step.<locals>.<listcomp>·s7øø€ÐR¸1Ð6 v§|¢|¡~¤~ÐRr[có:g|]}t|j¦«ŒSr)r'r2©rÚxrs €rYrz7_UnslothNashMDTrainer.training_step.<locals>.<listcomp>¸s'ø€ÐVÈ!Õ+¨A¨tÔ/DÑVr[cófg|]-} |jjjj¦«Œ.Sr)Ú tokenize_rowr+ÚconfigÚis_encoder_decoderr2s €rYrz7_UnslothNashMDTrainer.training_step.<locals>.<listcomp>¹s7ø€ÐtÐhi$×# A t¤zÔ'8Ô'KÈTÔMbÑtr[Úprompt_input_idsrCÚprompt_attention_maskrer*rr•)'ÚtrainrPÚnextÚiterÚvaluesÚranger/Ú_prepare_inputsrFrbrpr-rxrrrÚdetachr.r”rNÚ global_steprrÚLOMOÚADALOMOÚ_get_learning_rateÚn_gpur­Úuse_apexÚampÚ
scale_lossÚ optimizerÚbackwardrVr)rr+Ú
batch_sizer[rir]rarlrortrurÚ scaled_losss` ` rYÚ
training_stepz#_UnslothNashMDTrainer.training_step¯sløø€ð  Š
Œ
ˆ
õd 6§=¢=¡?¤?Ñ
Ø˜ÔØÀjÑ@QÔ@QÐØVÈvÐØtÐmsÐØ×# FÑð×% fÑØÐ 2Ô9¸!Ô<ˆàÐ 2Ô$Ð%<Ôð
ð
ˆð
ð(,×'AÒ'AÀ%ÈÑ'QÔ'QÑ$ˆ $(×#<Ò#<¸\È>Ð[bÑ#cÔ#cÑ ˆ
 Ô Ð (Ø+/×+@Ò+@ÀÈ\Ð[iÑ+jÔ+jÑ (ˆL˜œ) L°>Ñ$AÑBˆKˆKà+5Ñ (ˆL˜×-¨j¸ÑWˆ>B×=SÒ=SÐTYÐ[eÐguÑ=vÔ=vÑ!Ð#:ð#×2Ð3LÐNeÐgrшˆe
×ÒØ Ø Ø Ø LŠL‰NŒNØ MŠM‰OŒOØ Ø Ø ñ