Files
DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/__pycache__/UnslothBCOTrainer.cpython-311.pyc
T

447 lines
90 KiB
Plaintext
Raw Normal View History

2025-08-13 23:50:20 +00:00
§
3$hÉWãó¤dZddlmZddlZddlmZddlmZddlmZm Z m
Z
m Z m Z m
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZm Z m!Z!m"Z"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(m)Z)m Z m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0m1Z1m2Z2m3Z3m4Z4m5Z5m6Z6m7Z7m8Z8m9Z9m:Z:m;Z;m<Z<m=Z=m>Z>mZm?Z?m@Z@mAZAmBZBmCZCmDZDmEZEmFZFmGZGmHZHmIZImZmJZJmKZKmZm
Z
m Z m!Z!m'Z'm7Z7m=Z=mAZAmZddlAZAddlTddlLmMZMmNZNdd lOmPZPddlZddlQZ?dd
lRm@Z@ddlmZdd lSmTZTmUZVd d
d d
d
dœZWejXd d eW¬¦«d¦«ZYeMGdde¦«¦«ZZ Gdde'¦«Z[Gdde[¦«Z\dS)z8
2025.8.4
2025.8.5
4.55.1
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)KrÚAutoModelForCausalLMÚ BCOConfigÚ
BCOTrainerÚBaseImageProcessorÚCLF_NAMEr ÚDPODataCollatorWithPaddingÚ DataCollatorÚ
DataLoaderÚDatasetÚEvalLoopOutputÚFeatureExtractionMixinÚLiteralÚLogisticRegressionrÚ PartialStateÚPathÚ PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚ RUNNING_NAMEÚRunningMomentsÚSequentialSamplerÚTrainerÚTrainerCallbackÚTrainingArgumentsr Ú_process_tokensÚ _tokenizeÚautocastÚcontextmanagerÚcreate_reference_modelÚ defaultdictÚdisable_dropout_in_modelÚgenerate_model_cardÚget_comet_experiment_urlÚ
has_lengthÚinspectÚis_comet_availableÚis_joblib_availableÚis_peft_availableÚis_sklearn_availableÚis_wandb_availableÚ
itemgetterÚjoblibÚlog_table_to_comet_experimentÚloggerÚmaybe_apply_chat_templateÚnnÚnpÚ nullcontextÚosÚ
pad_to_lengthÚpdÚpeft_module_casting_to_bf16Úprepare_deepspeedÚprepare_model_for_kbit_trainingÚrandomÚselective_log_softmaxÚtextwrapÚtorchÚtqdmÚwarningsrrrrr$r4r:r?rH)Ú*)Ú dataclassÚfield)ÚVersion)r>)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚ max_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ fullgraphÚoptionscó’tj| d|jd¦«dd¬¦«}tj| d¦«dd¬¦«}g}t ||¦«D]\}}| tj¦«}tj|d| d¦«¬¦«  d¦«}tj
|d¬¦«}||z
} |  | ¦«Œ’ tj |¦«}| |jd|jdf¦«}|S)Néÿÿÿÿér)ÚchunksÚdim)r[Úindex©r[é)
rHÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ unsqueezeÚsqueezeÚ logsumexpÚappendÚconcat)
Úlogitsr\Úchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚ chunk_logitsÚ chunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logpss
ú]/workspace/Fine-tuning/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothBCOTrainer.pyÚchunked_selective_log_softmaxru"s5õ”[ §¢°°F´LÀÔ4DÑ!EÔ!EÐPQÐYZÐ[€NÝ”[ §¢¨rÑ!2Ô!2¸QÀaÐH€MØÐå%(¨¸Ñ%GÔ%Gð #—¥u¤}Ñ Ýœ, |¸2À{×G\ÒG\Ð]_ÑG`ÔG`Ða×iÐjlÑmˆÝ œ?¨<¸Ø)Ð,<Ñ<ˆØ×" Ýœ,Ð':ÑØ-×5°v´|ÀA´ÈÌ ÐUVÌÐ6XÑØ Ðócó´eZdZUdZedddi¬¦«Zeeed<edddi¬¦«Z ee
ed < d0ˆfd/„ Z ˆxZ S)1ÚUnslothBCOConfiguù
Configuration class for the [`BCOTrainer`].
This class includes only the parameters that are specific to BCO training. For a full list of training arguments,
please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may
differ from those in [`~transformers.TrainingArguments`].
Using [`~transformers.HfArgumentParser`] we can turn this class into
[argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
command line.
Parameters:
max_length (`int` or `None`, *optional*, defaults to `1024`):
Maximum length of the sequences (prompt + completion) in the batch. This argument is required if you want
to use the default data collator.
max_prompt_length (`int` or `None`, *optional*, defaults to `512`):
Maximum length of the prompt. This argument is required if you want to use the default data collator.
max_completion_length (`int` or `None`, *optional*, defaults to `None`):
Maximum length of the completion. This argument is required if you want to use the default data collator
and your model is an encoder-decoder.
beta (`float`, *optional*, defaults to `0.1`):
Parameter controlling the deviation from the reference model. Higher β means less deviation from the
reference model.
label_pad_token_id (`int`, *optional*, defaults to `-100`):
Label pad token id. This argument is required if you want to use the default data collator.
padding_value (`int` or `None`, *optional*, defaults to `None`):
Padding value to use. If `None`, the padding value of the tokenizer is used.
truncation_mode (`str`, *optional*, defaults to `"keep_end"`):
Truncation mode to use when the prompt is too long. Possible values are `"keep_end"` or `"keep_start"`.
This argument is required if you want to use the default data collator.
disable_dropout (`bool`, *optional*, defaults to `True`):
Whether to disable dropout in the model and reference model.
generate_during_eval (`bool`, *optional*, defaults to `False`):
If `True`, generates and logs completions from both the model and the reference model to W&B or Comet
during evaluation.
is_encoder_decoder (`bool` or `None`, *optional*, defaults to `None`):
When using the `model_init` argument (callable) to instantiate the model instead of the `model` argument,
you need to specify if the model returned by the callable is an encoder-decoder model.
precompute_ref_log_probs (`bool`, *optional*, defaults to `False`):
Whether to precompute reference model log probabilities for training and evaluation datasets. This is
useful when training without the reference model to reduce the total GPU memory needed.
model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the model from a
string.
ref_model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the reference model
from a string.
dataset_num_proc (`int` or `None`, *optional*, defaults to `None`):
Number of processes to use for processing the dataset.
prompt_sample_size (`int`, *optional*, defaults to `1024`):
Number of prompts that are fed to density ratio classifier.
min_density_ratio (`float`, *optional*, defaults to `0.5`):
Minimum value of the density ratio. The estimated density ratio is clamped to this value.
max_density_ratio (`float`, *optional*, defaults to `10.0`):
Maximum value of the density ratio. The estimated density ratio is clamped to this value.
helpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsrXz8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksFÚnorYéréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?çlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsr^éôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastééééœÿÿÿÚkeep_endçà?ç$@c” óŠ|dkrtd|d¦«|dkrtd|d¦«||#dkr
|$dkrd}d }#|Ž€!d
d lm}•t |•¦«d zd ¦«t ¦«jdžid
|d|d|d|d|d|d|d|d| “d|
d| d| d|
d|d|d|d|d|d|d |d!|d"|d#|d$|d%|d&|d'|d(|d)|d*|d+|d,| “d-|!“d.|"“d/|#“d0|$“d1|%“d2|&“d3|'“d4|(“d5|)“d6|*“d7|+“d8|,“d9|-“d:|.“d;|/“d<|0“d=|1“d>|2“d?|3“d@|4“dA|5“dB|6“dC|7“dD|8“dE|9“dF|:“dG|;“dH|<“dI|=“dJ|>“dK|?“dL|@“dM|A“dN|B“dO|C“dP|D“dQ|E“dR|F“dS|G“dT|H“dU|I“dV|J“dW|K“dX|L“dY|M“dZ|N“d[|O“d\|P“d]|Q“d^|R“d_|S“d`|T“da|U“db|V“dc|W“dd|X“de|Y“df|Z“dg|[“dh|\“di|]“dj|^“dk|_“dl|`“dm|a“dn|b“do|c“dp|d“dq|e“dr|f“ds|g“dt|h“du|i“dv|j“dw|k“dx|l“dy|m“dz|n“d{|o“d||p“d}|q“d~|r“d|s“d€|t“d|u“d|v“dƒ|w“d„|x“d…|y“d†|z“d‡|{“dˆ||“d‰|}“dŠ|~“d|dŒ|€“d|dŽ|‚“d|ƒ“d|„“d‘|…“d’|†“d“|‡“d”|ˆ“d•|‰“d–|Š“d—|‹“d˜|Œ“d™|dš|Ž“d›|dœ|d|‘“|”¤Ž||_|“|_dS)ŸNgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!r^za` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rŒrÚunsloth_training_checkpointsr~r)Ú cpu_countrÚ
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚ weight_decayÚ
adam_beta1Ú
adam_beta2Ú adam_epsilonÚ
max_grad_normÚnum_train_epochsÚ max_stepsÚlr_scheduler_typeÚ warmup_ratioÚ warmup_stepsÚ log_levelÚlog_level_replicaÚlog_on_each_nodeÚ logging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚ ddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚ disable_tqdmÚremove_unused_columnsÚ label_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚ fsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ adafactorÚgroup_by_lengthÚlength_column_nameÚ report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚ push_to_hubÚresume_from_checkpointÚ hub_model_idÚ hub_strategyÚ hub_tokenÚhub_private_repoÚhub_always_pushÚ hub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚ fp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚ torchdynamoÚ ray_scopeÚ ddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚ
max_lengthÚmax_prompt_lengthÚmax_completion_lengthÚbetaÚlabel_pad_token_idÚ
padding_valueÚtruncation_modeÚdisable_dropoutÚgenerate_during_evalÚis_encoder_decoderÚprecompute_ref_log_probsÚmodel_init_kwargsÚref_model_init_kwargsÚdataset_num_procÚprompt_sample_sizeÚmin_density_ratioÚmax_density_ratio©) ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingr ÚminÚsuperÚ__init__r|r})—Úselfr¡r­r¿rÿrrrrrrrrrr r
r r r
rrrrrrrrrrrrrrrrrrr r!r"r#r$r%r&r'r(r)r*r+r,r-r.r/r0r1r|r}Úkwargsr Ú __class__s— €rtr8zUnslothBCOConfig.__init__wø€ðn ˜ Ð Õ'9ð;VÐ]jð;Vð;Vð;Vñ(Wô(Wð"WØ ˜ Ð ¥Mð3FÐUbð3Fð3Fð3Fñ%Gô%GðGØ Ð  -°7Ò":Ð":¸zÈSÒ?PÐ?PØ7ˆ ˆ Ð " 9 9¡;¤;¨q¡=°!Ñ àŒÔðQQQ#˜ðQ <à#7Ð#7ðQ Qgð Q
$˜ð Q *˜
Q$8Ð#7ðQ+FÐ*EðQ*DÐ)CðQ(@Ð'?ðQ'>Ð&=ðQ+FÐ*EðQ'>Ð&=ðQ$˜ðQ'>Ð&=ðQ*˜Mð!Q <ð"(˜<ð#Q <ð$$˜ð%Q <ð&$˜ð'Q <ð((˜<ð)Q <ð**˜Mð+Q <ð,/ð-Q <ð."˜ ð/Q <ð0!2Ð 1ð1Q <ð2(˜<ð3Q <ð4(˜<ð5Q <ð6"˜ ð7Q <ð8!2Ð 1ð9Q <ð:/ð;Q <ð<&˜+ð=Q <ð>/ð?Q <ð@"4Ð!3ðAQ <ðB*˜MðCQ <ðD&<Ð%;ðEQ <ðF*˜MðGQ <ðH$˜ðIQ <ðJ/ðKQ <ðL/ðMQ <ðN!2Ð 1ðOQ <ðP.˜oðQQ <ðR7^Ð6]ðSQ <ðTgðUQ <ðVgðWQ <ðX,˜^ðYQ <ðZ4ð[Q <ð\"˜ ð]Q <ð^*˜Mð_Q <ð` xðaQ <ðb4ðcQ <ðd4ðeQ <ðf,˜^ðgQ <ðh&<Ð%;ðiQ <ðj,˜^ðkQ <ðl,˜^ðmQ <ðn4ðoQ <ðp$˜ðqQ <ðr&˜+ðsQ <ðt*˜MðuQ <ðv!2Ð 1ðwQ <ðxEðyQ <ðz$8Ð#7ð{Q <ð|$˜ð}Q <ð~&<Ð%;ðQ <ð@*DÐ)CðAQ <ðB$˜ðCQ <ðD xðEQ <ðF(˜<ðGQ <ðH%:Ð$9ðIQ <ðJ&˜+ðKQ <ðL&<Ð%;ðMQ <ðN%:Ð$9ðOQ <ðP!2Ð 1ðQQ <ðR/ðSQ <ðT4ðUQ <ðV#6Ð"5ðWQ <ðX&˜+ðYQ <ðZ2TÐ1Sð[Q <ð\"4Ð!3ð]Q <ð^"˜ ð_Q <ð`&<Ð%;ðaQ <ðbEðcQ <ðd$˜ðeQ <ðf"˜ ðgQ <ðh.˜oðiQ <ðj"4Ð!3ðkQ <ðl"˜ ðmQ <ðn*DÐ)CðoQ <ðp!2Ð 1ðqQ <ðr%:Ð$9ðsQ <ðt%:Ð$9ðuQ <ðv-JÐ,IðwQ <ðx#6Ð"5ðyQ <ðz*DÐ)Cð{Q <ð|&˜+ð}Q <ð~&<Ð%;ðQ <ð@(˜<ðAQ <ðB(˜<ðCQ <ðD"˜ ðEQ <ðF/ðGQ <ðH.˜oðIQ <ðJ(˜<ðKQ <ðL&<Ð%;ðMQ <ðN-JÐ,IðOQ <ðP*DÐ)CðQQ <ðR&<Ð%;ðSQ <ðT(˜<ðUQ <ðV$8Ð#7ðWQ <ðX(@Ð'?ðYQ <ðZ!2Ð 1ð[Q <ð\*˜Mð]Q <ð^$8Ð#7ð_Q <ð`/ðaQ <ðb&˜+ðcQ <ðd"˜ ðeQ <ðf&˜+ðgQ <ðh*˜MðiQ <ðj%:Ð$9ðkQ <ðl"4Ð!3ðmQ <ðn)BÐ(AðoQ <ðp-JÐ,IðqQ <ðr#6Ð"5ðsQ <ðt$8Ð#7ðuQ <ðv"4Ð!3ðwQ <ðx*˜MðyQ <ðz/ð{Q <ð|#6Ð"5ð}Q <ð~&<Ð%;ðQ <ð@-JÐ,IðAQ <ðB$˜ðCQ <ðD!2Ð 1ðEQ <ðF%:Ð$9ðGQ <ðH4ðIQ <ðJ"4Ð!3ðKQ <ðL*˜MðMQ <ðN.˜oðOQ <ðP.˜oðQQ <ðR$8Ð#7ðSQ <ðT"4Ð!3ðUQ <ðV(@Ð'?ðWQ <ðX!2Ð 1ðYQ <ðZ%:Ð$9ð[Q <ð\/ð]Q <ð^"4Ð!3ð_Q <ð`!2Ð 1ðaQ <ðb!2Ð 1°FðcQQQ <ðd%9ˆÔ!Ø"4ˆÔÐÐrv)“NNFFFr~FrYrYNNrrrr€rrr„r…r†r‡rXrˆr‰rrTNrŒFr^FrŒrNTFFFFFFrŽFFFFrrFFNrXNNFrFNrNrXNNTNFNNFrrNNNNrr“NFFr”NNNNTFTFFNNr•NNFNFNFTrNNNrTFNrr—FNNFFNNFFFNFTr˜r™Nr‰NrTFNFNNNr˜rNrX)
Ú__name__Ú
__module__Ú __qualname__Ú__doc__rMr|rrÚ__annotations__r}Úintr8Ú
__classcell__©r;s@rtrxrx3s„ø€ð9ð9ðt+0¨%ØØÐ+ñ+ô+И( 3œ-ððñð*/¨ØØÐ*ñ*ô*И #œððñð ØØØØØ$Ø&'Ø%&Ø#'Ø"&Ø&'Ø"#ØØ"%ØØØØØØØØØØØØØØØ!&ØØØØØØ27ØØØØØØØØØØØ!'ØØØØØØØØØ!"Ø%)ØØØØ $ØØ!&Ø $Ø Ø ØØØØ-1ØØ!$ØØØØØØ%)Ø Ø $Ø $Ø(-Ø"Ø%*ØØ!%ØØØØØØ!&Ø(,Ø%*Ø!%ØØ#Ø#'Ø ØØ ØØØØØ $Ø!Ø$)Ø(-ØØ Ø"Ø!&Ø(,ØØØ $ØØØØ!Ø#(Ø Ø $ØØØ Øðissssssssss5rvrxc$óReZdZdZddgZ dYdeeeje fde
eeeje fde d e
e d
e
ee e
e e ffd e
eeeeefd e
ed
e
egefde
eedeejjejjjfde
eejejgejfde
e
de
eege
fde
e de
e de
ede
ef"ˆfd
Zed¦«Z dej!dej!fdZ"dej#dej#dej!fdZ$de
e eeej#ffdeej!ej!ffd „Z%dZd"e d#e&dej!fd$„Z'ˆfd%„Z(ˆfd&„Z)e*d'„¦«Z+de,fˆfd(„ Z-d[d
e
e de,fˆfd)„
Z.d*e
de
fd+„Z/e0 d\d.ej!d/ej#d0e1d1e&d2e1dej!f d3„¦«Z2dejde
e eeej#ffdeej!ej!ej!ej!ffd4„Z3d5ej!dej!fd6„Z4 d]d8ej!d9ej!d:ej!d;ej!d<e
ej!d5e
ej!d=e1deej!ej!ej!ej!ffd>„Z5 d]de
e eeej#ffd=e1fd?„Z6 d^deeejfd@e
e eeje7ffdeejeeje
e ejffffdA„Z8d_dCe
e e9fdDe:dEddfdF„Z;d[d"e
e de
ej<j=j>fdG„Z?de
e ej#fdee e ffdH„Z@ d[deeejfd@e
e eeje7ffdIe1dJe
ee fdK„ZA d`dMe,dNe dIe
e1dJe
ee dOe def ˆfdP„
ZBd[dQe
e e9fdRe
e9ddfˆfdS„
ZCˆfdT„ZD dadUe
e dVe
e dWee ee dffdX„ZEˆxZFS)bÚ_UnslothBCOTrainerrÚtrlÚbcoN©NNÚmodelÚ ref_modelÚargsÚ
train_datasetÚ eval_datasetÚprocessing_classÚ
data_collatorÚ
model_initÚ callbacksÚ
optimizersÚpreprocess_logits_for_metricsÚ peft_configÚcompute_metricsÚmodel_adapter_nameÚref_adapter_nameÚembedding_funcÚembedding_tokenizerc
ó|+t¦«rt¦«std¦«t|¦«turt d¦«t
|t¦«s|||urt d¦«|ji}nªt
|t¦«st d¦«|j}|  d¦«}|ht
|t¦«r|dkrtt|¦«}|dkr-t
|tj ¦«st d|d¦«||d<|j
i}nªt
|t¦«st d ¦«|j
}|  d¦«}|ht
|t¦«r|dkrtt|¦«}|dkr-t
|tj ¦«st d|d¦«||d<t
|t¦«rtj|fi|¤Ž}t
|t¦«rtj|fi|¤Ž}d
|_t#¦«s| t d ¦«t#¦«r5| 2t
|t$¦«r| ¦«}t|d d
¦«st|d
d
¦«r`t)|d¦«o,dt+t-jt0¦«j¦«v}d|ji}|r
|j|d<t1|fi|¤Ž}nV|jrOt)|d¦«r| ¦«n*d}| ¦« |¦«|}|jr't|d
d
¦«rtA|¦«d|_nV|jrOt)|d¦«r| ¦«n*d}| ¦« |¦«|j!r+tE¦«stG¦«st d¦«||j$j%|_%n"|j%t d¦«|j%|_%t#¦«ot
|t$¦«|_&||_'||_(|r||_)n*|j&s|j*rd|_)ntW|¦«|_)|t d¦«|j,t[j.dt^¦«d}|j,|j,}|j0t[j.dt^¦«d}|j0|j0}d}|j1€#|j%rt[j.dt^¦«d}|j1|j%r|j1}|€Qte|j3|j4|j%¬¦«}|j5r!d
|_5t[j.dt^¦«d|_6nd
|_6|j7r*tq|¦«|j)tq|j)¦«||_,|j!|_!|j4|_4|j9|j9n|j3|_9||_0|j:|_:||_1|j*|_*d
|_;d
|_<t{d¦«|_>|j?|_?t|j$dd
¦«|_@t|j$d d!¦«|_A|j@r%|jAd!krt[j.d"t^¦«||_B||_Cd|jDd#<t¦« F¦«5| Gtd$|i|jI¬%¦«}|$| Gtd$|i|jI¬%¦«}| Gt”d||jCd&œ|jId'¬(¦«}d)|j%||j,|j:|j4|j0|j1d*œ}| Gt||jId+¬,¦«}|x| Gt”||jCd&œd|jId-¬.¦«}d)|j%||j,|j:|j4|j0|j1d*œ}| Gt||jId/¬,¦«}| Ld0„|jId1¬2¦«}| Ld3„|jId4¬2¦«}ddd¦«n #1swxYwYt¦« N||||||||
| |
| ¬5¦ « d
|_Ot)|jPd6¦«r|jP Q|jR¦«t)|d7¦«sd8¦«|jTr0|jUjVjWjXd9kr|j*rt d:¦«|j)|j&s|j*st d;¦«nM|jTr t³|j)|jU¦«|_)n&|jU Z|j)d¬<¦«|_)|jU¬=¦«|_\|jB|j]rdS| ^||j_j`¬>¦«}| ^||j_j`¬>¦«}tja||fd?¬@¦«} tjatjb|ddd?f¦«tjc|ddd?f¦«fd?¬@¦«}!tÉdA¬B¦« e|  f¦« g¦« h¦«|! f¦« h¦«¦«|_i|ji j| f¦« g¦« h¦«tjb|ddd?f¦« f¦« h¦«¦«}"|ji j| f¦« g¦« h¦«tjc|ddd?f¦« f¦« h¦«¦«}#t×jldC|"dD|#¦«dS)ENz}BCOTrainer with UDM requires the scikit-learn and joblib libraries. Please install it with `pip install scikit-learn joblib`.z3Please use `BCOConfig` instead `TrainingArguments`.zœ`model` and `ref_model` cannot be the same object. If you want `ref_model` to be the same as `model`, you must mass a copy of it, or `None` if you use peft.zRYou passed model_kwargs to the BCOTrainer. But your model is already instantiated.Ú torch_dtyperznInvalid `torch_dtype` passed to the BCOConfig. Expected a string with either `torch.dtype` or 'auto', but got ú.zZYou passed ref_model_kwargs to the BCOTrainer. But your ref_model is already instantiated.FzŽPEFT is not installed and you passed a `peft_config` in the trainer's kwargs, please install it with `pip install peft` to use the PEFT modelsÚis_loaded_in_8bitÚis_loaded_in_4bitrÚuse_gradient_checkpointingÚenable_input_require_gradscó0| d¦«dS©NT©Úrequires_grad_©ÚmoduleÚinputÚoutputs rtÚmake_inputs_require_gradz=_UnslothBCOTrainer.__init__.<locals>.make_inputs_require_grad sØ×-¨dÑ3rvTcó0| d¦«dSrbrcres rtriz=_UnslothBCOTrainer.__init__.<locals>.make_inputs_require_grad5sØ×)¨$Ñ/rvz`generate_during_eval=True` requires Weights and Biases or Comet to be installed. Please install `wandb` or `comet-ml` to resolve.zMWhen no model is provided, you need to pass the parameter is_encoder_decoder.zdmax_length or a processing_class must be specified when using the default DPODataCollatorWithPaddingz§When using DPODataCollatorWithPadding, you should set `max_length` in the `BCOConfig`. It will be set to `512` by default, but you should do it yourself in the future.r™z®When using DPODataCollatorWithPadding, you should set `max_prompt_length` in the `BCOConfig`. It will be set to `128` by default, but you should do it yourself in the future.é€zÜWhen using DPODataCollatorWithPadding with an encoder decoder architecture, you should set `max_completion_length` in the BCOTrainer's init it will be set to `128` by default, but you should do it yourself in the fu
˜!˜Gœ*€rvzFiltering desirable examples)rvrxcó|d Sr{r2r}s rtrpz-_UnslothBCOTrainer.__init__.<locals>.<lambda>s ˜a œj˜.€rvzFiltering undesirable examples) rIrKrOrLrMrNrPrUrQrRrSÚadd_model_tagsÚ acceleratorzXYour `Trainer` does not have an `accelerator` object. Consider upgrading `transformers`.ézrYou cannot use `precompute_ref_log_probs=True` with Deepspeed ZeRO-3. Please set `precompute_ref_log_probs=False`.z]No reference model and model is not a Peft model. Try setting `precompute_ref_log_probs=True`)Úevaluation_mode)r)Ú sample_sizerr]Úbalanced)Ú class_weightz(UDM classifier training scores: chosen: z , rejected: )mr5r3Ú ImportErrorÚtyper&Ú
ValueErrorÚ
isinstanceÚstrr,ÚgetÚgetattrrHÚdtyper-r
Úfrom_pretrainedÚ_peft_has_been_casted_to_bf16r4rÚmerge_and_unloadÚhasattrror1Ú signaturerDÚ
parametersrrr`Úget_input_embeddingsÚregister_forward_hookrÑrBr)r6r2Úconfigr*Ú
is_peft_modelrVrWrJr+r+r!rJÚwarnÚ UserWarningr"r#rrlr%Úuse_dpo_data_collatorr(r-r&r'Ú _precomputed_train_ref_log_probsÚ_precomputed_eval_ref_log_probsr,Ú_stored_metricsr$Úaux_loss_enabledÚ
aux_loss_coefrXrYÚwarnings_issuedrÚmain_process_firstÚmapr;r.r(r'Úfilterr7r8Úmodel_accepts_loss_kwargsrIr€Ú
_tag_namesÚAttributeErrorÚis_deepspeed_enabledrÚstateÚdeepspeed_pluginÚ
zero_stagerCÚ
prepare_modelr"ÚrunningrÿÚ_get_sample_prompt_embeddingsrKr/ÚcatÚ ones_likeÚ
zeros_likerÚfitÚcpuÚfloatÚnumpyÚclfÚscorer:Úinfo)%r9rIrJrKrLrMrNrOrPrQrRrSrTrUrVrWrXrYr,r[r-Ú_support_gc_kwargsÚprepare_model_kwargsrir!r"r#ruÚ desirableÚ undesirableÚchosen_embeddingsÚrejected_embeddingsÚ
embeddingsÚlabelsÚ chosen_meanÚ
rejected_meanr;s% €rtr8z_UnslothBCOTrainer.__init__²ø€ð, Ð %Õ/CÑ/EÔ/EÐ %ÕJ]ÑJ_ÔJ_Ð ðPñôð
õ ‰:Œ:Õ Ð ˜%¥Ñ ¨%Ð*;À ÈUÐ@RÐ@RÝðZñôð
ð
Ô )Ø "РРݘE¥3Ñ
Ð rà $Ô 6Ð Ø
Ñ>ˆÐ˜k­>°KÀ6Ò4IÐ4IÝ")­Ñ"=Ô"= ¸ÌÑ1UÔ1UÐXðJUðXðXðXñôðð4?Ð!  Ô -Ø$&Ð ˜I¥sÑ CÝØôð
ð%)Ô$>Ð /×3°MÑBˆÐ˜k­>°KÀ6Ò4IÐ4IÝ")­Ñ"=Ô"= ¸ÌÑ1UÔ1UÐXðJUðXðXðXñôðð8CÐ% mÑ e  UÝ8¸ÐTÐBSÐTˆEå Ñ  aÝ<¸`ÐJ_Ð`ˆ.3ˆÔ Ñ4 ] {Ð'>Ýðañôð
õÑ
Ô
ñ0 ] [Ñ%<å˜Ñ
×0å1°5Ñ
a½WÀUÐL_ÐafÑ=gÔ=gð
aÝ%,ØÐ&ô&ð&àÝÔ%Õ&EÑ:ô:ðð )EÀdÔFaÐ'bÐoØLPÔLnÐ(Ð)HÑ7¸ÐVÐAUÐVØÔ
aå˜5Ð">ÑaØ××FÐG_шEØŒyð
:W UÐ,?ÀÑ
+¨EÑ2à59Ô2øð
Ô
]å
]Ø××BÐC[Ñ Ô  Õ.@Ñ.BÔ.Bð ÕFXÑFZÔFZð ÝðDñôð
ð
Ð Ø&+¤lÔ&Eˆ
Ô
Ð mà&*Ô&=ˆ QµZÀÅyÑ5QÔ5QˆÔØ"4ˆÔØ 0ˆÔà ð&ˆDŒNˆ
Ô
ð ; 4Ô#@ð!ˆDŒNˆ3°EÑ:ˆDŒNà Ð Øôð
ð Œ?Ð ŒMðcåñ
ô
ð
ð
ˆJØ Œ?Ð œˆJà Ô ŒMðcåñ
ô
ð
ð
!$Ð Ø Ô -Ø $Ô 6Ð à $ÐØ Ô -°$Ô2IÐ ŒMðdåñ
ô
ð
ð
%(Ð Ô 1°dÔ6MÐ 1Ø$(Ô$>Ð Ð Ý:Ø#'Ô#:Ø#'Ô#:ðñôˆMð Ô
Ø-2Ô
ð\åñôðð *.ˆ &à).ˆ  Ô ð $  Œ~ЬÑ$ˆŒØ$(Ô$=ˆÔ!Ø"&Ô"9ˆÔØ37Ô3EÐ3Q˜/ÐWgÔWtˆÔØ!2ˆÔØ3ˆÔØ%:ˆÔ"Ø(,Ô(EˆÔ16ˆÔ-Ø/4ˆÔ +Ð+DÐ+DÑÔð”IˆŒ Ý '¨¬ Ð6LÈeÑ TÔ TˆÔÝ$ U¤\Ð3IÈ3ÑÔØ Ô ð  TÔ%7¸3Ò%>Ð%>Ý ŒMðõñ 
ô
ð
ðÔØ#6ˆÔ ð48ˆÔÐ
‰^Œ^×
H ðH à)×)°kÐCSÐ5TÐ_cÔ_tðôˆÐ+×*Ð,<Ð ô  ðØØ(8ÐQUÔQiÐÔ ôˆØ&*Ô&=Ø"œoØ#'Ô#7Ø&*Ô&=Ø%)Ô%;Ø)-Ô)Cð ð ˆØÔôˆÐ+ר,<ÐUYÔUmÐ Ø  ô  ð!Ø*.Ô*AØ!1Ø"&¤/Ø'+Ô';Ø*.Ô*AØ)-Ô)?Ø-1Ô-Gð ð  ð ,× ô  ð&×$¨tÔ/DÐKiðôˆIð(×(°4Ô3HÐOoðôˆKðMH ðH ðH ñH ôH ðH ðH ðH ðH ðH ðH øøøðH ðH ðH ðH õT Œ×ÒØØØØ!Ø*Gð ñ
ô
ð
ð"*/ˆÔ 4”:Ð  ŒJ× % d¤oÑ t˜]Ñ Ý Øôð
ð
Ô  ØÔÔAÀQÒFÈ4ÔKhÐ ðIñôðð Œ>Ð Ô
¨$Ô*Gð
Ý ØôðøðÔ
fÝ!2°4´>À4ÔCSÑ!TÔ!Tà!%Ô!1×!?Ò!?ÀÄÐ`dÐ!?Ñ!eÔ!eå%°$Ô2BÐŒ à Ô Ð &¨$Ô*EÐ ˆ ×>¸yÐVZÔV_ÔVrÐØ"×ÐZ^ÔZcÔZvÐå”YÐ 1Ð3FÐGÈQÐOˆ
ÝÝ
Œ_Ð.¨q¨q¨q°!¨tÔ
5µuÔ7GÐH[Ð\]Ð\]Ð\]Ð_`Ð\`ÔHaÑ7bÔ7bÐ cÐijð
ñ
ô
ˆõ&°:Ð>× NŠNÑ Ô × $× ,¨f¯jªj©l¬l×.@Ò.@Ñ.BÔ.Bñ
ô
ˆŒð”h—n’nØ × +× 3µU´_ÐEVÐWXÐWXÐWXÐZ[ÐW[ÔE\Ñ5]Ô5]×5aÒ5aÑ5cÔ5c×5iÒ5iÑ5kÔ5kñ
ô
ˆ ðœŸšØ × -× 5µuÔ7GÐH[Ð\]Ð\]Ð\]Ð_`Ð\`ÔHaÑ7bÔ7b×7fÒ7fÑ7hÔ7h×7nÒ7nÑ7pÔ7pñ
ô
ˆ
õ Œ Ðg¸{ÐgÐXeÐhsÜ-E9b2â2b6â9b6có&|jduo|jduSrn)rXrY©r9s rtÚmatch_underlying_distributionz0_UnslothBCOTrainer.match_underlying_distributionKsàÔ"¨$ÐW°4Ô3KÐSWÐ3WÐWrvÚprompt_embeddingsÚreturncó¼|j}|j}|jj}|j ||jj¬¦«}|jd}| d¬¦«|jjk}|j  |¦«}|jddkrtj g||¬¦«S|j  
| ¦« ¦« ¦«¦«dddf}tj|||¬¦«}|j |d¬ ¦«}|||z||dzz}||}|S)
Calculates the probability if the given prompt embedding is from desirable dataset. This function calculates
the probability in the process and ensemble across processes.
)Ú pad_indexrr^r])ÚdevicerŽÚmean)Ú reduction)rÚ
process_indexÚpad_across_processesrYrlrarerHÚtensorr¶Ú
predict_probar³r´Ú as_tensorÚreduce) r9ÚrankÚpadded_prompt_embeddingsr„ÚnonzeroÚprobs rtÚ_get_chosen_probz#_UnslothBCOTrainer._get_chosen_probOsað
'ˆØ)ˆØÔÔ-ˆà#'Ô#3×#HÒ#HØ ¨Ô)AÔ)Nð$Iñ$
ô$
Ð ð4°QÔ Ø/°AÐ6¸$Ô:RÔ:_Ò_ˆØ Ô3Ð4LÑð Ô "  Ò ”< ¨6¸Ð Œx×%Ð&7×&;Ò&;Ñ&=Ô&=×&CÒ&CÑ&EÔ&E×&KÒ&KÑ&MÔ&MÑNÈqÈqÈqÐRSÈtÔÝŒ˜t¨5¸ÐØÔ×& t°vÐ>ˆàK ¸¸Ñ)AÐØGŒ}ˆàˆ rvÚ input_idsÚattention_maskcóætj||jjk|jj|¦«}tj¦«5| ||¬¦«}ddd¦«n #1swxYwY|S)z|
Replaces processing_class.pad_token_id to embedding_tokenizer.pad_token_id and applies self.embedding_func
©N)rHÚwhererNrlrYÚno_gradrX)r9r¿s rtÚ_vectorize_promptz$_UnslothBCOTrainer._vectorize_promptlõ”KØ ˜Ô Ô ñ
ô
ˆ õ Œ]‰_Œ_ð ð Ø×ôˆJð ð ð ñ ô ð ð ð ð ð ð øøøð ð ð ð ð ÐsÁA&Á&A*Á-A*ÚbatchcóL|jsdS| |d|d¬¦«}tj|dtj|j¬¦«}tj|¦«d}tj|¦«d}||df}||df}||fS) z.Extract embeddings from frozen embedding modelrHÚembedding_input_idsÚembedding_attention_maskrÜr|r.)rHÚboolrÊ)r9r¿Ú
chosen_idxÚ rejected_idxr½s rtÚ_get_prompt_embeddingsz)_UnslothBCOTrainer._get_prompt_embeddings~ð
Ô Ø×Ð Ð!;Ô
ô
ˆ
õ
˜e GœnµE´JÀzÔGXÐÝ”[ ÑÔ
Ý”{ F +¨AÔ.ˆ à& z°3 ÔذsÐ):Ôà!Ð#6Ð7rvr™Údatasetr„cóÚtt|¦«|¦«}tj t|¦«|f¬¦«}| |¦«}|jj|j|jj |jj
ddœ}|j   t|fi|¤Ž¦«}tj¦«5tjd¦«}t#|d¬¦«D]g} | | d| d¬ ¦«}
|j  |
¦«}
tj||
 ¦«f¦«}Œh d
d
d
¦«n #1swxYwY|S) zv
Sample instances from dataset and get prompt embeddings. Used for density ratio classifier training.
)ÚsizeF©Ú
batch_sizeÚ
collate_fnÚ num_workersÚ
pin_memoryÚshufflerz!Building sample prompt embeddings©ÚiterablerxN)r6Úlenr=rEÚchoiceÚselectrKrOrÚpreparerrHÚemptyrIÚgather_for_metricsr¯) r9r„Ú n_samplesÚ rand_indicesÚembedding_datasetÚdataloader_paramsÚ data_loaderÚall_embeddingsÚ padded_batchr¿s rtz0_UnslothBCOTrainer._get_sample_prompt_embeddings”õ˜G™ œ  kÑ2ˆ Ý”y×'­¨G© ¬ ¸I¸Hˆ à#ŸNšN¨<Ñðœ)ÔÔœ9Ôœ)Ôð 
ð
ÐðÔ&×.­zÐ:KÐ/aÐ/aÐO`Ð/aÐ/aÑbˆ å
Œ]‰_Œ_ð Oð OÝ"œ[¨™^œ^ˆNÝ $¨kÐ@cÐ dÑ dÔ dð
Oð
O Ø!×*Ð+@ÔAØ#/Ð0JÔ#Kðô
ð-×ÑL
Ý!&¤¨N¸J¿NºNÑ<LÔ<LÐ+MÑ!NÔ!Nð

Oð Oð Oð Oñ Oô Oð Oð Oð Oð Oð Oð Oøøøð Oð Oð Oð OðÐsÃBE Å E$Å'E$có||n |jj}t¦« |¦«|jjr…|j tj  
|t¦«¦«|j rCtj|jtj  
|t ¦«d¬¦«dSdSdS)NT)Úcompress)rKr7Ú_save_optimizer_and_schedulerrÚis_main_processr­Ú save_to_jsonr?ÚpathÚjoinr!r8Údumpr¶r)r9r;s €rtrz0_UnslothBCOTrainer._save_optimizer_and_scheduler´ø€Ø#-Ð#9ZZ¸t¼yÔ?Sˆ
Ý
Œ×-¨jÑ Ô Ô  Yà ŒL× %¥b¤g§l¢l°:½|Ñ&LÔ&LÑ Ô
YÝ ˜DœH¥b¤g§l¢l°:½xÑ&HÔ&HÐSWÐ  Yð Yð
Yð
Yrvcó|tjd|¦«dSt¦« |¦«tj |t¦«}tj |¦«rtj
|j |¦«|_ |j
r_tj |t¦«}tj |¦«rtj|¦«|_dSdSdS)NzMissing Checkpoint )r:Ú warning_oncer7Ú_load_optimizer_and_schedulerr?rrr!Úisfiler"Úload_from_jsonrr­rr8Úloadr¶)r9Ú
checkpointÚ running_fileÚclf_filer;s €rtr
z0_UnslothBCOTrainer._load_optimizer_and_scheduler¿ø€Ø Ð Ý Ô Ð B°jÐ BÐ BÑ ˆFå
Œ×-¨jÑ”w—|’| Jµ Ñ Ý
Œ7>Š>˜  YÝ8¸Ô9IÈ<ÑXˆDŒLà Ô ”w—|’| JµÑ9ˆŒw~Š~˜
!œ; 0ð

1rvc#ózK|jr8|js1|j |j¦« ¦«n
t
¦«5|jr|j |j¦«dV|jr!|j |jpd¦«ddd¦«dS#1swxYwYdS)zWContext manager for handling null reference model (that is, peft adapter manipulation).Nrz) r˜rWrÚ unwrap_modelrIÚdisable_adapterr>Ú set_adapterrVs rtÚnull_ref_contextz#_UnslothBCOTrainer.null_ref_contextÐsèèð
Ô
Ø*.Ô*?ð
ˆDÔ × )¨$¬*Ñ ð Mð Mð
Ô

×& tÔ'<Ñ ˆEˆEˆÔ
MØ
×& tÔ'>Ð'KÀ)Ñ Mð Mð Mñ Mô Mð Mð Mð Mð Mð Mð Mð Møøøð Mð Mð Mð Mð Mð MsÁAB0Â0B4Â7B4có¢|jr'|js|jj|j|jj|jjddœ}|j t|j
fi|¤Ž¦«}g}t|d¬¦«D]X}|  |¦«}|j 
|¦«}| | ¦«¦«ŒY|j
 dt#j|¦« ¦« ¦«¬¦«|_
d|_t+¦« ¦«S)
Returns the training [`~torch.utils.data.DataLoader`].
Subclass of transformers.src.transformers.trainer.get_train_dataloader to precompute `ref_log_probs`.
Frëz!Train dataset reference log probsrñÚreference_logps©ÚnameÚcolumnT)r+rKrOrrrLrIÚcompute_reference_log_probsrøriÚ
add_columnrHr´r7Úget_train_dataloader)r9Úreference_completion_logpsrÿÚreference_completion_logpr;s €rtrz'_UnslothBCOTrainer.get_train_dataloaderÞsQø€ð Ô Ô1Vñ"œiÔ#œyÔ"œiÔ ð !ð!Ð ðÔ*×2µ:¸dÔ>PÐ3fÐ3fÐTeÐ3fÐ3fÑgˆKØ)+Ð &å $¨kÐ@cÐ dÑ dÔ dð
Sð
S Ø,0×,LÒ,LÈ\Ñ,ZÔ,ZÐ)à,0Ô,<×,OÒ,OÐPiÑ,jÔ,jÐ*×1Ð2K×2OÒ2OÑ2QÔ2QÑRà!%Ô!3×!>Ò!>Ø&­u¬yÐ9SÑ/TÔ/T×/ZÒ/ZÑ/\Ô/\×/bÒ/bÑ/dÔ/dð"?ñ"ô"ˆ ð59ˆ ‰wŒw×-rvcóê||jtd¦«||n|j}|jr&|js|jj|j|jj|jjddœ}|j  
t|fi|¤Ž¦«}g}t|d¬¦«D]X}| 
|¦«}|j  |¦«}| | ¦«¦«ŒY| dt%j|¦« ¦« ¦«¬¦«}|j||_d |_t-¦« |¬
¦«S) 
Returns the evaluation [`~torch.utils.data.DataLoader`].
Subclass of transformers.src.transformers.trainer.get_eval_dataloader to precompute `ref_log_probs`.
Args:
eval_dataset (`torch.utils.data.Dataset`, *optional*):
If provided, will override `self.eval_dataset`. If it is a [`~datasets.Dataset`], columns not accepted
by the `model.forward()` method are automatically removed. It must implement `__len__`.
Nz-Trainer: evaluation requires an eval_dataset.Frëz Eval dataset reference log probsrñrrT)rM)rMr‰r+rrKrOrrrIrrirrHr´r7Úget_eval_dataloader)r9rMrrÿrr;s €rtr!z&_UnslothBCOTrainer.get_eval_dataloaderø€ð Ð  DÔ$5Ð$=ÝÐ MØ'3Ð'?||ÀTÔEVˆ à Ô Ô1Uñ"œiÔ#œyÔ"œiÔ ð !ð!Ð ðÔ*×2µ:¸lÐ3`Ð3`ÐN_Ð3`Ð3`ÑaˆKà)+Ð &å $¨kÐ@bÐ cÑ cÔ cð
Sð
S Ø,0×,LÒ,LÈ\Ñ,ZÔ,ZÐ)à,0Ô,<×,OÒ,OÐPiÑ,jÔ,jÐ*×1Ð2K×2OÒ2OÑ2QÔ2QÑ'×&­u¬yÐ9SÑ/TÔ/T×/ZÒ/ZÑ/\Ô/\×/bÒ/bÑ/dÔ/dðôˆ
Ô Ð,Ø$0Ô!Ø37ˆDÔ ‰wŒw×*¸ ÐErvrÿc óätj¦«5|j€ | ¦«5|jrD| |d|d| d¦«|d¬¦«j}n(| |d|d¬ ¦«j}ddd¦«n #1swxYwYns|jrD| |d|d| d¦«|d¬¦«j}n(| |d|d¬ ¦«j}ddd¦«n #1swxYwY| ||dd
|j|j ¬ ¦«}|S) zfComputes log probabilities of the reference model for a single padded batch of a BCO specific dataset.NÚprompt_input_idsÚprompt_attention_maskÚcompletion_decoder_input_idsÚcompletion_labels)Údecoder_input_idsrÀÚcompletion_input_idsÚcompletion_attention_mask)Úaverage_log_probr*r%)
rHrJrr*rIrkÚget_batch_logpsr%)r9rÿÚcompletion_logitsÚcompletion_logpss rtrz._UnslothBCOTrainer.compute_reference_log_probs.s0å
Œ]‰_Œ_ð ð ØŒ~Ð×

Ô !Ø,0¯JªJØ(Ð);Ô<Ø+7Ð8OÔ+PØ.:×.>Ò.>Ð?]Ñ.^Ô.^Ø#/Ð0CÔ#Dð -7ñ-ô-ô
 -1¯JªJØ(Ð)?Ô@Ø+7Ð8SÔ+Tð-7ñ-ô-ô










!øøøð



!øð Ô Ø(,¯ªØ$Ð%7Ô8Ø'3Ð4KÔ'LØ*6×*:Ò*:Ð;YÑ*ZÔ*ZØ+Ð,?Ô)7ñ)ô)ô
ð )-¯ªØ$Ð%;Ô<È\ÐZuÔMvð)7ñ)ô)äð&ð7 ð ð ñ ô ð ð ð ð ð ð øøøð ð ð ð ð> × Ø Ð  
ô
Ðð Ðs6D;°A4B0Â$ D;Â0B4 Â4D;Â7B4 Â8A7D;Ä;D?ÅD?Fršrkr+r%r*có®|jdd|jkrtd¦«|s2|ddddf ¦«}|ddddddf}n| ¦«}||k}d|||k<t||¦«}|r.||z d¦«| d¦«z S||z d¦«S)aCompute the log probabilities of the given labels under the given logits.
Args:
logits: Logits of the model (unnormalized). Shape: (batch_size, sequence_length, vocab_size)
labels:
Labels for which to compute the log probabilities. Label tokens with a value of label_pad_token_id are
ignored. Shape: (batch_size, sequence_length)
average_log_prob:
If True, return the average log probability per (non-masked) token. Otherwise, return the sum of the
log probabilities of the (non-masked) tokens.
Returns:
A tensor of shape (batch_size,) containing the average/sum log probabilities of the given labels under the
given logits.
NrXzKLogits (batch and sequence length dim) and labels must have the same shape.r^r)rar‰ÚclonerFÚsum)rkr+r%r*Ú loss_maskrss rtr,z"_UnslothBCOTrainer.get_batch_logpsYð. Œ<˜˜˜Ô  ¤ Ò Ð ˜A˜A˜A˜q˜r˜r˜E”]×*ˆFؘA˜A˜A˜s ˜s A A A˜&ˆFˆ—\\^”^ˆÐ0ˆ ð01ˆˆvиÑà ð# iÑ/×4°RÑ8¸9¿=º=ÈÑ;LÔ;LÑ # /×4°RÑ 8rvcóz|jrd d¦«dœni}|jrd|d<|dfddi|¤Ž}|j}| |dd |j|j¬
¦«}|jd td ¦«krtd
¦«ˆfdt|jd ¦«D¦«}ˆfdt|jd ¦«D¦«}||df} ||df}
||df} ||df} |jr | |
| | |j
fS| |
| | fS)Nr&r%)r'Trqr(r)Fr*rr|z‡There is a mismatch between the number of examples in this batch and the number of examples for which an output sequence was predicted.có4g|]}d|du¯|ŒS©r|Tr2©Ú.0Úiràs €rtú
<listcomp>z._UnslothBCOTrainer.forward.<locals>.<listcomp>©s.ø€Ð_˜AÀUÈ7Ä^ÐTUÔEVÐZ^ÐE^ÐE^aÐE^ÐE^ÐE^rvcó4g|]}d|du¯|ŒS©r|Fr2r6s €rtr9z._UnslothBCOTrainer.forward.<locals>.<listcomp>ªs.ø€Ðb˜aÀuÈWÄ~ÐVWÔGXÐ\aÐGaÐGa˜ÐGaÐGaÐGarv.) r*rkr,r%rar‰ÚrangeÚaux_loss)
r9rIÚ model_kwargsÚoutputsr-r.Ú chosen_logpsÚrejected_logpsÚ
chosen_logitsÚrejected_logitss
` rtÚforwardz_UnslothBCOTrainer.forward†ø€ðÔ
ØÐ 3Ô4Ø%*§Y¢YÐ/MÑ%NÔ%Nð
ð
ð
ð
ð
ð Ô ð 8Ø37ˆLÐ  Ð 
ð
à Ð!<Ô
ðð
ð
ˆð