Files
DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/__pycache__/UnslothBCOTrainer.cpython-310.pyc
T

475 lines
53 KiB
Plaintext
Raw Normal View History

2025-08-28 17:57:59 +00:00
o
2025-08-28 22:41:56 +00:00
õ×°hP\ã@dZddlmZddlZddlmZddlmZddlmZm Z m
2025-08-28 17:57:59 +00:00
Z
m Z m Z m
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZm Z m!Z!m"Z"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(m)Z)m Z m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0m1Z1m2Z2m3Z3m4Z4m5Z5m6Z6m7Z7m8Z8m9Z9m:Z:m;Z;m<Z<m=Z=m>Z>mZm?Z?m@Z@mAZAmBZBmCZCmDZDmEZEmFZFmGZGmHZHmIZImZmJZJmKZKmZm
Z
m Z m!Z!m'Z'm7Z7m=Z=mAZAmZddlAZAddlTddlLmMZMmNZNdd lOmPZPddlZddlQZ?dd
lRm@Z@ddlmZdd lSmTZTmUZVd d
d d
d
dœZWejXd d eWdddƒZYeMGdddeƒƒZZ Gddde'ƒZ[Gddde[ƒZ\ e]e=dƒrEddl^Z^Gddde^j_ƒZ` e= ae`dƒ¡dSdS)z9
2025.8.9
2025.8.10
4.55.4
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)KrÚAutoModelForCausalLMÚ BCOConfigÚ
BCOTrainerÚBaseImageProcessorÚCLF_NAMEr ÚDPODataCollatorWithPaddingÚ DataCollatorÚ
DataLoaderÚDatasetÚEvalLoopOutputÚFeatureExtractionMixinÚLiteralÚLogisticRegressionrÚ PartialStateÚPathÚ PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚ RUNNING_NAMEÚRunningMomentsÚSequentialSamplerÚTrainerÚTrainerCallbackÚTrainingArgumentsrÚ_process_tokensÚ _tokenizeÚautocastÚcontextmanagerÚcreate_reference_modelÚ defaultdictÚdisable_dropout_in_modelÚgenerate_model_cardÚget_comet_experiment_urlÚ
has_lengthÚinspectÚis_comet_availableÚis_joblib_availableÚis_peft_availableÚis_sklearn_availableÚis_wandb_availableÚ
itemgetterÚjoblibÚlog_table_to_comet_experimentÚloggerÚmaybe_apply_chat_templateÚnnÚnpÚ nullcontextÚosÚ
pad_to_lengthÚpdÚpeft_module_casting_to_bf16Úprepare_deepspeedÚprepare_model_for_kbit_trainingÚrandomÚselective_log_softmaxÚtextwrapÚtorchÚtqdmÚwarningsrrrrr#r3r9r>rG)Ú*)Ú dataclassÚfield)ÚVersion)r=)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚ max_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ fullgraphÚoptionsc
Ctj| d|jd¡ddd}tj| d¡ddd}g}t||ƒD](\}}| tj¡}tj|d| d¡d  d¡}tj
|dd}||} |  | ¡q! t  |¡}| |jd|jdf¡}|S)Néÿÿÿÿér)ÚchunksÚdim)rYÚindex©rYé)
rGÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ unsqueezeÚsqueezeÚ logsumexpÚappendÚconcat)
ÚlogitsrZÚchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚ chunk_logitsÚ chunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logps©rrúQ/workspace/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothBCOTrainer.pyÚchunked_selective_log_softmax"s  
rtcs eZdZUdZedddidZeeed<edddidZ ee
ed <eddd
idZ ee
ed <  
                            ! ! " #     $           $      % &  '         (      #    $   ) *       + ,   -  .      + / 0   d3‡fd1d2„ Z Z
S)4ÚUnslothBCOConfiguù
Configuration class for the [`BCOTrainer`].
This class includes only the parameters that are specific to BCO training. For a full list of training arguments,
please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may
differ from those in [`~transformers.TrainingArguments`].
Using [`~transformers.HfArgumentParser`] we can turn this class into
[argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
command line.
Parameters:
max_length (`int` or `None`, *optional*, defaults to `1024`):
Maximum length of the sequences (prompt + completion) in the batch. This argument is required if you want
to use the default data collator.
max_prompt_length (`int` or `None`, *optional*, defaults to `512`):
Maximum length of the prompt. This argument is required if you want to use the default data collator.
max_completion_length (`int` or `None`, *optional*, defaults to `None`):
Maximum length of the completion. This argument is required if you want to use the default data collator
and your model is an encoder-decoder.
beta (`float`, *optional*, defaults to `0.1`):
Parameter controlling the deviation from the reference model. Higher β means less deviation from the
reference model.
label_pad_token_id (`int`, *optional*, defaults to `-100`):
Label pad token id. This argument is required if you want to use the default data collator.
padding_value (`int` or `None`, *optional*, defaults to `None`):
Padding value to use. If `None`, the padding value of the tokenizer is used.
truncation_mode (`str`, *optional*, defaults to `"keep_end"`):
Truncation mode to use when the prompt is too long. Possible values are `"keep_end"` or `"keep_start"`.
This argument is required if you want to use the default data collator.
disable_dropout (`bool`, *optional*, defaults to `True`):
Whether to disable dropout in the model and reference model.
generate_during_eval (`bool`, *optional*, defaults to `False`):
If `True`, generates and logs completions from both the model and the reference model to W&B or Comet
during evaluation.
is_encoder_decoder (`bool` or `None`, *optional*, defaults to `None`):
When using the `model_init` argument (callable) to instantiate the model instead of the `model` argument,
you need to specify if the model returned by the callable is an encoder-decoder model.
precompute_ref_log_probs (`bool`, *optional*, defaults to `False`):
Whether to precompute reference model log probabilities for training and evaluation datasets. This is
useful when training without the reference model to reduce the total GPU memory needed.
model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the model from a
string.
ref_model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the reference model
from a string.
dataset_num_proc (`int` or `None`, *optional*, defaults to `None`):
Number of processes to use for processing the dataset.
prompt_sample_size (`int`, *optional*, defaults to `1024`):
Number of prompts that are fed to density ratio classifier.
min_density_ratio (`float`, *optional*, defaults to `0.5`):
Minimum value of the density ratio. The estimated density ratio is clamped to this value.
max_density_ratio (`float`, *optional*, defaults to `10.0`):
Maximum value of the density ratio. The estimated density ratio is clamped to this value.
helpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsrVz8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksz'Maximum sequence length to truncate to.Úmax_seq_lengthFÚnorWéréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?çlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsr\éôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastééééœÿÿÿÚkeep_endçà?ç$@c• s|dkr td|dƒ|dkrtd|dƒ|dur(|#dkr(|$dkr(d}d }#|Ždur:d
d lm}t|–ƒd d
ƒ}ŽtƒjdŸid|d|d|d|d|d|d|d|d| “d|
d| d| d|
d|d|d|d|d|d |d!|d"|d#|d$|d%|d&|d'|d(|d)|d*|d+|d,|d-| “d.|!“d/|"“d0|#“d1|$“d2|%“d3|&“d4|'“d5|(“d6|)“d7|*“d8|+“d9|,“d:|-“d;|.“d<|/“d=|0“d>|1“d?|2“d@|3“dA|4“dB|5“dC|6“dD|7“dE|8“dF|9“dG|:“dH|;“dI|<“dJ|=“dK|>“dL|?“dM|@“dN|A“dO|B“dP|C“dQ|D“dR|E“dS|F“dT|G“dU|H“dV|I“dW|J“dX|K“dY|L“dZ|M“d[|N“d\|O“d]|P“d^|Q“d_|R“d`|S“da|T“db|U“dc|V“dd|W“de|X“df|Y“dg|Z“dh|[“di|\“dj|]“dk|^“dl|_“dm|`“dn|a“do|b“dp|c“dq|d“dr|e“ds|f“dt|g“du|h“dv|i“dw|j“dx|k“dy|l“dz|m“d{|n“d||o“d}|p“d~|q“d|r“d€|s“d|t“d|u“dƒ|v“d„|w“d…|x“d†|y“d‡|z“dˆ|{“d‰||“dŠ|}“d|~“dŒ|d|€“dŽ|d|‚“d|ƒ“d‘|„“d’|…“d“|†“d”|‡“d•|ˆ“d–|‰“d—|Š“d˜|‹“d™|Œ“dš|d›|Ž“dœ|d|dž|‘“|•¤Ž||_|“|_|”|_ dS) NgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!r\za` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rŠrZunsloth_training_checkpointsr|r)Ú cpu_countrWr}Ú
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚ weight_decayÚ
adam_beta1Ú
adam_beta2Ú adam_epsilonÚ
max_grad_normÚnum_train_epochsÚ max_stepsÚlr_scheduler_typeÚ warmup_ratioÚ warmup_stepsÚ log_levelÚlog_level_replicaÚlog_on_each_nodeÚ logging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚ ddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚ disable_tqdmÚremove_unused_columnsÚ label_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚ fsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ adafactorÚgroup_by_lengthÚlength_column_nameÚ report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚ push_to_hubÚresume_from_checkpointÚ hub_model_idÚ hub_strategyÚ hub_tokenÚhub_private_repoÚhub_always_pushÚ hub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚ fp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚ torchdynamoÚ ray_scopeÚ ddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚ
max_lengthÚmax_prompt_lengthÚmax_completion_lengthÚbetaÚlabel_pad_token_idÚ
padding_valueÚtruncation_modeÚdisable_dropoutÚgenerate_during_evalÚis_encoder_decoderÚprecompute_ref_log_probsÚmodel_init_kwargsÚref_model_init_kwargsÚdataset_num_procÚprompt_sample_sizeÚmin_density_ratioÚmax_density_ratiorr)
ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingrœÚmaxÚsuperÚ__init__ryrzr{)—Úselfrr r­r¿rÿrrrrrrrrrr r
r r r
rrrrrrrrrrrrrrrrrrr r!r"r#r$r%r&r'r(r)r*r+r,r-ryrzr{Úkwargsrœ©Ú __class__rrrsr3{  ÿþýüûúùø ÷
ö õ ô
óòñðïîíìëêéèçæåäãâá à!ß"Þ#Ý$Ü%Û&Ú'Ù(Ø)×*Ö+Õ,Ô-Ó.Ò/Ñ0Ð1Ï2Î3Í4Ì5Ë6Ê7É8È9Ç:Æ;Å<Ä=Ã>Â?Á@ÀA¿B¾C½D¼E»FºG¹H¸I·JKµL´M³N²O±P°Q¯R®S­T¬U«VªW©X¨Y§Z¦[¥\¤]£^¢_¡` aŸbžcdœefšgh˜ijklmnopqrŽstŒuvŠwxˆyz{|}ƒ~ÿþýüûúùø ÷
ö õ ô
óòñðïî
zUnslothBCOConfig.__init__)”NNFFFr|FrWrWNNr}r}rr~rr€rrr„r…rVr†r‡rrˆr‰TNrŠFr\FrŠrNTFFFFFFrŒFFFFrFFNrVNNFrFNrNrVNNTNFNNFrrNNNNrrNFFrNNNNTFTFFNNr“NNFNFNFTrŽNNNrTFNr”r•FNNFFNNFFFNFTrr—Nr‡r˜Nr™TFNFNNNrrNrVN)Ú__name__Ú
__module__Ú __qualname__Ú__doc__rLryrrÚ__annotations__rzÚintr{r3Ú
__classcell__rrrrr6rsru3sH
:þþþêruc$eZdZdZddgZ                 dsdeeeje fde
eeeje fde d e
e d
e
ee e
e e ffd e
eeeeefd e
ed
e
egefde
eedeejjejjjfde
eejejgejfde
e
de
eege
fde
e de
e de
ede
ef"‡fdd
ZeddƒZ dej!dej!fddZ"dej#d ej#dej!fd!d"„Z$d#e
e eeej#ffdeej!ej!ffd$d%„Z%dtd'e d(e&dej!fd)d*„Z'‡fd+d,„Z(‡fd-d.„Z)e*d/d0„ƒZ+de,ffd1d2„ Z-dud
e
e de,ffd3d4„
Z.d5e
de
fd6d7„Z/e0 8 9 8dvd:ej!d;ej#d<e1d=e&d>e1dej!f d?d@„ƒZ2dejd#e
e eeej#ffdeej!ej!ej!ej!ffdAdB„Z3dCej!dej!fdDdE„Z4 FdwdGej!dHej!dIej!dJej!dKe
ej!dCe
ej!dLe1deej!ej!ej!ej!ffdMdN„Z5 Fdwd#e
e eeej#ffdLe1fdOdP„Z6 8 dxdeeejfdQe
e eeje7ffdeejeeje
e ejffffdRdS„Z8dydUe
e e9fdVe:dWddfdXdY„Z;dud'e
e de
ej<j=j>fdZd[„Z?d#e
e ej#fdee e ffd\d]„Z@ dudeeejfdQe
e eeje7ffd^e1d_e
ee fd`da„ZA   bdzdce,dde d^e
e1d_e
ee dee def fdfdg„
ZBdudhe
e e9fdie
e9ddffdjdk„
ZC‡fdldm„ZD   d{dne
e doe
e dpee ee dffdqdr„ZE‡ZFS)|Ú_UnslothBCOTrainerrÚtrlÚbcoN©NNÚmodelÚ ref_modelÚargsÚ
train_datasetÚ eval_datasetÚprocessing_classÚ
data_collatorÚ
model_initÚ callbacksÚ
optimizersÚpreprocess_logits_for_metricsÚ peft_configÚcompute_metricsÚmodel_adapter_nameÚref_adapter_nameÚembedding_funcÚembedding_tokenizerc$
|durtƒr
tƒstdƒt|ƒturtdƒt|tƒs)|dur)||ur)tdƒ|jdur1i}n9t|tƒs:tdƒ|j}|  d¡}|durjt|tƒrT|dkrTt
t |ƒ}|dkrft|t j ƒsftd|dƒ||d<|j
durri}n9t|tƒs{td ƒ|j
}|  d¡}|dur«t|tƒr•|dkr•t
t |ƒ}|dkr§t|t j ƒs§td|dƒ||d<t|tƒr¹tj|fi|¤Ž}t|tƒrÇtj|fi|¤Ž}d
|_tƒsÕ| durÕtd ƒtƒrI| durIt|tƒrç| ¡}t
|d d
ƒsôt
|d
d
ƒrt|dƒodtt t¡jƒv}d|ji}|r|j|d<t|fi|¤Ž}n|jr4t|dƒr)| ¡n dd}| ¡ |¡|}|jrHt
|d
d
ƒrHt |ƒd|_n|jrct|dƒrX| ¡n dd}| ¡ |¡|j!rst"ƒsst#ƒsstdƒ|dur~|j$j%|_%n|j%durˆtdƒ|j%|_%tƒo”t|tƒ|_&||_'||_(|r£||_)n|j&s«|j*r¯d|_)nt+|ƒ|_)|dur½tdƒ|j,durËt- .dt/¡d}|j,durÔ|j,}|j0durât- .dt/¡d}|j0durë|j0}d}|j1durÿ|j%rÿt- .dt/¡d}|j1dur |j%r |j1}|dur,t2|j3|j4|j%d}|j5r(d
|_5t- .dt/¡d|_6nd
|_6|j7rBt8|ƒ|j)durBt8|j)ƒ||_,|j!|_!|j4|_4|j9durV|j9n|j3|_9||_0|j:|_:||_1|j*|_*d
|_;d
|_<t=dd „ƒ|_>|j?|_?t
|j$d!d
ƒ|_@t
|j$d"d#ƒ|_A|j@r™|jAd#kr™t- .d$t/¡||_B||_Cd|jDd%<tEƒ |jGtHd&|i|jId'}|durÅ|jGtHd&|i|jId'}|jGtJd||jCd(œ|jId)d*}d+|j%||j,|j:|j4|j0|j1d,œ}|jGtK||jId-d.}|dur|jGtJ||jCd(œd|jId/d0}d+|j%||j,|j:|j4|j0|j1d,œ}|jGtK||jId1d.}|jLd2d „|jId3d4}|jLd5d „|jId6d4}Wdƒn 1s?wYtMƒjN||||||||
| |
| d7 d
|_Ot|jPd8ƒrf|jP Q|jR¡t|d9ƒsptSd:ƒ|jTr…|jUjVjWjXd;kr…|j*r…td<ƒ|j)dur˜|j&s—|j*s—td=ƒn|jTr¥tY|j)|jUƒ|_)n
|jUjZ|j)dd>|_)t[|jUd?|_\|jBdusÀ|j]rÂdS|j^||j_j`d@}|j^||j_j`d@}t ja||fdAdB} t jat  b|dddAf¡t  c|dddAf¡fdAdB}!tddCdD e|    |!  ¡|_i|ji j|   t  b|dddAf¡  ¡}"|ji j|   t  c|dddAf¡  ¡}#tk ldE|"dF|#¡dS)GNz}BCOTrainer with UDM requires the scikit-learn and joblib libraries. Please install it with `pip install scikit-learn joblib`.z3Please use `BCOConfig` instead `TrainingArguments`.zœ`model` and `ref_model` cannot be the same object. If you want `ref_model` to be the same as `model`, you must mass a copy of it, or `None` if you use peft.zRYou passed model_kwargs to the BCOTrainer. But your model is already instantiated.Ú torch_dtyperŽznInvalid `torch_dtype` passed to the BCOConfig. Expected a string with either `torch.dtype` or 'auto', but got Ú.zZYou passed ref_model_kwargs to the BCOTrainer. But your ref_model is already instantiated.FzŽPEFT is not installed and you passed a `peft_config` in the trainer's kwargs, please install it with `pip install peft` to use the PEFT modelsÚis_loaded_in_8bitÚis_loaded_in_4bitrÚuse_gradient_checkpointingÚenable_input_require_gradscSó| d¡dS©NT©Úrequires_grad_©ÚmoduleÚinputÚoutputrrrrrsÚmake_inputs_require_grad&óz=_UnslothBCOTrainer.__init__.<locals>.make_inputs_require_gradTcSrZr[r\r^rrrrrsrb;rcz`generate_during_eval=True` requires Weights and Biases or Comet to be installed. Please install `wandb` or `comet-ml` to resolve.zMWhen no model is provided, you need to pass the parameter is_encoder_decoder.zdmax_length or a processing_class must be specified when using the default DPODataCollatorWithPaddingz§When using DPODataCollatorWithPadding, you should set `max_length` in the `BCOConfig`. It will be set to `512` by default, but you should do it yourself in the future.r—z®When using DPODataCollatorWithPadding, you should set `max_prompt_length` in the `BCOConfig`. It will be set to `128` by default, but you should do it yourself in the future.é€zÜWhen using DPODataCollatorWithPadding with an encoder decoder architecture, you should set `max_completion_length` in the BCOTrainer's init it will be set to `128` by default, but you should do it yourself in the future.)Ú pad_token_idr!r&zªWhen using DPODataCollatorWithPadding, you should set `remove_unused_columns=False` in your BCOConfig we have set it for you, but you should do it yourself in the future.cSsttƒS©N)r+ÚlistrrrrrrrsÚ<lambda>¥óz-_UnslothBCOTrainer.__init__.<locals>.<lambda>Úoutput_router_logitsÚrouter_aux_loss_coefra-You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to `0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary loss.Úestimate_tokensÚ tokenizer)Ú fn_kwargsÚnum_proc)rmrSzTokenizing train dataset)ÚbatchedrnroÚdescr)Úprefixr&rmrr#r!rrz"Processing tokenized train dataset)rnrorqzTokenizing eval dataset)rnrprorqz!Processing tokenized eval datasetcSs|dS©labelrr©ÚxrrrrrsrhrizFiltering desirable examples)rorqcSs
|d Srsrrrurrrrrsrhó
zFiltering undesirable examples) rCrErIrFrGrHrJrOrKrLrMÚadd_model_tagsÚ acceleratorzXYour `Trainer` does not have an `accelerator` object. Consider upgrading `transformers`.ézrYou cannot use `precompute_ref_log_probs=True` with Deepspeed ZeRO-3. Please set `precompute_ref_log_probs=False`.z]No reference model and model is not a Peft model. Try setting `precompute_ref_log_probs=True`)Úevaluation_mode)ry)Ú sample_sizerr[Úbalanced)Ú class_weightz(UDM classifier training scores: chosen: z , rejected: )mr4r2Ú ImportErrorÚtyper%Ú
ValueErrorÚ
isinstanceÚstrr(ÚgetÚgetattrrGÚdtyper)r Úfrom_pretrainedÚ_peft_has_been_casted_to_bf16r3rÚmerge_and_unloadÚhasattrrgr0Ú signaturerCÚ
parametersrrrYÚget_input_embeddingsÚregister_forward_hookrÍrAr%r5r1Úconfigr&Ú
is_peft_modelrPrQrDr'r*rrIÚwarnÚ UserWarningrrrrer!Úuse_dpo_data_collatorr$r,r"r#Ú _precomputed_train_ref_log_probsÚ_precomputed_eval_ref_log_probsr+Ú_stored_metricsr Úaux_loss_enabledÚ
aux_loss_coefrRrSÚwarnings_issuedrÚmain_process_firstÚmapr:r*r'r&Úfilterr2r3Úmodel_accepts_loss_kwargsrCrxÚ
_tag_namesÚAttributeErrorÚis_deepspeed_enabledryÚstateÚdeepspeed_pluginÚ
zero_stagerBÚ
prepare_modelr!ÚrunningrûÚ_get_sample_prompt_embeddingsrEr+ÚcatÚ ones_likeÚ
zeros_likerÚfitÚcpuÚfloatÚnumpyÚclfÚscorer9Úinfo)$r4rCrDrErFrGrHrIrJrKrLrMrNrOrPrQrRrSr(rTr)Ú_support_gc_kwargsÚprepare_model_kwargsrbrrrrnÚ desirableÚ undesirableÚchosen_embeddingsÚrejected_embeddingsÚ
embeddingsÚlabelsÚ chosen_meanÚ
rejected_meanr6rrrsr3¸s0ÿ ÿ




ÿ

ÿ


ÿ

ÿ
ÿ
ÿþ

 
 
ÿ
  

ÿ ý  ý ý
ýý 
û 
 ÿ
ý
û
ø
ü

û
ø
ü ÿ ÿºJõ ÿÿ ÿ,ÿ
ÿ*ÿ*ÿz_UnslothBCOTrainer.__init__cCs|jduo |jduSrf)rRrS©r4rrrrrsÚmatch_underlying_distributionQsz0_UnslothBCOTrainer.match_underlying_distributionÚprompt_embeddingsÚreturnc C|j}|j}|jj}|jj||jjd}|jd}|jdd|jjk}|j  |¡}|jddkr8t
j g||dS|j  
| ¡ ¡ ¡¡dddf}t
j|||d}|jj|dd }|||||d}||}|S)
Calculates the probability if the given prompt embedding is from desirable dataset. This function calculates
the probability in the process and ensemble across processes.
)Ú pad_indexrr\r[)Údevicer†r†Úmean)Ú reduction)r†ryÚ
process_indexÚpad_across_processesrSrer_rcrGÚtensorr®Ú
predict_probar«r­Ú as_tensorÚreduce) r4r†ÚrankÚpadded_prompt_embeddingsr|ÚnonzeroÚprobrrrrrsÚ_get_chosen_probUs"ÿ
 $z#_UnslothBCOTrainer._get_chosen_probÚ input_idsÚattention_maskcCsVt ||jjk|jj|¡}t ¡|j||d}Wdƒ|S1s$wY|S)z|
Replaces processing_class.pad_token_id to embedding_tokenizer.pad_token_id and applies self.embedding_func
©N)rGÚwhererHrerSÚno_gradrR)r4rrrrrsÚ_vectorize_promptrs
ý
þ
ÿúz$_UnslothBCOTrainer._vectorize_promptÚbatchcCsv|jsdS|j|d|dd}tj|dtj|jd}t |¡d}t |¡d}||df}||df}||fS) z.Extract embeddings from frozen embedding modelrBÚembedding_input_idsÚembedding_attention_maskrÑrtr.)rGÚboolrÀ)r4r¸Ú
chosen_idxÚ rejected_idxrµrrrrrsÚ_get_prompt_embeddings„sþ  z)_UnslothBCOTrainer._get_prompt_embeddingsr—Údatasetr|c Ctt|ƒ|ƒ}tjjt|ƒ|fd}| |¡}|jj|j|jj |jj
ddœ}|j   t
|fi|¤Ž¡}t ¡1t d¡}t|ddD]} |j| d| dd }
|j  |
¡}
t ||
 ¡f¡}qBWd
ƒ|S1sjwY|S) zv
Sample instances from dataset and get prompt embeddings. Used for density ratio classifier training.
)ÚsizeF©Ú
batch_sizeÚ
collate_fnÚ num_workersÚ
pin_memoryÚshufflerz!Building sample prompt embeddings©Úiterablerqr×N)ÚminÚlenr<rDÚchoiceÚselectrErIryÚpreparerrGÚemptyrHÚgather_for_metricsr§) r4r|Ú n_samplesÚ rand_indicesÚembedding_datasetÚdataloader_paramsÚ data_loaderÚall_embeddingsÚ padded_batchr·rrrrrsšs0
û

þ ú
þ
ö
z0_UnslothBCOTrainer._get_sample_prompt_embeddingscsl|dur|n|jj}tƒ |¡|jjr2|j tj  
|t ¡¡|j r4t
j|jtj  
|t¡dddSdSdS)NT)Úcompress)rErr2Ú_save_optimizer_and_schedulerryÚis_main_processr¥Ú save_to_jsonr>ÚpathÚjoinr r7Údumpr®r)r4rr6rrrsºs  ûz0_UnslothBCOTrainer._save_optimizer_and_schedulerc|durt d|¡dStƒ |¡tj |t¡}tj |¡r)t  
|j |¡|_ |j
rAtj |t¡}tj |¡rCt |¡|_dSdSdS)NzMissing Checkpoint )r9Ú warning_oncer2Ú_load_optimizer_and_schedulerr>r Úisfiler!Úload_from_jsonryrr7Úloadr®)r4Ú
checkpointÚ running_fileÚclf_filer6rrrsÅs   ýz0_UnslothBCOTrainer._load_optimizer_and_schedulercc|jr|js|j |j¡ ¡ntƒ*|jr|j |j¡dV|jr5|j |jp+d¡WdƒdSWdƒdS1s@wYdS)zWContext manager for handling null reference model (that is, peft adapter manipulation).Nrw) rrQryÚ unwrap_modelrCÚdisable_adapterr=Ú set_adapterrPrrrrrsÚnull_ref_contextÖsÿÿý÷"øz#_UnslothBCOTrainer.null_ref_contextc|jrR|jsR|jj|j|jj|jjddœ}|j t |j
fi|¤Ž¡}g}t |ddD]}|  |¡}|j 
|¡}| | ¡¡q*|j
jdt |¡ ¡ ¡d|_
d|_tƒ ¡S)
Returns the training [`~torch.utils.data.DataLoader`].
Subclass of transformers.src.transformers.trainer.get_train_dataloader to precompute `ref_log_probs`.
FrÞz!Train dataset reference log probsräÚreference_logps©ÚnameÚcolumnT)r'r”rErIryrrFrHÚcompute_reference_log_probsrìrgÚ
add_columnrGr­r2Úget_train_dataloader)r4Úreference_completion_logpsróÚreference_completion_logpr6rrrsr
äs$ û 
 ÿ
z'_UnslothBCOTrainer.get_train_dataloaderc|dur
|jdur
tdƒ|dur|n|j}|jrm|jsm|jj|j|jj|jjddœ}|j  
t |fi|¤Ž¡}g}t |ddD]}| 
|¡}|j  |¡}| | ¡¡q?|jdt |¡ ¡ ¡d}|jdurj||_d |_tƒj|d
S) 
Returns the evaluation [`~torch.utils.data.DataLoader`].
Subclass of transformers.src.transformers.trainer.get_eval_dataloader to precompute `ref_log_probs`.
Args:
eval_dataset (`torch.utils.data.Dataset`, *optional*):
If provided, will override `self.eval_dataset`. If it is a [`~datasets.Dataset`], columns not accepted
by the `model.forward()` method are automatically removed. It must implement `__len__`.
Nz-Trainer: evaluation requires an eval_dataset.FrÞz Eval dataset reference log probsrärrT)rG)rGrr'r•rErIryrrHr rgr rGr­r2Úget_eval_dataloader)r4rGrrr6rrrsrs.  û 
 ÿ
z&_UnslothBCOTrainer.get_eval_dataloaderróc Cst ¡h|jdurB| ¡+|jr&|j|d|d| d¡|ddj}n |j|d|dd j}Wdƒn1s<wYn#|jrY|j|d|d| d¡|ddj}n |j|d|dd j}Wdƒn1sowY|j||dd
|j|j d }|S) zfComputes log probabilities of the reference model for a single padded batch of a BCO specific dataset.NÚprompt_input_idsÚprompt_attention_maskÚcompletion_decoder_input_idsÚcompletion_labels)Údecoder_input_idsr¸Úcompletion_input_idsÚcompletion_attention_mask)Úaverage_log_probr&r!)
rGrDrr&rCr„riÚget_batch_logpsr!)r4Úcompletion_logitsÚcompletion_logpsrrrrrsr 4sZ


üûþýöüû ÿþåûz._UnslothBCOTrainer.compute_reference_log_probsFr˜rir¸rr!r&cC|jdd|jkrtdƒ|s*|ddddf ¡}|ddddddf}n| ¡}||k}d|||k<t||ƒ}|rK|| d¡| d¡S|| d¡S)aCompute the log probabilities of the given labels under the given logits.
Args:
logits: Logits of the model (unnormalized). Shape: (batch_size, sequence_length, vocab_size)
labels:
Labels for which to compute the log probabilities. Label tokens with a value of label_pad_token_id are
ignored. Shape: (batch_size, sequence_length)
average_log_prob:
If True, return the average log probability per (non-masked) token. Otherwise, return the sum of the
log probabilities of the (non-masked) tokens.
Returns:
A tensor of shape (batch_size,) containing the average/sum log probabilities of the given labels under the
given logits.
NrVzKLogits (batch and sequence length dim) and labels must have the same shape.r\r)r_rÚclonerEÚsum)rir¸rr!r&Ú loss_maskrqrrrrrsr_s 
z"_UnslothBCOTrainer.get_batch_logpsc
s|jr
ˆdˆ d¡dœni}|jrd|d<|ˆdfdˆdi|¤Ž}|j}|j|ˆdd |j|jd
}|jd tˆd ƒkrDtd
ƒfddt |jd ƒDƒ}fddt |jd ƒDƒ}||df} ||df}
||df} ||df} |jrƒ| |
| | |j
fS| |
| | fS)Nrr)r¸rTrjrrFrrrtz‡There is a mismatch between the number of examples in this batch and the number of examples for which an output sequence was predicted.có g|] }ˆd|dur|qS©rtTrr©Ú.0ÚrrrsÚ
<listcomp>¯ó z._UnslothBCOTrainer.forward.<locals>.<listcomp>cr ©rtFrrr"r%rrrsr&°r'.) r&r„r—rirr!r_rÚrangeÚaux_loss)
r4rCÚ model_kwargsÚoutputsrrÚ chosen_logpsÚrejected_logpsÚ
chosen_logitsÚrejected_logitsrrr%rsÚforwardŒsJüþúÿþýûÿ     z_UnslothBCOTrainer.forwardr¶cCs8| |¡}|jj}|jj}|d|dj||d}|S)Nr\)r1)rEr,r-Úclamp)r4Úprob_desirableÚ min_ratioÚ max_ratioÚweightrrrrrsÚ_get_udm_weight½s
z"_UnslothBCOTrainer._get_udm_weightTÚpolicy_chosen_logpsÚpolicy_rejected_logpsÚreference_chosen_logpsÚreference_rejected_logpsrµcC||}|j|} ||}
|j|
} |r"|j t | | fd¡ ¡¡tj|jj| jd} t  
| | ¡ }
t  
| |  ¡ }|j rXt  |
¡}| 
|¡}tj||
||fdd}n tj|
|fdd}|| | | fS)Compute the BCO loss for a batch of policy and reference model log probabilities.
Args:
policy_chosen_logps:
Log probabilities of the policy model for the chosen responses. Shape: (num(chosen) in batch_size,)
policy_rejected_logps:
Log probabilities of the policy model for the rejected responses. Shape: (num(rejected) in batch_size,)
reference_chosen_logps:
Log probabilities of the reference model for the chosen responses. Shape: (num(chosen) in batch_size,)
reference_rejected_logps:
Log probabilities of the reference model for the rejected responses. Shape: (num(rejected) in
batch_size,)
chosen_embeddings: embeddings of desirable prompts
rejected_embeddings: embeddings of undesirable prompts
Returns:
A tuple of four tensors: (losses, chosen_rewards, rejected_rewards, delta). The losses tensor contains the
BCO loss for each example in the batch. The chosen_rewards and rejected_rewards tensors contain the rewards
for the chosen and rejected responses, respectively. The delta value contains the moving average of all
implicit rewards.
r©r[)r ÚupdaterGÚdetachrÈrÚ
logsigmoidr¼r7)r4r8r9r:r;Úchosen_logratiosÚchosen_rewardsÚrejected_logratiosÚrejected_rewardsÚdeltaÚ
chosen_lossesÚrejected_lossesÚ
chosen_weightÚrejected_weightÚlossesrrrrrsÚbco_lossÆs



 z_UnslothBCOTrainer.bco_lossc i}fddˆ ¡Dƒˆ |ˆ¡}|dd\}}}} ˆjr$|d}
dˆvrY‡fddtˆdjdƒDƒ} fd dtˆdjdƒDƒ} ˆd| d
f}
ˆd| d
f}nLt ¡@ˆjdur‡ˆ ¡ˆ ˆj ˆ¡dd\}
}}}Wdƒn1swYnˆ ˆjˆ¡dd\}
}}}Wdƒn1s wYˆ 
ˆ¡\}}ˆj |||
||||d \}}}}ˆj  
|¡ ¡ ¡|d <t t|ƒg¡ ˆj j¡}t t|ƒg¡ ˆj j¡}ˆj  
|¡ ¡ ¡}ˆj  
|¡ ¡ ¡}|dkr)ˆj  
| ¡¡ ¡ ¡|d
<ˆj  
| ¡¡ ¡ ¡|d<ˆj  
| ¡¡ ¡ ¡|d<||d<|dkr\ˆj  
| ¡¡ ¡ ¡|d<ˆj  
| ¡¡ ¡ ¡|d<ˆj  
|  ¡¡ ¡ ¡|d<||d<| ¡}ˆjrk|ˆj|
7}||fS)zWCompute the BCO loss and other metrics for the given batch of inputs for train or test.cs0i|]\}}|t|tjƒr| ˆjj¡n|qSrr)rrGrrary©r#Úvr»rrrsÚ
<dictcomp>s0z=_UnslothBCOTrainer.get_batch_loss_metrics.<locals>.<dictcomp>NrWrcr r!rrr"r%rrrsr&r'z=_UnslothBCOTrainer.get_batch_loss_metrics.<locals>.<listcomp>rcr r(rrr"r%rrrsr&r'rDzrewards/chosen_sumzlogps/chosen_sumúlogits/chosen_sumz count/chosenzrewards/rejected_sumzlogps/rejected_sumúlogits/rejected_sumzcount/rejected)Úitemsr1r—r)r_rGrDrrCrJryÚitemrrarÚnansumÚnanmeanr˜)r4rCÚmetricsÚforward_outputr8r9Úpolicy_chosen_logitsÚpolicy_rejected_logitsr*r:r;Ú_rµrIrArCrDÚ
num_chosenÚ num_rejectedÚall_num_chosenÚall_num_rejectedÚlossrr)r4rsÚget_batch_loss_metricsý 
û  


ûû
ûòù 
ÿÿÿ
ÿÿÿz)_UnslothBCOTrainer.get_batch_loss_metricsÚinputscCs|jr
t|jjjƒntƒ}|| ||¡\}}Wdƒn1s"wY| |jj¡}|jj r9|j
|dd|r?||fS|S)train©Ú
train_eval) rˆr(ryr€r=r`rarEÚ
store_metrics)r4rCraÚreturn_outputsÚnum_items_in_batchÚcompute_loss_context_managerr_rVrrrrrsÚ compute_loss[sÿÿz_UnslothBCOTrainer.compute_lossrbrVrd)rbÚevalcCs*| ¡D]\}}|j|| |¡qdSrf)rRrrg)r4rVrdÚkeyÚvaluerrrrrsressÿz _UnslothBCOTrainer.store_metricscCs*|dur|j}|dust|ƒsdSt|ƒSrf)rFr/r")r4rrrrrsÚ_get_train_samplerws
z%_UnslothBCOTrainer._get_train_samplerc Cs:|jr
t|jjjƒntƒ}|`|j|d|d|jd|jj d}d|vr*|d}n>|j
durV|  ¡|j j|d|d|jd|jj d}Wdƒn1sPwYn|j
j|d|d|jd|jj d}Wdƒn1srwYt
||j|jj ƒ}|jj|dd}t
||j|jj ƒ}|jj|dd}||fS)zRGenerate samples from the model and reference model for the given batch of inputs.rrT)rÚ do_samplereÚreference_outputN)Úskip_special_tokens)rˆr(ryr€r=ÚgeneraterrHrerDrrCr?Ú batch_decode)r4rCÚgenerate_context_managerÚ
policy_outputroÚpolicy_output_decodedÚreference_output_decodedrrrrrsÚgenerate_from_model_and_ref~sJÿû


ûÿ ûéz._UnslothBCOTrainer.generate_from_model_and_refr£Ú ignore_keysc sBˆdurt|dƒrt|jdgƒng|jrt|jjjƒntƒ}t  
¡$||j ||dd\}}Wdƒn1s<wYWdƒn1sKwY|jj r[|j
|dd|rd| ¡ddfSi}d|vrp|d|d<d |vrz|d |d
<fd d | ¡Dƒ} t j| |jjd
} t j| jd|jjd
}
| ¡| |
fS)NrÚkeys_to_ignore_at_inferenceFrOrjrcrPzeval_logits/chosenrQzeval_logits/rejectedcsg|]
\}}|ˆvr|qSrrrrrK©rxrrrsr&Ísz6_UnslothBCOTrainer.prediction_step.<locals>.<listcomp>r<r)r…rrˆr(ryr€r=rGr`rer>rRÚzerosr_) r4rCrarxÚprediction_context_managerr_rVÚ logits_dictrir¸rrrzrsÚprediction_step­s0
ÿÿ  z"_UnslothBCOTrainer.prediction_steprjÚ
dataloaderÚ descriptionÚmetric_key_prefixcs$|jr†t|jƒ}tjt|ƒ|jjd}|j |¡}|  |¡} | 
| ¡} t j | dt j
|jjd}
t  |
¡d} | d| | d| t| Ž| dƒdœ} | |j| ¡\}
}tjgd ¢d
d t| d|
|ƒDƒd }d
|jjvrzt dtj|di¡d|jjvr†td|dtƒ |||||¡}|S)
Overriding built-in evaluation loop to store metrics for each batch. Prediction/evaluation loop, shared by
`Trainer.evaluate()` and `Trainer.predict()`.
Works both with or without labels.
)rLrtrrrÚprompt)rrr)ÚPromptÚPolicyz Ref ModelcSs4g|]\}}}||t|ƒd|t|ƒdgqSrf))r#rÚpolÚrefrrrrrsr&øs ÿÿz6_UnslothBCOTrainer.evaluation_loop.<locals>.<listcomp>)ÚcolumnsÚdataÚwandbÚgame_log)rˆÚcomet_mlz game_log.csv)r Útable)r%rDÚsampler)rEÚeval_batch_sizerérIÚ_prepare_inputsrGryr6rwrCr@Ú DataFramer`r‰ÚlogÚTabler8r2Úevaluation_loop)r4rr€rxrÚ num_samplesÚrandom_indicesÚrandom_batch_datasetÚ random_batchÚ
target_labelsÚtarget_indiciesÚ target_batchruÚref_output_decodedrŒÚinitial_outputr6rrrsr“Ós<
 



ýþþ  þ
ÿz"_UnslothBCOTrainer.evaluation_loopÚlogsÚ
start_timec
s`d|vrdnd}|dkrdnd}dD]V}d||j|vrht |j|d|¡ ¡ ¡}dD]-}t |j||d |d
¡ ¡ ¡||||d |<|j||d |d
=q1|j|d|=q|d |vrŠ|d |vrŠ||d ||d ||d
<|j| ¡D]\}} t | ¡ ¡ ¡|||<q|j|=tƒ ||¡S)a1
Log `logs` on the various objects watching training, including stored metrics.