unsloth_compiled_cache/__pycache__/UnslothBCOTrainer.cpython-310.pyc

o
õ×°hP\ã@sŽdZddlmZddlZddlmZddlmZddlmZm	Z	m
Z
mZmZm
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZm Z m!Z!m"Z"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(m)Z)mZm*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0m1Z1m2Z2m3Z3m4Z4m5Z5m6Z6m7Z7m8Z8m9Z9m:Z:m;Z;m<Z<m=Z=m>Z>mZm?Z?m@Z@mAZAmBZBmCZCmDZDmEZEmFZFmGZGmHZHmIZImZmJZJmKZKmZm
Z
m Z m!Z!m'Z'm7Z7m=Z=mAZAmZddlAZAddlTddlLmMZMmNZNdd	lOmPZPddlZddlQZ?dd
lRm@Z@ddlmZddlSmTZTmUZVdd
dd
d
dœZWejXddeWd�dd„ƒZYeMGdd„deƒƒZZ	Gdd„de'ƒZ[Gdd„de[ƒZ\	e]e=dƒ�rEddl^Z^Gdd„de^j_ƒZ`	e= ae`dƒ¡dSdS)z9
2025.8.9
2025.8.10
4.55.4
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)KrÚAutoModelForCausalLMÚ	BCOConfigÚ
BCOTrainerÚBaseImageProcessorÚCLF_NAMErÚDPODataCollatorWithPaddingÚDataCollatorÚ
DataLoaderÚDatasetÚEvalLoopOutputÚFÚFeatureExtractionMixinÚLiteralÚLogisticRegressionrÚPartialStateÚPathÚ	PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚRUNNING_NAMEÚRunningMomentsÚSequentialSamplerÚTrainerÚTrainerCallbackÚTrainingArgumentsrÚ_process_tokensÚ	_tokenizeÚautocastÚcontextmanagerÚcreate_reference_modelÚdefaultdictÚdisable_dropout_in_modelÚgenerate_model_cardÚget_comet_experiment_urlÚ
has_lengthÚinspectÚis_comet_availableÚis_joblib_availableÚis_peft_availableÚis_sklearn_availableÚis_wandb_availableÚ
itemgetterÚjoblibÚlog_table_to_comet_experimentÚloggerÚmaybe_apply_chat_templateÚnnÚnpÚnullcontextÚosÚ
pad_to_lengthÚpdÚpeft_module_casting_to_bf16Úprepare_deepspeedÚprepare_model_for_kbit_trainingÚrandomÚselective_log_softmaxÚtextwrapÚtorchÚtqdmÚwarningsrrrrr#r3r9r>rG)Ú*)Ú	dataclassÚfield)ÚVersion)r=)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚmax_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ	fullgraphÚoptionsc
Cs¾tj| d|jd¡ddd�}tj| d¡ddd�}g}t||ƒD](\}}| tj¡}tj|d| d¡d� 	d¡}tj
|dd�}||}	| |	¡q!	t |¡}| |jd|jdf¡}|S)Néÿÿÿÿér)ÚchunksÚdim)rYÚindex©rYé)
rGÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ	unsqueezeÚsqueezeÚ	logsumexpÚappendÚconcat)
ÚlogitsrZÚchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚchunk_logitsÚchunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logps©rrúQ/workspace/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothBCOTrainer.pyÚchunked_selective_log_softmax"s
rtcs eZdZUdZedddid�Zeeed<edddid�Z	ee
ed	<eddd
id�Zee
ed<						
																														 									!	!					"	#								$														$						%	&				'												(									#				$				)	*														+	,			-		.								+	/	0			d3‡fd1d2„	Z‡Z
S)4ÚUnslothBCOConfiguù
    
    Configuration class for the [`BCOTrainer`].

    This class includes only the parameters that are specific to BCO training. For a full list of training arguments,
    please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may
    differ from those in [`~transformers.TrainingArguments`].

    Using [`~transformers.HfArgumentParser`] we can turn this class into
    [argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
    command line.

    Parameters:
        max_length (`int` or `None`, *optional*, defaults to `1024`):
            Maximum length of the sequences (prompt + completion) in the batch. This argument is required if you want
            to use the default data collator.
        max_prompt_length (`int` or `None`, *optional*, defaults to `512`):
            Maximum length of the prompt. This argument is required if you want to use the default data collator.
        max_completion_length (`int` or `None`, *optional*, defaults to `None`):
            Maximum length of the completion. This argument is required if you want to use the default data collator
            and your model is an encoder-decoder.
        beta (`float`, *optional*, defaults to `0.1`):
            Parameter controlling the deviation from the reference model. Higher Î² means less deviation from the
            reference model.
        label_pad_token_id (`int`,  *optional*, defaults to `-100`):
            Label pad token id. This argument is required if you want to use the default data collator.
        padding_value (`int` or `None`, *optional*, defaults to `None`):
            Padding value to use. If `None`, the padding value of the tokenizer is used.
        truncation_mode (`str`, *optional*, defaults to `"keep_end"`):
            Truncation mode to use when the prompt is too long. Possible values are `"keep_end"` or `"keep_start"`.
            This argument is required if you want to use the default data collator.
        disable_dropout (`bool`, *optional*, defaults to `True`):
            Whether to disable dropout in the model and reference model.
        generate_during_eval (`bool`, *optional*, defaults to `False`):
            If `True`, generates and logs completions from both the model and the reference model to W&B or Comet
            during evaluation.
        is_encoder_decoder (`bool` or `None`, *optional*, defaults to `None`):
            When using the `model_init` argument (callable) to instantiate the model instead of the `model` argument,
            you need to specify if the model returned by the callable is an encoder-decoder model.
        precompute_ref_log_probs (`bool`, *optional*, defaults to `False`):
            Whether to precompute reference model log probabilities for training and evaluation datasets. This is
            useful when training without the reference model to reduce the total GPU memory needed.
        model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
            Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the model from a
            string.
        ref_model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
            Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the reference model
            from a string.
        dataset_num_proc (`int` or `None`, *optional*, defaults to `None`):
            Number of processes to use for processing the dataset.
        prompt_sample_size (`int`, *optional*, defaults to `1024`):
            Number of prompts that are fed to density ratio classifier.
        min_density_ratio (`float`, *optional*, defaults to `0.5`):
            Minimum value of the density ratio. The estimated density ratio is clamped to this value.
        max_density_ratio (`float`, *optional*, defaults to `10.0`):
            Maximum value of the density ratio. The estimated density ratio is clamped to this value.
    
    NÚhelpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsrVz8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksz'Maximum sequence length to truncate to.Úmax_seq_lengthFÚnorWéréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?ç@Úlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsr\éôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastééééœÿÿÿÚkeep_endçà?ç$@c•—s|dkrtd|›d�ƒ‚|dkrtd|›d�ƒ‚|dur(|#dkr(|$dkr(d}d	}#|Ždur:d
dlm}–t|–ƒdd
ƒ}ŽtƒjdŸid|“d|“d|“d|“d|“d|“d|“d|“d|	“d|
“d|“d|“d|
“d|“d|“d|“d|“d|“d |“d!|“d"|“d#|“d$|“d%|“d&|“d'|“d(|“d)|“d*|“d+|“d,|“d-| “d.|!“d/|"“d0|#“d1|$“d2|%“d3|&“d4|'“d5|(“d6|)“d7|*“d8|+“d9|,“d:|-“d;|.“d<|/“d=|0“d>|1“d?|2“d@|3“dA|4“dB|5“dC|6“dD|7“dE|8“dF|9“dG|:“dH|;“dI|<“dJ|=“dK|>“dL|?“dM|@“dN|A“dO|B“dP|C“dQ|D“dR|E“dS|F“dT|G“dU|H“dV|I“dW|J“dX|K“dY|L“dZ|M“d[|N“d\|O“d]|P“d^|Q“d_|R“d`|S“da|T“db|U“dc|V“dd|W“de|X“df|Y“dg|Z“dh|[“di|\“dj|]“dk|^“dl|_“dm|`“dn|a“do|b“dp|c“dq|d“dr|e“ds|f“dt|g“du|h“dv|i“dw|j“dx|k“dy|l“dz|m“d{|n“d||o“d}|p“d~|q“d|r“d€|s“d�|t“d‚|u“dƒ|v“d„|w“d…|x“d†|y“d‡|z“dˆ|{“d‰||“dŠ|}“d‹|~“dŒ|“d�|€“dŽ|�“d�|‚“d�|ƒ“d‘|„“d’|…“d“|†“d”|‡“d•|ˆ“d–|‰“d—|Š“d˜|‹“d™|Œ“dš|�“d›|Ž“dœ|�“d�|�“dž|‘“|•¤Ž|’|_|“|_|”|_	dS) NgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!r\za` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rŠr‹Zunsloth_training_checkpointsr|r)Ú	cpu_countrWr}Ú
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚweight_decayÚ
adam_beta1Ú
adam_beta2Úadam_epsilonÚ
max_grad_normÚnum_train_epochsÚ	max_stepsÚlr_scheduler_typeÚwarmup_ratioÚwarmup_stepsÚ	log_levelÚlog_level_replicaÚlog_on_each_nodeÚlogging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ	data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚdisable_tqdmÚremove_unused_columnsÚlabel_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚfsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ	deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ	adafactorÚgroup_by_lengthÚlength_column_nameÚ	report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚpush_to_hubÚresume_from_checkpointÚhub_model_idÚhub_strategyÚ	hub_tokenÚhub_private_repoÚhub_always_pushÚhub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚfp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚtorchdynamoÚ	ray_scopeÚddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚ
max_lengthÚmax_prompt_lengthÚmax_completion_lengthÚbetaÚlabel_pad_token_idÚ
padding_valueÚtruncation_modeÚdisable_dropoutÚgenerate_during_evalÚis_encoder_decoderÚprecompute_ref_log_probsÚmodel_init_kwargsÚref_model_init_kwargsÚdataset_num_procÚprompt_sample_sizeÚmin_density_ratioÚmax_density_ratiorr)
ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingrœÚmaxÚsuperÚ__init__ryrzr{)—Úselfr�ržrŸr r¡r¢r£r¤r¥r¦r§r¨r©rªr«r¬rr®r¯r°r±r²r³r´rµr¶r·r¸r¹rºr»r¼r½r¾r¿rÀrÁrÂrÃrÄrÅrÆrÇrÈrÉrÊrËrÌrÍrÎrÏrÐrÑrÒrÓrÔrÕrÖr×rØrÙrÚrÛrÜrÝrÞrßràrárârãrärårærçrèrérêrërìrírîrïrðrñròrórôrõrör÷rørùrúrûrürýrþrÿrrrrrrrrrr	r
rrr
rrrrrrrrrrrrrrrrrrr r!r"r#r$r%r&r'r(r)r*r+r,r-ryrzr{Úkwargsrœ©Ú	__class__rrrsr3{s¨ÿþýüûúùø	÷
öõô
óòñðïîíìëêéèçæåäãâá à!ß"Þ#Ý$Ü%Û&Ú'Ù(Ø)×*Ö+Õ,Ô-Ó.Ò/Ñ0Ð1Ï2Î3Í4Ì5Ë6Ê7É8È9Ç:Æ;Å<Ä=Ã>Â?Á@ÀA¿B¾C½D¼E»FºG¹H¸I·J¶KµL´M³N²O±P°Q¯R®ST¬U«VªW©X¨Y§Z¦[¥\¤]£^¢_¡` aŸbžc�dœe›fšg™h˜i—j–k•l”m“n’o‘p�q�rŽs�tŒu‹vŠw‰xˆy‡z†{…|„}ƒ~‚��ÿ�þ�ý�ü�û�ú�ù�ø	�÷
�ö�õ�ô
�ó�ò�ñ�ð�ï�î
zUnslothBCOConfig.__init__)”NNFFFr|FrWrWNNr}r}rr~rr€r�r‚rƒr„r…rVr†r‡rrˆr‰TNrŠFr\FrŠr‹NTFFFFFFrŒrŒFFFFr�rŽFFNrVNNFr�FNrNrVNNTNFNNFr�rNNNNr�r‘NFFr’NNNNTFTFFNNr“NNFNFNFTrŽNNNr�TFNr”r•FNNFFNNFFFNFTr–r—Nr‡r˜Nr™TFNFNNNr–ršr›NrVN)Ú__name__Ú
__module__Ú__qualname__Ú__doc__rLryrrÚ__annotations__rzÚintr{r3Ú
__classcell__rrrrr6rsru3sH
:þþþ�êruc$sÎeZdZdZddgZ																	dsdeeeje	fde
eeeje	fded	e
ed
e
eee
e	effde
eeeeefde
ed
e
egefde
eedeejjejjjfde
eejejgejfde
e
de
eege
fde
e	de
e	de
ede
ef"‡fdd„
Zedd„ƒZ dej!dej!fdd„Z"dej#d ej#dej!fd!d"„Z$d#e
e	eeej#ffdeej!ej!ffd$d%„Z%dtd'ed(e&dej!fd)d*„Z'‡fd+d,„Z(‡fd-d.„Z)e*d/d0„ƒZ+de,f‡fd1d2„Z-dud
e
ede,f‡fd3d4„
Z.d5e
de
fd6d7„Z/e0	8	9	8dvd:ej!d;ej#d<e1d=e&d>e1dej!fd?d@„ƒZ2dejd#e
e	eeej#ffdeej!ej!ej!ej!ffdAdB„Z3dCej!dej!fdDdE„Z4	FdwdGej!dHej!dIej!dJej!dKe
ej!dCe
ej!dLe1deej!ej!ej!ej!ffdMdN„Z5	Fdwd#e
e	eeej#ffdLe1fdOdP„Z6	8	dxdeeejfdQe
e	eeje7ffdeejeeje
e	ejffffdRdS„Z8dydUe
e	e9fdVe:dWddfdXdY„Z;dud'e
ede
ej<j=j>fdZd[„Z?d#e
e	ej#fdee	e	ffd\d]„Z@	dudeeejfdQe
e	eeje7ffd^e1d_e
ee	fd`da„ZA			bdzdce,dde	d^e
e1d_e
ee	dee	def‡fdfdg„
ZBdudhe
e	e9fdie
e9ddf‡fdjdk„
ZC‡fdldm„ZD			d{dne
e	doe
e	dpee	ee	dffdqdr„ZE‡ZFS)|Ú_UnslothBCOTrainerr�ÚtrlÚbcoN©NNÚmodelÚ	ref_modelÚargsÚ
train_datasetÚeval_datasetÚprocessing_classÚ
data_collatorÚ
model_initÚ	callbacksÚ
optimizersÚpreprocess_logits_for_metricsÚpeft_configÚcompute_metricsÚmodel_adapter_nameÚref_adapter_nameÚembedding_funcÚembedding_tokenizerc$
sš|durtƒr
tƒstdƒ‚t|ƒturtdƒ‚t|tƒs)|dur)||ur)tdƒ‚|jdur1i}n9t|tƒs:tdƒ‚|j}| 	d¡}|durjt|tƒrT|dkrTt
t|ƒ}|dkrft|tjƒsftd|›d�ƒ‚||d<|j
durri}n9t|tƒs{td	ƒ‚|j
}| 	d¡}|dur«t|tƒr•|dkr•t
t|ƒ}|dkr§t|tjƒs§td|›d�ƒ‚||d<t|tƒr¹tj|fi|¤Ž}t|tƒrÇtj|fi|¤Ž}d
|_tƒsÕ|durÕtdƒ‚tƒ�rI|du�rIt|tƒrç| ¡}t
|dd
ƒsôt
|d
d
ƒ�rt|dƒ�odtt t¡jƒv}d|ji}|�r|j|d<t|fi|¤Ž}n|j�r4t|dƒ�r)| ¡ndd„}| ¡ |¡|}|j�rHt
|d
d
ƒ�rHt |ƒd|_n|j�rct|dƒ�rX| ¡ndd„}| ¡ |¡|j!�rst"ƒ�sst#ƒ�sstdƒ‚|du�r~|j$j%|_%n|j%du�rˆtdƒ‚|j%|_%tƒ�o”t|tƒ|_&||_'||_(|�r£||_)n|j&�s«|j*�r¯d|_)nt+|ƒ|_)|du�r½tdƒ‚|j,du�rËt- .dt/¡d}|j,du�rÔ|j,}|j0du�rât- .dt/¡d}|j0du�rë|j0}d}|j1du�rÿ|j%�rÿt- .dt/¡d}|j1du�r|j%�r|j1}|du�r,t2|j3|j4|j%d�}|j5�r(d
|_5t- .dt/¡d|_6nd
|_6|j7�rBt8|ƒ|j)du�rBt8|j)ƒ||_,|j!|_!|j4|_4|j9du�rV|j9n|j3|_9||_0|j:|_:||_1|j*|_*d
|_;d
|_<t=dd „ƒ|_>|j?|_?t
|j$d!d
ƒ|_@t
|j$d"d#ƒ|_A|j@�r™|jAd#k�r™t- .d$t/¡||_B||_Cd|jDd%<tEƒ F¡�’|jGtHd&|i|jId'�}|du�rÅ|jGtHd&|i|jId'�}|jGtJd||jCd(œ|jId)d*�}d+|j%||j,|j:|j4|j0|j1d,œ}|jGtK||jId-d.�}|du�r|jGtJ||jCd(œd|jId/d0�}d+|j%||j,|j:|j4|j0|j1d,œ}|jGtK||jId1d.�}|jLd2d „|jId3d4�}|jLd5d „|jId6d4�}Wdƒn	1�s?wYtMƒjN||||||||
|	|
|d7�d
|_Ot|jPd8ƒ�rf|jP Q|jR¡t|d9ƒ�sptSd:ƒ‚|jT�r…|jUjVjWjXd;k�r…|j*�r…td<ƒ‚|j)du�r˜|j&�s—|j*�s—td=ƒ‚n|jT�r¥tY|j)|jUƒ|_)n
|jUjZ|j)dd>�|_)t[|jUd?�|_\|jBdu�sÀ|j]�rÂdS|j^||j_j`d@�}|j^||j_j`d@�}tja||fdAdB�} tjat b|dd…dAf¡t c|dd…dAf¡fdAdB�}!tddCdD� e|  f¡ g¡ h¡|! f¡ h¡¡|_i|ji j| f¡ g¡ h¡t b|dd…dAf¡ f¡ h¡¡}"|ji j| f¡ g¡ h¡t c|dd…dAf¡ f¡ h¡¡}#tk ldE|"›dF|#›�¡dS)GNz}BCOTrainer with UDM requires the scikit-learn and joblib libraries. Please install it with `pip install scikit-learn joblib`.z3Please use `BCOConfig` instead `TrainingArguments`.zœ`model` and `ref_model` cannot be the same object. If you want `ref_model` to be the same as `model`, you must mass a copy of it, or `None` if you use peft.zRYou passed model_kwargs to the BCOTrainer. But your model is already instantiated.Útorch_dtyperŽznInvalid `torch_dtype` passed to the BCOConfig. Expected a string with either `torch.dtype` or 'auto', but got Ú.zZYou passed ref_model_kwargs to the BCOTrainer. But your ref_model is already instantiated.FzŽPEFT is not installed and you passed a `peft_config` in the trainer's kwargs, please install it with `pip install peft` to use the PEFT modelsÚis_loaded_in_8bitÚis_loaded_in_4bitrÚuse_gradient_checkpointingÚenable_input_require_gradscSó| d¡dS©NT©Úrequires_grad_©ÚmoduleÚinputÚoutputrrrrrsÚmake_inputs_require_grad&óz=_UnslothBCOTrainer.__init__.<locals>.make_inputs_require_gradTcSrZr[r\r^rrrrrsrb;rcz‚`generate_during_eval=True` requires Weights and Biases or Comet to be installed. Please install `wandb` or `comet-ml` to resolve.zMWhen no model is provided, you need to pass the parameter is_encoder_decoder.zdmax_length or a processing_class must be specified when using the default DPODataCollatorWithPaddingz§When using DPODataCollatorWithPadding, you should set `max_length` in the `BCOConfig`. It will be set to `512` by default, but you should do it yourself in the future.r—z®When using DPODataCollatorWithPadding, you should set `max_prompt_length` in the `BCOConfig`. It will be set to `128` by default, but you should do it yourself in the future.é€zÜWhen using DPODataCollatorWithPadding with an encoder decoder architecture, you should set `max_completion_length` in the BCOTrainer's init it will be set to `128` by default, but you should do it yourself in the future.)Úpad_token_idr!r&zªWhen using DPODataCollatorWithPadding, you should set `remove_unused_columns=False` in your BCOConfig we have set it for you, but you should do it yourself in the future.cSsttƒS©N)r+ÚlistrrrrrrrsÚ<lambda>¥óz-_UnslothBCOTrainer.__init__.<locals>.<lambda>Úoutput_router_logitsÚrouter_aux_loss_coefr�a-You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to `0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary loss.Úestimate_tokensÚ	tokenizer)Ú	fn_kwargsÚnum_proc)rmrSzTokenizing train dataset)ÚbatchedrnroÚdescr�)Úprefixr&rmrr#r!rrz"Processing tokenized train dataset)rnrorqzTokenizing eval dataset)rnrprorqz!Processing tokenized eval datasetcSs|dS©NÚlabelrr©ÚxrrrrrsrhrizFiltering desirable examples)rorqcSs
|dSrsrrrurrrrrsrhó
zFiltering undesirable examples)rCrErIrFrGrHrJrOrKrLrMÚadd_model_tagsÚacceleratorzXYour `Trainer` does not have an `accelerator` object. Consider upgrading `transformers`.ézrYou cannot use `precompute_ref_log_probs=True` with Deepspeed ZeRO-3. Please set `precompute_ref_log_probs=False`.z]No reference model and model is not a Peft model. Try setting `precompute_ref_log_probs=True`)Úevaluation_mode)ry)Úsample_sizerr[Úbalanced)Úclass_weightz(UDM classifier training scores: chosen: z, rejected: )mr4r2ÚImportErrorÚtyper%Ú
ValueErrorÚ
isinstanceÚstrr(ÚgetÚgetattrrGÚdtyper)rÚfrom_pretrainedÚ_peft_has_been_casted_to_bf16r3rÚmerge_and_unloadÚhasattrrgr0Ú	signaturerCÚ
parametersrrrYÚget_input_embeddingsÚregister_forward_hookrÍrAr%r5r1Úconfigr&Ú
is_peft_modelrPrQrDr'r*rrIÚwarnÚUserWarningrrrrer!ràÚuse_dpo_data_collatorr$r,r"r#Ú _precomputed_train_ref_log_probsÚ_precomputed_eval_ref_log_probsr+Ú_stored_metricsr Úaux_loss_enabledÚ
aux_loss_coefrRrSÚwarnings_issuedrÚmain_process_firstÚmapr:r*r'r&Úfilterr2r3Úmodel_accepts_loss_kwargsrCrxÚ
_tag_namesÚAttributeErrorÚis_deepspeed_enabledryÚstateÚdeepspeed_pluginÚ
zero_stagerBÚ
prepare_modelr!ÚrunningrûÚ_get_sample_prompt_embeddingsrEr+ÚcatÚ	ones_likeÚ
zeros_likerÚfitÚcpuÚfloatÚnumpyÚclfÚscorer9Úinfo)$r4rCrDrErFrGrHrIrJrKrLrMrNrOrPrQrRrSr(rTr)Ú_support_gc_kwargsÚprepare_model_kwargsrbrrrrnÚ	desirableÚundesirableÚchosen_embeddingsÚrejected_embeddingsÚ
embeddingsÚlabelsÚchosen_meanÚ
rejected_meanr6rrrsr3¸s0ÿÿ


ÿ

ÿ


ÿ

ÿ
ÿ
ÿþ


€
ÿ


ÿýýý
ýý
û	
	ÿ
ý
û
ø
ü

û
ø
üÿÿºJõÿÿÿ€,ÿ
ÿ*ÿ*ÿz_UnslothBCOTrainer.__init__cCs|jduo	|jduSrf)rRrS©r4rrrrrsÚmatch_underlying_distributionQsz0_UnslothBCOTrainer.match_underlying_distributionÚprompt_embeddingsÚreturnc	CsØ|j}|j}|jj}|jj||jjd�}|jd}|jdd�|jjk}|j 	|¡}|jddkr8t
jg||d�S|j 
| ¡ ¡ ¡¡dd…df}t
j|||d�}|jj|dd	�}|||||d…}||}|S)
zÄ
        Calculates the probability if the given prompt embedding is from desirable dataset. This function calculates
        the probability in the process and ensemble across processes.
        )Ú	pad_indexrr\r[)Údevicer†N©r†rÀÚmean)Ú	reduction)r†rÀryÚ
process_indexÚpad_across_processesrSrer_rÂrcrGÚtensorr®Ú
predict_probar«r¬rÚ	as_tensorÚreduce)	r4r½r†rÀÚrankÚpadded_prompt_embeddingsr|ÚnonzeroÚprobrrrrrsÚ_get_chosen_probUs"ÿ
$z#_UnslothBCOTrainer._get_chosen_probÚ	input_idsÚattention_maskcCsVt ||jjk|jj|¡}t ¡�|j||d�}Wdƒ|S1s$wY|S)z|
        Replaces processing_class.pad_token_id to embedding_tokenizer.pad_token_id and applies self.embedding_func
        ©rÏrÐN)rGÚwhererHrerSÚno_gradrR)r4rÏrÐr·rrrrrsÚ_vectorize_promptrs
ý
þ
ÿúz$_UnslothBCOTrainer._vectorize_promptÚbatchcCsv|jsdS|j|d|dd�}tj|dtj|jd�}t |¡d}t |¡d}||df}||df}||fS)	z.Extract embeddings from frozen embedding modelrBÚembedding_input_idsÚembedding_attention_maskrÑrtrÁr.)r¼rÔrGrÆÚboolrÀrÒ)r4rÕr·r¸Ú
chosen_idxÚrejected_idxrµr¶rrrrrsÚ_get_prompt_embeddings„sþz)_UnslothBCOTrainer._get_prompt_embeddingsr—Údatasetr|cCsâtt|ƒ|ƒ}tjjt|ƒ|fd�}| |¡}|jj|j|jj	|jj
ddœ}|j t
|fi|¤Ž¡}t ¡�1t d¡}t|dd�D]}	|j|	d|	dd	�}
|j |
¡}
t ||
 ¡f¡}qBWd
ƒ|S1sjwY|S)zv
        Sample instances from dataset and get prompt embeddings. Used for density ratio classifier training.
        )ÚsizeF©Ú
batch_sizeÚ
collate_fnÚnum_workersÚ
pin_memoryÚshufflerz!Building sample prompt embeddings©ÚiterablerqrÖr×rÑN)ÚminÚlenr<rDÚchoiceÚselectrEr¤rIrÛröryÚpreparerrGrÓÚemptyrHrÔÚgather_for_metricsr§r«)r4rÜr|Ú	n_samplesÚrand_indicesÚembedding_datasetÚdataloader_paramsÚdata_loaderÚall_embeddingsÚpadded_batchr·rrrrrsr¦šs0
û	

þú
þ
ö
z0_UnslothBCOTrainer._get_sample_prompt_embeddingscsl|dur|n|jj}tƒ |¡|jjr2|j tj	 
|t¡¡|jr4t
j|jtj	 
|t¡dd�dSdSdS)NT)Úcompress)rEr�r2Ú_save_optimizer_and_schedulerryÚis_main_processr¥Úsave_to_jsonr>ÚpathÚjoinr r¼r7Údumpr®r)r4r�r6rrrsrõºs ûz0_UnslothBCOTrainer._save_optimizer_and_schedulercsŠ|durt d|›�¡dStƒ |¡tj |t¡}tj |¡r)t	 
|j|¡|_|j
rAtj |t¡}tj |¡rCt |¡|_dSdSdS)NzMissing Checkpoint )r9Úwarning_oncer2Ú_load_optimizer_and_schedulerr>rørùr Úisfiler!Úload_from_jsonryr¥r¼rr7Úloadr®)r4Ú
checkpointÚrunning_fileÚclf_filer6rrrsrüÅsýz0_UnslothBCOTrainer._load_optimizer_and_schedulerccsŽ�|jr|js|j |j¡ ¡ntƒ�*|jr|j |j¡dV|jr5|j |jp+d¡WdƒdSWdƒdS1s@wYdS)zWContext manager for handling null reference model (that is, peft adapter manipulation).Nrw)	r�rQryÚunwrap_modelrCÚdisable_adapterr=Úset_adapterrPr»rrrrrsÚnull_ref_contextÖs€ÿÿý÷"øz#_UnslothBCOTrainer.null_ref_contextcs®|jrR|jsR|jj|j|jj|jjddœ}|j t	|j
fi|¤Ž¡}g}t|dd�D]}| |¡}|j 
|¡}| | ¡¡q*|j
jdt |¡ ¡ ¡d�|_
d|_tƒ ¡S)z·
        Returns the training [`~torch.utils.data.DataLoader`].

        Subclass of transformers.src.transformers.trainer.get_train_dataloader to precompute `ref_log_probs`.
        FrÞz!Train dataset reference log probsräÚreference_logps©ÚnameÚcolumnT)r'r”rEr¤rIrÛröryrêrrFrHÚcompute_reference_log_probsrìrgr«Ú
add_columnrGr§r¬rr2Úget_train_dataloader)r4rðrñÚreference_completion_logpsróÚreference_completion_logpr6rrrsr
äs$û	
ÿ
z'_UnslothBCOTrainer.get_train_dataloadercsè|dur
|jdur
tdƒ‚|dur|n|j}|jrm|jsm|jj|j|jj|jjddœ}|j	 
t|fi|¤Ž¡}g}t|dd�D]}| 
|¡}|j	 |¡}| | ¡¡q?|jdt |¡ ¡ ¡d�}|jdurj||_d	|_tƒj|d
�S)aé
        Returns the evaluation [`~torch.utils.data.DataLoader`].

        Subclass of transformers.src.transformers.trainer.get_eval_dataloader to precompute `ref_log_probs`.

        Args:
            eval_dataset (`torch.utils.data.Dataset`, *optional*):
                If provided, will override `self.eval_dataset`. If it is a [`~datasets.Dataset`], columns not accepted
                by the `model.forward()` method are automatically removed. It must implement `__len__`.
        Nz-Trainer: evaluation requires an eval_dataset.FrÞz Eval dataset reference log probsrärrT)rG)rGr�r'r•rEr¥rIrÛröryrêrrHrrìrgr«rrGr§r¬rr2Úget_eval_dataloader)r4rGrðrñrrórr6rrrsrs.û	
ÿ
z&_UnslothBCOTrainer.get_eval_dataloaderróc	Cst ¡�h|jdurB| ¡�+|jr&|j|d|d| d¡|dd�j}n|j|d|dd	�j}Wdƒn1s<wYn#|jrY|j|d|d| d¡|dd�j}n|j|d|dd	�j}Wdƒn1sowY|j||dd
|j|j	d�}|S)zfComputes log probabilities of the reference model for a single padded batch of a BCO specific dataset.NÚprompt_input_idsÚprompt_attention_maskÚcompletion_decoder_input_idsÚcompletion_labels)rÐÚdecoder_input_idsr¸Úcompletion_input_idsÚcompletion_attention_mask)rÐF©Úaverage_log_probr&r!)
rGrÓrDrr&rCr„riÚget_batch_logpsr!)r4róÚcompletion_logitsÚcompletion_logpsrrrrrsr4sZ


üûþý€ö€üûÿþ€åûz._UnslothBCOTrainer.compute_reference_log_probsFr˜rir¸rr!r&cCs¤|jdd…|jkrtdƒ‚|s*|dd…dd…f ¡}|dd…dd…dd…f}n| ¡}||k}d|||k<t||ƒ}|rK|| d¡| d¡S|| d¡S)aCompute the log probabilities of the given labels under the given logits.

        Args:
            logits: Logits of the model (unnormalized). Shape: (batch_size, sequence_length, vocab_size)
            labels:
                Labels for which to compute the log probabilities. Label tokens with a value of label_pad_token_id are
                ignored. Shape: (batch_size, sequence_length)
            average_log_prob:
                If True, return the average log probability per (non-masked) token. Otherwise, return the sum of the
                log probabilities of the (non-masked) tokens.

        Returns:
            A tensor of shape (batch_size,) containing the average/sum log probabilities of the given labels under the
            given logits.
        NrVzKLogits (batch and sequence length dim) and labels must have the same shape.r\r)r_r�ÚclonerEÚsum)rir¸rr!r&Ú	loss_maskrqrrrrrsr_s
z"_UnslothBCOTrainer.get_batch_logpsc
s|jr
ˆdˆ d¡dœni}|jrd|d<|ˆdfdˆdi|¤Ž}|j}|j|ˆdd	|j|jd
�}|jdtˆdƒkrDtd
ƒ‚‡fdd„t	|jdƒDƒ}‡fdd„t	|jdƒDƒ}||df}	||df}
||df}||df}|jrƒ|	|
|||j
fS|	|
||fS)Nrr)r¸rTrjrrÐrFrrrtz‡There is a mismatch between the number of examples in this batch and the number of examples for which an output sequence was predicted.có g|]}ˆd|dur|‘qS©rtTrr©Ú.0Úi©rÕrrrsÚ
<listcomp>¯ó z._UnslothBCOTrainer.forward.<locals>.<listcomp>cr ©rtFrrr"r%rrrsr&°r'.)r&r„r—rirr!r_rçr�ÚrangeÚaux_loss)
r4rCrÕÚmodel_kwargsÚoutputsrrrÙrÚÚchosen_logpsÚrejected_logpsÚ
chosen_logitsÚrejected_logitsrrr%rsÚforwardŒsJüþúÿþýûÿz_UnslothBCOTrainer.forwardr¶cCs8| |¡}|jj}|jj}|d|dj||d�}|S)Nr\rƒ)rær1)rÎrEr,r-Úclamp)r4r¶Úprob_desirableÚ	min_ratioÚ	max_ratioÚweightrrrrrsÚ_get_udm_weight½s

z"_UnslothBCOTrainer._get_udm_weightTÚpolicy_chosen_logpsÚpolicy_rejected_logpsÚreference_chosen_logpsÚreference_rejected_logpsrµrŸcCsÎ||}|j|}	||}
|j|
}|r"|j t |	|fd¡ ¡¡tj|jj|	jd�}t	 
|	|¡}
t	 
||¡}|jrXt |
¡}| 
|¡}tj||
||fdd�}n	tj|
|fdd�}||	||fS)aúCompute the BCO loss for a batch of policy and reference model log probabilities.

        Args:
            policy_chosen_logps:
                Log probabilities of the policy model for the chosen responses. Shape: (num(chosen) in batch_size,)
            policy_rejected_logps:
                Log probabilities of the policy model for the rejected responses. Shape: (num(rejected) in batch_size,)
            reference_chosen_logps:
                Log probabilities of the reference model for the chosen responses. Shape: (num(chosen) in batch_size,)
            reference_rejected_logps:
                Log probabilities of the reference model for the rejected responses. Shape: (num(rejected) in
                batch_size,)
            chosen_embeddings: embeddings of desirable prompts
            rejected_embeddings: embeddings of undesirable prompts

        Returns:
            A tuple of four tensors: (losses, chosen_rewards, rejected_rewards, delta). The losses tensor contains the
            BCO loss for each example in the batch. The chosen_rewards and rejected_rewards tensors contain the rewards
            for the chosen and rejected responses, respectively. The delta value contains the moving average of all
            implicit rewards.
        r©rÀr[)r r¥ÚupdaterGr§ÚdetachrÈrÂrÀrÚ
logsigmoidr¼r¨r7)r4r8r9r:r;rµr¶rŸÚchosen_logratiosÚchosen_rewardsÚrejected_logratiosÚrejected_rewardsÚdeltaÚ
chosen_lossesÚrejected_lossesÚ
chosen_weightÚrejected_weightÚlossesrrrrrsÚbco_lossÆs 


z_UnslothBCOTrainer.bco_lossc	sÞi}‡fdd„ˆ ¡Dƒ‰ˆ |ˆ¡}|dd…\}}}}	ˆjr$|d}
dˆvrY‡fdd„tˆdjdƒDƒ}‡fd	d„tˆdjdƒDƒ}ˆd|d
f}
ˆd|d
f}nLt ¡�@ˆjdur‡ˆ ¡�ˆ ˆj	ˆ¡dd…\}
}}}Wdƒn1s�wYnˆ ˆjˆ¡dd…\}
}}}Wdƒn1s wYˆ 
ˆ¡\}}ˆj|||
||||d�\}}}}ˆj 
|¡ ¡ ¡|d<t t|ƒg¡ ˆjj¡}t t|ƒg¡ ˆjj¡}ˆj 
|¡ ¡ ¡}ˆj 
|¡ ¡ ¡}|dk�r)ˆj 
| ¡¡ ¡ ¡|d
<ˆj 
| ¡¡ ¡ ¡|d<ˆj 
| ¡¡ ¡ ¡|d<||d<|dk�r\ˆj 
| ¡¡ ¡ ¡|d<ˆj 
| ¡¡ ¡ ¡|d<ˆj 
|	 ¡¡ ¡ ¡|d<||d<| ¡}ˆj�rk|ˆj|
7}||fS)zWCompute the BCO loss and other metrics for the given batch of inputs for train or test.cs0i|]\}}|t|tjƒr| ˆjj¡n|“qSrr)r‚rGrraryrÀ©r#ÚkÚvr»rrrsÚ
<dictcomp>s0z=_UnslothBCOTrainer.get_batch_loss_metrics.<locals>.<dictcomp>NrWrcr r!rrr"r%rrrsr&r'z=_UnslothBCOTrainer.get_batch_loss_metrics.<locals>.<listcomp>rcr r(rrr"r%rrrsr&r'.©rŸrDzrewards/chosen_sumzlogps/chosen_sumúlogits/chosen_sumzcount/chosenzrewards/rejected_sumzlogps/rejected_sumúlogits/rejected_sumzcount/rejected)Úitemsr1r—r)r_rGrÓrDrrCrÛrJryrìrÂÚitemrrçrarÀrÚnansumÚnanmeanr˜)r4rCrÕrŸÚmetricsÚforward_outputr8r9Úpolicy_chosen_logitsÚpolicy_rejected_logitsr*rÙrÚr:r;Ú_rµr¶rIrArCrDÚ
num_chosenÚnum_rejectedÚall_num_chosenÚall_num_rejectedÚlossrr)rÕr4rsÚget_batch_loss_metricsýsŒ
û  


ûû€
û€òù	
ÿÿÿ
ÿÿÿz)_UnslothBCOTrainer.get_batch_loss_metricsÚinputscCs‚|jr
t|jjjƒntƒ}|�| ||¡\}}Wdƒn1s"wY| |jj¡}|jj	r9|j
|dd�|r?||fS|S)NÚtrain©Ú
train_eval)rˆr(ryrÀr€r=r`rarEröÚ
store_metrics)r4rCraÚreturn_outputsÚnum_items_in_batchÚcompute_loss_context_managerr_rVrrrrrsÚcompute_loss[sÿÿz_UnslothBCOTrainer.compute_lossrbrVrd)rbÚevalcCs*| ¡D]\}}|j|| |¡qdSrf)rRr–rg)r4rVrdÚkeyÚvaluerrrrrsressÿz _UnslothBCOTrainer.store_metricscCs*|dur|j}|dust|ƒsdSt|ƒSrf)rFr/r")r4rÜrrrrrsÚ_get_train_samplerws
z%_UnslothBCOTrainer._get_train_samplerc	Cs:|jr
t|jjjƒntƒ}|�`|j|d|d|jd|jj	d�}d|vr*|d}n>|j
durV| ¡�|jj|d|d|jd|jj	d�}Wdƒn1sPwYn|j
j|d|d|jd|jj	d�}Wdƒn1srwYt
||j|jj	ƒ}|jj|dd�}t
||j|jj	ƒ}|jj|dd�}||fS)zRGenerate samples from the model and reference model for the given batch of inputs.rrT)rÏrÐrÚ	do_samplereÚreference_outputN)Úskip_special_tokens)rˆr(ryrÀr€r=ÚgeneraterrHrerDrrCr?Úbatch_decode)r4rCrÕÚgenerate_context_managerÚ
policy_outputroÚpolicy_output_decodedÚreference_output_decodedrrrrrsÚgenerate_from_model_and_ref~sJÿû	


ûÿ€	û€éz._UnslothBCOTrainer.generate_from_model_and_refr£Úignore_keysc	sBˆdurt|dƒrt|jdgƒ‰ng‰|jrt|jjjƒntƒ}t	 
¡�$|�|j||dd�\}}Wdƒn1s<wYWdƒn1sKwY|jjr[|j
|dd�|rd| ¡ddfSi}d|vrp|d|d<d	|vrz|d	|d
<‡fdd„| ¡Dƒ}	t	j|	|jjd
�}	t	j|	jd|jjd
�}
| ¡|	|
fS)Nr�Úkeys_to_ignore_at_inferenceFrOrjrcrPzeval_logits/chosenrQzeval_logits/rejectedcsg|]
\}}|ˆvr|‘qSrrrrrK©rxrrrsr&Ísz6_UnslothBCOTrainer.prediction_step.<locals>.<listcomp>r<r)rŠr…r�rˆr(ryrÀr€r=rGrÓr`rörer>rRrÆÚzerosr_)r4rCrar£rxÚprediction_context_managerr_rVÚlogits_dictrir¸rrrzrsÚprediction_steps0
ÿÿ€z"_UnslothBCOTrainer.prediction_steprjÚ
dataloaderÚdescriptionÚmetric_key_prefixcs$|jr†t|jƒ}tjt|ƒ|jjd�}|j |¡}| 	|¡}	| 
|	¡}	tj|	dtj
|jjd�}
t |
¡d}|	d||	d|t|Ž|	dƒdœ}| |j|¡\}
}tjgd	¢d
d„t|d|
|ƒDƒd�}d
|jjvrzt dtj|d�i¡d|jjvr†td|d�tƒ |||||¡}|S)zÞ
        Overriding built-in evaluation loop to store metrics for each batch. Prediction/evaluation loop, shared by
        `Trainer.evaluate()` and `Trainer.predict()`.

        Works both with or without labels.
        )rLrtrÁrrrÚprompt)rrr‚)ÚPromptÚPolicyz	Ref ModelcSs4g|]\}}}||t|ƒd…|t|ƒd…g‘qSrf)rç)r#r‚ÚpolÚrefrrrrrsr&øs ÿÿz6_UnslothBCOTrainer.evaluation_loop.<locals>.<listcomp>)ÚcolumnsÚdataÚwandbÚgame_log)rˆÚcomet_mlzgame_log.csv)r	Útable)r%rçrÜrDÚsampler)rEÚeval_batch_sizerérIÚ_prepare_inputsrGrÆrØryrÀrÒr6rwrCr@Ú	DataFramer`ròr‰ÚlogÚTabler8r2Úevaluation_loop)r4rr€r£rxr�Únum_samplesÚrandom_indicesÚrandom_batch_datasetÚrandom_batchÚ
target_labelsÚtarget_indiciesÚtarget_batchruÚref_output_decodedrŒÚinitial_outputr6rrrsr“Ós<


ýþþþ
ÿz"_UnslothBCOTrainer.evaluation_loopÚlogsÚ
start_timec
	s`d|vrdnd}|dkrdnd}dD]V}d|›�|j|vrht |j|d|›�¡ ¡ ¡}dD]-}t |j||›d	|›d
�¡ ¡ ¡|||›|›d	|›�<|j||›d	|›d
�=q1|j|d|›�=q|›d�|vrŠ|›d�|vrŠ||›d�||›d�||›d
�<|j| ¡D]\}}	t |	¡ ¡ ¡||›|›�<q‘|j|=tƒ ||¡S)a1
        Log `logs` on the various objects watching training, including stored metrics.