unsloth_compiled_cache/__pycache__/UnslothSFTTrainer.cpython-311.pyc

§
5$�hd÷ãót—dZddlmZddlZddlmZddlmZddlmZm	Z	m
Z
mZmZm
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZmZmZmZmZm Z m!Z!m"Z"m#Z#m$Z$mZm%Z%m&Z&m'Z'm(Z(m)Z)m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0mZm1Z1m2Z2m3Z3m4Z4m5Z5mZm6Z6m7Z7mZmZmZmZmZm
Z
mZm1Z1m2Z2m
Z
mZmZm"Z"m/Z/m1Z1m3Z3mZm1Z1ddl1Z1ddlTddl(m'Z'm8Z8dd	l9m:Z:ddlZddl;Z<dd
l&m=Z=ddlmZddl>m?Z?mZ@dd
dd
d
dœZAejBddeA¬¦«d„¦«ZCe'Gd„de ¦«¦«ZD	Gd„de"¦«ZEGd„deE¦«ZFdS)z8
2025.8.4
2025.8.5
4.55.1
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)?rÚAutoModelForCausalLMÚ
AutoTokenizerÚBaseImageProcessorrÚDataCollatorÚDataCollatorForLanguageModelingÚDatasetÚEvalPredictionÚFeatureExtractionMixinÚIterableDatasetrÚPathÚ
PeftConfigÚ	PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚ	SFTConfigÚ
SFTTrainerÚTrainerÚTrainerCallbackÚTrainingArgumentsr	Úclone_chat_templateÚ
contextlibÚ	dataclassÚdataclassesÚdefaultdictÚgenerate_model_cardÚget_act_offloading_ctx_managerÚget_comet_experiment_urlÚget_peft_modelÚis_conversationalÚis_peft_availableÚis_wandb_availableÚnnÚosÚpadÚpeftÚpeft_module_casting_to_bf16Úprepare_model_for_kbit_trainingÚtorchÚversionÚwarningsrrrrrrr	r.r/rrrrr+r.r0r3r.)Ú*)r#Úfield)ÚVersion)Únullcontext©ÚDataCollatorForSeq2SeqrTF)Úepilogue_fusionÚmax_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ	fullgraphÚoptionscó’—tj| d|jd¦«dd¬¦«}tj| d¦«dd¬¦«}g}t	||¦«D]‘\}}| tj¦«}tj|d| d¦«¬¦« 	d¦«}tj
|d¬¦«}||z
}	| |	¦«Œ’	tj|¦«}| |jd|jdf¦«}|S)Néÿÿÿÿér)ÚchunksÚdim)rFÚindex)rFé)
r3ÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ	unsqueezeÚsqueezeÚ	logsumexpÚappendÚconcat)
ÚlogitsrGÚchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚchunk_logitsÚchunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logpss
          ú]/workspace/Fine-tuning/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothSFTTrainer.pyÚchunked_selective_log_softmaxr_"s5€õ”[ §¢°°F´LÀÔ4DÑ!EÔ!EÐPQÐYZÐ[Ñ[Ô[€NÝ”[ §¢¨rÑ!2Ô!2¸QÀaÐHÑHÔH€MØÐå%(¨¸Ñ%GÔ%Gð4ð4Ñ!ˆ�kØ#—’¥u¤}Ñ5Ô5ˆÝœ, |¸2À{×G\ÒG\Ð]_ÑG`ÔG`ÐaÑaÔa×iÒiÐjlÑmÔmˆÝ œ?¨<¸rÐBÑBÔBÐØ)Ð,<Ñ<ˆØ×"Ò" ?Ñ3Ô3Ð3Ð3ØÝœ,Ð':Ñ;Ô;ÐØ-×5Ò5°v´|ÀA´ÈÌÐUVÌÐ6XÑYÔYÐØÐócó²‡—eZdZUdZedddi¬¦«Zeeed<edddi¬¦«Z	ee
ed	<																																																																																																																																																		d-ˆfd,„	ZˆxZS).ÚUnslothSFTConfiga5
    
    Configuration class for the [`SFTTrainer`].

    This class includes only the parameters that are specific to SFT training. For a full list of training arguments,
    please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may
    differ from those in [`~transformers.TrainingArguments`].

    Using [`~transformers.HfArgumentParser`] we can turn this class into
    [argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
    command line.

    Parameters:
        > Parameters that control the model

        model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
            Keyword arguments for [`~transformers.AutoModelForCausalLM.from_pretrained`], used when the `model`
            argument of the [`SFTTrainer`] is provided as a string.
        chat_template_path (`str` or `None`, *optional*, defaults to `None`):
            If specified, sets the model's chat template. This can either be the path to a tokenizer (local directory
            or Hugging Face Hub model) or a direct path to a Jinja template file. When using a Jinja file, you must
            ensure that any special tokens referenced in the template are added to the tokenizer and that the model's
            embedding layer is resized accordingly.

        > Parameters that control the data preprocessing

        dataset_text_field (`str`, *optional*, defaults to `"text"`):
            Name of the column that contains text data in the dataset.
        dataset_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
            Dictionary of optional keyword arguments for the dataset preparation. The only supported key is
            `skip_prepare_dataset`.
        dataset_num_proc (`int` or `None`, *optional*, defaults to `None`):
            Number of processes to use for processing the dataset.
        eos_token (`str` or `None`, *optional*, defaults to `None`):
            Token used to indicate the end of a turn or sequence. If `None`, it defaults to
            `processing_class.eos_token`.
        pad_token (`int` or `None`, *optional*, defaults to `None`):
            Token used for padding. If `None`, it defaults to `processing_class.pad_token`, or if that is also `None`,
            it falls back to `processing_class.eos_token`.
        max_length (`int` or `None`, *optional*, defaults to `1024`):
            Maximum length of the tokenized sequence. Sequences longer than `max_length` are truncated from the right.
            If `None`, no truncation is applied. When packing is enabled, this value sets the sequence length.
        packing (`bool`, *optional*, defaults to `False`):
            Whether to group multiple sequences into fixed-length blocks to improve computational efficiency and reduce
            padding. Uses `max_length` to define sequence length.
        packing_strategy (`str`, *optional*, defaults to `"bfd"`):
            Strategy for packing sequences. Can be either `"bfd"` (best-fit decreasing, default), or `"wrapped"`.
        padding_free (`bool`, *optional*, defaults to `False`):
            Whether to perform forward passes without padding by flattening all sequences in the batch into a single
            continuous sequence. This reduces memory usage by eliminating padding overhead. Currently, this is only
            supported with the FlashAttention 2 or 3, which can efficiently handle the flattened batch structure. When
            packing is enabled with strategy `"bfd"`, padding-free is enabled, regardless of the value of this
            parameter.
        pad_to_multiple_of (`int` or `None`, *optional*, defaults to `None`):
            If set, the sequences will be padded to a multiple of this value.
        eval_packing (`bool` or `None`, *optional*, defaults to `None`):
            Whether to pack the eval dataset. If `None`, uses the same value as `packing`.

        > Parameters that control the training

        completion_only_loss (`bool` or `None`, *optional*, defaults to `None`):
            Whether to compute loss only on the completion part of the sequence. If set to `True`, loss is computed
            only on the completion, which is supported only for [prompt-completion](#prompt-completion) datasets. If
            `False`, loss is computed on the entire sequence. If `None` (default), the behavior depends on the dataset:
            loss is computed on the completion for [prompt-completion](#prompt-completion) datasets, and on the full
            sequence for [language modeling](#language-modeling) datasets.
        assistant_only_loss (`bool`, *optional*, defaults to `False`):
            Whether to compute loss only on the assistant part of the sequence. If set to `True`, loss is computed
            only on the assistant responses, which is supported only for [conversational](#conversational) datasets. If `False`,
            loss is computed on the entire sequence.
        activation_offloading (`bool`, *optional*, defaults to `False`):
            Whether to offload the activations to the CPU.
    
    NÚhelpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsrCz8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksFÚnorDéréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?ç@Úlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsrHéôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastéÚtextéÚbfdc“ó„•—|dkrtd|›d�¦«‚|dkrtd|›d�¦«‚|€|#dkr
|$dkrd}d	}#|…€!d
dlm}”t	|”¦«dzd¦«}…t¦«jd�id
|“d|“d|“d|“d|“d|“d|“d|“d|	“d|
“d|“d|“d|
“d|“d|“d|“d|“d|“d|“d |“d!|“d"|“d#|“d$|“d%|“d&|“d'|“d(|“d)|“d*|“d+|“d,| “d-|!“d.|"“d/|#“d0|$“d1|%“d2|&“d3|'“d4|(“d5|)“d6|*“d7|+“d8|,“d9|-“d:|.“d;|/“d<|0“d=|1“d>|2“d?|3“d@|4“dA|5“dB|6“dC|7“dD|8“dE|9“dF|:“dG|;“dH|<“dI|=“dJ|>“dK|?“dL|@“dM|A“dN|B“dO|C“dP|D“dQ|E“dR|F“dS|G“dT|H“dU|I“dV|J“dW|K“dX|L“dY|M“dZ|N“d[|O“d\|P“d]|Q“d^|R“d_|S“d`|T“da|U“db|V“dc|W“dd|X“de|Y“df|Z“dg|[“dh|\“di|]“dj|^“dk|_“dl|`“dm|a“dn|b“do|c“dp|d“dq|e“dr|f“ds|g“dt|h“du|i“dv|j“dw|k“dx|l“dy|m“dz|n“d{|o“d||p“d}|q“d~|r“d|s“d€|t“d�|u“d‚|v“dƒ|w“d„|x“d…|y“d†|z“d‡|{“dˆ||“d‰|}“dŠ|~“d‹|“dŒ|€“d�|�“dŽ|‚“d�|ƒ“d�|„“d‘|…“d’|†“d“|‡“d”|ˆ“d•|‰“d–|Š“d—|‹“d˜|Œ“d™|�“dš|Ž“d›|�“dœ|�“|“¤Ž|‘|_|’|_dS)žNgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!rHza` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rvrwÚunsloth_training_checkpointsrhr)Ú	cpu_countriÚ
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚweight_decayÚ
adam_beta1Ú
adam_beta2Úadam_epsilonÚ
max_grad_normÚnum_train_epochsÚ	max_stepsÚlr_scheduler_typeÚwarmup_ratioÚwarmup_stepsÚ	log_levelÚlog_level_replicaÚlog_on_each_nodeÚlogging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ	data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚdisable_tqdmÚremove_unused_columnsÚlabel_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚfsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ	deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ	adafactorÚgroup_by_lengthÚlength_column_nameÚ	report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚpush_to_hubÚresume_from_checkpointÚhub_model_idÚhub_strategyÚ	hub_tokenÚhub_private_repoÚhub_always_pushÚhub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚfp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚtorchdynamoÚ	ray_scopeÚddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚmodel_init_kwargsÚchat_template_pathÚdataset_text_fieldÚdataset_kwargsÚdataset_num_procÚ	eos_tokenÚ	pad_tokenÚ
max_lengthÚpackingÚpacking_strategyÚpadding_freeÚpad_to_multiple_ofÚeval_packingÚcompletion_only_lossÚassistant_only_lossÚactivation_offloading©)	ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingr‡ÚminÚsuperÚ__init__rfrg)–Úselfrˆr‰rŠr‹rŒr�rŽr�r�r‘r’r“r”r•r–r—r˜r™ršr›rœr�ržrŸr r¡r¢r£r¤r¥r¦r§r¨r©rªr«r¬rr®r¯r°r±r²r³r´rµr¶r·r¸r¹rºr»r¼r½r¾r¿rÀrÁrÂrÃrÄrÅrÆrÇrÈrÉrÊrËrÌrÍrÎrÏrÐrÑrÒrÓrÔrÕrÖr×rØrÙrÚrÛrÜrÝrÞrßràrárârãrärårærçrèrérêrërìrírîrïrðrñròrórôrõrör÷rørùrúrûrürýrþrÿrrrrrrrrrr	r
rrr
rrrrrrrrrrrfrgÚkwargsr‡Ú	__class__s–                                                                                                                                                     €r^rzUnslothSFTConfig.__init__‡s2
ø€ðl˜4ÒÐÕ'9ð;VÐ]jð;Vð;Vð;Vñ(Wô(Wð"WØ˜1ÒÐ¥Mð3FÐUbð3Fð3Fð3Fñ%Gô%GðGØÐ -°7Ò":Ð":¸zÈSÒ?PÐ?PØ7ˆJØ ˆMØÐ#Ø1Ð1Ð1Ð1Ð1Ð1Ý" 9 9¡;¤;¨q¡=°!Ñ4Ô4Ðà�‰ŒÔðP	DðP	DðP	DØ#˜ðP	Dà#7Ð#7ðP	Dð �xðP	Dð�gð	P	Dð
$˜ðP	Dð*˜Mð
P	Dð$8Ð#7ðP	Dð+FÐ*EðP	Dð*DÐ)CðP	Dð(@Ð'?ðP	Dð'>Ð&=ðP	Dð+FÐ*EðP	Dð'>Ð&=ðP	Dð$˜ðP	Dð'>Ð&=ðP	Dð *˜Mð!P	Dð"(˜<ð#P	Dð$$˜ð%P	Dð&$˜ð'P	Dð((˜<ð)P	Dð**˜Mð+P	Dð, 0Ð/ð-P	Dð."˜	ð/P	Dð0!2Ð 1ð1P	Dð2(˜<ð3P	Dð4(˜<ð5P	Dð6"˜	ð7P	Dð8!2Ð 1ð9P	Dð: 0Ð/ð;P	Dð<&˜+ð=P	Dð> 0Ð/ð?P	Dð@"4Ð!3ðAP	DðB*˜MðCP	DðD&<Ð%;ðEP	DðF*˜MðGP	DðH$˜ðIP	DðJ 0Ð/ðKP	DðL 0Ð/ðMP	DðN!2Ð 1ðOP	DðP.˜oðQP	DðR7^Ð6]ðSP	DðT�gðUP	DðV�gðWP	DðX,˜^ðYP	DðZ�4ð[P	Dð\"˜	ð]P	Dð^*˜Mð_P	Dð` �xðaP	Dðb�4ðcP	Dðd�4ðeP	Dðf,˜^ðgP	Dðh&<Ð%;ðiP	Dðj,˜^ðkP	Dðl,˜^ðmP	Dðn�4ðoP	Dðp$˜ðqP	Dðr&˜+ðsP	Dðt*˜MðuP	Dðv!2Ð 1ðwP	Dðx�EðyP	Dðz$8Ð#7ð{P	Dð|$˜ð}P	Dð~&<Ð%;ðP	Dð@*DÐ)CðAP	DðB$˜ðCP	DðD �xðEP	DðF(˜<ðGP	DðH%:Ð$9ðIP	DðJ&˜+ðKP	DðL&<Ð%;ðMP	DðN%:Ð$9ðOP	DðP!2Ð 1ðQP	DðR 0Ð/ðSP	DðT�4ðUP	DðV#6Ð"5ðWP	DðX&˜+ðYP	DðZ2TÐ1Sð[P	Dð\"4Ð!3ð]P	Dð^"˜	ð_P	Dð`&<Ð%;ðaP	Dðb�EðcP	Dðd$˜ðeP	Dðf"˜	ðgP	Dðh.˜oðiP	Dðj"4Ð!3ðkP	Dðl"˜	ðmP	Dðn*DÐ)CðoP	Dðp!2Ð 1ðqP	Dðr%:Ð$9ðsP	Dðt%:Ð$9ðuP	Dðv-JÐ,IðwP	Dðx#6Ð"5ðyP	Dðz*DÐ)Cð{P	Dð|&˜+ð}P	Dð~&<Ð%;ðP	Dð@(˜<ðAP	DðB(˜<ðCP	DðD"˜	ðEP	DðF 0Ð/ðGP	DðH.˜oðIP	DðJ(˜<ðKP	DðL&<Ð%;ðMP	DðN-JÐ,IðOP	DðP*DÐ)CðQP	DðR&<Ð%;ðSP	DðT(˜<ðUP	DðV$8Ð#7ðWP	DðX(@Ð'?ðYP	DðZ!2Ð 1ð[P	Dð\*˜Mð]P	Dð^$8Ð#7ð_P	Dð` 0Ð/ðaP	Dðb&˜+ðcP	Dðd"˜	ðeP	Dðf&˜+ðgP	Dðh*˜MðiP	Dðj%:Ð$9ðkP	Dðl"4Ð!3ðmP	Dðn)BÐ(AðoP	Dðp-JÐ,IðqP	Dðr#6Ð"5ðsP	Dðt$8Ð#7ðuP	Dðv"4Ð!3ðwP	Dðx*˜MðyP	Dðz 0Ð/ð{P	Dð|#6Ð"5ð}P	Dð~&<Ð%;ðP	Dð@-JÐ,IðAP	DðB!2Ð 1ðCP	DðD"4Ð!3ðEP	DðF"4Ð!3ðGP	DðH,˜^ðIP	DðJ 0Ð/ðKP	DðL"˜	ðMP	DðN"˜	ðOP	DðP$˜ðQP	DðR�gðSP	DðT 0Ð/ðUP	DðV(˜<ðWP	DðX"4Ð!3ðYP	DðZ(˜<ð[P	Dð\$8Ð#7ð]P	Dð^#6Ð"5ð_P	Dð`%:Ð$9¸FðaP	DðP	DðP	Dðb%9ˆÔ!Ø"4ˆÔÐÐr`)’NNFFFrhFrDrDNNririrrjrkrlrmrnrorprqrCrrrsrrtruTNrvFrHFrvrwNTFFFFFFrxrxFFFFryrzFFNrCNNFr{FNrNrCNNTNFNNFr{rNNNNr|r}NFFr~NNNNTFTFFNNrNNFNFNFTrzNNNr{TFNr€r�FNNFFNNFFFNFTNNr‚NNNNrƒFr„FNNNFFNrC)
Ú__name__Ú
__module__Ú__qualname__Ú__doc__r7rfrrÚ__annotations__rgÚintrÚ
__classcell__©r!s@r^rbrb3sƒø€€€€€€ðIðIðT+0¨%ØØÐ1Ð2ð+ñ+ô+Ð˜( 3œ-ððñð*/¨ØØÐVÐWð*ñ*ô*Ð˜ #œððñðØ#ØØØØØ$Ø&'Ø%&Ø#'Ø"&Ø&'Ø"#ØØ"%ØØØØØØØØØ$ØØØØ%ØØØ"Ø"ØØ!&ØØØØØ!ØØ27ØØØØØØØØØØØ!'ØØØØØØØ!ØØ$ØØ!"Ø%)ØØØØ $ØØ!&Ø $Ø Ø ØØØØ-1Ø!ØØ!$ØØØØØ%ØØ%)Ø Ø $Ø $Ø(-Ø"Ø%*ØØ!%ØØ#ØØØØØ!&Ø(,Ø%*Ø!%ØØ#Ø#'Ø ØØ#Ø ØØØØØ $Ø!Ø$)Ø(-Ø"Ø#Ø"ØØ Ø"Ø!&Ø(,Ø Ø!Ø#ØØØØØØØ ØØ!ØØ#Ø#Ø %Ø#Øðgq5ðq5ðq5ðq5ðq5ðq5ðq5ðq5ðq5ðq5r`rbc óØ‡—eZdZdZddgZ													d+deeeje	fde
eeefde
e
d	e
eeefd
e
eeeeeffde
eeeeefde
ed
e
eegefde
eedee
ejje
ejjjfde
eeejjeee ffde
eej!ej!gej!fde
dde
eegeffˆfd„
Z"dedede	fd„Z#de	de dede	fd„Z$de	dede	fd„Z%de	dede	fd„Z&deeefde'de
eegefdedeeeff
d„Z(d „Z)d,ˆfd"„	Z*ˆfd#„Z+d-d$eee,fd%e
e,ddfˆfd&„
Z-ˆfd'„Z.			d.d(e
ede
ed)eeeedffd*„Z/ˆxZ0S)/Ú_UnslothSFTTrainerr{ÚtrlÚsftN©NNÚmodelÚargsÚ
data_collatorÚ
train_datasetÚeval_datasetÚprocessing_classÚcompute_loss_funcÚcompute_metricsÚ	callbacksÚ
optimizersÚoptimizer_cls_and_kwargsÚpreprocess_logits_for_metricsÚpeft_configrÚformatting_funccóÚ
•‡‡‡‡‡—t|t¦«r|n|jj}‰€.| d¦«d}t|›d�¦«Šnit‰t¦«rTt‰t
¦«s?‰ ¦«}‰j|d<| 	d¦«td)i|¤ŽŠ‰€tj|¦«Š‰j�E‰j}‰ 
|¦«}|€ td|›d‰jj›d�¦«‚|‰_‰j�)t|t¦«st'jd	¦«t|t¦«r‰ |‰¦«}‰j�£t.j ‰j¦«rd‰j d
¦«rJt7‰jd¬¦«5}| ¦«‰_ddd¦«n#1swxYwYg}nt=|‰‰j¦«\}Š}ng}		‰j#p‰j$o
‰j%dk‰_#|jj&dv}‰j#rs|�td¦«‚‰j$r‰j%dkrt'jd¦«|st'jd¦«‰j'dkr‰j$st'jd¦«tQtS|¦«¦«}‰j*€
d|v‰_*n‰j*‰_*|€o‰j+p
‰j+p‰j}‰ 
|¦«}|€ td|›d‰jj›d�¦«‚tY|‰j*‰j#|‰j-¬¦«}‰j$r!‰j%dkr|st'jd¦«‰j.rt_|¦«std¦«‚‰j0dup‰j0 1dd
¦«}|r¢‰j*r‰rtd ¦«‚‰ 2|‰‰‰j$‰d!¦«}|�i‰j3€‰j$n‰j3Št|th¦«r%ˆˆˆˆˆfd"„| 5¦«D¦«}n‰ 2|‰‰‰‰d#¦«}tmtn¦«tmtn¦«d$œ‰_8d%‰_9tu¦« ;|‰|||‰|||	|
||¬&¦«‰j<j=rt}‰j?¬'¦«‰_@ntƒjB¦«‰_@t‡‰j?d(¦«r!‰j? D‰jE¦«dSdS)*Nú/rCz-SFTrérôzThe specified `eos_token` ('zC') is not found in the vocabulary of the given `processing_class` (zX). Ensure that the `eos_token` exists in the vocabulary before using it as an EOS token.z�You passed model_init_kwargs to the `SFTConfig`, but your model is already instantiated. The `model_init_kwargs` will be ignored.)z.jinjaz.j2zutf-8)ÚencodingFÚembed_tokensÚlm_heada-Cloning chat template added new tokens to the tokenizer, but 'lm_head' is not in PEFT's `modules_to_save`. As a result, the model may not learn to generate outputs with these new tokens, leading to degraded generation quality. To fix this, add `modules_to_save=['lm_head']` to your PEFT configuration.r„)Úflash_attention_2z"kernels-community/vllm-flash-attn3zHPassing a custom data collator is not supported when using padding-free.Úwrappedz¯You are passing `padding_free=True` with the 'wrapped' packing strategy, which is not recommended. Please refer to the documentation to understand why this is not recommended.açPadding-free training is enabled, but the attention implementation is not set to 'flash_attention_2'. Padding-free training flattens batches into a single sequence, and 'flash_attention_2' is the only known attention mechanism that reliably supports this. Using other implementations may lead to unexpected behavior. To ensure compatibility, set `attn_implementation='flash_attention_2'` in the model configuration, or verify that your attention mechanism can handle flattened sequences.rHzÎYou are using a per_device_train_batch_size of 1 with padding-free training. Using a batch size of 1 anihilate the benefits of padding-free training. Please consider increasing the batch size to at least 2.ÚpromptzThe specified `pad_token` ('z[). Ensure that the `pad_token` exists in the vocabulary before using it as a padding token.)Úpad_token_idrrÚreturn_position_idsra$You are using packing, but the attention implementation is not set to 'flash_attention_2' or 'kernels-community/vllm-flash-attn3'. Packing flattens batches into a single sequence, and Flash Attention is the only known attention mechanisms that reliably support this. Using other implementations may lead to cross-contamination between batches. To avoid this, either disable packing by setting `packing=False`, or set `attn_implementation='flash_attention_2'` or `attn_implementation='kernels-community/vllm-flash-attn3'` in the model configuration.z…You set `assistant_only_loss=True`, but the dataset is not conversational. This option is only supported for conversational datasets.Úskip_prepare_datasetaEA formatting function was provided while `completion_only_loss=True`, which is incompatible. Using a formatter converts the dataset to a language modeling type, conflicting with completion-only loss. To resolve this, apply your formatting function before passing the dataset, or disable `completion_only_loss` in `SFTConfig`.ÚtraincóL•—i|] \}}|‰ |‰‰‰‰|¦«“Œ!Sr)Ú_prepare_dataset)Ú.0ÚkeyÚdatasetr0r<rr4rs   €€€€€r^ú
<dictcomp>z/_UnslothSFTTrainer.__init__.<locals>.<dictcomp>wsKø€ð$ð$ð$á(˜C ð˜T×2Ò2°7Ð<LÈdÐT[Ð]lÐnqÑrÔrð$ð$ð$r`Úeval)rHrOr)r/r0r1r2r3r4r5r6r7r8r9r:)r/Úadd_model_tagsr)FÚ
isinstanceÚstrÚconfigÚ
_name_or_pathÚsplitrr Úto_dictréÚpoprÚfrom_pretrainedr
Úconvert_tokens_to_idsÚ
ValueErrorr!r"Úeos_token_idrr5ÚwarnÚ_create_model_from_pathr	r.ÚpathÚisfileÚendswithÚopenÚreadÚ
chat_templater!Útrainable_token_indicesÚextendÚmodules_to_saverSrrrÚ_attn_implementationr�ÚnextÚiterrrrrrr*rÚgetrJrÚdictÚitemsr%ÚlistÚ_metricsÚ_total_train_tokensrrr0rr'r/Ú maybe_activation_offload_contextr"r9ÚhasattrrPÚ
_tag_names)rr/r0r1r2r3r4r5r6r7r8r9r:r;r<Úmodel_idÚ
model_nameÚ	dict_argsr
r[Úchat_template_fileÚadded_tokensÚuse_flash_attentionÚdataset_samplerrEÚpreprocess_datasetrr!s` `   `       `            @€r^rz_UnslothSFTTrainer.__init__Àsíøøøøøø€õ(' ucÑ2Ô2ÐR�5�5¸¼Ô8RˆØˆ<Ø!Ÿš¨Ñ,Ô,¨RÔ0ˆJÝ 
Ð0Ð0Ð0Ñ1Ô1ˆDˆDÝ
˜Õ/Ñ
0Ô
0ð	*½ÀDÍ)Ñ9TÔ9Tð	*ØŸš™œˆIØ%)¤^ˆI�kÑ"Ø�MŠMÐ-Ñ.Ô.Ð.ÝÐ)Ð)˜yÐ)Ð)ˆDðÐ#Ý,Ô<¸XÑFÔFÐàŒ>Ð%ØœˆIØ+×AÒAÀ)ÑLÔLˆLØÐ#Ý ðI°9ðIðIØ+;Ô+EÔ+NðIðIðIñôðð
-9ÐÔ)ðÔ!Ð-µjÀÍÑ6LÔ6LÐ-ÝŒMð;ñ
ô
ð
õ�e�SÑ!Ô!ð	>Ø×0Ò0°¸Ñ=Ô=ˆEàÔ"Ð.ÝŒw�~Š~˜dÔ5Ñ6Ô6ð
¸4Ô;R×;[Ò;[Ð\mÑ;nÔ;nð
Ý˜$Ô1¸GÐDÑDÔDðOÐHZØ5G×5LÒ5LÑ5NÔ5NÐ$Ô2ðOðOðOñOôOðOðOðOðOðOðOøøøðOðOðOðOà!��å8KØÐ+¨TÔ-Dñ9ô9Ñ5�Ð'¨¨ðˆLð	Fð0	ð!Ô-Ðb°$´,Ð2aÀ4ÔCXÐ\aÒCaˆÔØ#œlÔ?ðD
ð
ÐðÔð	ØÐ(Ý Ð!kÑlÔlÐlØŒ|ð
 Ô 5¸Ò BÐ BÝ”
ðpñôðð'ð
Ý”
ðJñôððÔ/°1Ò4Ð4¸T¼\Ð4Ý”
ð%ñôðõ�d =Ñ1Ô1Ñ2Ô2ˆØÔ$Ð,Ø(0°NÐ(BˆDÔ%Ð%à(,Ô(AˆDÔ%àÐ ðœÐbÐ*:Ô*DÐbÐHXÔHbˆIØ+×AÒAÀ)ÑLÔLˆLØÐ#Ý ðL°9ðLðLØ+;Ô+EÔ+NðLðLðLñôðõ
<Ø)Ø%)Ô%>Ø!Ô.à$7Ø#'Ô#:ð
ñôˆMðŒ<ð	˜DÔ1°UÒ:Ð:ÐCVÐ:ÝŒMðiñ
ô
ð
ðÔ#ð	Õ,=¸nÑ,MÔ,Mð	Ýð9ñôð
ð"Ô0°DÐ8ÐvÀÔ@S×@WÒ@WÐXnÐpuÑ@vÔ@vÐ<vÐØð	ØÔ(ð
¨_ð
Ý ðQñôðð!×1Ò1ØÐ/°°t´|À_ÐV]ñôˆMðÐ'Ø*.Ô*;Ð*C˜$œ,˜,ÈÔIZ�Ý˜lDÑ1Ô1ðð$ð$ð$ð$ð$ð$ð$ð$à,8×,>Ò,>Ñ,@Ô,@ð$ñ$ô$�L�Lð
$(×#8Ò#8Ø$Ð&6¸¸gÀÐX^ñ$ô$�Lõ
#.dÑ"3Ô"3½[ÍÑ=NÔ=NÐOÐOˆŒ
Ø#$ˆÔ õ	‰Œ×ÒØØØ'Ø'Ø%Ø-Ø/Ø+ØØ!Ø%=Ø*Gð	ñ
	
ô
	
ð
	
ð Œ9Ô*ð	MÝ4RÐY]ÔYcÐ4dÑ4dÔ4dˆDÔ1Ð1å4>Ô4JÑ4LÔ4LˆDÔ1õ�4”:Ð/Ñ0Ô0ð	7ØŒJ×%Ò% d¤oÑ6Ô6Ð6Ð6Ð6ð	7ð	7sÇHÈHÈHÚ
model_pathÚreturncó2—|jpi}| d¦«}t|tj¦«s|dks|€nCt|t
¦«rt
t|¦«}||d<ntd|›d�¦«‚tj	|fi|¤Ž}|S)z0Creates a model from a path or model identifier.Útorch_dtyperzNzˆInvalid `torch_dtype` passed to `SFTConfig`. Expected either 'auto' or a string representing a `torch.dtype` (e.g., 'float32'), but got ú.)
rrjrQr3ÚdtyperRÚgetattrrZr
rX)rr{r0rr~r/s      r^r]z*_UnslothSFTTrainer._create_model_from_path£sÅ€à Ô2Ð8°bÐà'×+Ò+¨MÑ:Ô:ˆÝ�k¥5¤;Ñ/Ô/ð		°;À&Ò3HÐ3HÈKÐL_ØÝ
˜¥SÑ
)Ô
)ð	Ý!¥%¨Ñ5Ô5ˆKØ/:Ð˜mÑ,Ð,åðMØ>IðMðMðMñôð
õ%Ô4°ZÐUÐUÐCTÐUÐUˆØˆr`cóî—t¦«std¦«‚t|dd¦«pt|dd¦«}d}t|dd¦«r?| ¦«D]*\}}|jjdkr|jjjdv}nŒ+|r/|s-| 	||¦«}tj|d¬¦«}n|jr| 
||¦«}|�jtjt j¦«tjd	¦«kr&t|dd¦«r|rt%||d¬
¦«}nt%||¦«}|jr"t|dd¦«r|st)|¦«|S)z#Prepares a model for PEFT training.z9To use PeftModel, you need to install the `peft` library.Úis_loaded_in_4bitFÚis_loaded_in_8bitÚ
Params4bit>ÚcpuÚmeta)ríNz0.12)Úautocast_adapter_dtype)r+ÚImportErrorr�Únamed_parametersr!r"ÚdataÚdeviceÚtypeÚ _prepare_model_for_kbit_trainingr$ÚreplaceríÚ_enable_gradient_checkpointingr4Úparser0Ú__version__r)r¸r1)rr/r;r0Úis_qloraÚis_sharded_qloraÚ_Úparams        r^Ú_prepare_peft_modelz&_UnslothSFTTrainer._prepare_peft_modelºs»€å Ñ"Ô"ð	[ÝÐYÑZÔZÐZõ˜5Ð"5°uÑ=Ô=ÐkÅÈÐPcÐejÑAkÔAkˆà ÐÝ�5Ð-¨uÑ5Ô5ð	à!×2Ò2Ñ4Ô4ð
ð
‘��5Ø”?Ô+¨|Ò;Ð;Ø',¤zÔ'8Ô'=ÀÐ'PÐ$Ø�Eð<ð
ð	EÐ,ð	EØ×9Ò9¸%ÀÑFÔFˆEåÔ& tÀEÐJÑJÔJˆDˆDØ
Ô
(ð	EØ×7Ò7¸¸tÑDÔDˆEðÐ"å”
�dÔ.Ñ/Ô/µ7´=ÀÑ3HÔ3HÒHÐHÝ˜EÐ#6¸Ñ>Ô>ðIà$ðIõ' u¨kÐRWÐXÑXÔX��å& u¨kÑ:Ô:�ðŒ9ð	/� Ð(;¸UÑCÔCð	/ÐL\ð	/Ý'¨Ñ.Ô.Ð.àˆr`có>—|j|jpidœ}t|fi|¤ŽS)z-Prepares a quantized model for kbit training.)Úuse_gradient_checkpointingrî)rírîr2)rr/r0Úprepare_model_kwargss    r^rŽz3_UnslothSFTTrainer._prepare_model_for_kbit_trainingãs<€ð+/Ô*EØ-1Ô-OÐ-UÐSUð 
ð 
Ðõ
/¨uÐMÐMÐ8LÐMÐMÐMr`cóÒ—|jpi}d|vp|d}|rOt|d¦«r| ¦«n*d„}| ¦« |¦«|S)z-Enables gradient checkpointing for the model.Ú
use_reentrantÚenable_input_require_gradscó0—| d¦«dS)NT)Úrequires_grad_)ÚmoduleÚinputÚoutputs   r^Úmake_inputs_require_gradzS_UnslothSFTTrainer._enable_gradient_checkpointing.<locals>.make_inputs_require_gradøs€Ø×)Ò)¨$Ñ/Ô/Ð/Ð/Ð/r`)rîrqr�Úget_input_embeddingsÚregister_forward_hook)rr/r0rîrœr£s      r^r�z1_UnslothSFTTrainer._enable_gradient_checkpointingìs•€à(,Ô(JÐ(PÈbÐ%àÐ#@Ð@ÐrÐDaÐbqÔDrð	ðð	]Ý�uÐ:Ñ;Ô;ð
]Ø×0Ò0Ñ2Ô2Ð2Ð2ð0ð0ð0ð×*Ò*Ñ,Ô,×BÒBÐC[Ñ\Ô\Ð\àˆr`rMrÚdataset_namecó ‡‡‡‡‡‡‡—	t|t¦«r|Sn#YnxYwi}t|t¦«}t|d¦«}	|Š|	r|jŠt|dd¦«Š‰dkrt|dd¦«Š‰dkrt|dd¦«Š‰dkrt|dd¦«Š‰dkrt
d¦«‚t|dd¦«Š‰dkŠd	Šd
}
ttt|¦«¦« 
¦«¦«}dg}d|vr| d¦«dd
lm
}
m}d|vrR|	r(t‰d¦«st
d|j›d�¦«‚|
‰¦«|_| d¦«d	}
nZd|vr?|	r(t‰d¦«st
d|j›d�¦«‚|‰d	¬¦«|_d	}
n‰|vrd
Š‰€t
d¦«‚	|
�r†‰rR‰tt|¦«¦«¦«}t|t"¦«st%d¦«‚|d}n(tt|¦«¦«‰d}t|dd¦«}|dkr|	rt‰dd¦«}|€d}d
Št|dd¦«}t‰dd¦«}|p|}|�*| |¦«s||vrd	Št)d¦«	ˆˆˆˆˆˆˆfd„}	t|t*¦«st|dd¦«|d<n|jj|d<|r	d‰›d�|d <|j|fd!d
i|¤Ž}|	r$t|d¦«s|‰d	¬¦«}||_		|rt)d"¦«|S	|S)#NÚ	tokenizerrrÚmax_seq_lengthÚmax_seqz1Unsloth: max_seq_length is 0! Please specify one!r
r‚FTÚ	input_idsÚattention_maskr:Úlabelsr/z	Unsloth: z does not have .pad!)Úmlmz-Unsloth: You must specify a `formatting_func`zIUnsloth: The `formatting_func` should return a list of processed strings.rcr{Ú	bos_tokenzHUnsloth: We found double BOS tokens - we shall remove one automatically.cóJ•—‰‰s|‰n
‰|¦«‰‰d‰¬¦«S)NF)Ú
truncationrÚreturn_token_type_idsÚadd_special_tokensr)Úexampler³r
Údo_formatting_funcÚ
do_truncationr<r©r¨s €€€€€€€r^Ú	_tokenizez6_UnslothSFTTrainer._prepare_dataset.<locals>._tokenize\sFø€Ø �yØ7IÐg�GÐ.Ô/Ð/ÈÈÐ_fÑOgÔOgØ!.Ø!/Ø,1Ø);ðñôðr`rriÚnum_procÚ
batch_sizezUnsloth: Tokenizing ["z"]ÚdescÚbatchedzPUnsloth: Hugging Face's packing is currently buggy - we're disabling it for now!)rQÚConstantLengthDatasetrrqr¨r�ÚRuntimeErrorÚsetrhriÚkeysrSÚtransformersr;rr!r1rmrZÚ
startswithÚprintrÚ_ex_iterabler¹ÚmapÚselect_columnsÚ
pack_examples)rrMr4r0rr<r¦Ú
map_kwargsÚuse_descÚis_vlmÚdo_tokenizeÚcolumn_namesÚused_column_namesr;rÚ	test_textrcÚbos_token_1Úbos_token_2r¯r·r1r³r
rµr¶r©r¨s     `                @@@@@@r^rJz#_UnslothSFTTrainer._prepare_datasetÿsÌøøøøøøø€ð	Ý˜'Õ#8Ñ9Ô9ÐIÀ'¸>ÐIøð	ØˆDøøøàˆ
Ý˜g¥wÑ/Ô/ˆÝÐ)¨;Ñ7Ô7ˆØ$ˆ	ØÐ9Ð/Ô9�9õ!  |°QÑ7Ô7ˆØ˜QÒÐµ¸Ð?OÐQRÑ1SÔ1S Ø˜QÒÐµ¸Ð?OÐQRÑ1SÔ1S Ø˜QÒÐµ¸¸yÈ!Ñ1LÔ1L Ø˜QÒÐ¥lÐ3fÑ&gÔ&gÐ gÝ$ TÐ+?ÀÑHÔHÐØ&¨!Ò+ˆ
Ø"ÐØˆõ�4¥ W¡
¤
Ñ.Ô.×3Ò3Ñ5Ô5Ñ6Ô6ˆØ(˜MÐØ˜|Ð+Ð+Ø×$Ò$Ð%5Ñ6Ô6Ð6ð	YÐXÐXÐXÐXÐXÐXÐXØ�|Ð#Ð#àð
a�g i°Ñ7Ô7ð
aå"Ð#_Ð/?Ô/IÐ#_Ð#_Ð#_Ñ`Ô`Ð`Ø!7Ð!7¸	Ñ!BÔ!BˆDÔØ×$Ò$ XÑ.Ô.Ð.ØˆKˆKØ
˜LÐ
(Ð
(àð
a�g i°Ñ7Ô7ð
aå"Ð#_Ð/?Ô/IÐ#_Ð#_Ð#_Ñ`Ô`Ð`Ø!@Ð!@ÀÐRWÐ!XÑ!XÔ!XˆDÔØˆKˆKØ
 |Ð
3Ð
3Ø!%ÐØÐ&Ý"Ð#RÑSÔSÐSØàñ6	à!ð
GØ+˜ODµ°g±´Ñ,?Ô,?Ñ@Ô@�	Ý! )TÑ2Ô2ðÝ$Øcñôðð& aœL�	�	å ¥ g¡¤Ñ/Ô/Ð0BÔCÀAÔF�	õ$Ð$4°oÀrÑJÔJˆMØ Ò"Ð" vÐ"Ý '¨	°?ÀBÑ GÔ G�
ØÐ$Ø "�
ð"&ÐÝ!Ð"2°KÀÑFÔFˆKÝ! )¨[¸$Ñ?Ô?ˆKØ#Ð2 {ˆIàÐ$Ø×'Ò'¨	Ñ2Ô2ðf°iÀ=Ð6PÐ6PØ).Ð&ÝÐdÑeÔeÐeØð
ð
ð
ð
ð
ð
ð
ð
ð
ð
ð
ð