Files
DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/__pycache__/UnslothSFTTrainer.cpython-311.pyc
T

306 lines
58 KiB
Plaintext
Raw Normal View History

2025-08-13 23:50:20 +00:00
§
5$hd÷ãótdZddlmZddlZddlmZddlmZddlmZm Z m
Z
m Z m Z m
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZmZmZmZmZm Z m!Z!m"Z"m#Z#m$Z$m Z m%Z%m&Z&m'Z'm(Z(m)Z)m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0mZm1Z1m2Z2m3Z3m4Z4m5Z5mZm6Z6m7Z7mZmZmZmZmZm
Z
m Z m1Z1m2Z2m
Z
mZmZm"Z"m/Z/m1Z1m3Z3mZm1Z1ddl1Z1ddlTddl(m'Z'm8Z8dd l9m:Z:ddlZddl;Z<dd
l&m=Z=ddlmZdd l>m?Z?mZ@d d
d d
d
dœZAejBd d eA¬¦«d¦«ZCe'Gdde ¦«¦«ZD Gdde"¦«ZEGddeE¦«ZFdS)z8
2025.8.4
2025.8.5
4.55.1
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)?rÚAutoModelForCausalLMÚ
AutoTokenizerÚBaseImageProcessorr Ú DataCollatorÚDataCollatorForLanguageModelingÚDatasetÚEvalPredictionÚFeatureExtractionMixinÚIterableDatasetrÚPathÚ
PeftConfigÚ PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚ SFTConfigÚ
SFTTrainerÚTrainerÚTrainerCallbackÚTrainingArgumentsr Úclone_chat_templateÚ
contextlibÚ dataclassÚ dataclassesÚ defaultdictÚgenerate_model_cardÚget_act_offloading_ctx_managerÚget_comet_experiment_urlÚget_peft_modelÚis_conversationalÚis_peft_availableÚis_wandb_availableÚnnÚosÚpadÚpeftÚpeft_module_casting_to_bf16Úprepare_model_for_kbit_trainingÚtorchÚversionÚwarningsr rrrrrr r.r/rrrrr+r.r0r3r.)Ú*)r#Úfield)ÚVersion)Ú nullcontext©ÚDataCollatorForSeq2SeqrTF)Úepilogue_fusionÚ max_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ fullgraphÚoptionscó’tj| d|jd¦«dd¬¦«}tj| d¦«dd¬¦«}g}t ||¦«D]\}}| tj¦«}tj|d| d¦«¬¦«  d¦«}tj
|d¬¦«}||z
} |  | ¦«Œ’ tj |¦«}| |jd|jdf¦«}|S)Néÿÿÿÿér)ÚchunksÚdim)rFÚindex)rFé)
r3ÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ unsqueezeÚsqueezeÚ logsumexpÚappendÚconcat)
ÚlogitsrGÚchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚ chunk_logitsÚ chunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logpss
ú]/workspace/Fine-tuning/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothSFTTrainer.pyÚchunked_selective_log_softmaxr_"s5õ”[ §¢°°F´LÀÔ4DÑ!EÔ!EÐPQÐYZÐ[€NÝ”[ §¢¨rÑ!2Ô!2¸QÀaÐH€MØÐå%(¨¸Ñ%GÔ%Gð #—¥u¤}Ñ Ýœ, |¸2À{×G\ÒG\Ð]_ÑG`ÔG`Ða×iÐjlÑmˆÝ œ?¨<¸Ø)Ð,<Ñ<ˆØ×" Ýœ,Ð':ÑØ-×5°v´|ÀA´ÈÌ ÐUVÌÐ6XÑØ Ðócó²eZdZUdZedddi¬¦«Zeeed<edddi¬¦«Z ee
ed < d-ˆfd,„ Z ˆxZ S).ÚUnslothSFTConfiga5
Configuration class for the [`SFTTrainer`].
This class includes only the parameters that are specific to SFT training. For a full list of training arguments,
please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may
differ from those in [`~transformers.TrainingArguments`].
Using [`~transformers.HfArgumentParser`] we can turn this class into
[argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
command line.
Parameters:
> Parameters that control the model
model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
Keyword arguments for [`~transformers.AutoModelForCausalLM.from_pretrained`], used when the `model`
argument of the [`SFTTrainer`] is provided as a string.
chat_template_path (`str` or `None`, *optional*, defaults to `None`):
If specified, sets the model's chat template. This can either be the path to a tokenizer (local directory
or Hugging Face Hub model) or a direct path to a Jinja template file. When using a Jinja file, you must
ensure that any special tokens referenced in the template are added to the tokenizer and that the model's
embedding layer is resized accordingly.
> Parameters that control the data preprocessing
dataset_text_field (`str`, *optional*, defaults to `"text"`):
Name of the column that contains text data in the dataset.
dataset_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
Dictionary of optional keyword arguments for the dataset preparation. The only supported key is
`skip_prepare_dataset`.
dataset_num_proc (`int` or `None`, *optional*, defaults to `None`):
Number of processes to use for processing the dataset.
eos_token (`str` or `None`, *optional*, defaults to `None`):
Token used to indicate the end of a turn or sequence. If `None`, it defaults to
`processing_class.eos_token`.
pad_token (`int` or `None`, *optional*, defaults to `None`):
Token used for padding. If `None`, it defaults to `processing_class.pad_token`, or if that is also `None`,
it falls back to `processing_class.eos_token`.
max_length (`int` or `None`, *optional*, defaults to `1024`):
Maximum length of the tokenized sequence. Sequences longer than `max_length` are truncated from the right.
If `None`, no truncation is applied. When packing is enabled, this value sets the sequence length.
packing (`bool`, *optional*, defaults to `False`):
Whether to group multiple sequences into fixed-length blocks to improve computational efficiency and reduce
padding. Uses `max_length` to define sequence length.
packing_strategy (`str`, *optional*, defaults to `"bfd"`):
Strategy for packing sequences. Can be either `"bfd"` (best-fit decreasing, default), or `"wrapped"`.
padding_free (`bool`, *optional*, defaults to `False`):
Whether to perform forward passes without padding by flattening all sequences in the batch into a single
continuous sequence. This reduces memory usage by eliminating padding overhead. Currently, this is only
supported with the FlashAttention 2 or 3, which can efficiently handle the flattened batch structure. When
packing is enabled with strategy `"bfd"`, padding-free is enabled, regardless of the value of this
parameter.
pad_to_multiple_of (`int` or `None`, *optional*, defaults to `None`):
If set, the sequences will be padded to a multiple of this value.
eval_packing (`bool` or `None`, *optional*, defaults to `None`):
Whether to pack the eval dataset. If `None`, uses the same value as `packing`.
> Parameters that control the training
completion_only_loss (`bool` or `None`, *optional*, defaults to `None`):
Whether to compute loss only on the completion part of the sequence. If set to `True`, loss is computed
only on the completion, which is supported only for [prompt-completion](#prompt-completion) datasets. If
`False`, loss is computed on the entire sequence. If `None` (default), the behavior depends on the dataset:
loss is computed on the completion for [prompt-completion](#prompt-completion) datasets, and on the full
sequence for [language modeling](#language-modeling) datasets.
assistant_only_loss (`bool`, *optional*, defaults to `False`):
Whether to compute loss only on the assistant part of the sequence. If set to `True`, loss is computed
only on the assistant responses, which is supported only for [conversational](#conversational) datasets. If `False`,
loss is computed on the entire sequence.
activation_offloading (`bool`, *optional*, defaults to `False`):
Whether to offload the activations to the CPU.
helpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsrCz8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksFÚnorDéréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?çlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsrHéôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastéÚtextéÚbfdc“ ó„|dkrtd|d¦«|dkrtd|d¦«||#dkr
|$dkrd}d }#|…€!d
d lm}”t |”¦«d zd ¦«}…t ¦«jdid
|d|d|d|d|d|d|d|d| “d|
d| d| d|
d|d|d|d|d|d|d |d!|d"|d#|d$|d%|d&|d'|d(|d)|d*|d+|d,| “d-|!“d.|"“d/|#“d0|$“d1|%“d2|&“d3|'“d4|(“d5|)“d6|*“d7|+“d8|,“d9|-“d:|.“d;|/“d<|0“d=|1“d>|2“d?|3“d@|4“dA|5“dB|6“dC|7“dD|8“dE|9“dF|:“dG|;“dH|<“dI|=“dJ|>“dK|?“dL|@“dM|A“dN|B“dO|C“dP|D“dQ|E“dR|F“dS|G“dT|H“dU|I“dV|J“dW|K“dX|L“dY|M“dZ|N“d[|O“d\|P“d]|Q“d^|R“d_|S“d`|T“da|U“db|V“dc|W“dd|X“de|Y“df|Z“dg|[“dh|\“di|]“dj|^“dk|_“dl|`“dm|a“dn|b“do|c“dp|d“dq|e“dr|f“ds|g“dt|h“du|i“dv|j“dw|k“dx|l“dy|m“dz|n“d{|o“d||p“d}|q“d~|r“d|s“d€|t“d|u“d|v“dƒ|w“d„|x“d…|y“d†|z“d‡|{“dˆ||“d‰|}“dŠ|~“d|dŒ|€“d|dŽ|‚“d|ƒ“d|„“d‘|…“d’|†“d“|‡“d”|ˆ“d•|‰“d–|Š“d—|‹“d˜|Œ“d™|dš|Ž“d›|dœ||“¤Ž||_||_dS)žNgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!rHza` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rvrwÚunsloth_training_checkpointsrhr)Ú cpu_countriÚ
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚ weight_decayÚ
adam_beta1Ú
adam_beta2Ú adam_epsilonÚ
max_grad_normÚnum_train_epochsÚ max_stepsÚlr_scheduler_typeÚ warmup_ratioÚ warmup_stepsÚ log_levelÚlog_level_replicaÚlog_on_each_nodeÚ logging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚ ddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚ disable_tqdmÚremove_unused_columnsÚ label_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚ fsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ adafactorÚgroup_by_lengthÚlength_column_nameÚ report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚ push_to_hubÚresume_from_checkpointÚ hub_model_idÚ hub_strategyÚ hub_tokenÚhub_private_repoÚhub_always_pushÚ hub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚ fp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚ torchdynamoÚ ray_scopeÚ ddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚmodel_init_kwargsÚchat_template_pathÚdataset_text_fieldÚdataset_kwargsÚdataset_num_procÚ eos_tokenÚ pad_tokenÚ
max_lengthÚpackingÚpacking_strategyÚ padding_freeÚpad_to_multiple_ofÚ eval_packingÚcompletion_only_lossÚassistant_only_lossÚactivation_offloading©) ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingr‡ÚminÚsuperÚ__init__rfrg)–Úselfrˆr‰rrrrrrr“r”r•r–r—r™rrr r­r¿rÿrrrrrrrrrr r
r r r
rrrrrrrrrrrfrgÚkwargsr‡Ú __class__s €r^rzUnslothSFTConfig.__init__‡s2
ø€ðl ˜ Ð Õ'9ð;VÐ]jð;Vð;Vð;Vñ(Wô(Wð"WØ ˜ Ð ¥Mð3FÐUbð3Fð3Fð3Fñ%Gô%GðGØ Ð  -°7Ò":Ð":¸zÈSÒ?PÐ?PØ7ˆ ˆ Ð " 9 9¡;¤;¨q¡=°!Ñ àŒÔðP DðP DðP DØ#˜ðP Dà#7Ð#7ðP Dð P Dðgð P Dð
$˜ð P Dð *˜
P Dð$8Ð#7ðP Dð+FÐ*EðP Dð*DÐ)CðP Dð(@Ð'?ðP Dð'>Ð&=ðP Dð+FÐ*EðP Dð'>Ð&=ðP Dð$˜ðP Dð'>Ð&=ðP Dð *˜Mð!P Dð"(˜<ð#P Dð$$˜ð%P Dð&$˜ð'P Dð((˜<ð)P Dð**˜Mð+P Dð,/ð-P Dð."˜ ð/P Dð0!2Ð 1ð1P Dð2(˜<ð3P Dð4(˜<ð5P Dð6"˜ ð7P Dð8!2Ð 1ð9P Dð:/ð;P Dð<&˜+ð=P Dð>/ð?P Dð@"4Ð!3ðAP DðB*˜MðCP DðD&<Ð%;ðEP DðF*˜MðGP DðH$˜ðIP DðJ/ðKP DðL/ðMP DðN!2Ð 1ðOP DðP.˜oðQP DðR7^Ð6]ðSP DðTgðUP DðVgðWP DðX,˜^ðYP DðZ4ð[P Dð\"˜ ð]P Dð^*˜Mð_P Dð` xðaP Dðb4ðcP Dðd4ðeP Dðf,˜^ðgP Dðh&<Ð%;ðiP Dðj,˜^ðkP Dðl,˜^ðmP Dðn4ðoP Dðp$˜ðqP Dðr&˜+ðsP Dðt*˜MðuP Dðv!2Ð 1ðwP DðxEðyP Dðz$8Ð#7ð{P Dð|$˜ð}P Dð~&<Ð%;ðP Dð@*DÐ)CðAP DðB$˜ðCP DðD xðEP DðF(˜<ðGP DðH%:Ð$9ðIP DðJ&˜+ðKP DðL&<Ð%;ðMP DðN%:Ð$9ðOP DðP!2Ð 1ðQP DðR/ðSP DðT4ðUP DðV#6Ð"5ðWP DðX&˜+ðYP DðZ2TÐ1Sð[P Dð\"4Ð!3ð]P Dð^"˜ ð_P Dð`&<Ð%;ðaP DðbEðcP Dðd$˜ðeP Dðf"˜ ðgP Dðh.˜oðiP Dðj"4Ð!3ðkP Dðl"˜ ðmP Dðn*DÐ)CðoP Dðp!2Ð 1ðqP Dðr%:Ð$9ðsP Dðt%:Ð$9ðuP Dðv-JÐ,IðwP Dðx#6Ð"5ðyP Dðz*DÐ)Cð{P Dð|&˜+ð}P Dð~&<Ð%;ðP Dð@(˜<ðAP DðB(˜<ðCP DðD"˜ ðEP DðF/ðGP DðH.˜oðIP DðJ(˜<ðKP DðL&<Ð%;ðMP DðN-JÐ,IðOP DðP*DÐ)CðQP DðR&<Ð%;ðSP DðT(˜<ðUP DðV$8Ð#7ðWP DðX(@Ð'?ðYP DðZ!2Ð 1ð[P Dð\*˜Mð]P Dð^$8Ð#7ð_P Dð`/ðaP Dðb&˜+ðcP Dðd"˜ ðeP Dðf&˜+ðgP Dðh*˜MðiP Dðj%:Ð$9ðkP Dðl"4Ð!3ðmP Dðn)BÐ(AðoP Dðp-JÐ,IðqP Dðr#6Ð"5ðsP Dðt$8Ð#7ðuP Dðv"4Ð!3ðwP Dðx*˜MðyP Dðz/ð{P Dð|#6Ð"5ð}P Dð~&<Ð%;ðP Dð@-JÐ,IðAP DðB!2Ð 1ðCP DðD"4Ð!3ðEP DðF"4Ð!3ðGP DðH,˜^ðIP DðJ/ðKP DðL"˜ ðMP DðN"˜ ðOP DðP$˜ðQP DðRgðSP DðT/ðUP DðV(˜<ðWP DðX"4Ð!3ðYP DðZ(˜<ð[P Dð\$8Ð#7ð]P Dð^#6Ð"5ð_P Dð`%:Ð$9¸FðaP DðP DðP Dðb%9ˆÔ!Ø"4ˆÔÐÐr`)NNFFFrhFrDrDNNririrrjrkrlrmrnrorprqrCrrrsrrtruTNrvFrHFrvrwNTFFFFFFrxrxFFFFryrzFFNrCNNFr{FNrNrCNNTNFNNFr{rNNNNr|r}NFFr~NNNNTFTFFNNrNNFNFNFTrzNNNr{TFNr€rFNNFFNNFFFNFTNNrNNNNrƒFr„FNNNFFNrC)
Ú__name__Ú
__module__Ú __qualname__Ú__doc__r7rfrrÚ__annotations__rgÚintrÚ
__classcell__©r!s@r^rbrb3ø€ðIðIðT+0¨%ØØÐ+ñ+ô+И( 3œ-ððñð*/¨ØØÐ*ñ*ô*И #œððñð ØØØØØ$Ø&'Ø%&Ø#'Ø"&Ø&'Ø"#ØØ"%ØØØØØØØØØØØØØØØ!&ØØØØØØ27ØØØØØØØØØØØ!'ØØØØØØØØØ!"Ø%)ØØØØ $ØØ!&Ø $Ø Ø ØØØØ-1ØØ!$ØØØØØØ%)Ø Ø $Ø $Ø(-Ø"Ø%*ØØ!%ØØØØØØ!&Ø(,Ø%*Ø!%ØØ#Ø#'Ø ØØ ØØØØØ $Ø!Ø$)Ø(-ØØ Ø"Ø!&Ø(,Ø ØØØØØØØ ØØØ#Ø %Øðgqqqqqqqqqq5r`rbc óØeZdZdZddgZ d+deeeje fde
ee e fde
e
d e
eeefd
e
eeeeeffd e
eeeeefd e
ed
e
eegefde
eedee
ejje
ejjjfde
eeejjeee ffde
eej!ej!gej!fde
dde
eegeffˆfd
Z"dede de fdZ#de de de de fdZ$de de de fdZ%de de de fdZ&deeefde'de
eegefdedeeeff
dZ(d „Z)d,ˆfd"„ Z*ˆfd#„Z+d-d$eee,fd%e
e,ddfˆfd&„
Z-ˆfd'„Z. d.d(e
ede
ed)eeeedffd*„Z/ˆxZ0S)/Ú_UnslothSFTTrainerr{ÚtrlÚsftN©NNÚmodelÚargsÚ
data_collatorÚ
train_datasetÚ eval_datasetÚprocessing_classÚcompute_loss_funcÚcompute_metricsÚ callbacksÚ
optimizersÚoptimizer_cls_and_kwargsÚpreprocess_logits_for_metricsÚ peft_configrÚformatting_funccóÚ
t|t¦«r|n |jj}€.| d¦«d}t |d¦«Šnitt ¦«rTtt
¦«s?‰ ¦«}j|d<|  d¦«t d)i|¤ŽŠtj |¦«Šj E‰j } 
|¦«}|€ td|djjd¦«|_j)t|t¦«st'jd ¦«t|t¦«r |¦«}j£t.j j¦«rd‰j d
¦«rJt7jd ¬ ¦«5}| ¦«_ddd¦«n #1swxYwYg}nt=|j¦«\}Š}ng} j#pj$o
j%dk_#|jj&dv}j#rs|td¦«j$rj%dkrt'jd¦«|st'jd¦«j'dkrj$st'jd¦«tQtS|¦«¦«}j*
d|v_*n j*_*|€o‰j+p
j+pj } 
|¦«}|€ td|djjd¦«tY|j*j#|j-¬¦«}j$r!‰j%dkr|st'jd¦«j.rt_|¦«std¦«j0dupj0 1dd
¦« }|r¢‰j*rrtd ¦« 2|j$d!¦«}|i‰j3j$nj3Št|th¦«r%ˆˆˆˆˆfd"„| 5¦«D¦«}n 2|d#¦«}tmtn¦«tmtn¦«d$œ_8d%‰_9tu¦« ;||||||| |
| | ¬&¦ « j<j=rt}j?¬'¦«_@njB¦«_@t‡j?d(¦«r!‰j? DjE¦«dSdS)*Nú/rCz-SFTrézThe specified `eos_token` ('zC') is not found in the vocabulary of the given `processing_class` (zX). Ensure that the `eos_token` exists in the vocabulary before using it as an EOS token.zYou passed model_init_kwargs to the `SFTConfig`, but your model is already instantiated. The `model_init_kwargs` will be ignored.)z.jinjaz.j2zutf-8)ÚencodingFÚ embed_tokensÚlm_heada-Cloning chat template added new tokens to the tokenizer, but 'lm_head' is not in PEFT's `modules_to_save`. As a result, the model may not learn to generate outputs with these new tokens, leading to degraded generation quality. To fix this, add `modules_to_save=['lm_head']` to your PEFT configuration.r„)Úflash_attention_2z"kernels-community/vllm-flash-attn3zHPassing a custom data collator is not supported when using padding-free.Úwrappedz¯You are passing `padding_free=True` with the 'wrapped' packing strategy, which is not recommended. Please refer to the documentation to understand why this is not recommended.açPadding-free training is enabled, but the attention implementation is not set to 'flash_attention_2'. Padding-free training flattens batches into a single sequence, and 'flash_attention_2' is the only known attention mechanism that reliably supports this. Using other implementations may lead to unexpected behavior. To ensure compatibility, set `attn_implementation='flash_attention_2'` in the model configuration, or verify that your attention mechanism can handle flattened sequences.rHzÎYou are using a per_device_train_batch_size of 1 with padding-free training. Using a batch size of 1 anihilate the benefits of padding-free training. Please consider increasing the batch size to at least 2.ÚpromptzThe specified `pad_token` ('z[). Ensure that the `pad_token` exists in the vocabulary before using it as a padding token.)Ú pad_token_idrrÚreturn_position_idsra$You are using packing, but the attention implementation is not set to 'flash_attention_2' or 'kernels-community/vllm-flash-attn3'. Packing flattens batches into a single sequence, and Flash Attention is the only known attention mechanisms that reliably support this. Using other implementations may lead to cross-contamination between batches. To avoid this, either disable packing by setting `packing=False`, or set `attn_implementation='flash_attention_2'` or `attn_implementation='kernels-community/vllm-flash-attn3'` in the model configuration.z…You set `assistant_only_loss=True`, but the dataset is not conversational. This option is only supported for conversational datasets.Úskip_prepare_datasetaEA formatting function was provided while `completion_only_loss=True`, which is incompatible. Using a formatter converts the dataset to a language modeling type, conflicting with completion-only loss. To resolve this, apply your formatting function before passing the dataset, or disable `completion_only_loss` in `SFTConfig`.Útrainc óLi|] \}}| ||¦«Œ!Sr)Ú_prepare_dataset)Ú.0ÚkeyÚdatasetr0r<rr4rs €€€€€r^ú
<dictcomp>z/_UnslothSFTTrainer.__init__.<locals>.<dictcomp>wsKø€ð$ð$ð$á(˜C ð˜T×2°7Ð<LÈdÐT[Ð]lÐnqÑ$ð$ð$r`Úeval)rHrOr) r/r0r1r2r3r4r5r6r7r8r9r:)r/Úadd_model_tagsr)FÚ
isinstanceÚstrÚconfigÚ
_name_or_pathÚsplitrr Úto_dictréÚpoprÚfrom_pretrainedr
Úconvert_tokens_to_idsÚ
ValueErrorr!r"Ú eos_token_idrr5ÚwarnÚ_create_model_from_pathr r.ÚpathÚisfileÚendswithÚopenÚreadÚ
chat_templater!Útrainable_token_indicesÚextendÚmodules_to_saverSrrrÚ_attn_implementationrÚnextÚiterrrrrrr*r ÚgetrJrÚdictÚitemsr%ÚlistÚ_metricsÚ_total_train_tokensrrr0rr'r/Ú maybe_activation_offload_contextr"r9ÚhasattrrPÚ
_tag_names)rr/r0r1r2r3r4r5r6r7r8r9r:r;r<Úmodel_idÚ
model_nameÚ dict_argsr
r[Úchat_template_fileÚ added_tokensÚuse_flash_attentionÚdataset_samplerrEÚpreprocess_datasetrr!s` ` ` ` @€r^rz_UnslothSFTTrainer.__init__Àøøøøøø€õ(' u­cÑR55¸¼ Ô8RˆØ ˆ<Øš¨Ñ,¨RÔ0ˆJÝ 
Ð1ˆDˆDÝ
˜Õ
ÀDÍ)Ñ9TÔ9T🠚 œˆIØ%)¤^ˆI MŠMÐ Ð)˜yÐ)ˆDð Ð <¸ à Œ>Ð œˆIØAÀ)ÑLˆLØÐ ðI°9ðIðIØ+;Ô+EÔ+NðIðIðIñôðð
-9Ð Ô  Ô -µjÀÍÑ6LÔ6LÐ ŒMð
ô
ð
õ e ׸Ñ=ˆEà Ô Œw~Š~˜
¸4Ô;R×;[Ò;[Ð\mÑ;nÔ;nð
ݘ$Ô1¸GÐOÐHZØ5G×5LÒ5LÑ5NÔ5NÐOðOðOñOôOðOðOðOðOðOðOøøøðOðOðOðOà! å8KØÐ+¨TÔ-Dñ9ô9Ñ5Шðˆ Fð0 ð b°$´,Ð2aÀ4ÔCXÐ\aÒCaˆÔØ#œlÔD
ð
Ðð Ô ð ØÐ Ð!kÑŒ|ð
 Ô 5¸Ò BÐ BÝ
ðpñôðð
Ý
ðJñôððÔ/°1Ò4¸T¼\Ð
ðôðõ d =Ñ2ˆØ Ô ,Ø(0°NÐ(Bˆ %à(,Ô(AˆDÔ Ð ðœÐbÐ*:Ô*DÐbÐHXÔHbˆ+×AÀ)ÑLˆLØÐ ðL°9ðLðLØ+;Ô+EÔ+NðLðLðLñôðõ
)Ø%)Ô%>Ø.à$7Ø#'Ô#:ð
ñôˆMð Œ<ð ˜1°UÒ:ÐCVÐ ŒMðiñ
ô
ð
ð Ô  Õ,=¸nÑ,MÔ,Mð Ýðôð
ð 0°DÐÔ@S×@WÒ@WÐXnÐpuÑ@vÔ@vÐ<vÐØ ð ØÔ
¨_ð
Ý ðQñôðð аt´|À_ÐV]ñôˆÐ'Ø*.Ô*;Ð*C˜$œ,˜ÔIZݘl­DÑð$ð$ð$ð$ð$ð$ð$ð$à,8×,>Ò,>Ñ,@Ô,@ð$ñ$ô$L
$(×#8Ò#8Ø$Ð&6¸¸ÐX^ñ$ô$
#.­dÑ"3Ô"3½[ÍÑ=NÔ=NÐOˆŒ
Ø#$ˆÔ õ Œ×ÒØØØØ!Ø%=Ø*Gð ñ
ô
ð
ð Œ9Ô  MÝ4RÐY]ÔYcÐ4dÑ4dÔ4dˆ 1å4>Ô4JÑ4LÔ4Lˆ  4”:Ð  ŒJ× % d¤oÑ  7sÇHÈHÈHÚ
model_pathÚreturncó2|jpi}| d¦«}t|tj¦«s|dks|nCt|t
¦«rt
t|¦«}||d<ntd|d¦«tj |fi|¤Ž}|S)z0Creates a model from a path or model identifier.Ú torch_dtyperzNzˆInvalid `torch_dtype` passed to `SFTConfig`. Expected either 'auto' or a string representing a `torch.dtype` (e.g., 'float32'), but got ú.)
rrjrQr3ÚdtyperRÚgetattrrZr
rX)rr{r0rr~r/s r^r]z*_UnslothSFTTrainer._create_model_from_path£à Ô8°bÐà'×+¨MÑ:ˆ Ý k¥5¤;Ñ °;À&Ò3HÐ3HÈKÐL_Ø Ý
˜ ¥SÑ
 Ý!¥%¨Ñ5ˆKØ/:Ð ˜mÑ ðMØ>IðMðMðMñôð
õ4°ZÐUÐCTÐUˆØˆ r`cóît¦«std¦«t|dd¦«pt|dd¦«}d}t|dd¦«r?| ¦«D]*\}}|jjdkr|jjjdv}nŒ+|r/|s-|  ||¦«}tj |d¬¦«}n|j r| 
||¦«}|jtjt j¦«tjd ¦«kr&t|dd¦«r|rt%||d¬
¦«}nt%||¦«}|jr"t|dd¦«r|st)|¦«|S) z#Prepares a model for PEFT training.z9To use PeftModel, you need to install the `peft` library.Úis_loaded_in_4bitFÚis_loaded_in_8bitÚ
Params4bit>ÚcpuÚmeta)Nz0.12)Úautocast_adapter_dtype)r+Ú ImportErrorrÚnamed_parametersr!r"ÚdataÚdeviceÚtypeÚ _prepare_model_for_kbit_trainingr$ÚreplaceríÚ_enable_gradient_checkpointingr4Úparser0Ú __version__r)r¸r1)rr/r;r0Úis_qloraÚis_sharded_qloraÚparams r^Ú_prepare_peft_modelz&_UnslothSFTTrainer._prepare_peft_modelºå Ñ [ÝÐ ˜5Ð"5°uÑÈÐPcÐejÑAkÔAkˆà ÐÝ -¨uÑ  à!×
ð
”?Ô+¨|Ò;Ø',¤zÔ'8Ô'=ÀÐ'PÐ
ð EÐ EØ×9¸ÑFˆÔ& tÀEÐJˆDˆ
Ô
 EØ×¸DˆEð Ð 
/µ7´=ÀÑ3HÔ3HÒ˜EÐ#6¸ÑIàIõ' u¨kÐRWÐXå& u¨kÑ:ð Œ9ð / Ð(;¸ /ÐL\ð Ñ ˆ r`có>|j|jpidœ}t|fi|¤ŽS)z-Prepares a quantized model for kbit training.)Úuse_gradient_checkpointingrî)r2)rr/r0Úprepare_model_kwargss r^z3_UnslothSFTTrainer._prepare_model_for_kbit_trainingãs<ð+/Ô*EØ-1Ô-OÐ-UÐSUð
ð
Ðõ
/¨uÐMÐ8LÐMr`cóÒ|jpi}d|vp|d}|rOt|d¦«r| ¦«n*d}| ¦« |¦«|S)z-Enables gradient checkpointing for the model.Ú
use_reentrantÚenable_input_require_gradscó0| d¦«dS)NT)Úrequires_grad_)ÚmoduleÚinputÚoutputs r^Úmake_inputs_require_gradzS_UnslothSFTTrainer._enable_gradient_checkpointing.<locals>.make_inputs_require_gradøsØ×)¨$Ñ/r`)rqrÚget_input_embeddingsÚregister_forward_hook)rr/r0s r^rz1_UnslothSFTTrainer._enable_gradient_checkpointingìs•à(,Ô(JÐ(PÈbÐ Ð#@Ð rÐDaÐbqÔDrð ð ð ]Ý
]Ø××BÐC[ш r`rMrÚ dataset_namecó  t|t¦«r|Sn#YnxYwi}t|t¦«}t|d¦«} |Š| r|jŠt |dd¦«Šdkrt |dd¦«Šdkrt |dd¦«Šdkrt |dd¦«Šdkrt
d¦«t |dd¦«ŠdkŠd Šd
}
ttt|¦«¦« 
¦«¦«} d g} d | vr|   d ¦«dd
l m
}
m}d| vrR| r(td¦«st
d|jd¦«|
¦«|_|   d¦«d }
nZd | vr?| r(td¦«st
d|jd¦«|d ¬¦«|_d }
n| vrd
Št
d¦« |
r†‰rRtt|¦«¦«¦«}t|t"¦«st%d¦«|d}n(tt|¦«¦«d}t |dd¦«}|dkr| rt dd¦«}|d}d
Št |dd¦«}t dd¦«}|p|}|*| |¦«s||vrd Št)d¦« ˆˆˆˆˆˆˆfd} t|t*¦«st |dd¦«|d<n|jj|d<|r dd|d <|j|fd!d
i|¤Ž}| r$t|d¦«s|d ¬¦«}||_ |rt)d"¦«|S |S)#NÚ tokenizerrrÚmax_seq_lengthÚmax_seqz1Unsloth: max_seq_length is 0! Please specify one!r
rFTÚ input_idsÚattention_maskr:Úlabelsr/z Unsloth: z does not have .pad!)Úmlmz-Unsloth: You must specify a `formatting_func`zIUnsloth: The `formatting_func` should return a list of processed strings.rcr{Ú bos_tokenzHUnsloth: We found double BOS tokens - we shall remove one automatically.cóJs|n
|¦«d¬¦«S)NF)Ú
truncationrÚreturn_token_type_idsÚadd_special_tokensr)Úexampler³r
Údo_formatting_funcÚ
do_truncationr<s €€€€€€€r^Ú _tokenizez6_UnslothSFTTrainer._prepare_dataset.<locals>._tokenize\sFø€Ø yØ7IÐgÈÐ_fÑOgÔOgØ!.Ø!/Ø,1Ø);ð ñôðr`r riÚnum_procÚ
batch_sizezUnsloth: Tokenizing ["z"]ÚdescÚbatchedzPUnsloth: Hugging Face's packing is currently buggy - we're disabling it for now!)rQÚConstantLengthDatasetrrqrÚ RuntimeErrorÚsetrhriÚkeysrSÚ transformersr;rr!r1rmrZÚ
startswithÚprintrÚ _ex_iterabler¹ÚmapÚselect_columnsÚ
pack_examples)rrMr4r0rr<Ú
map_kwargsÚuse_descÚis_vlmÚ do_tokenizeÚ column_namesÚused_column_namesr;rÚ test_textrcÚ bos_token_1Ú bos_token_2r¯r1r
s ` @@@@@@r^rJz#_UnslothSFTTrainer._prepare_datasetÿøøøøøøø€ð ݘ'Õ#8Ñ IÀ'¸ Iøð Ø ˆDøøøàˆ
ݘg¥wÑÝÐ)¨;ÑØ$ˆ Ø Ð9 |°QÑØ ˜QÒ Ð µ¸Ð?OÐQRÑ1SÔ1S Ø ˜QÒ Ð µ¸Ð?OÐQRÑ1SÔ1S Ø ˜QÒ Ð µ¸¸yÈ!Ñ1LÔ1L Ø ˜ Ð ¥lÐ3fÑ&gÔ&gÐ gÝ$ TÐ+?ÀÑØ&¨!Ò+ˆ
ØØˆ õ W¡
¤
Ñ Ø(˜MÐØ ˜|Ð × $Ð%5Ñ  YÐ  ð
ag Ñ
aå"Ð#_Ð/?Ô/IÐ#_Ð#_Ð#_Ñ`Ø!7Ð!7¸ Ñ!BÔ!Bˆ Ø × $ XÑ ˆKˆ
˜
ð
ag Ñ
aå"Ð#_Ð/?Ô/IÐ#_Ð#_Ð#_Ñ`Ø!@Ð!@ÀÐRWÐ!XÑ!XÔ!Xˆ ؈Kˆ
 
3Ø!%Ð ØÐ"Ð#RÑ à ñ6 à
GØ+˜O­Dµ°g±´Ñ,?Ô,?Ñ@ Ý! )­Ýôðð& aœL å ¥ ¤Ñ/Ð0BÔCÀAÔF õ$Ð$4°oÀrÑJˆ Ò" "Ý '¨ °?ÀBÑ GÔ G
ØÐ$Ø "
ð"&Ð Ý!Ð"2°KÀÑFˆ! )¨[¸?ˆKØ2 {ˆÐ×'¨ Ñf°iÀ=Ð6PÐ6PØ).ÐÐ ð
ð
ð
ð
ð
ð
ð
ð
ð
ð
ð
ð