Files
DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/__pycache__/UnslothIterativeSFTTrainer.cpython-310.pyc
T

261 lines
27 KiB
Plaintext
Raw Normal View History

2025-08-28 17:57:59 +00:00
o
9—°h$¤ã@dZddlmZddlZddlmZddlmZddlmZm Z m
Z
m Z m Z m
Z
mZmZddlmZmZmZmZmZmZmZmZmZmZmZmZmZm
Z
mZmZmZm Z m!Z!m"Z"m#Z#m$Z$m Z m%Z%m&Z&m'Z'm(Z(m)Z)mZm*Z*m
Z
mZm Z m#Z#m'Z'm)Z)mZddl)Z)ddlTddl+m,Z,m-Z-dd l.m/Z/ddlZddl0Z1dd
l2m3Z3ddlmZdd l4mZmZ5d d
d d
d
dœZ6ej7d d e6dddƒZ8e,GdddeƒƒZ9 Gddde#ƒZ:Gddde:ƒZ;dS)z9
2025.8.9
2025.8.10
4.55.4
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)%ÚAutoModelForCausalLMÚ
AutoTokenizerÚBaseImageProcessorr Ú DataCollatorÚDataCollatorForLanguageModelingÚDataCollatorForSeq2SeqÚ
DataLoaderÚDatasetÚEvalLoopOutputÚFeatureExtractionMixinÚIterativeSFTConfigÚIterativeSFTTrainerrÚ
PPODecoratorsÚPathÚ PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚTrainerÚTrainingArgumentsrÚgenerate_model_cardÚget_comet_experiment_urlÚis_peft_availableÚis_wandb_availableÚosÚtorchÚwarningsrrrrr"r$r%)Ú*)Ú dataclassÚfield)ÚVersion)Ú nullcontext)rrTF)Úepilogue_fusionÚ max_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ fullgraphÚoptionsc
Ctj| d|jd¡ddd}tj| d¡ddd}g}t||ƒD](\}}| tj¡}tj|d| d¡d  d¡}tj
|dd}||} |  | ¡q! t  |¡}| |jd|jdf¡}|S)Néÿÿÿÿér)ÚchunksÚdim)r5Úindex)r5é)
r%ÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ unsqueezeÚsqueezeÚ logsumexpÚappendÚconcat)
Úlogitsr6Úchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚ chunk_logitsÚ chunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logps©rMúZ/workspace/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothIterativeSFTTrainer.pyÚchunked_selective_log_softmax"s  
rOcs†eZdZUdZedddidZeeed<edddidZ ee
ed <eddd
idZ ee
ed <  
                            ! ! " #     $           $      % &  '         (      #    $   ) *         +   d.‡fd,d-„ Z Z
S)/ÚUnslothIterativeSFTConfigaÅ
Configuration class for the [`IterativeSFTTrainer`].
This class includes only the parameters that are specific to Iterative SFT training. For a full list of training
arguments, please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this
class may differ from those in [`~transformers.TrainingArguments`].
Using [`~transformers.HfArgumentParser`] we can turn this class into
[argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
command line.
Parameters:
> Parameters that control the model
model_init_kwargs (`dict[str, Any]` or `None`, *optional*, defaults to `None`):
Keyword arguments for [`~transformers.AutoModelForCausalLM.from_pretrained`], used when the `model`
argument of the [`IterativeSFTTrainer`] is provided as a string.
> Parameters that control the data preprocessing
max_length (`int` or `None`, *optional*, defaults to `None`):
Maximum length of the tokenized sequence. Sequences longer than `max_length` are truncated.
truncation_mode (`str`, *optional*, defaults to `"keep_end"`):
The truncation mode to use, either `"keep_end"` or `"keep_start"`.
optimize_device_cache (`bool`, *optional*, defaults to `False`):
Whether to optimize accelerator cache for slightly more memory-efficient training.
helpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsr2z8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksz'Maximum sequence length to truncate to.Úmax_seq_lengthFÚnor3éréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?çlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsr7éôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastéÚkeep_endcˆ s|dkr td|dƒ|dkrtd|dƒ|dur(|#dkr(|$dkr(d}d }#tƒjdŽid
|d |d |d
|d|d|d|d|d| “d|
d| d| d|
d|d|d|d|d|d|d|d|d|d |d!|d"|d#|d$|d%|d&|d'|d(|d)| “d*|!“d+|"“d,|#“d-|$“d.|%“d/|&“d0|'“d1|(“d2|)“d3|*“d4|+“d5|,“d6|-“d7|.“d8|/“d9|0“d:|1“d;|2“d<|3“d=|4“d>|5“d?|6“d@|7“dA|8“dB|9“dC|:“dD|;“dE|<“dF|=“dG|>“dH|?“dI|@“dJ|A“dK|B“dL|C“dM|D“dN|E“dO|F“dP|G“dQ|H“dR|I“dS|J“dT|K“dU|L“dV|M“dW|N“dX|O“dY|P“dZ|Q“d[|R“d\|S“d]|T“d^|U“d_|V“d`|W“da|X“db|Y“dc|Z“dd|[“de|\“df|]“dg|^“dh|_“di|`“dj|a“dk|b“dl|c“dm|d“dn|e“do|f“dp|g“dq|h“dr|i“ds|j“dt|k“du|l“dv|m“dw|n“dx|o“dy|p“dz|q“d{|r“d||s“d}|t“d~|u“d|v“d€|w“d|x“d|y“dƒ|z“d„|{“d…||“d†|}“d‡|~“dˆ|d‰|€“dŠ|d‹|‚“dŒ|ƒ“d|„“|ˆ¤Ž|…|_|†|_|‡|_dS)NgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!r7za` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rerfÚunsloth_training_checkpointsrWÚ
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚ weight_decayÚ
adam_beta1Ú
adam_beta2Ú adam_epsilonÚ
max_grad_normÚnum_train_epochsÚ max_stepsÚlr_scheduler_typeÚ warmup_ratioÚ warmup_stepsÚ log_levelÚlog_level_replicaÚlog_on_each_nodeÚ logging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚ ddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚ disable_tqdmÚremove_unused_columnsÚ label_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚ fsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ adafactorÚgroup_by_lengthÚlength_column_nameÚ report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚ push_to_hubÚresume_from_checkpointÚ hub_model_idÚ hub_strategyÚ hub_tokenÚhub_private_repoÚhub_always_pushÚ hub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚ fp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚ torchdynamoÚ ray_scopeÚ ddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚmodel_init_kwargsÚ
max_lengthÚtruncation_modeÚoptimize_device_cacherM)ÚFloatingPointErrorÚ
OverflowErrorÚsuperÚ__init__rTrUrV)‰Úselfrsrtrurvrwrxryrzr{r|r}r~rr€rrr„r…r†r‡r‰rrrrrrr“r”r•r–r—r™rrr r­r¿rTrUrVÚkwargs©Ú __class__rMrN^s:  ÿþýüûúùø ÷
ö õ ô
óòñðïîíìëêéèçæåäãâá à!ß"Þ#Ý$Ü%Û&Ú'Ù(Ø)×*Ö+Õ,Ô-Ó.Ò/Ñ0Ð1Ï2Î3Í4Ì5Ë6Ê7É8È9Ç:Æ;Å<Ä=Ã>Â?Á@ÀA¿B¾C½D¼E»FºG¹H¸I·JKµL´M³N²O±P°Q¯R®S­T¬U«VªW©X¨Y§Z¦[¥\¤]£^¢_¡` aŸbžcdœefšgh˜ijklmnopqrŽstŒuvŠwxˆyz{|}ƒ~ÿþýüû
z"UnslothIterativeSFTConfig.__init__)‡NNFFFrWFr3r3NNrXrXrrYrZr[r\r]r^r_r`r2rarbrrcrdTNreFr7FrerfNTFFFFFFrgrgFFFFrhriFFNr2NNFrjFNrNr2NNTNFNNFrjrNNNNrkrlNFFrmNNNNTFTFFNNrnNNFNFNFTriNNNrjTFNrorpFNNFFNNFFFNFTNNrqFNr2N)Ú__name__Ú
__module__Ú __qualname__Ú__doc__r)rTrrÚ__annotations__rUÚintrVÚ
__classcell__rMrMrNrP3s.
þþþ÷rPceZdZdZddgZ       d(deeefdeee e
fdee d eee e
ee ffd
eeeeeefd eejjejjjfd eeejejgejfd
eeege
fffdd
Zdede defddZdejdejdejfddZedeej deej deej deedeef
ddƒZ!e"      d)deeej deeej deeej deeedeeef
ddƒZ$dd „Z%‡fd!d"„Z&   d*d#eed$eed%eeeedffd&d'„Z'‡Z(S)+Ú_UnslothIterativeSFTTrainerrjÚtrlz
iterative-sftN©NNÚmodelÚargsÚ
data_collatorÚ eval_datasetÚprocessing_classÚ
optimizersÚpreprocess_logits_for_metricsÚcompute_metricsc
t|tƒr|n|jj} |dur|  d¡d}
t|
dƒ}nt|tƒr=t|tƒs=| ¡} |j| d<|   d¡tdi| ¤Ž}|durFt
  | ¡}|j durUt|tƒsUt
 d¡t|tƒr`| ||¡}tƒrlt|tƒrld|_nd|_||_t|jd dƒ|_|dur“|jrŠt|d
d d |_n t|jdd
|_n||_|j|_|j|_|j|_tƒj|||j|||||dt|jdƒr¾|j  |j!¡| "|j#j$¡|j% &|j|j'|j(¡\|_|_'|_(|jdkrÝdnd|j_)t|dƒsêt*dƒ|jt+_dS)/r2z
-IterativeSFTrÔzŠYou passed model_init_kwargs to the `IterativeSFTConfig`, but your model is already instantiated. The `model_init_kwargs` will be ignored.TFÚis_encoder_decoderéœÿÿÿé)Úlabel_pad_token_idÚpad_to_multiple_of)Úmlm)r r
r r r
rrrÚadd_model_tagsrqÚleftÚrightÚ acceleratorzXYour `Trainer` does not have an `accelerator` object. Consider upgrading `transformers`.rM),Ú
isinstanceÚstrÚconfigÚ
_name_or_pathÚsplitrrÚto_dictrÔÚpopr
Úfrom_pretrainedrór&ÚwarnÚ_create_model_from_pathr"rÚ
is_peft_modelr
Úgetattrrrr rÚhasattrr rÚ
_tag_namesÚcreate_optimizer_and_schedulerr
r‰rÚprepareÚ optimizerÚ lr_schedulerÚtruncation_sideÚAttributeErrorr) r r
r r r
rrrÚmodel_idÚ
model_nameÚ dict_argsrýrMrN~sl


ÿ
 
ÿø  ÿ
ÿ z$_UnslothIterativeSFTTrainer.__init__Ú
model_pathÚreturncCs|jpi}tj|fi|¤ŽS)z0Creates a model from a path or model identifier.)r r#)r3r
rMrMrNr%Üs
z3_UnslothIterativeSFTTrainer._create_model_from_pathÚ input_idsÚattention_maskÚlabelsc|dur dd|Dƒ}ˆjr4ˆ ddt|||ƒDƒ¡ ˆjj¡}| dd¡d|d|dˆjjk<nˆ ddt||ƒDƒ¡ ˆjj¡}ˆj durwˆj
dkr]‡fd d
|  ¡Dƒ}|Sˆj
d kro‡fd d
|  ¡Dƒ}|St d
ˆj
ƒ|S)NcSsg|]}t |¡qSrM)r%Ú ones_like)Ú.0ÚidsrMrMrNÚ
<listcomp>ãózD_UnslothIterativeSFTTrainer.prepare_model_inputs.<locals>.<listcomp>cSsg|] \}}}|||dœqS)©r5r6r7rM)r9r:ÚattÚlabrMrMrNr;çs
ÿÿÚdecoder_input_idsrr7cSsg|] \}}||dœqS))r5r6rM)r9r:r>rMrMrNr;ósÚ
keep_startcs i|] \}}||dˆjqS©©r9ÚrMrNÚ
<dictcomp>ùs zD_UnslothIterativeSFTTrainer.prepare_model_inputs.<locals>.<dictcomp>rqcs"i|]
\}}||ˆj dqSrBrCrDrGrMrNrHûs"zUnknown truncation mode: )
rr r;r<r Údevicer"r
Ú pad_token_idrôÚitemsÚ
ValueError)r5r6r7Ú
input_datarMrGrNÚprepare_model_inputsás4
þÿ
û ÿ
þ


ûþz0_UnslothIterativeSFTTrainer.prepare_model_inputsÚtextsÚ texts_labelscCs||durs|dur=tddg||gƒD]*\}}t|tƒs%t|dt|ƒƒt|dtjƒs;td|dt|dƒƒqnztgd¢|||gƒD]*\}}t|tƒs[t|dt|ƒƒt|dtjƒsqtd|dt|dƒƒqGnDt|tƒstd t|ƒƒt|dtƒs“td
t|dƒƒ|dur·t|tƒs¥td t|ƒƒt|dtƒs·td t|dƒƒ|||||fS)

Check if the input data is valid for training.
Args:
input_ids (list[`torch.LongTensor`]):
List of tensors containing the input_ids
attention_mask (list[`torch.LongTensor`]):
List of tensors containing the attention_mask
labels (list[`torch.FloatTensor`]):
List of tensors containing the labels
texts (list[`str`]):
List of string containing the text input.
texts_labels (list[`str`]):
List of string containing the text labels.
Returns:
`tuple`: The input data.
Nr5r7z! must be a list of tensors - got rz Elements in z must be tensors - got r=z''text' must be a list of strings - got z)Elements in 'text' must be strings - got z.'text_labels' must be a list of strings - got z0Elements in 'text_labels' must be strings - got )r;rÚlistrLÚtyper%rr)r5r6r7rOrPÚnameÚ tensor_listrMrMrNÚ_step_safety_checkers8
ÿý ÿ
ÿû

z0_UnslothIterativeSFTTrainer._step_safety_checkerc ˆj ¡ˆjjdkrt d¡ ˆjj¡ˆ_ ˆjjˆ_
|dur'|dur't dƒ|dur5|dur5t  
dt¡|durD|durDˆjrDt dƒ|durN|ddnd}|durZ|ddnd}|durf|ddnd}|durr|ddnd}|dur~|ddnd}ˆ |||||¡\}}}}}|dur¦ˆj|ˆjdddd }|d
|d }}|dur·ˆj|ˆjdddd d
}|dur½|}ˆ |||¡}t| ¡ƒ}i}| |¡fd d
} t |¡}
|
 d¡t|
ˆjjd| d} t| ƒD]\} ˆj ˆj¡mfdd|Dƒ}ˆ ˆj|¡}
ˆjj dkr|
 }
|
 }ˆj #|
¡ˆjj$r8ˆjj%dur8ˆj &ˆj ˆjj%¡ˆj( ˆj( ˆj+durMˆj+ ˆjjd7_ˆj |7_ ˆ Wdƒn 1skwYqïdS)
Run an optimisation step given a list of input_ids, attention_mask, and labels or a list of text and
text_labels.
Args:
input_ids (list[`torch.LongTensor`]):
List of tensors containing the input_ids (if not provided, text will be used)
attention_mask (list[`torch.LongTensor`], , *optional*):
List of tensors containing the attention_mask
labels (list[`torch.FloatTensor`], *optional*):
List of tensors containing the labels (if set to None, will default to input_ids)
texts (list[`str`], *optional*):
List of strings containing the text input (if not provided, input_ids will directly be used)
texts_labels (list[`str`], *optional*):
List of strings containing the text labels (if set to None, will default to text)
Returns:
`dict[str, Any]`: A summary of the training statistics
rrkNz@Step should include `input_ids` or `texts` as keyword arguments.ztBoth `input_ids` and `texts` argument are provided. `input_ids` will be ignored. Please provide only one of the two.z€No 'labels' or 'text_labels' are provided. When using an encoder-decoder architecture, 'labels' or 'text_labels' must be passed.TÚpt)Ú
truncationÚpaddingÚreturn_tensorsr5r6csFtƒ}|dD]ˆdvr t fdd|Dƒ¡ ˆjj¡|ˆ<q|S)Nrr=csg|]}|ˆqSrMrM)r9ÚÚkeyrMrNr;ŒszF_UnslothIterativeSFTTrainer.step.<locals>.collator.<locals>.<listcomp>)Údictr%Ústackr<r rI)ÚdataÚ return_dictrGr[rNÚcollatorˆs  &z2_UnslothIterativeSFTTrainer.step.<locals>.collatorr%)Ú
batch_sizeÚshuffleÚ
collate_fncsi|]}|ˆ|qSrMrM)r9rE)ÚbatchrMrNrHr<z4_UnslothIterativeSFTTrainer.step.<locals>.<dictcomp>r7)-r ÚtrainÚstateÚ global_stepr%Útensorr<r
rIÚtr_lossÚ_globalstep_last_loggedrLr&r$Ú UserWarningrrUr
rNrQÚkeysÚupdaterÚ from_dictÚ
set_formatrrzÚ enumeraterÚ
accumulateÚ compute_lossÚn_gpuÚmeanÚdetachÚbackwardÚsync_gradientsr‡Úclip_grad_norm_Ú
parametersr,ÚstepÚ zero_gradr-Ú_maybe_log_save_evaluate)r5r6r7rOrPÚ model_inputsÚmodel_inputs_namesÚ
batch_dictraÚ
batch_dataÚstep_dataloaderÚlossÚ tr_loss_steprM)rerNr{7s
 
ýÿ
ÿ ÿ ÿþ 
 

ü þ

 

åÿz _UnslothIterativeSFTTrainer.stepcC|jjdur|jj|jjdkr|jjdkr| |j¡|jjdurf|jj|jjdkrh|jjdkrji}| |j¡  ¡ 
¡}|j|j8_t ||jj|j dƒ|d<| 
¡|d<|jj|_ | |¡dSdSdSdS)Nrr3r„r)r
rgrhÚevaluater r“Ú_nested_gatherrjruÚitemÚroundrkÚ_get_learning_rateÚlog)ÚlogsÚtr_loss_scalarrMrMrNr}·s      
òz4_UnslothIterativeSFTTrainer._maybe_log_save_evaluatecsL|jjdurt|jjƒj}n |jj d¡d}|j|dtƒ ||¡dS)Nrr2)r1) r
rrsrSr Úcreate_model_cardrùÚ_save_checkpoint)r Útrialr1rMrNrÏs
 z,_UnslothIterativeSFTTrainer._save_checkpointr1Ú dataset_nameÚtagsc
C| ¡sdSt|jjdƒrtj |jjj¡s|jjj}nd}|dur&tƒ}n
t |t
ƒr/|h}nt|ƒ}t|jjdƒr?|  d¡|  |j
¡t|||j||tƒrXtjdurXtjjndtƒdd}| tj |jjd¡¡dS)
Creates a draft of a model card using the information available to the `Trainer`.
Args:
model_name (`str` or `None`, *optional*, defaults to `None`):
Name of the model.
dataset_name (`str` or `None`, *optional*, defaults to `None`):
Name of the dataset used for training.
tags (`str`, `list[str]` or `None`, *optional*, defaults to `None`):
Tags to be associated with the model card.
NrÚunsloth_versionÚunslothz
Iterative SFT)Ú
base_modelr1rrÚ wandb_urlÚ comet_urlÚ trainer_namez README.md)Úis_world_process_zeror(r rr$ÚpathÚisdirrÚsetrrÚaddrnr)r r#ÚwandbÚrunÚurlr!ÚsaveÚjoinr
rs)r1rrr•Ú
model_cardrMrMrN×s0