Files
DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/__pycache__/UnslothAlignPropTrainer.cpython-311.pyc
T

272 lines
34 KiB
Plaintext
Raw Normal View History

2025-08-13 23:50:20 +00:00
§
2$hnãóÌdZddlmZddlZddlmZddlmZddlmZm Z m
Z
m Z m Z m
Z
mZmZddlmZmZmZmZmZmZm
Z
mZmZmZm Z mZmZmZmZmZmZmZmZmZm Z ddlZddlTddl!m"Z"m#Z#dd l$m%Z%ddlZddl&Z'dd
l(m)Z)ddlmZdd l*m+Z+m,Z-d d
d d
d
dœZ.ej/d d e.¬¦«d¦«Z0e"Gdde¦«¦«Z1 Gdde¦«Z2Gdde2¦«Z3dS)z8
2025.8.4
2025.8.5
4.55.1
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)Ú AcceleratorÚAlignPropConfigÚAlignPropTrainerrr ÚDDPOStableDiffusionPipelinerÚPathÚProjectConfigurationÚPyTorchModelHubMixinr Ú defaultdictÚgenerate_model_cardÚget_comet_experiment_urlÚis_wandb_availableÚloggerÚosÚset_seedÚtextwrapÚtorchÚwarnings)Ú*)Ú dataclassÚfield)ÚVersion)Ú nullcontext)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚ max_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ fullgraphÚoptionscó’tj| d|jd¦«dd¬¦«}tj| d¦«dd¬¦«}g}t ||¦«D]\}}| tj¦«}tj|d| d¦«¬¦«  d¦«}tj
|d¬¦«}||z
} |  | ¦«Œ’ tj |¦«}| |jd|jdf¦«}|S)Néÿÿÿÿér)ÚchunksÚdim)r/Úindex)r/é)
rÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ unsqueezeÚsqueezeÚ logsumexpÚappendÚconcat)
Úlogitsr0Úchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚ chunk_logitsÚ chunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logpss
úc/workspace/Fine-tuning/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothAlignPropTrainer.pyÚchunked_selective_log_softmaxrH"s5õ”[ §¢°°F´LÀÔ4DÑ!EÔ!EÐPQÐYZÐ[€NÝ”[ §¢¨rÑ!2Ô!2¸QÀaÐH€MØÐå%(¨¸Ñ%GÔ%Gð #—¥u¤}Ñ Ýœ, |¸2À{×G\ÒG\Ð]_ÑG`ÔG`Ða×iÐjlÑmˆÝ œ?¨<¸Ø)Ð,<Ñ<ˆØ×" Ýœ,Ð':ÑØ-×5°v´|ÀA´ÈÌ ÐUVÌÐ6XÑØ ÐócóÌeZdZUdZedddi¬¦«Zeeed<edddi¬¦«Z ee
ed < d ˆfd„ Z ˆxZ S)!ÚUnslothAlignPropConfiga´
Configuration class for the [`AlignPropTrainer`].
Using [`~transformers.HfArgumentParser`] we can turn this class into
[argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
command line.
Parameters:
exp_name (`str`, *optional*, defaults to `os.path.basename(sys.argv[0])[: -len(".py")]`):
Name of this experiment (defaults to the file name without the extension).
run_name (`str`, *optional*, defaults to `""`):
Name of this run.
seed (`int`, *optional*, defaults to `0`):
Random seed for reproducibility.
log_with (`str` or `None`, *optional*, defaults to `None`):
Log with either `"wandb"` or `"tensorboard"`. Check
[tracking](https://huggingface.co/docs/accelerate/usage_guides/tracking) for more details.
log_image_freq (`int`, *optional*, defaults to `1`):
Frequency for logging images.
tracker_kwargs (`dict[str, Any]`, *optional*, defaults to `{}`):
Keyword arguments for the tracker (e.g., `wandb_project`).
accelerator_kwargs (`dict[str, Any]`, *optional*, defaults to `{}`):
Keyword arguments for the accelerator.
project_kwargs (`dict[str, Any]`, *optional*, defaults to `{}`):
Keyword arguments for the accelerator project config (e.g., `logging_dir`).
tracker_project_name (`str`, *optional*, defaults to `"trl"`):
Name of project to use for tracking.
logdir (`str`, *optional*, defaults to `"logs"`):
Top-level logging directory for checkpoint saving.
num_epochs (`int`, *optional*, defaults to `100`):
Number of epochs to train.
save_freq (`int`, *optional*, defaults to `1`):
Number of epochs between saving model checkpoints.
num_checkpoint_limit (`int`, *optional*, defaults to `5`):
Number of checkpoints to keep before overwriting old ones.
mixed_precision (`str`, *optional*, defaults to `"fp16"`):
Mixed precision training.
allow_tf32 (`bool`, *optional*, defaults to `True`):
Allow `tf32` on Ampere GPUs.
resume_from (`str`, *optional*, defaults to `""`):
Path to resume training from a checkpoint.
sample_num_steps (`int`, *optional*, defaults to `50`):
Number of sampler inference steps.
sample_eta (`float`, *optional*, defaults to `1.0`):
Eta parameter for the DDIM sampler.
sample_guidance_scale (`float`, *optional*, defaults to `5.0`):
Classifier-free guidance weight.
train_batch_size (`int`, *optional*, defaults to `1`):
Batch size for training.
train_use_8bit_adam (`bool`, *optional*, defaults to `False`):
Whether to use the 8bit Adam optimizer from `bitsandbytes`.
train_learning_rate (`float`, *optional*, defaults to `1e-3`):
Learning rate.
train_adam_beta1 (`float`, *optional*, defaults to `0.9`):
Beta1 for Adam optimizer.
train_adam_beta2 (`float`, *optional*, defaults to `0.999`):
Beta2 for Adam optimizer.
train_adam_weight_decay (`float`, *optional*, defaults to `1e-4`):
Weight decay for Adam optimizer.
train_adam_epsilon (`float`, *optional*, defaults to `1e-8`):
Epsilon value for Adam optimizer.
train_gradient_accumulation_steps (`int`, *optional*, defaults to `1`):
Number of gradient accumulation steps.
train_max_grad_norm (`float`, *optional*, defaults to `1.0`):
Maximum gradient norm for gradient clipping.
negative_prompts (`str` or `None`, *optional*, defaults to `None`):
Comma-separated list of prompts to use as negative examples.
truncated_backprop_rand (`bool`, *optional*, defaults to `True`):
If `True`, randomized truncation to different diffusion timesteps is used.
truncated_backprop_timestep (`int`, *optional*, defaults to `49`):
Absolute timestep to which the gradients are backpropagated. Used only if `truncated_backprop_rand=False`.
truncated_rand_backprop_minmax (`tuple[int, int]`, *optional*, defaults to `(0, 50)`):
Range of diffusion timesteps for randomized truncated backpropagation.
push_to_hub (`bool`, *optional*, defaults to `False`):
Whether to push the final model to the Hub.
helpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsr,z8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksÚ inferenceÚéO
r1ÚtrlÚlogsédéÚfp16Té2çð?ç@Fç-Cëâ6
?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç{®Gáz„?ç:Œ0âŽyE>éé1c  ót¦«jdid|d|d|d|d|d|d|d|d | “d
|
d | d | d
|
d|d|d|d|d|d|d|d|d|d|d|d|d|d|d|d|| ¤Ž||_||_dS)exp_nameÚrun_nameÚseedÚlog_withÚlog_image_freqÚtracker_project_nameÚlogdirÚ
num_epochsÚ save_freqÚnum_checkpoint_limitÚmixed_precisionÚ
allow_tf32Ú resume_fromÚsample_num_stepsÚ
sample_etaÚsample_guidance_scaleÚtrain_batch_sizeÚtrain_use_8bit_adamÚtrain_learning_rateÚtrain_adam_beta1Útrain_adam_beta2Útrain_adam_weight_decayÚtrain_adam_epsilonÚ!train_gradient_accumulation_stepsÚtrain_max_grad_normÚnegative_promptsÚtruncated_backprop_randÚtruncated_backprop_timestepÚ push_to_hub©)ÚsuperÚ__init__rOrP)"Úselfrdrerfrgrhrirjrkrlrmrnrorprqrrrsrtrurvrwrxryrzr{r|r}r~rr€rOrPÚkwargsÚ __class__s" €rGzUnslothAlignPropConfig.__init__ø€ðH ŒÔð 
,˜  $8Ð#7ð
$˜ð"˜ ð$8Ð#7ð.˜$˜ð&˜$˜ð%:Ð$9ð! 0ð"/ð# 0ð$#6Ð"5ð% 0ð&#6Ð"5ð' 0ð(/ð) 0ð*/ð+ 0ð,'>Ð&=ð- 0ð."4Ð!3ð/ 0ð01RÐ0Qð1 0ð2#6Ð"5ð3 0ð4/ð5 0ð6'>Ð&=ð7 0ð8+FÐ*Eð9 0ð:&˜ð; 0ð<%9ˆÔ!Ø"4ˆÔÐÐrI)rQrRrSNr1rTrUrVr1rWrXTrRrYrZr[r1Fr\r]r^r_r`rarZNTrbFNr,)
Ú__name__Ú
__module__Ú __qualname__Ú__doc__r rOrrÚ__annotations__rPÚintrƒÚ
__classcell__©r†s@rGrKrK3s*ø€ðMðMð\+0¨%ØØÐ+ñ+ô+И( 3œ-ððñð*/¨ØØÐ*ñ*ô*И #œððñð ØØØØØØØØ Ø ØØØØØ #ØØØ Ø"&Ø"Ø,-ØØ"&Ø&(ØØðACCCCCCCCCC5rIrKcóžeZdZdZddgZ d!dedeeje e
e e gejfdege e
e ffde d e
ee e e ge ff
d
Zd Zd ed
efdZdZdejdedejfdZdZdZdZd"dZd!de
efdZdZˆfdZ d#de
e
de
e
dee
ee
dffd „ZˆxZS)$Ú_UnslothAlignPropTrainerrRrTÚ alignpropNÚconfigÚreward_functionÚprompt_functionÚ sd_pipelineÚimage_samples_hookc ó‚ tjdt¦«|tjd¦«||_||_||_||_tdi|jj¤Ž}|jj rJtj   tj  
|jj ¦«¦«|j_ dtj  |jj ¦«vrÏtt!dtj|jj ¦«¦«¦«}t%|¦«dkrt'd|jj ¦«t)d|D¦«¦«}tj  |jj d|d¦«|j_ |dd z|_t/d|jj|jj||jjd
œ|jj¤Ž|_|jduo
|jd k} |jjrg|j |jj| s"tA| !¦«¬ ¦«n| !¦«|jj"¬
¦«tGj$d|¦«tK|jj&d¬¦«||_'|j' (d |jj) ddd¬¦«|jjdkr
tTj+}
n)|jjdkr
tTj,}
n tTj-}
|j'j. /|jj0|
¬¦«|j'j1 /|jj0|
¬¦«|j'j2 /|jj0|
¬¦«|j' 3¦«} |j 4|j5¦«|j 6|j7¦«|jj8rdtTj9j:j;_8| <t{| t¦«s|  >¦«n| ¦«|_?|j' 1|j' @|jjAdgn |jjAddd|j'j@jB¬¦«jC /|jj0¦«¦«d|_D|j'jEp |jjE|_Et|j'd¦«rj|j'jGr^|j H| |j?¦«\} |_?tt!d|  >¦«¦«¦«|_In-|j H| |j?¦«\|_I|_?|j rrtGj$d|j ¦«|j J|j ¦«t—|j  Ld¦«d¦«d z|_MdSd|_MdS) NzEAlignPropTrainer is deprecated and will be removed in version 0.23.0.z8No image_samples_hook provided; no images will be loggedÚ checkpoint_có
d|vS)Nr˜r)Úxs rGú<lambda>z3_UnslothAlignPropTrainer.__init__.<locals>.<lambda>òs  -°1Ð"4€rIrzNo checkpoints found in có^g|]*}t| d¦«d¦«Œ+S)Ú_r,)Úsplit)Ú.0ršs rGú
<listcomp>z5_UnslothAlignPropTrainer.__init__.<locals>.<listcomp>øs/Ð,XÐ,XÐ,XÀq­S°·²¸±´¸bÔ1AÑ-BÔ-BÐ,XÐ,XÐ,XrIr,r1)rgrnÚproject_configÚgradient_accumulation_stepsÚ tensorboard)Úalignprop_trainer_config)rÚ init_kwargsÚ
T)Údevice_specificFÚTimestep)ÚpositionÚdisableÚleaveÚdescÚ
dynamic_ncolsrXÚbf16)ÚdtyperRÚptÚ
max_length©Úreturn_tensorsÚpaddingÚ
truncationr±Úuse_loracó|jS©N)Ú
requires_grad)Úps rGrz3_UnslothAlignPropTrainer.__init__.<locals>.<lambda>Os¸!¼/€rIzResuming from rr)NrÚwarnÚDeprecationWarningÚ prompt_fnÚ reward_fnrÚimage_samples_callbackrÚproject_kwargsrprÚpathÚnormpathÚ
expanduserÚbasenameÚlistÚfilterÚlistdirÚlenÚ
ValueErrorÚsortedÚjoinÚ iterationr
rgrnr{Úaccelerator_kwargsÚ acceleratorÚis_main_processÚ
init_trackersriÚdictÚto_dictÚtracker_kwargsrÚinforrfr•Úset_progress_bar_configÚis_local_main_processrÚfloat16Úbfloat16r7Úvaer6ÚdeviceÚ text_encoderÚunetÚget_trainable_layersÚregister_save_state_pre_hookÚ_save_model_hookÚregister_load_state_pre_hookÚ_load_model_hookroÚbackendsÚcudaÚmatmulÚ_setup_optimizerÚ
isinstanceÚ
parametersÚ optimizerÚ tokenizerr}Úmodel_max_lengthÚ input_idsÚneg_prompt_embedÚautocastÚhasattrr¶ÚprepareÚtrainable_layersÚ
load_staterŒÚ first_epoch)
r„rr“r”r•rÚaccelerator_project_configÚ checkpointsÚcheckpoint_numbersÚis_using_tensorboardÚinference_dtyperðs
rGz!_UnslothAlignPropTrainer.__init__Öõ Œ
Ø ñ
ô
ð
ð Ð ŒMÐ ŒØŒØˆŒ Ø&8ˆÔ#å%9Ð%WÐ%W¸D¼KÔ<VÐ%WÐ%WÐ Œ;Ô  RÝ&(¤g×&6Ò&6µr´w×7IÒ7IÈ$Ì+ÔJaÑ7bÔ7bÑ&cÔ&cˆDŒKÔ ¥B¤G×$4Ò$4°T´[Ô5LÑ$MÔ$MÐØœ
 4¤;Ô#:Ñôñô õ # qÒ$Ð%YÀÄ Ô@WÐ%YÐ%YÑZÝ%+Ð,XÐ,XÈKÐ,XÑ,XÔ,XÑ%YÔ%YÐ"Ý*,¬'¯,ª,Ø”KÔ:Ð"4°RÔ"8Ð+ô+ Ô
8JÈ"Ô7MÐPQÑ7QÐ 
Ø”[Ô œKÔ)-¬ Ô(Uð 
ð 
ðŒkÔ 
ð 
ˆÔð°dÐ_¸v¼ÐR_Ò?_Ðà Ô Ô  Ø Ô ×  Ô&•t°V·^²^Ñ5EÔ5EЗ^’^Ñ œKÔ

ô
ð
õ Œ M˜MÔ!°4ÐÔà Ô×ØÔØØð 
ô
ð
ð Ô Ô +¨vÒ #œmˆOˆOØ
Ô
Ô
Ò
#œnˆOˆOå#œmˆOà ÔÔ×Ò Ô 0Ô 7¸ÐÑ ÔÔÔ)9Ô)@ÈÐ ÔÔ× Ò  Ô!1Ô!8ÀÐ ÑÔà Ô×5°dÔ6KÑ Ô×5°dÔ6KÑ Œ;Ô  9Ø48EŒNÔ Ô ×.Ý1;Ð<LÍdÑ1SÔ1SÐ × )ÐYiñ
ô
ˆŒð!%Ô 0× =Ò =Ø Ô × œ Ô<À$Ä+ÔB^ØØÔ

ô
ô Ÿš˜4Ô!
ô!
ð ô!
ˆÔðÔN°TÔ5EÔ5NˆŒ
å #   o°TÔ5EÔ5Nð oØ#'Ô#3×#;Ò#;Ð<LÈdÌnÑ#]Ô#]Ñ ˆD$”.Ý$(­Ð0IÐ0IÈ4Ï?Ê?ÑK\ÔK\Ñ)]Ô)]Ñ$^Ô$^ˆ !à48Ô4D×4LÒ4LÐM]Ð_cÔ_mÑ4nÔ4nÑ 1ˆ ! 4¤>à Ô ð ŒKÐÔ);Ð Ô × Ô(:Ñ " 6Ô#5×#;Ò#;¸CÑ#@Ô#@ÀÔ#DÑÑIˆDÔ Ð Ð à ˆDÔ Ð Ð rIcó^| |d|d|d¦«\}}|S)imagesÚpromptsÚprompt_metadata))r„Úprompt_image_pairsÚrewardÚreward_metadatas rGÚcompute_rewardsz(_UnslothAlignPropTrainer.compute_rewardsZs:Ø"&§.¢.Ø ˜ (Ð*<¸YÔ*GÐI[Ð\mÔInñ#
ô#
шðˆ
rIÚepochÚ global_stepc
óÄtt¦«}|jj ¦«t |jj¦«D]q}|j  |jj¦«5| 
¦«5tj ¦«5| 
|jj¬¦«}| |¦«}||d<|j |¦« ¦« ¦« ¦«}| |¦«}|j |¦«|jjr]|j t1|jt¦«s|j ¦«n|j|jj¦«|j ¦«|j ¦«ddd¦«n #1swxYwYddd¦«n #1swxYwYddd¦«n #1swxYwY|d |  ¦«¦«|d | !¦«¦«|d | "¦«¦«Œs|jjr‡d| #¦«D¦«}|j $|d¬ ¦«}| %d
|i¦«|j &||¬ ¦«|d z
}tt¦«}ntOd
¦«|j(:||jj)zdkr'| (|||jj*d¦«|dkr8||jj+zdkr%|jj,r|j -¦«|S)a
Perform a single step of training.
Args:
epoch (int): The current epoch.
global_step (int): The current global step.
Side Effects:
- Model weights are updated
- Logs the statistics to the accelerator trackers.
- If `self.image_samples_callback` is not None, it will be called with the prompt_image_pairs, global_step,
and the accelerator tracker.
Returns:
global_step (int): The updated global step.
)Ú
batch_sizeÚrewardsNÚ reward_meanÚ
reward_stdÚlosscóbi|],\}}|tjtj|¦«¦«Œ-Sr)rÚmeanÚtensor)Úvs rGú
<dictcomp>z1_UnslothAlignPropTrainer.step.<locals>.<dictcomp>—s0ÐL±t°q¸!A•u”z¥%¤,¨q¡/¤/ÑLrIr )Ú reductionr)Ústepr1zsOptimization step should have been performed by this point. Please check calculated gradient accumulation settings.r).rr•ÚtrainÚrangerr{Ú
accumulaterírÚ enable_gradÚ_generate_samplesrtrÿr8ÚdetachÚcpuÚnumpyÚcalculate_lossÚbackwardÚsync_gradientsÚclip_grad_norm_rær|rÚ zero_gradr<r ÚstdÚitemÚitemsÚreduceÚupdateÚlogrÉr¿rhÚtrackersrlÚ
save_state) r„rrrrÚ rewards_visrs rGrz_UnslothAlignPropTrainer.step`sNõ" Ô ˆà ÔÔ×t”{Ô -ˆÔ!×,¨TÔ-=Ô-BÑ

+ÀTÇ]Â]Á_Ä_ð

+ÕV[ÔVgÑViÔVið

+Ø%)×%;Ò%;Ø#œ{Ô&<ñ&ô&Ð×.Ð/AÑBà07Ð" .×5°gÑ>×G×U à×*¨7Ñ3àÔ ×)¨$ÑÔØÔ)¨$Ô*?ÅÑÔœ Ôôðð××*ð1










+øøøð














+øøøð














+øøøð



+ð4