Files
DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/__pycache__/UnslothAlignPropTrainer.cpython-310.pyc
T

264 lines
22 KiB
Plaintext
Raw Normal View History

2025-08-28 17:57:59 +00:00
o
2025-08-28 22:41:56 +00:00
õ×°hOoã@s´dZddlmZddlZddlmZddlmZddlmZm Z m
2025-08-28 17:57:59 +00:00
Z
m Z m Z m
Z
mZmZddlmZmZmZmZmZmZm
Z
mZmZmZm Z mZmZmZmZmZmZmZmZmZm Z ddlZddlTddl!m"Z"m#Z#dd l$m%Z%ddlZddl&Z'dd
l(m)Z)ddlmZdd l*m+Z+m,Z-d d
d d
d
dœZ.ej/d d e.dddƒZ0e"GdddeƒƒZ1 GdddeƒZ2Gddde2ƒZ3 e4edƒrØddl5Z5Gddde5j6ƒZ7 e 8e7dƒ¡dSdS)z9
2025.8.9
2025.8.10
4.55.4
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)Ú AcceleratorÚAlignPropConfigÚAlignPropTrainerrr ÚDDPOStableDiffusionPipelinerÚPathÚProjectConfigurationÚPyTorchModelHubMixinrÚ defaultdictÚgenerate_model_cardÚget_comet_experiment_urlÚis_wandb_availableÚloggerÚosÚset_seedÚtextwrapÚtorchÚwarnings)Ú*)Ú dataclassÚfield)ÚVersion)Ú nullcontext)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚ max_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ fullgraphÚoptionsc
Ctj| d|jd¡ddd}tj| d¡ddd}g}t||ƒD](\}}| tj¡}tj|d| d¡d  d¡}tj
|dd}||} |  | ¡q! t  |¡}| |jd|jdf¡}|S)Néÿÿÿÿér)ÚchunksÚdim)r-Úindex)r-é)
rÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ unsqueezeÚsqueezeÚ logsumexpÚappendÚconcat)
Úlogitsr.Úchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚ chunk_logitsÚ chunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logps©rEúW/workspace/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothAlignPropTrainer.pyÚchunked_selective_log_softmax"s  
rGceZdZUdZedddidZeeed<edddidZ ee
ed <

  
     
             d!‡fdd „ Z Z S)"ÚUnslothAlignPropConfiga´
Configuration class for the [`AlignPropTrainer`].
Using [`~transformers.HfArgumentParser`] we can turn this class into
[argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
command line.
Parameters:
exp_name (`str`, *optional*, defaults to `os.path.basename(sys.argv[0])[: -len(".py")]`):
Name of this experiment (defaults to the file name without the extension).
run_name (`str`, *optional*, defaults to `""`):
Name of this run.
seed (`int`, *optional*, defaults to `0`):
Random seed for reproducibility.
log_with (`str` or `None`, *optional*, defaults to `None`):
Log with either `"wandb"` or `"tensorboard"`. Check
[tracking](https://huggingface.co/docs/accelerate/usage_guides/tracking) for more details.
log_image_freq (`int`, *optional*, defaults to `1`):
Frequency for logging images.
tracker_kwargs (`dict[str, Any]`, *optional*, defaults to `{}`):
Keyword arguments for the tracker (e.g., `wandb_project`).
accelerator_kwargs (`dict[str, Any]`, *optional*, defaults to `{}`):
Keyword arguments for the accelerator.
project_kwargs (`dict[str, Any]`, *optional*, defaults to `{}`):
Keyword arguments for the accelerator project config (e.g., `logging_dir`).
tracker_project_name (`str`, *optional*, defaults to `"trl"`):
Name of project to use for tracking.
logdir (`str`, *optional*, defaults to `"logs"`):
Top-level logging directory for checkpoint saving.
num_epochs (`int`, *optional*, defaults to `100`):
Number of epochs to train.
save_freq (`int`, *optional*, defaults to `1`):
Number of epochs between saving model checkpoints.
num_checkpoint_limit (`int`, *optional*, defaults to `5`):
Number of checkpoints to keep before overwriting old ones.
mixed_precision (`str`, *optional*, defaults to `"fp16"`):
Mixed precision training.
allow_tf32 (`bool`, *optional*, defaults to `True`):
Allow `tf32` on Ampere GPUs.
resume_from (`str`, *optional*, defaults to `""`):
Path to resume training from a checkpoint.
sample_num_steps (`int`, *optional*, defaults to `50`):
Number of sampler inference steps.
sample_eta (`float`, *optional*, defaults to `1.0`):
Eta parameter for the DDIM sampler.
sample_guidance_scale (`float`, *optional*, defaults to `5.0`):
Classifier-free guidance weight.
train_batch_size (`int`, *optional*, defaults to `1`):
Batch size for training.
train_use_8bit_adam (`bool`, *optional*, defaults to `False`):
Whether to use the 8bit Adam optimizer from `bitsandbytes`.
train_learning_rate (`float`, *optional*, defaults to `1e-3`):
Learning rate.
train_adam_beta1 (`float`, *optional*, defaults to `0.9`):
Beta1 for Adam optimizer.
train_adam_beta2 (`float`, *optional*, defaults to `0.999`):
Beta2 for Adam optimizer.
train_adam_weight_decay (`float`, *optional*, defaults to `1e-4`):
Weight decay for Adam optimizer.
train_adam_epsilon (`float`, *optional*, defaults to `1e-8`):
Epsilon value for Adam optimizer.
train_gradient_accumulation_steps (`int`, *optional*, defaults to `1`):
Number of gradient accumulation steps.
train_max_grad_norm (`float`, *optional*, defaults to `1.0`):
2025-08-28 22:41:56 +00:00
Maximum gradient norm for gradient clipping.
2025-08-28 17:57:59 +00:00
negative_prompts (`str` or `None`, *optional*, defaults to `None`):
Comma-separated list of prompts to use as negative examples.
truncated_backprop_rand (`bool`, *optional*, defaults to `True`):
If `True`, randomized truncation to different diffusion timesteps is used.
truncated_backprop_timestep (`int`, *optional*, defaults to `49`):
Absolute timestep to which the gradients are backpropagated. Used only if `truncated_backprop_rand=False`.
truncated_rand_backprop_minmax (`tuple[int, int]`, *optional*, defaults to `(0, 50)`):
Range of diffusion timesteps for randomized truncated backpropagation.
push_to_hub (`bool`, *optional*, defaults to `False`):
Whether to push the final model to the Hub.
helpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsr*z8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksÚ inferenceÚéO
r/ÚtrlÚlogsédéÚfp16Té2çð?ç@Fç-Cëâ6
?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç{®Gáz„?ç:Œ0âŽyE>éé1c ! tƒjdid|d|d|d|d|d|d|d|d | “d
|
d | d | d
|
d|d|d|d|d|d|d|d|d|d|d|d|d|d|d|d|| ¤Ž||_||_dS)exp_nameÚrun_nameÚseedÚlog_withÚlog_image_freqÚtracker_project_nameÚlogdirÚ
num_epochsÚ save_freqÚnum_checkpoint_limitÚmixed_precisionÚ
allow_tf32Ú resume_fromÚsample_num_stepsÚ
sample_etaÚsample_guidance_scaleÚtrain_batch_sizeÚtrain_use_8bit_adamÚtrain_learning_rateÚtrain_adam_beta1Útrain_adam_beta2Útrain_adam_weight_decayÚtrain_adam_epsilonÚ!train_gradient_accumulation_stepsÚtrain_max_grad_normÚnegative_promptsÚtruncated_backprop_randÚtruncated_backprop_timestepÚ push_to_hubrE)ÚsuperÚ__init__rLrM)!Úselfr`rarbrcrdrerfrgrhrirjrkrlrmrnrorprqrrrsrtrurvrwrxryrzr{r|rLrMÚkwargs©Ú __class__rErFr~Œsz %ÿþýüûúùø ÷
ö õ ô
óòñðïîíìëêéèçæåäã
zUnslothAlignPropConfig.__init__)rNrOrPNr/rQrRrSr/rTrUTrOrVrWrXr/FrYrZr[r\r]r^rWNTr_FNr*)
Ú__name__Ú
__module__Ú __qualname__Ú__doc__rrLrrÚ__annotations__rMÚintr~Ú
__classcell__rErErrFrH3sT
NþþàrHcs4eZdZdZddgZ d.dedeeje e
e e gejfdege e
e ffde d e
ee e e ge ff
d
d Zd d
ZdedefddZddZdejdedejfddZddZddZddZd/d d!„Zd.d"e
efd#d$„Zd%d&„Zfd'd(„Z   d0d)e
e
d*e
e
d+ee
ee
dffd,d-„ZZS)1Ú_UnslothAlignPropTrainerrOrQÚ alignpropNÚconfigÚreward_functionÚprompt_functionÚ sd_pipelineÚimage_samples_hookc
Cst dt¡|durt d¡||_||_||_||_td!i|jj¤Ž}|jj r}t
j   t
j  
|jj ¡¡|j_ dt
j  |jj ¡vr}ttddt
 |jj ¡ƒƒ}t|ƒdkr]td|jj ƒtdd „|Dƒƒ}t
j  |jj d|d
¡|j_ |d
d |_td!|jj|jj||jjd œ|jj¤Ž|_|jduo|jd
k} |jjr¸|jj|jj| s¯t | dn| |jj"dt# $d|¡t%|jj&dd||_'|j'j(d |jj) dddd|jjdkrãt*j+}
n
|jjdkrít*j,}
nt*j-}
|j'j.j/|jj0|
d|j'j1j/|jj0|
d|j'j2j/|jj0|
d|j' } |j 4|j5¡|j 6|j7¡|jj8r/dt*j9j:j;_8| <t=| tƒs;|  n| ¡|_?|j' 1|j'j@|jjAdurOdgn|jjAddd|j'j@jBdjC /|jj0¡¡d|_D|j'jEpn|jjE|_EtF|j'dƒr”|j'jGr”|j H| |j?¡\} |_?ttdd|  ƒƒ|_In |j H| |j?¡\|_I|_?|j rÃt# $d|j ¡|j J|j ¡tK|j  Ld ¡d
ƒd |_MdSd|_MdS)"NzEAlignPropTrainer is deprecated and will be removed in version 0.23.0.z8No image_samples_hook provided; no images will be loggedÚ checkpoint_cSsd|vS)NrrE)ÚxrErErFÚ<lambda>õsz3_UnslothAlignPropTrainer.__init__.<locals>.<lambda>rzNo checkpoints found in cSsg|] }t| d¡dƒqS)Ú_r*)rˆÚsplit)Ú.0rrErErFÚ
<listcomp>ûsz5_UnslothAlignPropTrainer.__init__.<locals>.<listcomp>r*r/)rcrjÚproject_configÚgradient_accumulation_stepsÚ tensorboard)Úalignprop_trainer_config)Ú init_kwargsÚ
T)Údevice_specificFÚTimestep)ÚpositionÚdisableÚleaveÚdescÚ
dynamic_ncolsrUÚbf16)ÚdtyperOÚptÚ
max_length©Úreturn_tensorsÚpaddingÚ
truncationr¨Úuse_loracSs|jS©N)Ú
requires_grad)ÚprErErFr“RszResuming from r”rE)NrÚwarnÚDeprecationWarningÚ prompt_fnÚ reward_fnrŒÚimage_samples_callbackrÚproject_kwargsrlrÚpathÚnormpathÚ
2025-08-28 22:41:56 +00:00
expanduserÚbasenameÚlistÚfilterÚlistdirÚlenÚ
2025-08-28 17:57:59 +00:00
ValueErrorÚsortedÚjoinÚ iterationr rcrjrwÚaccelerator_kwargsÚ acceleratorÚis_main_processÚ
init_trackersreÚdictÚto_dictÚtracker_kwargsrÚinforrbrÚset_progress_bar_configÚis_local_main_processrÚfloat16Úbfloat16r5Úvaer4ÚdeviceÚ text_encoderÚunetÚget_trainable_layersÚregister_save_state_pre_hookÚ_save_model_hookÚregister_load_state_pre_hookÚ_load_model_hookrkÚbackendsÚcudaÚmatmulÚ_setup_optimizerÚ
isinstanceÚ
parametersÚ optimizerÚ tokenizerryÚmodel_max_lengthÚ input_idsÚneg_prompt_embedÚautocastÚhasattrr­ÚprepareÚtrainable_layersÚ
load_staterˆr•Ú first_epoch)
rrrrÚaccelerator_project_configÚ checkpointsÚcheckpoint_numbersÚis_using_tensorboardÚinference_dtyperærErErFr~Ùþ
 þÿ  þùø ÿûû
 

 ÿû ùø 
z!_UnslothAlignPropTrainer.__init__cCs"| |d|d|d¡\}}|S)imagesÚpromptsÚprompt_metadata)r´)rÚprompt_image_pairsÚrewardÚreward_metadatarErErFÚcompute_rewards]sÿz(_UnslothAlignPropTrainer.compute_rewardsÚepochÚ global_stepc Cs<ttƒ}|jj ¡t|jjƒD]¨}|j  |jj¡{| 
¡gt   ¡S|j
|jjd}| |¡}||d<|j |¡ ¡ ¡ ¡}| |¡}|j |¡|jjrf|j t|jtƒs_|j ¡n|j|jj¡|j ¡|j ¡Wdƒn1szwYWdƒn1s‰wYWdƒn1s˜wY|d |  ¡¡|d | ¡|d | ¡q|jjrædd| Dƒ}|jj$|d d
}| %d |i¡|jj&||d |d
7}ttƒ}nt'dƒ|j(dur||jj)dkr| (|||jj*d¡|dkr||jj+dkr|jj,r|j |S)a
Perform a single step of training.
2025-08-28 22:41:56 +00:00
2025-08-28 17:57:59 +00:00
Args:
2025-08-28 22:41:56 +00:00
epoch (int): The current epoch.
2025-08-28 17:57:59 +00:00
global_step (int): The current global step.
2025-08-28 22:41:56 +00:00
2025-08-28 17:57:59 +00:00
Side Effects:
2025-08-28 22:41:56 +00:00
- Model weights are updated
2025-08-28 17:57:59 +00:00
- Logs the statistics to the accelerator trackers.
- If `self.image_samples_callback` is not None, it will be called with the prompt_image_pairs, global_step,
and the accelerator tracker.
Returns:
global_step (int): The updated global step.
)Ú
batch_sizeÚrewardsNÚ reward_meanÚ
reward_stdÚlosscSs"i|]
\}}|t t |¡¡qSrE)rÚmeanÚtensor)rÚvrErErFÚ
<dictcomp>šs"z1_UnslothAlignPropTrainer.step.<locals>.<dictcomp>rü)Ú reductionrõ)Ústepr/zsOptimization step should have been performed by this point. Please check calculated gradient accumulation settings.r).rrÚtrainÚrangerŒrwÚ
accumulaterãrÚ enable_gradÚ_generate_samplesrpr6ÚdetachÚcpuÚnumpyÚcalculate_lossÚbackwardÚsync_gradientsÚclip_grad_norm_rÜrxrÚ zero_gradr:ÚstdÚitemÚitemsÚreduceÚupdateÚlogr¿rdÚtrackersrhÚ
2025-08-28 22:41:56 +00:00
save_state) rr”Ú rewards_visrûrErErFrcsX &ÿ

2025-08-28 17:57:59 +00:00
 
 ÿü
2025-08-28 22:41:56 +00:00
 è
ÿ&
z_UnslothAlignPropTrainer.stepcCsd| ¡}|S)a(
2025-08-28 17:57:59 +00:00
Calculate the loss for a batch of an unpacked sample
Args:
rewards (torch.Tensor):
Differentiable reward scalars for each generated image, shape: [batch_size]
Returns:
loss (torch.Tensor) (all of these are of shape (1,))
g$@))rrErErFr ­s z'_UnslothAlignPropTrainer.calculate_lossÚ
advantagesÚ
clip_rangeÚratiocCs8| |}| t |d|d|¡}t t ||¡¡S)NrW)rÚclamprüÚmaximum)rrrrÚunclipped_lossÚ clipped_lossrErErF¼s
ýz_UnslothAlignPropTrainer.losscCsL|jjr
ddl}|jj}ntjj}|||jj|jj|jj f|jj
|jj dS)Nr)ÚlrÚbetasÚ weight_decayÚeps) rqÚ bitsandbytesÚoptimÚ AdamW8bitrÚAdamWrrrsrtrurv)rÚtrainable_layers_parametersr$Ú
optimizer_clsrErErFÊs
ûz)_UnslothAlignPropTrainer._setup_optimizercCs|j |||¡| ¡dS)rÚsave_checkpointÚpop)rÚmodelsÚweightsÚ
output_dirrErErFÚs z)_UnslothAlignPropTrainer._save_model_hookcCs|j ||¡| ¡dS)rÚload_checkpointr+)rr,Ú input_dirrErErFr×Þs z)_UnslothAlignPropTrainer._load_model_hookTc si}ˆj |dd¡}|durtfddt|ƒDƒŽ\}}n ddt|ƒDƒ}ˆjj|dddˆjjjd j ˆj j
¡}ˆj  |¡d
2025-08-28 22:41:56 +00:00
}|raˆjj ||ˆj
jˆj
jˆj
2025-08-28 17:57:59 +00:00
jˆj
jˆj
2025-08-28 22:41:56 +00:00
jˆj
2025-08-28 17:57:59 +00:00
jdd } nˆj||ˆj
jˆj
jˆj
jdd } | j}
|
|d
<||d<||d<|S)a
Generate samples from the model
Args:
batch_size (int): Batch size to use for sampling
with_grad (bool): Whether the generated RGBs should have gradients attached to it.
Returns:
prompt_image_pairs (dict[Any])
r/Ncsg|]}ˆ ¡qSrE)©rr”©rrErFr—òsz>_UnslothAlignPropTrainer._generate_samples.<locals>.<listcomp>cSsg|]}iqSrErEr1rErErFr—ôsTr©r) Ú
prompt_embedsÚnegative_prompt_embedsÚnum_inference_stepsÚguidance_scaleÚetarzr{Útruncated_rand_backprop_minmaxÚ output_type)r3r4r5r6r7r9)Úrepeatr3rrr4Ú
rgb_with_gradrŒrmrornrzr{r8) rÚ with_gradrïÚsample_neg_prompt_embedsrðÚ
prompt_idsr3Ú sd_outputrîrEr2rFrâsP  û ú÷ ú z*_UnslothAlignPropTrainer._generate_samplesÚepochscCs6d}|dur
|jj}t|j|ƒD]}| ||¡}qdS)z>
Train the model for a given number of epochs
rN)rgrr)rr@rErErFrs ÿz_UnslothAlignPropTrainer.traincCs|j |¡| ¡dS)rÚsave_pretrainedÚcreate_model_card)rÚsave_directoryrErErFÚ_save_pretrained(s  z)_UnslothAlignPropTrainer._save_pretrainedcsL|jjdurt|jjƒj}n |jj d¡d}|j|dtƒ ||¡dS)/r*)Ú
model_name) ÚargsÚ hub_model_idrr.Únamer•rBr}Ú_save_checkpoint)rÚmodelÚtrialrFrrErFrJ-s
2025-08-28 22:41:56 +00:00
 z)_UnslothAlignPropTrainer._save_checkpointrFÚ dataset_nameÚtagsc
C| ¡sdSt|jjdƒrtj |jjj¡s|jjj}nd}|dur&tƒ}n
2025-08-28 17:57:59 +00:00
t |t
2025-08-28 22:41:56 +00:00
ƒr/|h}nt|ƒ}t|jjdƒr?|  d¡|  |j
¡t d¡}t|||j||tƒr]tjdur]tjjndtƒd|ddd }| tj |jjd
¡¡dS) 
Creates a draft of a model card using the information available to the `Trainer`.
Args:
model_name (`str` or `None`, *optional*, defaults to `None`):
2025-08-28 17:57:59 +00:00
Name of the model.
dataset_name (`str` or `None`, *optional*, defaults to `None`):