unsloth_compiled_cache/__pycache__/UnslothAlignPropTrainer.cpython-310.pyc

o
õ×°hOoã@s´dZddlmZddlZddlmZddlmZddlmZm	Z	m
Z
mZmZm
Z
mZmZddlmZmZmZmZmZmZm
Z
mZmZmZmZmZmZmZmZmZmZmZmZmZm Z ddlZddlTddl!m"Z"m#Z#dd	l$m%Z%ddlZddl&Z'dd
l(m)Z)ddlmZddl*m+Z+m,Z-dd
dd
d
dœZ.ej/dde.d�dd„ƒZ0e"Gdd„deƒƒZ1	Gdd„deƒZ2Gdd„de2ƒZ3	e4edƒrØddl5Z5Gdd„de5j6ƒZ7	e 8e7dƒ¡dSdS)z9
2025.8.9
2025.8.10
4.55.4
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)ÚAcceleratorÚAlignPropConfigÚAlignPropTrainerrrÚDDPOStableDiffusionPipelinerÚPathÚProjectConfigurationÚPyTorchModelHubMixinrÚdefaultdictÚgenerate_model_cardÚget_comet_experiment_urlÚis_wandb_availableÚloggerÚosÚset_seedÚtextwrapÚtorchÚwarnings)Ú*)Ú	dataclassÚfield)ÚVersion)Únullcontext)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚmax_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ	fullgraphÚoptionsc
Cs¾tj| d|jd¡ddd�}tj| d¡ddd�}g}t||ƒD](\}}| tj¡}tj|d| d¡d� 	d¡}tj
|dd�}||}	| |	¡q!	t |¡}| |jd|jdf¡}|S)Néÿÿÿÿér)ÚchunksÚdim)r-Úindex)r-é)
rÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ	unsqueezeÚsqueezeÚ	logsumexpÚappendÚconcat)
Úlogitsr.Úchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚchunk_logitsÚchunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logps©rEúW/workspace/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothAlignPropTrainer.pyÚchunked_selective_log_softmax"s
rGcsšeZdZUdZedddid�Zeeed<edddid�Z	ee
ed	<	
				
				
														d!‡fdd „	Z‡ZS)"ÚUnslothAlignPropConfiga´
    
    Configuration class for the [`AlignPropTrainer`].

    Using [`~transformers.HfArgumentParser`] we can turn this class into
    [argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
    command line.

    Parameters:
        exp_name (`str`, *optional*, defaults to `os.path.basename(sys.argv[0])[: -len(".py")]`):
            Name of this experiment (defaults to the file name without the extension).
        run_name (`str`, *optional*, defaults to `""`):
            Name of this run.
        seed (`int`, *optional*, defaults to `0`):
            Random seed for reproducibility.
        log_with (`str` or `None`, *optional*, defaults to `None`):
            Log with either `"wandb"` or `"tensorboard"`. Check
            [tracking](https://huggingface.co/docs/accelerate/usage_guides/tracking) for more details.
        log_image_freq (`int`, *optional*, defaults to `1`):
            Frequency for logging images.
        tracker_kwargs (`dict[str, Any]`, *optional*, defaults to `{}`):
            Keyword arguments for the tracker (e.g., `wandb_project`).
        accelerator_kwargs (`dict[str, Any]`, *optional*, defaults to `{}`):
            Keyword arguments for the accelerator.
        project_kwargs (`dict[str, Any]`, *optional*, defaults to `{}`):
            Keyword arguments for the accelerator project config (e.g., `logging_dir`).
        tracker_project_name (`str`, *optional*, defaults to `"trl"`):
            Name of project to use for tracking.
        logdir (`str`, *optional*, defaults to `"logs"`):
            Top-level logging directory for checkpoint saving.
        num_epochs (`int`, *optional*, defaults to `100`):
            Number of epochs to train.
        save_freq (`int`, *optional*, defaults to `1`):
            Number of epochs between saving model checkpoints.
        num_checkpoint_limit (`int`, *optional*, defaults to `5`):
            Number of checkpoints to keep before overwriting old ones.
        mixed_precision (`str`, *optional*, defaults to `"fp16"`):
            Mixed precision training.
        allow_tf32 (`bool`, *optional*, defaults to `True`):
            Allow `tf32` on Ampere GPUs.
        resume_from (`str`, *optional*, defaults to `""`):
            Path to resume training from a checkpoint.
        sample_num_steps (`int`, *optional*, defaults to `50`):
            Number of sampler inference steps.
        sample_eta (`float`, *optional*, defaults to `1.0`):
            Eta parameter for the DDIM sampler.
        sample_guidance_scale (`float`, *optional*, defaults to `5.0`):
            Classifier-free guidance weight.
        train_batch_size (`int`, *optional*, defaults to `1`):
            Batch size for training.
        train_use_8bit_adam (`bool`, *optional*, defaults to `False`):
            Whether to use the 8bit Adam optimizer from `bitsandbytes`.
        train_learning_rate (`float`, *optional*, defaults to `1e-3`):
            Learning rate.
        train_adam_beta1 (`float`, *optional*, defaults to `0.9`):
            Beta1 for Adam optimizer.
        train_adam_beta2 (`float`, *optional*, defaults to `0.999`):
            Beta2 for Adam optimizer.
        train_adam_weight_decay (`float`, *optional*, defaults to `1e-4`):
            Weight decay for Adam optimizer.
        train_adam_epsilon (`float`, *optional*, defaults to `1e-8`):
            Epsilon value for Adam optimizer.
        train_gradient_accumulation_steps (`int`, *optional*, defaults to `1`):
            Number of gradient accumulation steps.
        train_max_grad_norm (`float`, *optional*, defaults to `1.0`):
            Maximum gradient norm for gradient clipping.
        negative_prompts (`str` or `None`, *optional*, defaults to `None`):
            Comma-separated list of prompts to use as negative examples.
        truncated_backprop_rand (`bool`, *optional*, defaults to `True`):
            If `True`, randomized truncation to different diffusion timesteps is used.
        truncated_backprop_timestep (`int`, *optional*, defaults to `49`):
            Absolute timestep to which the gradients are backpropagated. Used only if `truncated_backprop_rand=False`.
        truncated_rand_backprop_minmax (`tuple[int, int]`, *optional*, defaults to `(0, 50)`):
            Range of diffusion timesteps for randomized truncated backpropagation.
        push_to_hub (`bool`, *optional*, defaults to `False`):
            Whether to push the final model to the Hub.
    
    NÚhelpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsr*z8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksÚ	inferenceÚéO
r/ÚtrlÚlogsédéÚfp16Té2çð?ç@Fç-Cëâ6
?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç{®Gáz„?ç:Œ0âŽyE>éé1c !sÐtƒjdid|“d|“d|“d|“d|“d|“d|“d|“d	|	“d
|
“d|“d|“d
|
“d|“d|“d|“d|“d|“d|“d|“d|“d|“d|“d|“d|“d|“d|“d|“d|“| ¤Ž||_||_dS)NÚexp_nameÚrun_nameÚseedÚlog_withÚlog_image_freqÚtracker_project_nameÚlogdirÚ
num_epochsÚ	save_freqÚnum_checkpoint_limitÚmixed_precisionÚ
allow_tf32Úresume_fromÚsample_num_stepsÚ
sample_etaÚsample_guidance_scaleÚtrain_batch_sizeÚtrain_use_8bit_adamÚtrain_learning_rateÚtrain_adam_beta1Útrain_adam_beta2Útrain_adam_weight_decayÚtrain_adam_epsilonÚ!train_gradient_accumulation_stepsÚtrain_max_grad_normÚnegative_promptsÚtruncated_backprop_randÚtruncated_backprop_timestepÚpush_to_hubrE)ÚsuperÚ__init__rLrM)!Úselfr`rarbrcrdrerfrgrhrirjrkrlrmrnrorprqrrrsrtrurvrwrxryrzr{r|rLrMÚkwargs©Ú	__class__rErFr~Œsz%ÿþýüûúùø	÷
öõô
óòñðïîíìëêéèçæåäã
zUnslothAlignPropConfig.__init__)rNrOrPNr/rQrRrSr/rTrUTrOrVrWrXr/FrYrZr[r\r]r^rWNTr_FNr*)
Ú__name__Ú
__module__Ú__qualname__Ú__doc__rrLrrÚ__annotations__rMÚintr~Ú
__classcell__rErEr�rFrH3sT
NþþàrHcs4eZdZdZddgZ	d.dedeeje	e
e	egejfdege	e
effded	e
eeeegeff
d
d„Zdd
„Zdedefdd„Zdd„Zdejdedejfdd„Zdd„Zdd„Zdd„Zd/d d!„Zd.d"e
efd#d$„Zd%d&„Z‡fd'd(„Z			d0d)e
e
d*e
e
d+ee
ee
dffd,d-„Z‡ZS)1Ú_UnslothAlignPropTrainerrOrQÚ	alignpropNÚconfigÚreward_functionÚprompt_functionÚsd_pipelineÚimage_samples_hookc
	Cs�t dt¡|durt d¡||_||_||_||_td!i|jj¤Ž}|jj	r}t
j t
j 
|jj	¡¡|j_	dt
j |jj	¡vr}ttdd„t
 |jj	¡ƒƒ}t|ƒdkr]td|jj	›�ƒ‚tdd	„|Dƒƒ}t
j |jj	d|d
›�¡|j_	|d
d|_td!|jj|jj||jjdœ|jj¤Ž|_|jduo›|jd
k}	|jjr¸|jj|jj|	s¯t | !¡d�n| !¡|jj"d�t# $d|›�¡t%|jj&dd�||_'|j'j(d|jj)dddd�|jjdkrãt*j+}
n
|jjdkrít*j,}
nt*j-}
|j'j.j/|jj0|
d�|j'j1j/|jj0|
d�|j'j2j/|jj0|
d�|j' 3¡}|j 4|j5¡|j 6|j7¡|jj8�r/dt*j9j:j;_8| <t=|tƒ�s;| >¡n|¡|_?|j' 1|j'j@|jjAdu�rOdgn|jjAddd|j'j@jBd�jC /|jj0¡¡d|_D|j'jE�pn|jjE|_EtF|j'dƒ�r”|j'jG�r”|j H||j?¡\}|_?ttdd„| >¡ƒƒ|_In|j H||j?¡\|_I|_?|j	�rÃt# $d|j	›�¡|j J|j	¡tK|j	 Ld ¡d
ƒd|_MdSd|_MdS)"NzEAlignPropTrainer is deprecated and will be removed in version 0.23.0.z8No image_samples_hook provided; no images will be loggedÚcheckpoint_cSsd|vS)Nr‘rE)ÚxrErErFÚ<lambda>õsz3_UnslothAlignPropTrainer.__init__.<locals>.<lambda>rzNo checkpoints found in cSsg|]}t| d¡dƒ‘qS)Ú_r*)rˆÚsplit)Ú.0r’rErErFÚ
<listcomp>ûsz5_UnslothAlignPropTrainer.__init__.<locals>.<listcomp>r*r/)rcrjÚproject_configÚgradient_accumulation_stepsÚtensorboard)Úalignprop_trainer_config)rŒÚinit_kwargsÚ
T)Údevice_specificFÚTimestep)ÚpositionÚdisableÚleaveÚdescÚ
dynamic_ncolsrUÚbf16)ÚdtyperOÚptÚ
max_length©Úreturn_tensorsÚpaddingÚ
truncationr¨Úuse_loracSs|jS©N)Ú
requires_grad)ÚprErErFr“RszResuming from r”rE)NrÚwarnÚDeprecationWarningÚ	prompt_fnÚ	reward_fnrŒÚimage_samples_callbackrÚproject_kwargsrlrÚpathÚnormpathÚ
expanduserÚbasenameÚlistÚfilterÚlistdirÚlenÚ
ValueErrorÚsortedÚjoinÚ	iterationrrcrjrwÚaccelerator_kwargsÚacceleratorÚis_main_processÚ
init_trackersreÚdictÚto_dictÚtracker_kwargsrÚinforrbr�Úset_progress_bar_configÚis_local_main_processrÚfloat16Úbfloat16r5Úvaer4ÚdeviceÚtext_encoderÚunetÚget_trainable_layersÚregister_save_state_pre_hookÚ_save_model_hookÚregister_load_state_pre_hookÚ_load_model_hookrkÚbackendsÚcudaÚmatmulÚ_setup_optimizerÚ
isinstanceÚ
parametersÚ	optimizerÚ	tokenizerryÚmodel_max_lengthÚ	input_idsÚneg_prompt_embedÚautocastÚhasattrrÚprepareÚtrainable_layersÚ
load_staterˆr•Úfirst_epoch)
rrŒr�rŽr�r�Úaccelerator_project_configÚcheckpointsÚcheckpoint_numbersÚis_using_tensorboardÚinference_dtyperærÒrErErFr~Ùsºþ
þÿþùøÿûû


ÿûùø
z!_UnslothAlignPropTrainer.__init__cCs"| |d|d|d¡\}}|S)NÚimagesÚpromptsÚprompt_metadata)r´)rÚprompt_image_pairsÚrewardÚreward_metadatarErErFÚcompute_rewards]sÿz(_UnslothAlignPropTrainer.compute_rewardsÚepochÚglobal_stepc	Cs<ttƒ}|jj ¡t|jjƒD]¨}|j 	|jj¡�{| 
¡�gt ¡�S|j
|jjd�}| |¡}||d<|j |¡ ¡ ¡ ¡}| |¡}|j |¡|jjrf|j t|jtƒs_|j ¡n|j|jj¡|j ¡|j ¡Wdƒn1szwYWdƒn1s‰wYWdƒn1s˜wY|d |  ¡¡|d | !¡¡|d | "¡¡q|jjrædd„| #¡Dƒ}|jj$|d	d
�}| %d|i¡|jj&||d�|d
7}ttƒ}nt'dƒ‚|j(du�r||jj)dk�r| (|||jj*d¡|dk�r||jj+dk�r|jj,�r|j -¡|S)a
        Perform a single step of training.

        Args:
            epoch (int): The current epoch.
            global_step (int): The current global step.

        Side Effects:
            - Model weights are updated
            - Logs the statistics to the accelerator trackers.
            - If `self.image_samples_callback` is not None, it will be called with the prompt_image_pairs, global_step,
              and the accelerator tracker.

        Returns:
            global_step (int): The updated global step.
        )Ú
batch_sizeÚrewardsNÚreward_meanÚ
reward_stdÚlosscSs"i|]
\}}|t t |¡¡“qSrE)rÚmeanÚtensor)r–ÚkÚvrErErFÚ
<dictcomp>šs"z1_UnslothAlignPropTrainer.step.<locals>.<dictcomp>rü)Ú	reductionrõ)Ústepr/zsOptimization step should have been performed by this point. Please check calculated gradient accumulation settings.r).rr»r�rÒÚtrainÚrangerŒrwrÄÚ
accumulaterãrÚenable_gradÚ_generate_samplesrprôr6ÚdetachÚcpuÚnumpyÚcalculate_lossÚbackwardÚsync_gradientsÚclip_grad_norm_rÜrærÝrxrÞrÚ	zero_gradr:rüÚstdÚitemÚitemsÚreduceÚupdateÚlogr¿rµrdÚtrackersrhrÅÚ
save_state)	rrõrörÊr”rñrøÚrewards_visrûrErErFrcsX&ÿ


ÿü
è€€
ÿ&
z_UnslothAlignPropTrainer.stepcCsd| ¡}|S)a(
        Calculate the loss for a batch of an unpacked sample

        Args:
            rewards (torch.Tensor):
                Differentiable reward scalars for each generated image, shape: [batch_size]

        Returns:
            loss (torch.Tensor) (all of these are of shape (1,))
        g$@)rü)rrørûrErErFrsz'_UnslothAlignPropTrainer.calculate_lossÚ
advantagesÚ
clip_rangeÚratiocCs8||}|t |d|d|¡}t t ||¡¡S)NrW)rÚclamprüÚmaximum)rrrrÚunclipped_lossÚclipped_lossrErErFrû¼s
ýz_UnslothAlignPropTrainer.losscCsL|jjr
ddl}|jj}ntjj}|||jj|jj|jj	f|jj
|jjd�S)Nr)ÚlrÚbetasÚweight_decayÚeps)rŒrqÚbitsandbytesÚoptimÚ	AdamW8bitrÚAdamWrrrsrtrurv)rÚtrainable_layers_parametersr$Ú
optimizer_clsrErErFrÛÊs
ûz)_UnslothAlignPropTrainer._setup_optimizercCs|j |||¡| ¡dSr®)r�Úsave_checkpointÚpop)rÚmodelsÚweightsÚ
output_dirrErErFrÕÚsz)_UnslothAlignPropTrainer._save_model_hookcCs|j ||¡| ¡dSr®)r�Úload_checkpointr+)rr,Ú	input_dirrErErFr×Þsz)_UnslothAlignPropTrainer._load_model_hookTcsi}ˆj |dd¡}|durt‡fdd„t|ƒDƒŽ\}}n	dd„t|ƒDƒ}ˆjj|dddˆjjjd	�j ˆj	j
¡}ˆj |¡d
}|raˆjj||ˆj
jˆj
jˆj
jˆj
jˆj
jˆj
jdd�	}	nˆj||ˆj
jˆj
jˆj
jdd�}	|	j}
|
|d
<||d<||d<|S)a
        Generate samples from the model

        Args:
            batch_size (int): Batch size to use for sampling
            with_grad (bool): Whether the generated RGBs should have gradients attached to it.

        Returns:
            prompt_image_pairs (dict[Any])
        r/Ncsg|]}ˆ ¡‘qSrE)r³©r–r”©rrErFr—òsz>_UnslothAlignPropTrainer._generate_samples.<locals>.<listcomp>cSsg|]}i‘qSrErEr1rErErFr—ôsr§r¨Tr©r)	Ú
prompt_embedsÚnegative_prompt_embedsÚnum_inference_stepsÚguidance_scaleÚetarzr{Útruncated_rand_backprop_minmaxÚoutput_type)r3r4r5r6r7r9rîrïrð)râÚrepeatr3rr�rßràrár4rÄrÐrÑÚ
rgb_with_gradrŒrmrornrzr{r8rî)rr÷Ú	with_gradrïrñÚsample_neg_prompt_embedsrðÚ
prompt_idsr3Ú	sd_outputrîrEr2rFrâsP ûú÷ú	z*_UnslothAlignPropTrainer._generate_samplesÚepochscCs6d}|dur
|jj}t|j|ƒD]}| ||¡}qdS)z>
        Train the model for a given number of epochs
        rN)rŒrgrrèr)rr@rörõrErErFrsÿz_UnslothAlignPropTrainer.traincCs|j |¡| ¡dSr®)r�Úsave_pretrainedÚcreate_model_card)rÚsave_directoryrErErFÚ_save_pretrained(sz)_UnslothAlignPropTrainer._save_pretrainedcsL|jjdurt|jjƒj}n	|jj d¡d}|j|d�tƒ ||¡dS)Nú/r*)Ú
model_name)	ÚargsÚhub_model_idrr.Únamer•rBr}Ú_save_checkpoint)rÚmodelÚtrialrFr�rErFrJ-s
z)_UnslothAlignPropTrainer._save_checkpointrFÚdataset_nameÚtagsc
Csê| ¡sdSt|jjdƒrtj |jjj¡s|jjj}nd}|dur&tƒ}n
t	|t
ƒr/|h}nt|ƒ}t|jjdƒr?| d¡| |j
¡t d¡}t|||j||tƒr]tjdur]tjjndtƒd|ddd	�}| tj |jjd
¡¡dS)aî
        Creates a draft of a model card using the information available to the `Trainer`.

        Args:
            model_name (`str` or `None`, *optional*, defaults to `None`):
                Name of the model.
            dataset_name (`str` or `None`, *optional*, defaults to `None`):