unsloth_compiled_cache/__pycache__/UnslothPRMTrainer.cpython-310.pyc

o
=—°hAŸã@sÆdZddlmZddlZddlmZddlmZddlmZm	Z	m
Z
mZmZm
Z
mZmZddlmZmZmZmZmZmZmZm
Z
mZmZmZmZmZmZmZmZmZm Z mZm!Z!m"Z"m#Z#m$Z$m%Z%m&Z&m'Z'm(Z(mZm)Z)m*Z*m+Z+mZm,Z,m
Z
mZmZmZm'Z'm)Z)mZddl)Z)ddlTddl-m.Z.m/Z/dd	l0m1Z1ddlZddl2Z3dd
l4m5Z5ddlmZddl6m7Z7m8Z9dd
dd
d
dœZ:ej;dde:d�dd„ƒZ<e.Gdd„deƒƒZ=	Gdd„deƒZ>Gdd„de>ƒZ?dS)z9
2025.8.9
2025.8.10
4.55.4
0.21.0
__UNSLOTH_VERSIONING__
é)ÚTensorN)Ú
functional)ÚAnyÚListÚOptionalÚTupleÚUnionÚDictÚSetÚCallable)(ÚBaseImageProcessorrÚDataCollatorÚ"DataCollatorForTokenClassificationÚDatasetÚEvalPredictionÚFeatureExtractionMixinrÚ	PRMConfigÚ
PRMTrainerÚPartialStateÚPathÚ	PeftModelÚPreTrainedModelÚPreTrainedTokenizerBaseÚProcessorMixinÚTrainerÚTrainerCallbackrÚchainÚcompute_accuracyÚdisable_dropout_in_modelÚfeaturesÚgenerate_model_cardÚinspectÚis_peft_availableÚis_wandb_availableÚnnÚosÚprepare_model_for_kbit_trainingÚtextwrapÚtorchÚwarningsrrrrr"r%r()Ú*)Ú	dataclassÚfield)ÚVersion)Únullcontext)ÚDataCollatorForSeq2SeqÚDataCollatorForLanguageModelingTF)Úepilogue_fusionÚmax_autotuneÚ
shape_paddingz
trace.enabledztriton.cudagraphs)ÚdynamicÚ	fullgraphÚoptionsc
Cs¾tj| d|jd¡ddd�}tj| d¡ddd�}g}t||ƒD](\}}| tj¡}tj|d| d¡d� 	d¡}tj
|dd�}||}	| |	¡q!	t |¡}| |jd|jdf¡}|S)Néÿÿÿÿér)ÚchunksÚdim)r:Úindex)r:é)
r(ÚchunkÚreshapeÚshapeÚzipÚtoÚfloat32ÚgatherÚ	unsqueezeÚsqueezeÚ	logsumexpÚappendÚconcat)
Úlogitsr;Úchunked_logitsÚ
chunked_indexÚall_per_token_logpsÚchunk_logitsÚchunk_indexÚselected_logitsÚlogsumexp_valuesÚper_token_logps©rRúQ/workspace/DS-LLM-TEMPLATE-FINETUNING/unsloth_compiled_cache/UnslothPRMTrainer.pyÚchunked_selective_log_softmax"s
rTcsŒeZdZUdZedddid�Zeeed<edddid�Z	ee
ed	<eddd
id�Zee
ed<						
																														 									!	!					"	#								$														$						%	&				'												(									#				$				)	*														+	,			$					d/‡fd-d.„	Z‡Z
S)0ÚUnslothPRMConfiga:
    
    Configuration class for the [`PRMTrainer`].

    This class includes only the parameters that are specific to PRM training. For a full list of training arguments,
    please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may
    differ from those in [`~transformers.TrainingArguments`].

    Using [`~transformers.HfArgumentParser`] we can turn this class into
    [argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
    command line.

    Parameters:
        max_length (`int` or `None`, *optional*, defaults to `1024`):
            Maximum length of the sequences (prompt + completion) used for truncation.
        max_prompt_length (`int` or `None`, *optional*, defaults to `512`):
            Maximum length of the prompt used for truncation.
        max_completion_length (`int` or `None`, *optional*, defaults to `None`):
            Maximum length of the completion used for truncation. The completion is the concatenation of the steps.
        disable_dropout (`bool`, *optional*, defaults to `True`):
            Whether to disable dropout in the model.
        step_separator (`str`, *optional*, defaults to `"
"`):
            Separator used to separate each step of the reasoning process.
        train_on_last_step_only (`bool`, *optional*, defaults to `False`):
            Whether to train only on the last step.
        dataset_num_proc (`int`, *optional*, defaults to `None`):
            Number of processes to use for processing the dataset.
    
    NÚhelpzvLLM SamplingParams)ÚdefaultÚmetadataÚvllm_sampling_paramsr7z8Chunk size to reduce memory usage. -1 is most efficient.Úunsloth_num_chunksz'Maximum sequence length to truncate to.Úmax_seq_lengthFÚnor8éréúç-Cëâ6
?ç{®Gáz„?çÍÌÌÌÌÌì?ç+‡ÙÎ÷ï?ç:Œ0âŽyE>çð?ç@Úlinearçš™™™™™¹?ÚpassiveÚwarningTÚstepsr<éôéO
ÚO1ÚautoÚçÚ
adamw_8bitÚlengthÚ
every_saveÚlastéééc‹�sÆ|dkrtd|›d�ƒ‚|dkrtd|›d�ƒ‚|dur(|#dkr(|$dkr(d}d	}#|‡dur:d
dlm}Œt|Œƒdd
ƒ}‡tƒjd•id|“d|“d|“d|“d|“d|“d|“d|“d|	“d|
“d|“d|“d|
“d|“d|“d|“d|“d|“d |“d!|“d"|“d#|“d$|“d%|“d&|“d'|“d(|“d)|“d*|“d+|“d,|“d-| “d.|!“d/|"“d0|#“d1|$“d2|%“d3|&“d4|'“d5|(“d6|)“d7|*“d8|+“d9|,“d:|-“d;|.“d<|/“d=|0“d>|1“d?|2“d@|3“dA|4“dB|5“dC|6“dD|7“dE|8“dF|9“dG|:“dH|;“dI|<“dJ|=“dK|>“dL|?“dM|@“dN|A“dO|B“dP|C“dQ|D“dR|E“dS|F“dT|G“dU|H“dV|I“dW|J“dX|K“dY|L“dZ|M“d[|N“d\|O“d]|P“d^|Q“d_|R“d`|S“da|T“db|U“dc|V“dd|W“de|X“df|Y“dg|Z“dh|[“di|\“dj|]“dk|^“dl|_“dm|`“dn|a“do|b“dp|c“dq|d“dr|e“ds|f“dt|g“du|h“dv|i“dw|j“dx|k“dy|l“dz|m“d{|n“d||o“d}|p“d~|q“d|r“d€|s“d�|t“d‚|u“dƒ|v“d„|w“d…|x“d†|y“d‡|z“dˆ|{“d‰||“dŠ|}“d‹|~“dŒ|“d�|€“dŽ|�“d�|‚“d�|ƒ“d‘|„“d’|…“d“|†“d”|‡“|‹¤Ž|ˆ|_|‰|_|Š|_	dS)–NgH¯¼šò×z>z Unsloth: Your learning rate of `zi` is too small and less than 1e-7! Consider increasing it, otherwise gradient updates will be close to 0!r<za` is way too larger > 1! Consider decreasing it to 1e-1, otherwise gradient updates will explode!rjrkÚunsloth_training_checkpointsr\r)Ú	cpu_countr8r]Ú
output_dirÚoverwrite_output_dirÚdo_trainÚdo_evalÚ
do_predictÚ
eval_strategyÚprediction_loss_onlyÚper_device_train_batch_sizeÚper_device_eval_batch_sizeÚper_gpu_train_batch_sizeÚper_gpu_eval_batch_sizeÚgradient_accumulation_stepsÚeval_accumulation_stepsÚ
eval_delayÚtorch_empty_cache_stepsÚ
learning_rateÚweight_decayÚ
adam_beta1Ú
adam_beta2Úadam_epsilonÚ
max_grad_normÚnum_train_epochsÚ	max_stepsÚlr_scheduler_typeÚwarmup_ratioÚwarmup_stepsÚ	log_levelÚlog_level_replicaÚlog_on_each_nodeÚlogging_dirÚlogging_strategyÚlogging_first_stepÚ
logging_stepsÚlogging_nan_inf_filterÚ
save_strategyÚ
save_stepsÚsave_total_limitÚsave_safetensorsÚsave_on_each_nodeÚsave_only_modelÚ'restore_callback_states_from_checkpointÚno_cudaÚuse_cpuÚuse_mps_deviceÚseedÚ	data_seedÚ
jit_mode_evalÚuse_ipexÚbf16Úfp16Úfp16_opt_levelÚhalf_precision_backendÚbf16_full_evalÚfp16_full_evalÚtf32Ú
local_rankÚddp_backendÚ
tpu_num_coresÚtpu_metrics_debugÚdebugÚdataloader_drop_lastÚ
eval_stepsÚdataloader_num_workersÚdataloader_prefetch_factorÚ
past_indexÚrun_nameÚdisable_tqdmÚremove_unused_columnsÚlabel_namesÚload_best_model_at_endÚmetric_for_best_modelÚgreater_is_betterÚignore_data_skipÚfsdpÚfsdp_min_num_paramsÚfsdp_configÚ"fsdp_transformer_layer_cls_to_wrapÚaccelerator_configÚ	deepspeedÚlabel_smoothing_factorÚoptimÚ
optim_argsÚ	adafactorÚgroup_by_lengthÚlength_column_nameÚ	report_toÚddp_find_unused_parametersÚddp_bucket_cap_mbÚddp_broadcast_buffersÚdataloader_pin_memoryÚdataloader_persistent_workersÚskip_memory_metricsÚuse_legacy_prediction_loopÚpush_to_hubÚresume_from_checkpointÚhub_model_idÚhub_strategyÚ	hub_tokenÚhub_private_repoÚhub_always_pushÚhub_revisionÚgradient_checkpointingÚgradient_checkpointing_kwargsÚinclude_inputs_for_metricsÚeval_do_concat_batchesÚfp16_backendÚpush_to_hub_model_idÚpush_to_hub_organizationÚpush_to_hub_tokenÚ
mp_parametersÚauto_find_batch_sizeÚfull_determinismÚtorchdynamoÚ	ray_scopeÚddp_timeoutÚ
torch_compileÚtorch_compile_backendÚtorch_compile_modeÚinclude_tokens_per_secondÚinclude_num_input_tokens_seenÚneftune_noise_alphaÚoptim_target_modulesÚbatch_eval_metricsÚ
eval_on_startÚuse_liger_kernelÚliger_kernel_configÚeval_use_gather_objectÚaverage_tokens_across_devicesÚ
max_lengthÚmax_prompt_lengthÚmax_completion_lengthÚdisable_dropoutÚstep_separatorÚtrain_on_last_step_onlyÚdataset_num_procrR)
ÚFloatingPointErrorÚ
OverflowErrorÚmultiprocessingryÚmaxÚsuperÚ__init__rYrZr[)�Úselfrzr{r|r}r~rr€r�r‚rƒr„r…r†r‡rˆr‰rŠr‹rŒr�rŽr�r�r‘r’r“r”r•r–r—r˜r™ršr›rœr�ržrŸr r¡r¢r£r¤r¥r¦r§r¨r©rªr«r¬rr®r¯r°r±r²r³r´rµr¶r·r¸r¹rºr»r¼r½r¾r¿rÀrÁrÂrÃrÄrÅrÆrÇrÈrÉrÊrËrÌrÍrÎrÏrÐrÑrÒrÓrÔrÕrÖr×rØrÙrÚrÛrÜrÝrÞrßràrárârãrärårærçrèrérêrërìrírîrïrðrñròrórôrõrör÷rørùrúrûrürýrþrÿrrYrZr[Úkwargsry©Ú	__class__rRrSr^sXÿþýüûúùø	÷
öõô
óòñðïîíìëêéèçæåäãâá à!ß"Þ#Ý$Ü%Û&Ú'Ù(Ø)×*Ö+Õ,Ô-Ó.Ò/Ñ0Ð1Ï2Î3Í4Ì5Ë6Ê7É8È9Ç:Æ;Å<Ä=Ã>Â?Á@ÀA¿B¾C½D¼E»FºG¹H¸I·J¶KµL´M³N²O±P°Q¯R®ST¬U«VªW©X¨Y§Z¦[¥\¤]£^¢_¡` aŸbžc�dœe›fšg™h˜i—j–k•l”m“n’o‘p�q�rŽs�tŒu‹vŠw‰xˆy‡z†{…|„}ƒ~‚��ÿ�þ�ý�ü�û�ú�ù�ø	
zUnslothPRMConfig.__init__)ŠNNFFFr\Fr8r8NNr]r]rr^r_r`rarbrcrdrer7rfrgrrhriTNrjFr<FrjrkNTFFFFFFrlrlFFFFrmrnFFNr7NNFroFNrNr7NNTNFNNFrorNNNNrprqNFFrrNNNNTFTFFNNrsNNFNFNFTrnNNNroTFNrtruFNNFFNNFFFNFTrvrwNTroFNNr7N)Ú__name__Ú
__module__Ú__qualname__Ú__doc__r,rYrrÚ__annotations__rZÚintr[rÚ
__classcell__rRrRr	rSrU3s4
þþþ�órUcsLeZdZdZddgZ												ddeeeej	fdee
deed	eed
eeee
eeffdeeeeeefdeegefd
eeege
fdeeedeejjejjjfdeeejejgejfdee
f‡fdd„
Zedd„ƒZ ‡fdd„Z!			ddeedeedeeeedffdd„Z"‡Z#S)Ú_UnslothPRMTrainerroÚtrlÚprmN©NNÚmodelÚargsÚ
data_collatorÚ
train_datasetÚeval_datasetÚprocessing_classÚ
model_initÚcompute_metricsÚ	callbacksÚ
optimizersÚpreprocess_logits_for_metricsÚpeft_configc
s(tƒs|durtdƒ‚tƒrU|durUt|tƒsUt|ddƒs#t|ddƒrSdtt t¡j	ƒv}
d|j
i}|
s?|jdur?t 
d¡n|
rK|jdurK|j|d<t|fi|¤Ž}|}|jr\t|ƒ|durbt}|duru|durntdƒ‚t||jd	�}d
|jvrñtƒ ¡�j||j|j|j|j|jdœ}i|¥ddi¥}|j|j||j|jd
t t t  d¡¡t t  d¡¡dœ¡d�}i|¥ddi¥}|durâ|j|j||j|jdt t t  d¡¡t t  d¡¡dœ¡d�}Wdƒn1sìwYt!ƒj"|||||||||	|
|d�t#|j$dƒ�r|j$ %|j&¡dSdS)NzvPEFT is not installed and you passed a `peft_config` in the trainer's kwargs, please install it to use the PEFT modelsÚis_loaded_in_8bitFÚis_quantizedràÚuse_gradient_checkpointingzÂYou passed `gradient_checkpointing_kwargs` in the trainer's kwargs, but your peft version does not support it. please update to the latest version of peft to use `gradient_checkpointing_kwargs`.z^A processing_class must be specified when using the default DataCollatorForTokenClassification)rúÚ	input_ids)Ú	tokenizerrþrúrûrürÿÚis_evalzTokenizing train datasetÚint64)Úlabelsr%)Ú	fn_kwargsÚnum_procÚremove_columnsÚdescrTzTokenizing eval dataset)rrrrrrrrrrr Úadd_model_tags)'r"Ú
ValueErrorÚ
isinstancerÚgetattrÚlistr!Ú	signaturer&Ú
parametersrßràr)ÚwarnrýrrrrúÚcolumn_namesrÚmain_process_firstrþrûrürÿÚmapÚtokenize_rowrrÚFeaturesÚSequenceÚValuerrÚhasattrrr.Ú
_tag_names)rrrrrrrrrrrr r!Ú_supports_gc_kwargsÚprepare_model_kwargsr*Útrain_fn_kwargsÚeval_fn_kwargsr	rRrSrˆs¦ÿ

ÿ
ÿ
ÿ
úþÿúþÿú€æ(õÿz_UnslothPRMTrainer.__init__c
sJˆ|ddd�d}‡fdd„|dDƒ}	|r.|s.dgt|d	ƒd
t|d	dƒg}
n	dd„|d	Dƒ}
ˆj|dd�‰‡fd
d„|	Dƒ}	dd„t|	|
ƒDƒ}
tt|	Žƒ}tt|
Žƒ}
ˆjdurhˆjg|}|durs||d…}|durƒ|d|…}|
d|…}
||}dgt|ƒ|
}
|dur |d|…}|
d|…}
||
dœS)a/	
        Tokenize a row of the dataset.

        Args:
            features (`dict[str, str]`):
                Row of the dataset, should contain the keys `"prompt"`, `"completions"`, and `"labels"`.
            tokenizer (`PreTrainedTokenizerBase`):
                Tokenizer used to process the data.
            step_separator (`str`):
                Separator between steps in the completion.
            max_length (`int` or `None`):
               Maximum length of the sequences (prompt + completion). If `None`, the sequences are not truncated.
            max_prompt_length (`int` or `None`):
                Maximum length of the prompt. If `None`, the prompt is not truncated.
            max_completion_length (`int` or `None`):
                Maximum length of the completion sequences. If `None`, the completion sequences are not truncated.
            train_on_last_step_only (`bool`):
                Whether to train only on the last step. If `True`, the labels are `-100` for all tokens except the last
                token of the completion.
            is_eval (`bool`):
                Whether the function is used to tokenize samples from a training or an evaluation dataset. Used only if
                `train_on_last_step_only` is set to `True`.

        Returns:
            `dict[str, list[int]]`:
                Tokenized sequences with the keys `"input_ids"`, and `"labels".

        Example:
        ```python
        >>> from transformers import AutoTokenizer

        >>> tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
        >>> features = {
        ...     "prompt": "Which number is larger, 9.8 or 9.11?",
        ...     "completions": ["11 is greater than 8.", "Hence, 9.11 > 9.8."],
        ...     "labels": [True, False],
        ... }
        >>> PRMTrainer.tokenize_row(
        ...     features, tokenizer, "\n", max_completion_length=None, train_on_last_step_only=False, is_eval=False
        ... )
        {'input_ids': [23085, 1372, 374, 8131, 11, 220, 24, 13, 23, 476, 220, 24, 13, 16, 16, 30, 16, 16, 374, 7046, 1091, 220, 23, 13, 198, 39, 763, 11, 220, 24, 13, 16, 16, 861, 220, 24, 13, 23, 13, 198],
         'labels': [-100, -100, -100, -100, -100, -100, -100, -100, 1, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 0]}
        ```
        ÚpromptF©Úadd_special_tokensr%csg|]
}ˆ|dd�d‘qS)FrDr%rR©Ú.0Ú
completion)r&rRrSÚ
<listcomp>7sÿz3_UnslothPRMTrainer.tokenize_row.<locals>.<listcomp>Úcompletionséœÿÿÿr)r<r7cSsg|]}t|ƒ‘qSrR)r)rGÚlabelrRrRrSrI=ócsg|]}|ˆ‘qSrRrRrF)Ú
separator_idsrRrSrIArMcSs(g|]\}}dgt|ƒd|g‘qS)rKr<)Úlen)rGrHrLrRrRrSrIDs(N)r%r))rOrÚencoder@r2rÚbos_token_id)
rr&rþrúrûrürÿr'Ú
prompt_idsÚcompletions_idsr)Úcompletion_idsr%rR)rNr&rSr9þs28
ÿ*

z_UnslothPRMTrainer.tokenize_rowcsL|jjdurt|jjƒj}n	|jj d¡d}|j|d�tƒ ||¡dS)Nú/r7)Ú
model_name)	rrÙrrzÚnameÚsplitÚcreate_model_cardrÚ_save_checkpoint)rrÚtrialrVr	rRrSrZ^s
z#_UnslothPRMTrainer._save_checkpointrVÚdataset_nameÚtagscCsä| ¡sdSt|jjdƒrtj |jjj¡s|jjj}nd}|dur&tƒ}n
t	|t
ƒr/|h}nt|ƒ}t|jjdƒr?| d¡| |j
¡t d¡}t|||j||tƒr]tjdur]tjjndd|dd�	}| tj |jjd	¡¡dS)
aî
        Creates a draft of a model card using the information available to the `Trainer`.

        Args:
            model_name (`str` or `None`, *optional*, defaults to `None`):
                Name of the model.
            dataset_name (`str` or `None`, *optional*, defaults to `None`):
                Name of the dataset used for training.
            tags (`str`, `list[str]` or `None`, *optional*, defaults to `None`):
                Tags to be associated with the model card.
        NÚ
_name_or_pathÚunsloth_versionÚunslotha²        @article{uesato2022solving,
            title        = {{Solving Math Word Problems With Process- and Outcome-Based Feedback}},
            author       = {Uesato, Jonathan and Kushman, Nate and Kumar, Ramana and Song, Francis and Siegel, Noah and Wang, Lisa and Creswell, Antonia and Irving, Geoffrey and Higgins, Irina},
            year         = 2022,
            journal      = {arXiv preprint arXiv:2211.14275}
        }ÚPRMzBSolving math word problems with process-and outcome-based feedback)	Ú
base_modelrVrÙr\r]Ú	wandb_urlÚtrainer_nameÚtrainer_citationÚpaper_titlez	README.md)Úis_world_process_zeror=rÚconfigr%ÚpathÚisdirr^Úsetr0ÚstrÚaddÚupdater>r'Údedentr rÙr#ÚwandbÚrunÚurlÚsaveÚjoinrrz)rrVr\r]rbÚcitationÚ
model_cardrRrRrSrYfs4 


÷z$_UnslothPRMTrainer.create_model_card)NNNNNNNNNrNN)NNN)$rrr
rr>rrrr$ÚModulerr
rÚdictrlrrrrrrr2rÚtupler(rÊÚ	OptimizerÚlr_schedulerÚLambdaLRrrÚstaticmethodr9rZrYrrRrRr	rSrƒsnîþýüûúÿù
öõ
ô
óïîv
_
üþýürcs8eZdZdZ											d‡fdd„	Z‡ZS)ÚUnslothPRMTrainera@	
    
    Initialize PRMTrainer.

    Args:
        model (`transformers.PreTrainedModel`):