vllm.model_executor.layers.quantization.utils.marlin_utils_fp4 ¶
_nvfp4_compute_scale_factor ¶
Compute the power-of-2 scale_factor needed so that all non-zero values in marlin_scales * 2^7 are >= 2 after rescaling. Returns a Python float (power of 2, >= 1.0).
Source code in vllm/model_executor/layers/quantization/utils/marlin_utils_fp4.py
nvfp4_marlin_process_scales ¶
nvfp4_marlin_process_scales(
marlin_scales: Tensor, scale_factor: float | None = None
) -> tuple[Tensor, float]
Process NVFP4 weight scales into the special S0E5M3 format for Marlin.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
marlin_scales | Tensor | Weight scales tensor in half precision, already permuted for the Marlin kernel layout. | required |
scale_factor | float | None | Optional power-of-2 rescaling factor. If None, the factor is computed automatically so that every non-zero scale satisfies | None |
Returns:
| Type | Description |
|---|---|
tuple[Tensor, float] | A tuple of (processed_scales, scale_factor). |