Skip to content

vllm.model_executor.models.skyworkr1v

SkyworkR1VImageEmbeddingInputs

Bases: TensorSchema

Dimensions
  • ni: Number of images
  • ifs: Image feature size
  • hs: Hidden size (must match the hidden size of language model backbone)
Source code in vllm/model_executor/models/skyworkr1v.py
class SkyworkR1VImageEmbeddingInputs(TensorSchema):
    """
    Dimensions:
        - ni: Number of images
        - ifs: Image feature size
        - hs: Hidden size (must match the hidden size of language model
          backbone)
    """

    type: Literal["image_embeds"] = "image_embeds"

    data: Annotated[
        torch.Tensor | list[torch.Tensor],
        TensorShape("ni", "ifs", "hs"),
    ]

SkyworkR1VImagePixelInputs

Bases: TensorSchema

Dimensions
  • bnp: Batch size * number of images * (1 + num_patches)
  • c: Number of channels (3)
  • h: Height
  • w: Width
  • bn: Batch size * number of images
Source code in vllm/model_executor/models/skyworkr1v.py
class SkyworkR1VImagePixelInputs(TensorSchema):
    """
    Dimensions:
        - bnp: Batch size * number of images * (1 + num_patches)
        - c: Number of channels (3)
        - h: Height
        - w: Width
        - bn: Batch size * number of images
    """

    type: Literal["pixel_values"] = "pixel_values"

    pixel_values_flat: Annotated[
        torch.Tensor,
        TensorShape("bnp", 3, "h", "w"),
    ]

    num_patches: Annotated[
        torch.Tensor,
        TensorShape("bn"),
    ]