vllm.utils.multi_stream_utils ¶
maybe_execute_in_parallel ¶
maybe_execute_in_parallel(
fn0: Callable[[], Any],
fn1: Callable[[], Any],
event0: Event,
event1: Event,
aux_stream: Stream | None = None,
) -> tuple[Any, Any]
Run two functions potentially in parallel on separate CUDA streams.
When aux_stream is provided, fn0 runs on the current (default) stream and fn1 runs on aux_stream, synchronized via CUDA events. When aux_stream is None, both functions execute sequentially on the current stream.
This design follows TensorRT-LLM's maybe_execute_in_parallel pattern (tensorrt_llm/_torch/modules/multi_stream_utils.py).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fn0 | Callable[[], Any] | Callable for the default stream. | required |
fn1 | Callable[[], Any] | Callable for the auxiliary stream. | required |
event0 | Event | CUDA event recorded before fn0 so aux_stream can wait. | required |
event1 | Event | CUDA event recorded after fn1 so default stream can wait. | required |
aux_stream | Stream | None | The second CUDA stream for fn1. Multi-stream is disabled when aux_stream is None. | None |
Returns:
| Type | Description |
|---|---|
tuple[Any, Any] | Tuple of (fn0_result, fn1_result). |