Skip to content

Clarification on GPU thread-group sizing, GPU utilization, and runtime estimation in MMseqs2-GPU #1062

@slee-ai

Description

@slee-ai

I am using MMseqs2-GPU for large-scale MSA generation (millions of queries) and would like to better understand GPU behavior so I can estimate runtime and GPU-hours accurately.

  1. CPU --threads vs GPU threads
  • Can you confirm that --threads controls only CPU threads and does not affect GPU threads?
  1. GPU cores / SMs / thread-groups
  • The GPU paper states that MMseqs2-GPU uses thread-groups of size 4, 8, 16, or 32, each processing one alignment/tile. How is thread-group size chosen internally? Does it depend on query length, target length, tile size, GPU architecture, or a fixed heuristic?
  • Does MMseqs2-GPU always attempt to fully occupy all SMs, or can some SMs remain idle depending on sequence lengths or batch size?
  1. Runtime / TCUPS estimation in practice
  • The GPU paper reports TCUPS using synthetic databases where query and target lengths match. For real databases (e.g., UniRef30 2023, envDB 202108), is there a recommended method to estimate runtime using TCUPS?
  1. VRAM usage and host-memory streaming
  • When MMseqs2-GPU requires more VRAM than what is available, does it spill to host memory?
  1. Best practices for large-batch GPU searches
  • Are there recommended batch sizes to maximize GPU utilization and for fastest MSA computation?
  • Aside from --gpu and --threads, are there GPU-specific user specified parameters?

Thank you very much! Having clarity on these points would be extremely helpful for accurate GPU resource planning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions