-
Notifications
You must be signed in to change notification settings - Fork 262
Open
Description
I am using MMseqs2-GPU for large-scale MSA generation (millions of queries) and would like to better understand GPU behavior so I can estimate runtime and GPU-hours accurately.
- CPU --threads vs GPU threads
- Can you confirm that --threads controls only CPU threads and does not affect GPU threads?
- GPU cores / SMs / thread-groups
- The GPU paper states that MMseqs2-GPU uses thread-groups of size 4, 8, 16, or 32, each processing one alignment/tile. How is thread-group size chosen internally? Does it depend on query length, target length, tile size, GPU architecture, or a fixed heuristic?
- Does MMseqs2-GPU always attempt to fully occupy all SMs, or can some SMs remain idle depending on sequence lengths or batch size?
- Runtime / TCUPS estimation in practice
- The GPU paper reports TCUPS using synthetic databases where query and target lengths match. For real databases (e.g., UniRef30 2023, envDB 202108), is there a recommended method to estimate runtime using TCUPS?
- VRAM usage and host-memory streaming
- When MMseqs2-GPU requires more VRAM than what is available, does it spill to host memory?
- Best practices for large-batch GPU searches
- Are there recommended batch sizes to maximize GPU utilization and for fastest MSA computation?
- Aside from --gpu and --threads, are there GPU-specific user specified parameters?
Thank you very much! Having clarity on these points would be extremely helpful for accurate GPU resource planning.
Metadata
Metadata
Assignees
Labels
No labels