Hi VGLLM team, thanks a lot for the excellent work!
I’m currently experimenting with introducing VGGT into another base model. In my setup:
Base model: nvila-8b
Data: VLM3R’s vsi dataset only
Settings:
• On the source model, I enabled tuning_mm_mlp (i.e., MM projector in the code) and LoRA on the LMM backbone.
• On the VGGT-integrated model, I enabled tuning_mm_mlp, the fusion module, and LoRA on the backbone.
• I have tried multiple fusion strategies.
Observation:
Under these settings, the model with VGGT consistently underperforms the source model by a large margin (about 1–7 points).
Question:
Have you tried enabling tuning_mm_mlp in your experiments? Would this observation imply that fine-tuning the vision encoder might yield better results than introducing VGGT?
Any insights into this phenomenon would be greatly appreciated.
Next steps:
I’m currently also running experiments where the projector is frozen and strictly aligned with the same fusion module (as in your paper’s setting), and I will update the results once they are ready.
Thanks in advance for the community’s help!
Hi VGLLM team, thanks a lot for the excellent work!
I’m currently experimenting with introducing VGGT into another base model. In my setup:
Base model: nvila-8b
Data: VLM3R’s vsi dataset only
Settings:
• On the source model, I enabled tuning_mm_mlp (i.e., MM projector in the code) and LoRA on the LMM backbone.
• On the VGGT-integrated model, I enabled tuning_mm_mlp, the fusion module, and LoRA on the backbone.
• I have tried multiple fusion strategies.
Observation:
Under these settings, the model with VGGT consistently underperforms the source model by a large margin (about 1–7 points).
Question:
Have you tried enabling tuning_mm_mlp in your experiments? Would this observation imply that fine-tuning the vision encoder might yield better results than introducing VGGT?
Any insights into this phenomenon would be greatly appreciated.
Next steps:
I’m currently also running experiments where the projector is frozen and strictly aligned with the same fusion module (as in your paper’s setting), and I will update the results once they are ready.
Thanks in advance for the community’s help!