Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?

This is the official repository of our paper “Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?” by Jiahe Jin, Yanheng He, and Mingyan Yang.

Abstract

In this work, we identify the "2D-Cheating" problem in 3D LLM evaluation, where these tasks might be easily solved by VLMs with rendered images of point clouds, exposing ineffective evaluation of 3D LLMs' unique 3D capabilities. We test VLM performance across multiple 3D LLM benchmarks and, using this as a reference, propose principles for better assessing genuine 3D understanding. We also advocate explicitly separating 3D abilities from 1D or 2D aspects when evaluating 3D LLMs.

Usage

We conducted experiments on the above benchmarks. See the commands for reproducing each experiment as follows:

3D MM-Vet

Generating Results

python ./src/object/3dmmvet/inference.py

Evaluating Results

python ./src/object/3dmmvet/eval.py

ObjaverseXL-LVIS Caption

Generating Results

python ./src/object/objaverseXL-LVIS_caption/vlm3d.py

Evaluating Results

python ./src/object/objaverseXL-LVIS_caption/evaluate.py

Rendering Scene Point Cloud

Render BEV Images

python ./src/scene/render/parallel_render_bev.py

Render Multi View Images

python ./src/scene/render/parallel_render_multi.py

ScanQA

Generating Results

Generate BEV Results

python ./src/scene/evaluation/scanqa/generate/vlm3d.py

Generate Multi View Results

bash ./src/scene/evaluation/scanqa/generate/generate.sh

Evaluating Results

Evaluate Single View Results

python ./src/scene/evaluation/scanqa/evaluation/test.py

Evaluate HIS Results

python ./src/scene/evaluation/scanqa/evaluation/test_HIS.py

Evaluate BoN Results

python ./src/scene/evaluation/scanqa/evaluation/test_BoN.py

SQA3D

To test VLM’s performance on SQA3D, run the following command:

python ./src/scene/evaluation/sqa3d/test_sqa_vlm.py

Acknowledgement

We would like to express our sincere gratitude to Prof. Yonglu Li for his valuable guidance and support throughout this research, from topic selection to the final writing. His insightful discussions and feedback have been essential to the completion of this work. We would also like to thank Ye Wang for kindly sharing the viewpoint dataset in ScanNet.

Data Attribution

This project uses data from:

GPT4Point (MIT License)
ShapeLLM (Apache 2.0 License)

Citation

If you find this work useful, please cite our paper:

@article{revisit3dllmbenchmark,
  author       = {Jiahe Jin and Yanheng He and Mingyan Yang},
  title        = {Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?},
  year         = {2025},
  journal      = {arXiv preprint arXiv:2502.08503},
  url          = {https://arxiv.org/abs/2502.08503}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?

Abstract

Usage

3D MM-Vet

Generating Results

Evaluating Results

ObjaverseXL-LVIS Caption

Generating Results

Evaluating Results

Rendering Scene Point Cloud

Render BEV Images

Render Multi View Images

ScanQA

Generating Results

Generate BEV Results

Generate Multi View Results

Evaluating Results

Evaluate Single View Results

Evaluate HIS Results

Evaluate BoN Results

SQA3D

Acknowledgement

Data Attribution

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?

Abstract

Usage

3D MM-Vet

Generating Results

Evaluating Results

ObjaverseXL-LVIS Caption

Generating Results

Evaluating Results

Rendering Scene Point Cloud

Render BEV Images

Render Multi View Images

ScanQA

Generating Results

Generate BEV Results

Generate Multi View Results

Evaluating Results

Evaluate Single View Results

Evaluate HIS Results

Evaluate BoN Results

SQA3D

Acknowledgement

Data Attribution

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages