Dyn-Bench
A benchmark for spatio-temporal dynamics reasoning in the physical 4D world.
Dyn-Bench studies how multimodal large language models perceive, track, and reason about dynamic contents in the physical 4D world. It evaluates spatio-temporal understanding through dynamic inter-object perception, object-scene tracking, and camera-object reasoning, with paired grounding tasks for dynamic objects.
The project provides a large-scale benchmark, detailed evaluation pipeline, model comparisons, and a public dataset for analyzing dynamic visual reasoning in MLLMs.
Project page / Paper / Code / Dataset