Dyn-Bench | Jian Zhang

Dyn-Bench studies how multimodal large language models perceive, track, and reason about dynamic contents in the physical 4D world. It evaluates spatio-temporal understanding through dynamic inter-object perception, object-scene tracking, and camera-object reasoning, with paired grounding tasks for dynamic objects.

The project provides a large-scale benchmark, detailed evaluation pipeline, model comparisons, and a public dataset for analyzing dynamic visual reasoning in MLLMs.

Project page / Paper / Code / Dataset