publications

2026

  1. spatialstack.gif
    SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
    Jian Zhang*, Shijie Zhou*, Bangya Liu*, Achuta Kadambi, and Zhiwen Fan
    In CVPR, 2026
  2. vlm3r.gif
    VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
    Zhiwen Fan*, Jian Zhang*, Renjie Li, Junge Zhang, Runjin Chen, Hezhen Hu, Kevin Wang, Huaizhi Qu, Shijie Zhou, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Tianlong Chen, Jiachen Li, Zhengzhong Tu, Zhangyang Wang, and Rakesh Ranjan
    In CVPR, 2026
  3. dynbench_teaser.jpg
    Thinking in Dynamics: How Multimodal Large Language Models Perceive, Track, and Reason Dynamics in Physical 4D World
    Yuzhi Huang*, Kairun Wen*, Rongxin Gao*, Dongxuan Liu, Yibin Lou, Jie Wu, Jing Xu, Jian Zhang, Zheng Yang, Yunlong Lin, Chenxin Li, Panwang Pan, Junbin Lu, Jingyan Jiang, Xinghao Ding, Yue Huang, and Zhi Wang
    In CVPR, 2026

2025

  1. dynamicverse.gif
    DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
    Kairun Wen, Yuzhi Huang, Runyu Chen, Hui Zheng, Yunlong Lin, Panwang Pan, Chenxin Li, Wenyan Cong, Jian Zhang, Junbin Lu, Chenguo Lin, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Yue Huang, Xinghao Ding, Rakesh Ranjan, and Zhiwen Fan
    In NeurIPS, 2025

2024

  1. lsm.gif
    Large spatial model: End-to-end unposed images to semantic 3d
    Zhiwen Fan*, Jian Zhang*, Wenyan Cong, Peihao Wang, Renjie Li, Kairun Wen, Shijie Zhou, Achuta Kadambi, Zhangyang Wang, Danfei Xu, Boris Ivanovic, Marco Pavone, and Yue Wang
    In NeurIPS, 2024
  2. instantsplat.gif
    Instantsplat: Unbounded sparse-view pose-free gaussian splatting in 40 seconds
    Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, Zhangyang Wang, and Yue Wang
    arXiv preprint, 2024