publications

2026

  1. spatialstack.gif
    SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
    Jian Zhang*, Shijie Zhou*, Bangya Liu*, and 2 more authors
    In CVPR, 2026
  2. vlm3r.gif
    VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
    Zhiwen Fan*, Jian Zhang*, Renjie Li, and 8 more authors
    In CVPR, 2026
  3. dynbench_teaser.jpg
    Thinking in Dynamics: How Multimodal Large Language Models Perceive, Track, and Reason Dynamics in Physical 4D World
    Yuzhi Huang*, Kairun Wen*, Rongxin Gao*, and 14 more authors
    In CVPR, 2026

2025

  1. dynamicverse.gif
    DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
    Kairun Wen, Runyu Chen, Hui Zheng, and 8 more authors
    In NeurIPS, 2025

2024

  1. lsm.gif
    Large spatial model: End-to-end unposed images to semantic 3d
    Zhiwen Fan*, Jian Zhang*, Wenyan Cong, and 8 more authors
    In NeurIPS, 2024
  2. instantsplat.gif
    Instantsplat: Unbounded sparse-view pose-free gaussian splatting in 40 seconds
    Zhiwen Fan, Wenyan Cong, Kairun Wen, and 8 more authors
    arXiv preprint, 2024