Jian Zhang

Master's student at Xiamen University

personal/jian_zhang.jpg

I am a master’s student at Xiamen University working on geometric reasoning for 3D vision, embodied intelligence, and AI-native content creation. My research goal is to build systems that understand the dynamic world from sparse observations and can assist humans with reliable spatial perception.

I recently co-developed projects such as VLM-3R, Large Spatial Model (NeurIPS 2024), DynamicVerse (NeurIPS 2025), and InstantSplat, where we combine large multimodal models with geometric priors to enable end-to-end semantic reconstruction, world modeling, and fast Gaussian splatting. These collaborations often involve partners from both academia and industry, and I actively maintain the associated open-source artifacts to keep our results reproducible.

selected publications

  1. vlm3r.gif
    VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
    Zhiwen Fan*, Jian Zhang*, Renjie Li, and 8 more authors
    arXiv preprint, 2025
  2. lsm.gif
    Large spatial model: End-to-end unposed images to semantic 3d
    Zhiwen Fan*, Jian Zhang*, Wenyan Cong, and 8 more authors
    In NeurIPS, 2024