Hi There!

I am currently a final-year Ph.D. candidate at The Hong Kong Polytechnic University (PolyU), supervised by Prof. Wangmeng Zuo and Prof. Lei Zhang, IEEE Fellow. I have been elected as the one of the seven AAAI Student Committee Members since 2024. I was awarded the China Association for Science and Technology (CAST) Youth Talent Support Project (中国科协青年人才托举工程博士专项).

I have been a research intern at Microsoft since July 2020 working closely with Dr. Lijuan Wang, Dr. Zhengyuan Yang, and Dr. Zicheng Liu, IEEE Fellow in at Microsoft CoreAI from May 2024, and with Dr. Nan Duan at Microsoft Research Asia (MSRA) from July 2020 to April 2024. I received my B.E. degree in Computer Science and Technology from Harbin Institute of Technology (HIT) in June 2022. I was exceptionally admitted to HIT in 2017, exempt from the entrance examination due to my outstanding performance in the National Olympiad in Informatics (NOI).

I have published over 20 papers in top-tier conferences and journals. My research interests are the reasoning and cognition of large multimodal models.

I expect to graduate in mid 2026 and am actively seeking full-time research opportunities in the AI academia or industry. Please contact me if my background aligns with your interests. My CV is available upon request.

Selected Publication

Published

  1. M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
    Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Dongdong Zhang, Nan Duan
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  2. ImaginaryNet: Learning Object Detectors without Real Images and Annotations
    Minheng Ni, Zitong Huang, Kailai Feng, Wangmeng Zuo
    Proceedings of the International Conference on Learning Representations (ICLR), 2023.
  3. NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN
    Minheng Ni, Xiaoming Li, Wangmeng Zuo
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  4. ORES: Open-vocabulary Responsible Visual Synthesis
    Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan
    Proceedings of the AAAI conference on Artificial Intelligence (AAAI), 2023.
  5. Responsible Visual Editing
    Minheng Ni, Yeli Shen, Lei Zhang, Wangmeng Zuo
    Proceedings of the European Conference on Computer Vision (ECCV), 2024.
  6. Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
    Minheng Ni, Yutao Fan, Lei Zhang, Wangmeng Zuo
    Proceedings of the International Conference on Learning Representations (ICLR), 2025.
  7. Ref-diff: Zero-shot Referring Image Segmentation with Generative Models
    Minheng Ni, Yabo Zhang, Kailai Feng, Xiaoming Li, Yiwen Guo, Wangmeng Zuo
    Science China Information Sciences (SCIS), 2025.
  8. Robot be Harmful: Responsible Robotic Manipulation via Safety-as-Policy
    Minheng Ni, Lei Zhang, Zihan Chen, Kaixin Bai, Zhaopeng Chen, Jianwei Zhang, Lei Zhang, Wangmeng Zuo
    IEEE Robotics and Automation Letters (RA-L), 2025.
  9. Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
    Minheng Ni, Zhengyuan Yang, Linjie Li, Chung-Ching Lin, Kevin Lin, Wangmeng Zuo, Lijuan Wang
    Advances in Neural Information Processing Systems (NeurIPS), 2025.

Pre-print

  1. AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
    Minheng Ni, Chenfei Wu, Huaying Yuan, Zhengyuan Yang, Ming Gong, Lijuan Wang, Zicheng Liu, Wangmeng Zuo, Nan Duan
    ArXiv Preprint, 2024.
  2. Measurement of LLM’s Philosophies of Human Nature
    Minheng Ni, Ennan Wu, Zidong Gong, Zhengyuan Yang, Linjie Li, Chung-Ching Lin, Kevin Lin, Lijuan Wang, Wangmeng Zuo
    ArXiv Preprint, 2025.