Hi There!

I am currently a 3rd-year Ph.D. candidate at the Hong Kong Polytechnic University (PolyU) supervised by Prof. Wangmeng Zuo and Prof. Lei Zhang. Before that, I received my B.E. degree in Computer Science and Technology from Harbin Institute of Technology (HIT) in 2022. I was exceptionally admitted to HIT in 2017, exempt from the entrance examination due to my outstanding performance in the National Olympiad in Informatics (NOI).

Currently, I am a research intern at Microsoft Generative AI (MS GenAI). I also worked as a research intern at Microsoft Research Asia (MSRA) from 2020 to 2024.

I have published 14 top-tier conference/journal papers. My research interests is the collaboration in multimodal generation and understanding such as LLMs, VLMs, and Diffusion Models. Recently, I am exploring in the fields of AI ethics and cognition.

Selected Publication

  1. M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
    Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Dongdong Zhang, Nan Duan
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  2. ImaginaryNet: Learning Object Detectors without Real Images and Annotations
    Minheng Ni, Zitong Huang, Kailai Feng, Wangmeng Zuo
    Proceedings of the International Conference on Learning Representations (ICLR), 2023.
  3. NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN
    Minheng Ni, Xiaoming Li, Wangmeng Zuo
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  4. ORES: Open-vocabulary Responsible Visual Synthesis
    Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan
    Proceedings of the AAAI conference on Artificial Intelligence (AAAI), 2023.
  5. Responsible Visual Editing
    Minheng Ni, Yeli Shen, Lei Zhang, Wangmeng Zuo
    Proceedings of the European Conference on Computer Vision (ECCV), 2024.
  6. Ref-diff: Zero-shot Referring Image Segmentation with Generative Models
    Minheng Ni, Yabo Zhang, Kailai Feng, Xiaoming Li, Yiwen Guo, Wangmeng Zuo
    ArXiv Preprint, 2023.
  7. AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
    Minheng Ni, Chenfei Wu, Huaying Yuan, Zhengyuan Yang, Ming Gong, Lijuan Wang, Zicheng Liu, Wangmeng Zuo, Nan Duan
    ArXiv Preprint, 2024.
  8. Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
    Minheng Ni, Yutao Fan, Lei Zhang, Wangmeng Zuo
    ArXiv Preprint, 2024.