Hi There!
I am currently a 3rd-year Ph.D. candidate at the Hong Kong Polytechnic University (PolyU) supervised by Prof. Wangmeng Zuo and Prof. Lei Zhang. Before that, I received my B.E. degree in Computer Science and Technology from Harbin Institute of Technology (HIT) in 2022. I was exceptionally admitted to HIT in 2017, exempt from the entrance examination due to my outstanding performance in the National Olympiad in Informatics (NOI).
Currently, I am a research intern at Microsoft Generative AI (MS GenAI). I also worked as a research intern at Microsoft Research Asia (MSRA) from 2020 to 2024.
I have published 14 top-tier conference/journal papers. My research interests is the collaboration in multimodal generation and understanding such as LLMs, VLMs, and Diffusion Models. Recently, I am exploring in the fields of AI ethics and cognition.
Selected Publication
- M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Dongdong Zhang, Nan Duan
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. - ImaginaryNet: Learning Object Detectors without Real Images and Annotations
Minheng Ni, Zitong Huang, Kailai Feng, Wangmeng Zuo
Proceedings of the International Conference on Learning Representations (ICLR), 2023. - NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN
Minheng Ni, Xiaoming Li, Wangmeng Zuo
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. - ORES: Open-vocabulary Responsible Visual Synthesis
Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan
Proceedings of the AAAI conference on Artificial Intelligence (AAAI), 2023. - Responsible Visual Editing
Minheng Ni, Yeli Shen, Lei Zhang, Wangmeng Zuo
Proceedings of the European Conference on Computer Vision (ECCV), 2024. - Ref-diff: Zero-shot Referring Image Segmentation with Generative Models
Minheng Ni, Yabo Zhang, Kailai Feng, Xiaoming Li, Yiwen Guo, Wangmeng Zuo
ArXiv Preprint, 2023. - AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Minheng Ni, Chenfei Wu, Huaying Yuan, Zhengyuan Yang, Ming Gong, Lijuan Wang, Zicheng Liu, Wangmeng Zuo, Nan Duan
ArXiv Preprint, 2024. - Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Minheng Ni, Yutao Fan, Lei Zhang, Wangmeng Zuo
ArXiv Preprint, 2024.