


default search action
Haotian Zhang 0005
Person information
- affiliation: Apple AI/ML, Cupertino, CA, USA
- affiliation: University of Washington, Department of Electrical and Computer Engineering, Seattle, WA, USA
Other persons with the same name
- Haotian Zhang — disambiguation page
- Haotian Zhang 0001
— University of Waterloo, Ontario, Canada
- Haotian Zhang 0002 — Carnegie Mellon University, Pittsburgh, PA, USA
- Haotian Zhang 0003
— Shandong University of Technology, Zibo, China
- Haotian Zhang 0004
— NVIDIA, Canada (and 1 more)
Other persons with a similar name
SPARQL queries 
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
- [c19]Ruohong Zhang, Bowen Zhang, Yanghao Li, Haotian Zhang, Zhiqing Sun, Zhe Gan, Yinfei Yang, Ruoming Pang, Yiming Yang:
Improve Vision Language Model Chain-of-thought Reasoning. ACL (1) 2025: 1631-1662 - [c18]Zhengfeng Lai, Vasileios Saveris, Chen Chen, Hong-You Chen, Haotian Zhang, Bowen Zhang, Wenze Hu, Juan Lao Tebar, Zhe Gan, Peter Grasch, Meng Cao, Yinfei Yang:
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models. ICLR 2025 - [c17]Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun Li, Mohana Prasad Sathya Moorthy, Jeffrey Nichols, Yinfei Yang, Zhe Gan:
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms. ICLR 2025 - [c16]Hanrong Ye, Haotian Zhang, Erik A. Daxberger, Lin Chen, Zongyu Lin, Yanghao Li, Bowen Zhang, Haoxuan You, Dan Xu, Zhe Gan, Jiasen Lu, Yinfei Yang:
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA. ICLR 2025 - [c15]Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier, Zhengfeng Lai, Haoxuan You, Zirui Wang, et al.:
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning. ICLR 2025 - 2024
- [c14]Zhengfeng Lai
, Haotian Zhang
, Bowen Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah
, Yinfei Yang, Meng Cao:
VeCLIP: Improving CLIP Training via Visual-Enriched Captions. ECCV (42) 2024: 111-127 - [c13]Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan:
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs. ECCV (64) 2024: 240-255 - [c12]Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang:
MM1: Methods, Analysis and Insights from Multimodal LLM Pre-training. ECCV (29) 2024: 304-323 - [c11]Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang:
Ferret: Refer and Ground Anything Anywhere at Any Granularity. ICLR 2024 - [c10]Zhengfeng Lai, Haoping Bai, Haotian Zhang, Xianzhi Du, Jiulong Shan, Yinfei Yang, Chen-Nee Chuah, Meng Cao:
Empowering Unsupervised Domain Adaptation with Large-scale Pre-trained Vision-Language Models. WACV 2024: 2679-2689 - [i17]Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan:
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts. CoRR abs/2402.13220 (2024) - [i16]Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang:
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training. CoRR abs/2403.09611 (2024) - [i15]Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan:
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs. CoRR abs/2404.05719 (2024) - [i14]Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang:
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models. CoRR abs/2404.07973 (2024) - [i13]Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier, Zhengfeng Lai, Haoxuan You, Zirui Wang, Afshin Dehghan, Peter Grasch, Yinfei Yang:
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning. CoRR abs/2409.20566 (2024) - [i12]Zhengfeng Lai, Vasileios Saveris, Chen Chen, Hong-You Chen, Haotian Zhang, Bowen Zhang, Juan Lao Tebar, Wenze Hu, Zhe Gan, Peter Grasch, Meng Cao, Yinfei Yang:
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models. CoRR abs/2410.02740 (2024) - [i11]Hong-You Chen, Zhengfeng Lai, Haotian Zhang, Xinze Wang, Marcin Eichner, Keen You, Meng Cao, Bowen Zhang, Yinfei Yang, Zhe Gan:
Contrastive Localized Language-Image Pre-Training. CoRR abs/2410.02746 (2024) - [i10]Hanrong Ye, Haotian Zhang, Erik A. Daxberger, Lin Chen, Zongyu Lin, Yanghao Li, Bowen Zhang, Haoxuan You, Dan Xu, Zhe Gan, Jiasen Lu, Yinfei Yang:
MM-Ego: Towards Building Egocentric Multimodal LLMs. CoRR abs/2410.07177 (2024) - [i9]Ruohong Zhang, Bowen Zhang, Yanghao Li, Haotian Zhang, Zhiqing Sun, Zhe Gan, Yinfei Yang, Ruoming Pang, Yiming Yang:
Improve Vision Language Model Chain-of-thought Reasoning. CoRR abs/2410.16198 (2024) - [i8]Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun Li, Mohana Prasad Sathya Moorthy, Jeff Nichols, Yinfei Yang, Zhe Gan:
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms. CoRR abs/2410.18967 (2024) - 2023
- [i7]Zhengfeng Lai, Haotian Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao:
From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions. CoRR abs/2310.07699 (2023) - [i6]Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang:
Ferret: Refer and Ground Anything Anywhere at Any Granularity. CoRR abs/2310.07704 (2023) - 2022
- [c9]Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, Jianfeng Gao:
Grounded Language-Image Pre-training. CVPR 2022: 10955-10965 - [c8]Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian Harold Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Jianfeng Gao:
GLIPv2: Unifying Localization and Vision-Language Understanding. NeurIPS 2022 - [c7]Jiarui Cai, Yizhou Wang, Hung-Min Hsu, Haotian Zhang, Jenq-Neng Hwang:
DIOR: DIstill Observations to Representations for Multi-Object Tracking and Segmentation. WACV (Workshops) 2022: 520-529 - [i5]Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian Harold Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Jianfeng Gao:
GLIPv2: Unifying Localization and Vision-Language Understanding. CoRR abs/2206.05836 (2022) - 2021
- [c6]Haotian Zhang, Haorui Ji, Aotian Zheng, Jenq-Neng Hwang, Ren-Hung Hwang
:
Monocular 3D Localization of Vehicles in Road Scenes. ICCVW 2021: 2855-2864 - [c5]Yizhou Wang, Jenq-Neng Hwang, Gaoang Wang, Hui Liu, Kwang-Ju Kim
, Hung-Min Hsu, Jiarui Cai, Haotian Zhang, Zhongyu Jiang, Renshu Gu:
ROD2021 Challenge: A Summary for Radar Object Detection Challenge for Autonomous Driving Applications. ICMR 2021: 553-559 - [i4]Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, Jianfeng Gao:
Grounded Language-Image Pre-training. CoRR abs/2112.03857 (2021) - 2020
- [j1]Yanting Zhang
, Haotian Zhang
, Gaoang Wang
, Jie Yang
, Jenq-Neng Hwang:
Bundle Adjustment for Monocular Visual Odometry Based on Detections of Traffic Signs. IEEE Trans. Veh. Technol. 69(1): 151-162 (2020) - [i3]Jiarui Cai, Yizhou Wang, Haotian Zhang, Hung-Min Hsu, Chengqian Ma, Jenq-Neng Hwang:
IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency. CoRR abs/2006.13458 (2020)
2010 – 2019
- 2019
- [c4]Longyin Wen, Yue Zhang, Liefeng Bo, Hailin Shi, Rui Zhu, Ajit Jadhav, Bing Dong, Brejesh Lall, Chang Liu
, Chunhui Zhang, Dong Wang, Pengfei Zhu, Feng Ni, Filiz Bunyak
, Gaoang Wang, Guizhong Liu, Guna Seetharaman, Guorong Li, Håkan Ardö, Haotian Zhang, Hongyang Yu, Huchuan Lu, Dawei Du, Jenq-Neng Hwang, Jiatong Mu, Jinrong Hu, Kannappan Palaniappan, Long Chen, Lu Ding, Martin Lauer
, Mikael G. Nilsson, Noor M. Al-Shakarji, Prerana Mukherjee, Xiao Bian, Qingming Huang, Robert Laganière, Shuhao Chen, Siyang Pan, Vinay Kaushik
, Wei Shi, Wei Tian, Weiqiang Li, Xin Chen, Xinyu Zhang, Haibin Ling, Yanting Zhang, Yanyun Zhao, Yong Wang, Yuduo Song, Yuehan Yao, Zhaotang Chen, Zhenyu Xu, Zhibin Xiao, Zhihang Tong, Zhipeng Luo, Qinghua Hu, Zhuojin Sun, Jiayu Zheng, Tao Peng, Xinyao Wang:
VisDrone-MOT2019: The Vision Meets Drone Multiple Object Tracking Challenge Results. ICCV Workshops 2019: 189-198 - [c3]Yanting Zhang, Jie Yang, Haotian Zhang, Jenq-Neng Hwang:
Bundle Adjustment for Monocular Visual Odometry Based on Detected Traffic Sign Features. ICIP 2019: 4350-4354 - [c2]Gaoang Wang, Yizhou Wang
, Haotian Zhang, Renshu Gu, Jenq-Neng Hwang:
Exploit the Connectivity: Multi-Object Tracking with TrackletNet. ACM Multimedia 2019: 482-490 - [c1]Haotian Zhang, Gaoang Wang, Zhichao Lei, Jenq-Neng Hwang:
Eye in the Sky: Drone-Based Object Tracking and 3D Localization. ACM Multimedia 2019: 899-907 - [i2]Haotian Zhang, Gaoang Wang, Zhichao Lei, Jenq-Neng Hwang:
Eye in the Sky: Drone-Based Object Tracking and 3D Localization. CoRR abs/1910.08259 (2019) - 2018
- [i1]Gaoang Wang, Yizhou Wang, Haotian Zhang, Renshu Gu, Jenq-Neng Hwang:
Exploit the Connectivity: Multi-Object Tracking with TrackletNet. CoRR abs/1811.07258 (2018)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from ,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-10-08 23:32 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint