Biograph: Hongliang Li received his Ph.D. degree in Electronics and Information Engineering from Xi’an Jiaotong University, China, in 2005. From 2005 to 2006, he joined the visual signal processing and communication laboratory (VSPC) of the Chinese University of Hong Kong (CUHK) as a Research Associate. From 2006 to 2008, he was a Postdoctoral Fellow at the same laboratory in CUHK. He is currently a Professor in the School of Electronic Engineering, University of Electronic Science and Technology of China. His research interests include image segmentation, object detection, image and video coding, visual attention, and multimedia communication system.
Dr. Li has authored or co-authored numerous technical articles in well-known international journals and conferences. He is a co-editor of a Springer book titled “Video segmentation and its applications”. Dr. Li is involved in many professional activities. He is an Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology and Journal on Visual Communications and Image Representation, and the Area Editor of Signal Processing: Image Communication, Elsevier Science. He served as a Technical Program Chair for VCIP2016 and PCM2017, General Chair of the ISPACS 2010, Publicity Chair of IEEE VCIP 2013, Local Chair of the IEEE ICME 2014, and TPC members in a number of international conferences, e.g., ICME 2013, ICME 2012, ISCAS 2013, PCM 2007, PCM 2009, and VCIP 2010. He is now a senior member of IEEE.
Title: Vision-Language Mapping for Referring Image Segmentation
Abstract: Image segmentation is a classic topic in computer vision. In recent years, referring image segmentation has attracted increasing research interest. Beyond traditional image segmentation, referring image segmentation expects to segment out an object described by a language query from an image. As it needs to correlate vision and language, referring image segmentation is a more challenging task. This talk focuses on building more accurate vision-language mappings for the referring image segmentation task. I will first present a key word extraction model. It extracts key words from the language query to suppress the noise in the query and to highlight the desired object. Next, I will present a key-word-aware visual context model. This model is designed to learn the relationships of multiple visual objects based on the language query, which is crucial to localize and segment objects in accordance with the query. Finally, I will present a query reconstruction network that builds a bidirectional vision-language mapping to confirm the vision-language consistency. Furthermore, for inconsistent segmentations and queries, an iterative segmentation correction method is proposed to correct them.