Biograph: Yuxin Peng is a Professor with Wangxuan Institute of Computer Technology in Peking University, the Chief Scientist with 863 Project, the chairman of Expert Committee of China Artificial Intelligence Industry Innovation Alliance, and the expert in Committee of “Artificial Intelligence 2.0” Planning Experts for Chinese Academy of Engineering, etc. He has authored more than 150 papers in refereed international journals and conference proceedings, including 70 papers on IJCV, TIP, TCSVT, TMM, TCYB, TOMM, ACM MM, ICCV, CVPR, IJCAI, and AAAI. He has submitted 41 patent applications and authorized 24 of them. His current research interests mainly include cross-media intelligence, image and video analysis and retrieval, and computer vision. He serves as a member of the editorial boards in some journals as IEEE TCSVT. He led his team to win the first place in video instance search evaluation of TRECVID in the recent years. He presided over the development of Internet multi-modal content analysis and recognition system, and was the recipient of the First Prize of the Beijing Science and Technology Award in 2016 (ranking first).
Title: Cross-media Intelligence: Representation, Analysis, and Application
Abstract: With the rapid development of multimedia and Internet technologies, we have witnessed a dramatic growth of multimedia big data, such as image, video, text and audio. The data is multi-source, heterogeneous and correlated, which brings the cross-media and cross-source challenges for data representation, information retrieval, knowledge discovery, reasoning and decision-making. Therefore, how to simulate the human brain to recognize the outside world with sensory organs such as visual and auditory senses, has become significant to improve the intelligent level of computer. This talk first introduces the tasks and goals of cross-media intelligence in “AI 2.0” from Chinese Academy of Engineering. Then we will introduce our relevant research progress, including fine-grained image classification, cross-media retrieval, cross-media generation (text-to-image generation and video caption), and finally present the system demonstration.