Descriptif
Automatic character identification in multimedia videos is an extensive and challenging problem. Person identities can serve as foundation and building block for many higher level video analysis tasks, for example semantic indexing, search and retrieval, interaction analysis and video summarization. The goal of this project is to exploit textual, audio and video information to automatically identify characters in TV series and movies without requiring any manual annotation for training character models. A fully automatic and unsupervised approach is especially appealing when considering the huge amount of available multimedia data (and its growth rate). Text, audio and video provide complementary cues to the identity of a person, and thus allow to better identify a person than from either modality alone.
In this context, LIMSI (www.limsi.fr) proposes two projects, focusing on two different aspects of this multimodal problem. Depending on the outcome of the internship, both projects may lead to a PhD scholarship (one funding is already secured).