Laboratory for Multimodal Processing

LAMP focuses on multimodal machine learning at the intersection of natural language processing and computer vision. Our recent work includes image captioning, multimodal translation, learning from videos, and multilingual multimodal representation learning. We collaborate closely with members of CoAStaL.


M. Nikolaus, M. Abdou, M. Lamm, R. Aralikatte, and D. Elliott. Compositional Generalization in Image Captioning. To appear in Proceedings of the 22nd Conference on Computational Natural Language (CoNLL '19), Hong Kong, China.

S. Gella, D. Elliott, and F. Keller. Cross-lingual Visual Verb Sense Disambiguation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL '19), Minneapolis, Minnesota, USA. [data]

R. Sanabria, O. Caglayan, S. Palaskar, D. Elliott, L. Barrault, L. Specia, F. Metze. How2: A Large-scale Dataset for Multimodal Language Understanding. In Proceedings of the Visually Grounded Interaction and Language Workshop (ViGiL '18), Montreal, Canada.

D. Elliott. Adversarial Evaluation of Multimodal Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP '18), Brussels, Belgium.

Á. Kádár, D. Elliott, M-A. Côté, G. Chrupała, A. Alishahi. Lessons learned in multilingual grounded language learning. In Proceedings of the 21st Conference on Computational Natural Language (CoNLL '18), Brussels, Belgium.