F. Liu, E. Bugliarello, E. M. Ponti, S. Reddy, N. Collier, and D. Elliott. Visually Grounded Reasoning across Languages and Cultures. In Proceedings of Empirical Methods in Natural Language Proccessing (EMNLP '21). Best Long Paper Award.
S. Frank, E. Bugliarello, and D. Elliott. Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers. In Proceedings of Empirical Methods in Natural Language Proccessing (EMNLP '21).
I. Parfenova, D. Elliott, R. Fernández, and S. E. Pezzelle. Probing Cross-Modal Representations in Multi-Step Relational Reasoning. Sixth Workshop on Representation Learning for NLP.
E. Bugliarello, R. Cotterell, N. Okazaki, and D. Elliott. Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs. Accepted in Transactions of the Association of Computational Linguistics (TACL).
E. Bugliarello and D. Elliott. The Role of Syntactic Planning in Compositional Image Captioning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL '21).
T. Srinivasan, R. Sanabria, F. Metze and D. Elliott. Fine-Grained Grounding for Multimodal Speech Recognition. In Findings of Empirical Methods in Natural Language Processing (EMNLP '20).
B. Higy, D. Elliott and G. Chrupała. Textual supervision for visually grounded spoken language understanding. In Findings of Empirical Methods in Natural Language Processing (EMNLP '20).
A. Suglia, I. Konstas, A. Vanzo, E. Bastianelli, D. Elliott, S. Frank and O. Lemon. CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL '20).
L. Specia, R. Arora, L. Barrault, O. Caglayan, A. Duarte, D. Elliott, S. Gella, N. Holzenberger, C. Lala, S J Lee, J. Libovický, P. Madhyastha, F. Metze, K. Mulligan, A. Ostapenko, S. Palaskar, R. Sanabria, and J. Wang. Grounded Sequence to Sequence Transduction. IEEE Journal of Selected Topics in Signal Processing. (Email me if you have difficulties in accessing this paper.)
U. Sulubacak, O. Caglayan, S. Grönroos, A. Rouhe, D. Elliott, L. Specia, and J. Tiedemann. Multimodal Machine Translation through Visuals and Speech. Machine Translation.
M. Nikolaus, M. Abdou, M. Lamm, R. Aralikatte, and D. Elliott. Compositional Generalization in Image Captioning. In Proceedings of the 22nd Conference on Computational Natural Language (CoNLL '19), Hong Kong, China.
S. Gella, D. Elliott, and F. Keller. Cross-lingual Visual Verb Sense Disambiguation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL '19), Minneapolis, Minnesota, USA. [data]
R. Sanabria, O. Caglayan, S. Palaskar, D. Elliott, L. Barrault, L. Specia, F. Metze. How2: A Large-scale Dataset for Multimodal Language Understanding. In Proceedings of the Visually Grounded Interaction and Language Workshop (ViGiL '18), Montreal, Canada.D. Elliott. Adversarial Evaluation of Multimodal Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP '18), Brussels, Belgium.
Á. Kádár, D. Elliott, M-A. Côté, G. Chrupała, A. Alishahi. Lessons learned in multilingual grounded language learning. In Proceedings of the 21st Conference on Computational Natural Language (CoNLL '18), Brussels, Belgium.