Desmond Elliott at the University of Copenhagen

Preprints

E. Zaranis, A. Farinhas, S. Santos, B. Canaverde, M. M. Ramos, A. K. Surikuchi, A. Viveiros, B. Liao, E. Bueno-Benito, N. Sivakumaran, P. Vasylenko, S. Yu, S. Sannigrahi, W. Mohammed, B. Peters, D. S. Villegas, E. Stengel-Eskin, G. Attanasio, J. Yoon, S. Frank, A. Suglia, C. Zerva, D. Elliott, M. Dimiccoli, M. Bansal, O. Lanz, R. Bernardi, R. Fernández, S. Pezzelle, V. Niculae, and A. F. T. Martins. Movie Facts and Fibs (MF): A Benchmark for Long Movie Understanding.

K. Dobler, D. Elliott, and G. de Melo. AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings.

I. Salazar, M. Fernández Burda, S. Bin Islam, A. Soltani Moakhar, S. Singh, F. Farestam, A. Romanou, D. Boiko, D. Khullar, M. Zhang, D. Krzemiński, J. Novikova, L. Shimabucoro, J. Marvin Imperial, R. Maheshwary, S. Duwal, A. Amayuelas, S. Rajwal, J. Purbey, A. Ruby, N. Popovič, M. Suppa, A. Toushik Wasi, R. Mohan Rao Kadiyala, O. Tsymboi, M. Kostritsya, B. Soltani Moakhar, G. da Costa Merlin, O. Ferracioli Coletti, M. Jabbari Shiviari, M. farahani fard, S. Fernandez, M. Grandury, D. Abulkhanov, D. Sharma, A. Guarnier De Mitri, L. Bossatto Marchezi, S. Heydari, J. Obando-Ceron, N. Kohut, B. Ermis, D. Elliott, E. Ferrante, S. Hooker, and M. Fadaee. Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

D. S. Villegas, I. Ziegler, and D. Elliott. ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models.

V. Beliveau, H. Kaas, M. Prener, C. Ladefoged, D. Elliott, G. M. Knudsen, L. H. Pinborg, and M. Ganz. Classification of Radiological Text in Small and Imbalanced Datasets in a Non-English Language.

2025

I. Kesen, J. F. Lotz, I. Ziegler, P. Rust, D. Elliott. Multilingual Pretraining for Pixel Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP '25).

I. Ziegler, A. Köksal, D. Elliott, and H. Schuetze. CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation. Transactions of the Association of Computational Linguistics (TACL).

L. de Grazia, P. Pastells, M. V. Chas, D. Elliott, D. S. Villegas, M. Farrús, and M. Taulé. MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos. In Proceedings of the Second Conference on Language Modeling (COLM).

A. Bavaresco, R. Bernardi, L. Bertolazzi, D. Elliott, R. Fernández, A. Gatt, E. Ghaleb, M. Giulianelli, M. Hanna, A. Koller, and A. F. Martins. LLMs instead of Human Judges? In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL'25).

N. Borenstein, G. Warren, D. Elliott, and I. Augenstein. Can Community Notes Replace Professional Fact-Checkers? In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL'25).

D. Oneata, D. Elliott, and S. Frank. Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era. In Findings of the Association of Computational Linguistics: ACL 2025. Best Paper Honorable Mention at CVPR Visual Concepts Workshop.

C. Fierro, N. Foroutan, D. Elliott, and A. Søgaard. How Do Multilingual Language Models Remember Facts? In Findings of the Association of Computational Linguistics: ACL 2025.

N. Horn and D. Elliott. Tracking Universal Features Through Fine-Tuning and Model Merging. Proceedings of the 10th Workshop on Representation Learning for NLP.

A. Schiavone, L. M. Pehrson, S. Ingala, R. Bonnevie, M. Fraccaro, D. Li, M. B. Nielsen, and D. Elliott. Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study. AI.

2024

I. Ziegler, A. Köksal, D. Elliott, and H. Schuetze. CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation. NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability.

W. Li, C. Zhang, J. Li, Q. Peng, R. Tang, L. Zhou, W. Zhang, G. Hu, Y. Yuan, A. Søgaard, D. Hershcovich, and D. Elliott. FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP '24).

W. Li, J. Li, R. Ramos, R. Tang, and D. Elliott. Understanding Retrieval Robustness for Retrieval-augmented Image Captioning. In Proceedings of the 62nd Annual Meeting of the Association of Computational Linguistics (ACL '24).

V. Beliveau, H. Kaas, M. Prener, C. Ladefoged, D. Elliott, G. M. Knudsen, L. H. Pinborg, and M. Ganz. Classification of Medical Text in Small and Imbalanced Datasets in a Non-English Language. Medical Imaging with Deep Learning 2024.

S. Yagcioglu, O. B. İnce, A. Erdem, E. Erdem, D. Elliott, and D. Yuret. Sequential Compositional Generalization in Multimodal Models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL '24).

R. Ramos, E. Bugliarello, B. Martins, and D. Elliott. PAELLA: Parameter-Efficient Lightweight Language-Agnostic Captioning Model. In Findings of the Association of Computational Linguistics: NAACL 2024.

W. Li, J. F. Lotz, C. Qui, and D. Elliott. The Role of Data Augmentation in Image Captioning. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL '24).

2023

J. F. Lotz, E. Salesky, P. Rust, and D. Elliott. Text Rendering Strategies for Pixel Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP '23).

L. Cabello, E. Bugliarello, S. Brandl, and D. Elliott. Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP '23).

N. Borenstein, P. Rust, D. Elliott, and I. Augenstein. PHD: Pixel-Based Language Modeling of Historical Documents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP '23).

T. Hirasawa, E. Bugliarello, D. Elliott and M. Komachi. Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation. In Proceedings of the Eighth Conference on Machine Translation (WMT).

R. Ramos, B. Martins, and D. Elliott. Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting. In Findings of the Association of Computational Linguistics: ACL 2023.

R. Ramos, B. Martins, D. Elliott, and Y. Kementchedjhieva. SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recongition (CVPR'23).

P. Rust, J. F. Lotz, E. Bugliarello, E. Salesky, M. de Lhoneux, and D. Elliott. Language Modelling with Pixels. Proceedings of the International Conference on Learning Representations (ICLR '23).

D. Rigoni, D. Elliott and S. Frank. Cleaner categories improve object detection and visual-textual grounding. Scandinavian Conference on Image Analysis (SCIA '23).

R. Ramos, D. Elliott and B. Martins. Retrieval-augmented Image Captioning. Proceedings of the 16th conference of the European Chapter of the Association for Computational Linguistics (EACL '23).

R. K. Jørgensen, O. Brandt, M. Hartmann, X. Dai, C. Igel, and D. Elliott. MultiFin: A Dataset for Multilingual Financial NLP. In Findings of the Association of Computational Linguistics: EACL 2023.

2022

C. Qiu, D. Oneață, E. Bugliarello, S. Frank and D. Elliott. Multilingual Multimodal Learning with Machine Translated Text. In Findings of the Association of Computational Linguistics: EMNLP 2022.

X. Dai, I. Chalkidis, S. Darkner, and D. Elliott. Revisiting Transformer-based Models for Long Document Classification. In Findings of the Association of Computational Linguistics: EMNLP 2022.

C. Fierro, L. C. Piqueras, J. F. Lotz, P. Rust, J. Rommedahl, J. K. Due, C. Igel, D. Elliott, C. B. Pedersen, I. Salazar, and Anders Søgaard. Date Recognition in Historical Parish Records. In Proceedings of International Conference on Frontiers in Handwriting Recognition.

E. Bugliarello, F. Liu, J. Pfeiffer, S. Reddy, D. Elliott, E. M. Ponti, and I. Vulić. IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages. In Proceedings of the 39th International Conference on Machine Learning (ICML '22).

C. Grønbæk, Y. Liang, D. Elliott, and A. Krogh. Prediction of DNA from context using neural networks. PeerJ.

2021

F. Liu, E. Bugliarello, E. M. Ponti, S. Reddy, N. Collier, and D. Elliott. Visually Grounded Reasoning across Languages and Cultures. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP '21). Best Long Paper Award.

S. Frank, E. Bugliarello, and D. Elliott. Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP '21).

R. K. Jørgensen, M. Hartmann, X. Dai, and D. Elliott. mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model. In Findings of the Association of Computational Linguistics: EMNLP 2021.

D. Li, L. M. Pehrson, C. A. Lauridsen, L. Tøttrup, M. Fraccaro, D. Elliott, H. D. Zajac, S. Darkner, J. F. Carlsen, and M. B. Nielsen. The Added Effect of Artificial Intelligence on Physicians’ Performance in Detecting Thoracic Pathologies on CT and Chest X-ray: A Systematic Review. Diagnostics 11(12).

E. Bugliarello, R. Cotterell, N. Okazaki, and D. Elliott. Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs. Transactions of the Association of Computational Linguistics (TACL).

I. Parfenova, D. Elliott, R. Fernández, and S. E. Pezzelle. Probing Cross-Modal Representations in Multi-Step Relational Reasoning. Sixth Workshop on Representation Learning for NLP.

E. Bugliarello and D. Elliott. The Role of Syntactic Planning in Compositional Image Captioning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL '21).

D. Elliott. Beyond Text and Back Again. In Proceedings of the Second International Workshop on NLP Beyond Text.

2020

T. Srinivasan, R. Sanabria, F. Metze and D. Elliott. Fine-Grained Grounding for Multimodal Speech Recognition. In Findings of the Association of Computational Linguistics: EMNLP 2020.

B. Higy, D. Elliott and G. Chrupała. Textual supervision for visually grounded spoken language understanding. In Findings of the Association of Computational Linguistics: EMNLP 2020.

M. Bollmann and D. Elliott. On Forgetting to Cite Older Papers: An Analysis of the ACL Anthology. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL '20).

A. Suglia, I. Konstas, A. Vanzo, E. Bastianelli, D. Elliott, S. Frank and O. Lemon. CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL '20).

M. Abdou, V. Ravishankar, M. Barrett, Y. Belinkov, D. Elliott and A. Søgaard. The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL '20).

L. Specia, R. Arora, L. Barrault, O. Caglayan, A. Duarte, D. Elliott, S. Gella, N. Holzenberger, C. Lala, S J Lee, J. Libovický, P. Madhyastha, F. Metze, K. Mulligan, A. Ostapenko, S. Palaskar, R. Sanabria, and J. Wang. Grounded Sequence to Sequence Transduction. IEEE Journal of Selected Topics in Signal Processing.

U. Sulubacak, O. Caglayan, S. Grönroos, A. Rouhe, D. Elliott, L. Specia, and J. Tiedemann. Multimodal Machine Translation through Visuals and Speech. Machine Translation.

2019

M. Nikolaus, M. Abdou, M. Lamm, R. Aralikatte, and D. Elliott. Compositional Generalization in Image Captioning. In Proceedings of the 22nd Conference on Computational Natural Language (CoNLL '19), Hong Kong, China.

M. Barrett, Y. Kementchedjhieva, Y. Elazar, D. Elliott, and A. Søgaard. Adversarial Removal of Demographic Attributes Revisited. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP-IJCNLP '19), Hong Kong, China.

K. D. Chowdhury and D. Elliott. Understanding the Effect of Textual Adversaries in Multimodal Machine Translation. In Proceedings of the Workshop on Beyond Vision and Language: Integrating Real-world Knowledge, Hong Kong, China. Best Poster Award.

S. Gella, D. Elliott, and F. Keller. Cross-lingual Visual Verb Sense Disambiguation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL '19), Minneapolis, Minnesota, USA. [data]

2018

R. Sanabria, O. Caglayan, S. Palaskar, D. Elliott, L. Barrault, L. Specia, F. Metze. How2: A Large-scale Dataset for Multimodal Language Understanding. In Proceedings of the Visually Grounded Interaction and Language Workshop (ViGiL '18), Montreal, Canada.

D. Elliott. Adversarial Evaluation of Multimodal Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP '18), Brussels, Belgium.

Á. Kádár, D. Elliott, M-A. Côté, G. Chrupała, A. Alishahi. Lessons learned in multilingual grounded language learning. In Proceedings of the 21st Conference on Computational Natural Language (CoNLL '18), Brussels, Belgium.

L. Barrault, F. Bougares, L. Specia, C. Lala, D. Elliott, and S. Frank. Findings of the Third Shared Task on Multimodal Machine Translation. In Proceedings of the Third Conference on Machine Translation (WMT '18), Brussels, Belgium.

E. van Miltenburg, D. Elliott, P. Vossen. Talking about other people: an endless range of possibilities. In Proceedings of the 11th International Natural Language Generation Confernece (INLG '18), Tilburg, The Netherlands.

E. van Miltenburg, D. Elliott, P. Vossen. Measuring the Diversity of Automatic Image Descriptions. In Proceedings of the 27th International Conference on Computational Linguistics (COLING '18), Santa Fe, New Mexico, U.S.A. Area Chair Favourite.

S. Frank, D. Elliott and L. Specia. Assessing Multilingual Multimodal Image Description: Studies of Native Speaker Preferences and Translator Choices. Journal of Natural Language Engineering.

2017

D. Elliott and Á. Kádár. Imagination Improves Multimodal Translation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (IJCNLP '17), Taipei, Taiwan. [code]

D. Elliott, S. Frank, L. Barrault, F. Bougares, and L. Specia. Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description. Proceedings of the Second Conference on Machine Translation (WMT '17), Copenhagen, Denmark. [slides]

E. van Miltenburg, D. Elliott, P. Vossen. Cross-linguistic differences and similarities in image descriptions. Proceedings of the 10th International Natural Language Generation Conference (INLG '17), Santiago di Compostela, Spain.

E. van Miltenburg and D. Elliott. Room for improvement in automatic image description: an error analysis.

R. Bernardi, R. Cakici, D. Elliott, A. Erdem, E. Erdem, N. Ikizler-Cinbis, F. Keller, A. Muscat, and B, Plank. Automatic description generation from images: A survey of models, datasets, and evaluation measures (Extended Abstract). In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI '17), Melbourne, Australia.

2016

L. Specia, S. Frank, K. Sima'an, and D. Elliott. A Shared Task on Multimodal Machine Translation and Crosslingual Image Description. Proceedings of the First Conference on Machine Translation (WMT '16), Berlin, Germany.

I. Calixto, D. Elliott, and S. Frank. DCU-UvA Multimodal MT System Report. Proceedings of the First Conference on Machine Translation (WMT '16), Berlin, Germany.

D. Elliott, S. Frank, K. Sima'an, and L. Specia. Multi30K: Multilingual English-German Image Descriptions. Workshop on Vision and Language at ACL '16, Berlin, Germany. [slides]

E. van Miltenburg, R. Morante, and D. Elliott. Pragmatic factors in image description: the case of negations. Workshop on Vision and Language at ACL' 16, Berlin, Germany.

M. Kleppe and D. Elliott. 1 Million Dutch Newspaper Images available for researchers: The KBK-1M Dataset. In Proceedings of Digital Humanities 2016 (DH '16), Kraków, Poland.

D. Elliott and M. Kleppe. 1 Million Captioned Dutch Newspaper Images. In Proceedings of Language Resources and Evaluation Conference (LREC '16), Portorož, Slovenia. [slides]

L. Hollink, A. Bedjeti, M. van Harmelen, and D. Elliott. A corpus of images and text in online news. In Proceedings of Language Resources and Evaluation Conference (LREC '16), Portorož, Slovenia.

2015

D. Elliott, S. Frank, and E. Hasler. Multilingual Image Description with Neural Sequence Models. [code] [slides]

D. Elliott and A. P. de Vries. Describing Images using Inferred Visual Dependency Representations. In Proceedings of the 53rd Annual Meeting of the Association of Computational Linguistics (ACL '15), Beijing, China.

D. Elliott. A Structured Representation of Images for Language Generation and Image Retrieval. Ph.D Thesis, University of Edinburgh.

2014

D. Elliott. Towards Succinct and Relevant Image Descriptions. In Proceedings of the Workshop on Vision and Language 2014 at COLING '14, Dublin, Ireland.

D. Elliott, V. Lavrenko, and F. Keller. Query-by-Example Image Retrieval using Visual Dependency Representations. In Proceedings of the 25th International Conference on Computational Linguistics (COLING '14), Dublin, Ireland. [data] [code] [slides]

D. Eliott and F. Keller. Comparing Automatic Evaluation Measures for Image Description. In Proceedings of the 52nd Annual Meeting of the Association of Computational Linguistics (ACL '14), Baltimore, Maryland, U.S.A. [data] [code] (This version of the paper corrects an omission of citing the papers of the evaluation measures, which was introduced between the submitted and camera-ready copies.)

Earlier

D. Elliott and F. Keller. Image Description using Visual Dependency Representations. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP '13), Seattle, Washington, U.S.A. [data] [code]

D. Elliott and L. Azzopardi. Practical Considerations when Filtering Documents. In Proceedings of the Fourth Information Interaction in Context Symposium (IIiX '12), Nijmegen, Netherlands.

D. Elliott and F. Keller. A Treebank of Visual and Linguistic Data. In Proceedings of the Workshop on Integrating Language and Vision at Neural Information Processing Systems 2011 (NIPS 25), Granada, Spain. Best Poster Award.

D. Elliott, R. Glassey, T. Polajnar, and L. Azzopardi. Finding and Filtering Information for Children. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '10), Geneva, Switzerland.

D. Elliott and J. M. Jose. A Proactive Personalised Retrieval System. In Proceedings of the ACM Eighteenth Conference on Information and Knowledge Management (CIKM '09), Hong Kong, China.