Job Properties
  • Job Type
    Full-time Position
  • Category
    Research & Science
  • Languages
    English Dutch
  • Experience Required
  • Degree Required
    • Province
    • Date Posted
      December 18,2021
    • Entrusting Package
    • JSS
    • VISA
    • IMG_6430
    • Career Consultation

    PhD position in Vision and Language Processing

    The grounding of language in perception is a central problem in Artificial Intelligence. Neural Vision and Language models partly address this problem by linking linguistic expressions with visual features extracted from images or videos. One important application in this area is the generation of text from visual input. For example, in image captioning, the task is to automatically describe a picture using a short caption in a language such as English. However, most image-to-text models produce highly descriptive captions, focusing on what the image contains. In contrast, when humans describe pictures, they often go beyond mere description. Pictures often evoke stories for people, and the same picture may also evoke different stories.

    In this PhD project, we will investigate visually grounded narrative generation, seeking to develop deep, neural models which can generate a short, story-like text from an image. Such narratives can be based on facts, that is, grounded in both the image contents and additional factual information related to what the image depicts. However, narratives can also be creative and include content which deviates from these facts. For instance, a narrative text can also try to convey the emotional impact of what an image depicts.

    Therefore, a central question in this project concerns the different sources of information that could be leveraged to produce visually grounded narrative. These include the visual input, but potentially also other sources. For example, pictures can evoke historical events; they may have emotional connotations (e.g., when they depict a sad or joyful event); or they may depict a familiar scene. This information is not usually explicitly marked up in visual data. Hence, one question is: What additional knowledge sources are needed to support the generation of narrative texts from images? A second important consideration is what type of short narrative can be generated, from the purely factual, to the more creative.

    As a PhD candidate, you will:

    • study the relationship between vision and language, with a focus on how current deep learning models combine visual and textual information;
    • review available datasets for image-to-text generation, and develop methods for augmenting data to meet the challenge of factual or creative generation of short narratives from visual inputs;
    • develop techniques to combine visual data with additional information, to drive the generation process;
    • conduct empirical studies to evaluate the output of a neural generator, using both automatic means and experiments with human participants.

    We offer a five year position. The PhD project constitutes the bulk of your commitment, with 30% of your employment time dedicated to teaching. Thus, in addition to your research, you will be involved in supporting the preparation and teaching of Bachelor's and Master's courses, and supervising student theses. We offer the opportunity to take significant steps towards acquiring a basic teaching qualification (BKO), which qualifies you as a teacher in the Dutch higher education system.


    We are looking for a candidate with the following qualifications:

    • a Master’s degree in AI, NLP, Machine Learning, or a related discipline;
    • a strong interest in Natural Language Processing;
    • prior experience with deep learning methods – you are expected to develop neural models, train and evaluate them;
    • good knowledge of Python and standard deep learning libraries;
    • willingness to teach and develop your pedagogical skills;
    • fluency in English, both in speech and writing.

    In addition, the following qualifications and experience will be considered a plus:

    • prior experience in designing and executing experiments with human participants;
    • knowledge of statistics for data analysis.


    We offer an exciting opportunity to do cutting edge research within the NLP group, an ambitious and growing team of international researchers. You will be working in a group which combines an interest in scientific and theoretical issues related to the computational treatment of natural language, as well as an interest in the practical applications of NLP.

    Apart from this we offer:

    • a full-time position for five years;
    • a full-time gross salary starting at €2,443 and increasing to €3,122 per month (scale P of the Collective Labour Agreement Dutch Universities (cao));
    • 8% holiday bonus and 8.3% end-of-year bonus;
    • a pension scheme, partially paid parental leave, and flexible employment conditions based on the Collective Labour Agreement Dutch Universities.

    In addition to the employment conditions laid down in the cao for Dutch Universities, Utrecht University has a number of its own arrangements. For example, there are agreements on professional development, leave arrangements and sports. We also give you the opportunity to expand your terms of employment via the Employment Conditions Selection Model. This is how we like to encourage you to continue to grow.

    More information about working at the Faculty of Science can be found here.

    Over de organisatie

    You will work within the Natural Language Processing Group at the Department of Information and Computing Sciences at Utrecht University. The group is led by Prof. Kees van Deemter and is composed of researchers who have expertise in language generation and understanding, multimodal language processing, computational social science, and deep learning methods. The group also has a keen interest in the interface between NLP and cognitive science, with strong ties to other research groups within the ICS Department, as well as the Utrecht Institute of Linguistics OTS at Utrecht University.

    Within the NLP group, you will be working under the supervision of Dr. Albert Gatt in the area of vision and language research. You will also interact closely with other group members, through regular meetings and seminars.

    At the Faculty of Science, there are 6 departments to make a fundamental connection with: Biology, Chemistry, Information and Computing Sciences, Mathematics, Pharmaceutical Sciences and Physics. Each of these is made up of distinct institutes that work together to focus on answering some of humanity’s most pressing problems. More fundamental still are the individual research groups – the building blocks of our ambitious scientific projects. Find out more about us.

    Utrecht University is a friendly and ambitious university at the heart of an ancient city. We love to welcome new scientists to our city – a thriving cultural hub that is consistently rated as one of the world’s happiest cities. We are renowned for our innovative interdisciplinary research and our emphasis on inspirational research and excellent education. We are equally well-known for our familiar atmosphere and the can-do attitude of our people. This fundamental connection attracts researchers, professors and PhD candidates from all over the globe, making both the university and the Faculty of Science a vibrant international and wonderfully diverse community.

    Aanvullende informatie

    If you have any questions regarding the position, please contact Dr. Albert Gatt via

    Do you have a question about the application procedure? Please send an email to

    Open Positions from Universiteit Utrecht
    Related positions