Assessing Constructed Responses with Explainable Natural Language Processing

JTELSS logo
Assessing Constructed Responses with Explainable Natural Language Processing Tuesday 24/05 14:00-15:30h Outdoor Area B Abstract The assessment of constructed responses is useful for measuring the active knowledge of learners. In the past, it has been approached with various NLP methods. These range from keyword- and pattern matching to different machine

Speakers

Sebastian Gombert
DIPF, Germany

Start

24/05/2022 - 14:00

End

24/05/2022 - 15:30

Assessing Constructed Responses with Explainable Natural Language Processing

Tuesday 24/05 14:00-15:30h
Outdoor Area B
Abstract

The assessment of constructed responses is useful for measuring the active knowledge of learners. In the past, it has been approached with various NLP methods. These range from keyword- and pattern matching to different machine learning algorithms. Since 2019, so-called transformer language models started to dominate the field of natural language processing and have also been successfully applied to the assessment of constructed responses. They often allow for a high predictive performance without the need for extensive task-specific feature-engineering, but they are also black box models. Consequently, it is not trivial to assess whether these models pick up on the correct signals or learn simple, instable shortcuts. Thus, it is hard to predict how robustly they will behave in high-stake scenarios without explaining them. For this reason, models used in such scenarios need to be reliable, which, in turn, can only be guaranteed if these models are explainable. In this workshop, we will explore how we can apply and explain transformer language models as well as explainable feature-based models for assessment, and how we can gain insight into these models.

 

Needs Analysis

This workshop will be useful for participants who want to learn how to apply explainable natural language processing methodology for scoring constructed responses. This workshop touches upon explaining transformer language models and feature-based models such as generalized additive or tree-based models trained for the assessment of constructed responses. Participants will get an overview over different techniques they can use for assessing constructed responses. Moreover, they gain insight on how to explain the predictions of their models and gain insights into them. Moreover, we will reflect on some ethical aspects related to the automatic assessment of constructed responses.

 

Learning Objectives

The objective is to make participants aware of the potential different NLP techniques which can be used for the assessment of constructed responses. Moreover, participants should get an overview over the different techniques which can be used to explain the models and gain interpretable insight. Moreover, participants will reflect on the reasons why it is important to use explainable techniques for the assessment of constructed responses.

 

Pre-activities

A basic understanding of assessment and/or natural language processing methodology could a plus, but is not really required to follow the workshop. An optional reader participants can use to dive deeper into the workshop topic will be provided in advance.

 

Session Description
  • Background presentation:
    – Assessment: placement, formative, summative; objective vs. subjective; holistic vs. analytic scoring.
    – Constructed response assessment
    – NLP and the assessment of constructed responses: a short historical overview.
    – Black-box vs. glass-box models
    – Transformer language models: short introduction
    – Feature-based glass-boxes: Tree-based and generalized additive models. 25’
  • Group discussion: what are the ethical issues with using black-box models in assessment scenarios? When are they justified, when problematic? What are the requirements for making them applicable? How do sufficient explanations need to look? 20’
  • Existing frameworks and tools: Pytorch & Huggingface Transformers, Huggingface ModelHub, Interpret.ml, LIME, Transformer Explainability 10’
  • Group discussion: Model-intrinsic methods (Transformer Explainability) vs. model-extrinsic methods (LIME) 10’
  • Examples of Explainable NLP from our group:
    – DIPF contribution to the NAEP scoring challenge
    – Paper: Identifying Energy Knowledge in Constructed Responses with Explainable NLP
    – Applications in the ALICE project: formative knowledge assessment with explainable models 15’
  • Q&A: Challenges, opportunities 10’

 

Post-activities

Slides and a workshop script will be provided.

The following publications are useful to dive deeper into the topic of explainable natural language processing methodology:

  • Søgaard, Anders. 2021. Explainable Natural Language Processing. Morgan & Claypool. PDF
  • Xiaofei Sun, Diyi Yang, Xiaoya Li, Tianwei Zhang, Yuxian Meng, Qiu Han, Guoyin Wang, – Eduard Hovy, and Jiwei Li. 2021. Interpreting deep learning models in natural language processing: A review. arXiv preprint arXiv:2110.10470.
  • Yonathan Belinkov, James Glass (2019): Analysis Methods in Neural Language Processing: A Survey, TACL. https://doi-org.vu-nl.idm.oclc.org/10.1162/tacl_a_00254

Huggingface Transformers:
https://huggingface.co/docs/transformers/index

Transformer Explainability:

Learn about older approaches for scoring short answers:
https://webis.de/downloads/publications/papers/burrows_2015.pdf