Speakers
Bruce McLaren
Carnegie Mellon University, USAYongchao Wu
Stockholm University, SwedenSteven James Moore
Carnegie Mellon University, USAIshari Amarasinghe
Radboud University, The NetherlandsUniversitat Pompeu Fabra, Spain
Start
06/09/2023 - 11:00
End
06/09/2023 - 13:00
Address
Auditorium View mapSession 1: Generative AI for Learning & Teaching
Chair: Marco Kalz
Evaluating ChatGPT’s Decimal Skills and Feedback Generation to Students’ Self-explanations in a Digital Learning Game
Hayden Stec, Huy Nguyen, Xinying Hou, Sarah Di and Bruce McLaren
Abstract: While open-ended self-explanations have been shown to promote robust learning in multiple studies, they pose significant challenges to automated grading and feedback in technology-enhanced learning, due to the unconstrained nature of the students’ input. Our work investigates whether recent advances in Large Language Models, particularly ChatGPT, can address this issue. Using decimal exercises and student data from a prior study of the learning game Decimal Point, with more than 5000 open-ended self-explanation responses, we investigate ChatGPT’s capability in (1) solving the in-game exercises, (2) determining the correctness of students’ answers, and (3) providing meaningful feedback to incorrect answers. Our results showed that ChatGPT can respond well to conceptual questions, but struggled with decimal place values and number line problems. In addition, it was able to accurately assess the correctness of 75% of the students’ answers and generated generally high-quality feedback, similar to human instructors. We conclude with a discussion of ChatGPT’s strengths and weaknesses and suggest several venues for extending its use cases in digital teaching and learning.
📄 Read More: https://link.springer.com/chapter/10.1007/978-3-031-42682-7_19
Towards Improving the Reliability and Transparency of ChatGPT for Educational Question Answering
Yongchao Wu, Aron Henriksson, Martin Duneld and Jalal Nouri
Abstract: Large language models (LLMs), such as ChatGPT, have shown remarkable performance on various natural language processing (NLP) tasks, including educational question answering (EQA). However, LLMs generate text entirely based on knowledge obtained during pre-training, which means they struggle with recent information or domain-specific knowledge bases. Moreover, only providing answers to questions posed to LLMs without any grounding materials makes it difficult for students to judge their validity. We therefore propose a method for integrating information retrieval systems with LLMs when developing EQA systems, which in addition to improving EQA performance grounds the answers in the educational context. Our experiments show that the proposed system outperforms vanilla ChatGPT with a vast margin of 110.9%, 67.8%, and 43.3% on BLEU, ROUGE, and METEOR scores. In addition, we argue that the use of the retrieved educational context enhances the transparency and reliability of the EQA process, making it easier to determine the correctness of the answers.
📄 Read More: https://link.springer.com/chapter/10.1007/978-3-031-42682-7_32
Assessing the Quality of Multiple-Choice Questions: Automated Methods for Identifying Item-Writing Flaws
Steven Moore, Huy Nguyen, John Stamper and Tianying Chen
Abstract: Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for classroom usage. Existing methods for evaluating multiple-choice questions often focus on machine readability metrics, without considering their intended use within course materials and their pedagogical implications. In this study, we compared the performance of a rule-based method we developed to a machine-learning based method utilizing GPT-4 for the task of automatically assessing multiple-choice questions based on 19 common item-writing flaws. By analyzing 200 student-generated questions from four different subject areas, we found that the rule-based method correctly detected 91% of the flaws identified by human annotators, as compared to 79% by GPT-4. We demonstrated the effectiveness of the two methods in identifying common item-writing flaws present in the student-generated questions across different subject areas. The rule-based method can accurately and efficiently evaluate multiple-choice questions from multiple domains, outperforming GPT-4 and going beyond existing metrics that do not account for the educational use of such questions. Finally, we discuss the potential for using these automated methods to improve the quality of questions based on the identified flaws.
📄 Read More: https://link.springer.com/chapter/10.1007/978-3-031-42682-7_16
Generative Pre-trained Transformers for Coding Text Data? An Analysis with Classroom Orchestration Data
Ishari Amarasinghe, Francielle Marques, Ariel Ortiz-Beltran and Davinia Hernández-Leo
Abstract: Content analysis is of importance for researchers in technology-enhanced learning. Text transcripts, for example those obtained from video recordings, enables the application of a coding scheme to group the text into categories that highlight the key themes. However, manually coding text into codes is demanding and requires the time and effort of human annotators. Therefore, this study explores the possibility of using Generative Pre-trained Transformer 3 (GPT-3) models for automating the text data coding compared to baseline classical machine learning approaches using a dataset manually coded for the orchestration actions of six teachers in classroom collaborative learning sessions. The findings of our study showed that a fine-tuned GPT-3 (curie) model outperformed classical approaches (F1 score of 0.87) and reached a 0.77 Cohen’s kappa, which indicated a moderate agreement between manual and machine coding. The study also brings out the limitations of our text transcripts and highlights the importance of multimodal observations that capture the context of orchestration actions.
📄 Read More: https://link.springer.com/chapter/10.1007/978-3-031-42682-7_3