Natural Language Processing for Educational Assessment: Developing an AI-Driven Question Classification System
Title
Natural Language Processing for Educational Assessment: Developing an AI-Driven Question Classification System
Subject
Computer Science
Description
Natural Language Processing
Creator
Edwin Eldho Paul
Date
2025
Abstract
Students and teachers often struggle to find past paper questions that align with specific syllabus topics. GCSE and A-Level Computer Science exams from AQA, Edexcel, and OCR provide a wide range of practice materials, but these are typically organised by year and exam board rather than by topic, requiring students to manually sift through entire papers to locate relevant questions and reducing the time available for effective revision. This project aims to develop an automated system that classifies and labels past paper questions according to their syllabus topics, focusing initially on OCR GCSE Computer Science papers. By applying Natural Language Processing (NLP) techniques such as TF–IDF vectorisation, Support Vector Machine (SVM) classification, and data augmentation, the system is able to automatically tag each question with its corresponding syllabus code, while trials with transformer-based models like BERT and DistilBERT explore their potential for educational question classification. The resulting labelled and searchable database improves accessibility of past papers for targeted revision and has practical implications for students, teachers, and educational platforms, while also laying the groundwork for future AI-driven educational tools such as automated worksheet generation and personalised learning systems.
Files
Collection
Citation
u5568000, “Natural Language Processing for Educational Assessment: Developing an AI-Driven Question Classification System,” URSS SHOWCASE, accessed November 2, 2025, https://linen-dog.lnx.warwick.ac.uk/items/show/951.