Potential of LLMs and Automated Text Analysis in Interpreting Student Course Feedback

Integrating Large Language Models (LLMs) with automated text analysis tools offers a novel approach to interpreting student course feedback. As educators and administrators strive to refine teaching methods and enhance learning experiences, leveraging AI’s capabilities could unlock more profound insights from student feedback. Traditionally seen as a vast collection of qualitative data filled with sentiments, preferences, and suggestions, this feedback can now be more effectively analyzed. This blog will explore how LLMs can be utilized to interpret and classify student feedback, highlighting workflows that could benefit most teachers.

The Advantages of LLMs in Feedback Interpretation

Bano et al. (2023) shed light on the capabilities of LLMs, such as ChatGPT, in analyzing qualitative data, including student feedback. Their research found a significant alignment between human and LLM classifications of Alexa voice assistant app reviews, demonstrating LLMs’ ability to understand and categorize feedback effectively. This indicates that LLMs can grasp the nuances of student feedback, especially when the data is rich in specific word choices and context related to course content or teaching methodologies.

LLMs excel at processing and interpreting large volumes of text, identifying patterns, and extracting themes from qualitative feedback. Their capacity for thematic analysis at scale can assist educators in identifying common concerns, praises, or suggestions within students’ comments, tasks that might be cumbersome and time-consuming through manual efforts.

Limitations and Challenges

Despite their advantages, LLMs have limitations. Linse (2017) highlights that fully understanding the subtleties of student feedback requires more than text analysis; it demands contextual understanding and an awareness of biases. LLMs might not accurately interpret outliers and statistical anomalies, often necessitating human intervention to identify root causes.

Kastrati et al. (2021) identify several challenges in analyzing student feedback sentiment. One major challenge is accurately identifying and interpreting figurative speech, such as sarcasm and irony, which can convey sentiments opposite to their literal meanings. Additionally, many feedback analysis techniques designed for specific domains may falter when applied to the varied contexts of educational feedback. Handling complex linguistic features, such as double negatives, unknown proper names, abbreviations, and words with multiple meanings commonly found in student feedback, presents further difficulties. Lastly, there is a risk that LLMs might inadvertently reinforce biases in their training data, leading to skewed feedback interpretations.

Tools and Workflows

According to ChatGPT (OpenAI, 2024), a suggested workflow for analyzing data from course feedback forms is summarized as follows:

  1. Data Collection: Utilize tools such as Google Forms or Microsoft Forms to design and distribute course feedback forms, emphasizing open-ended questions to gather qualitative feedback from students.
  2. Data Aggregation: Employ automation to compile feedback data into a single repository, like a Google Sheet or Microsoft Excel spreadsheet, simplifying the analysis process.
  3. Initial Thematic Analysis: Import the aggregated feedback into qualitative data analysis software such as NVivo or ATLAS.ti. Use the software’s coding capabilities to identify recurring themes or sentiments in the feedback.
  4. LLM-Assisted Analysis: Engage an LLM, like OpenAI’s GPT, to further analyze the identified themes, categorize comments, and potentially uncover new themes that were not initially evident. It’s crucial to review AI-generated themes for their accuracy and relevance.
  5. Quantitative Integration: Combine qualitative insights with quantitative data from the feedback forms (e.g., ratings) using tools like Microsoft Excel or Google Sheets. This integration offers a more holistic view of student feedback.
  6. Visualization and Presentation: Apply data visualization tools such as Google Charts or Tableau to create interactive dashboards or charts that present the findings of the qualitative analysis. Employing visual aids like word clouds for common themes, sentiment analysis graphs, and charts showing thematic distribution can render the data more engaging and comprehensible.

Case Study: Minecraft Education Lesson

ChatGPT’s recommended workflow was used to analyze feedback from a recent lesson on teaching functions in Minecraft Education.

Step 1: Data Collection

A Google Forms survey was distributed to students, which comprised three quantitative five-point Likert scale questions and three qualitative open-ended questions to gather comprehensive feedback.

MCE Questionnaire

Step 2: Data Aggregation

Using Google Forms’ export to CSV feature, all survey responses were consolidated into a single file, facilitating efficient data management.

Step 3: Initial Thematic Analysis

The survey data was then imported into atlas.ti, an online thematic analysis tool with AI capabilities, to generate initial codes from the quantitative data. This process revealed several major themes, providing valuable insights from the feedback.

Results of AI Coding

Step 4: Manual Verification and Analysis

Upon reviewing the survey data manually, the main themes identified by Atlas.ti were confirmed. Additionally, this manual step highlighted specific approaches students took to solve problems presented in the lesson. Generally, the AI-generated codes were quite accurate, but a closer analysis of the comments (like the ones below) shows even more insightful student suggestions.

AI Coding

Step 5: Quantitative Integration

With both qualitative and quantitative data at hand, we bypass the need for a separate step for quantitative integration.

Step 6: LLM-Assisted Analysis and Visualization

Next, themes were further analyzed using ChatGPT’s code interpreter feature. ChatGPT helped analyze the data and summarized the aggregated data very accurately. It even provided Python code for generating additional visualizations, enhancing the interpretation of the feedback.

Python pandas code

ChatGPT’s guidance facilitated the creation of insightful visualizations such as bar charts and word clouds.

bar chart of qualitative data
Word cloud output

Python offers a wealth of data visualization libraries for even more detailed analysis (https://mode.com/blog/python-data-visualization-libraries).

Best Practices for Using LLMs in Feedback Analysis

Research by Bano et al. (2023) and insights from Linse (2017) highlight the potential of LLMs and automated text analysis tools in interpreting student course feedback. Adopting best practices for integrating these technologies is critical for educators and administrators to make informed decisions that enhance teaching quality and the student learning experience, contributing to a more responsive and dynamic educational environment. Below are several recommendations:

  1. Educators or trained administrators must review AI-generated themes and categorizations to ensure alignment with the intended context and uncover nuances possibly missed by the AI. This step is vital for identifying subtleties and complexities that LLMs may not detect.
  2. Utilize insights from both AI and human analyses to inform changes in teaching practices or course content. Then, assess whether subsequent feedback reflects the effects of these changes, thereby establishing an iterative loop for continuous improvement.
  3. Offer guidance on using Student course evaluations constructively. This involves understanding the context of evaluations, looking beyond average scores to grasp the distribution, and considering student feedback as one of several measures for assessing and enhancing teaching quality.
  4. This process should act as part of a holistic teaching evaluation system, which should also encompass peer evaluations, self-assessments, and reviews of teaching materials. A comprehensive approach offers a more precise and balanced assessment of teaching effectiveness.


Bano, M., Didar Zowghi, & Whittle, J. (2023). Exploring Qualitative Research Using LLMs. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2306.13298

Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation54, 94–106. https://doi.org/10.1016/j.stueduc.2016.12.004

Kastrati, Z., Dalipi, F., Imran, A. S., Pireva Nuci, K., & Wani, M. A. (2021). Sentiment Analysis of Students’ Feedback with NLP and Deep Learning: A Systematic Mapping Study. Applied Sciences, 11(9), 3986. https://doi.org/10.3390/app11093986

OpenAI. (2024). ChatGPT (Feb 10, 2024) [Large language model]. https://chat.openai.com/chat

Leave a Reply

Your email address will not be published. Required fields are marked *