Security Risks of Public Package Managers and Developer Responsibilities

Introduction

Open-source development ecosystems rely heavily on package managers such as Node Package Manager (NPM), RubyGems, and Pip. These tools provide developers with easy access to a vast library of reusable software packages, accelerating development timelines and reducing costs. However, the convenience of public repositories comes with significant security risks. Since the public develops these packages, often anonymously, they may contain vulnerabilities and malicious code or introduce indirect threats through their dependencies. This post explores the most common security risks developers face when using packages from public repositories and how to identify these threats. We will also examine developers’ ethical responsibilities when using package managers and discuss how developers can help mitigate some of these issues.

Security Risks in Public Package Managers

One of the most prominent risks associated with public repositories is the presence of malicious or vulnerable packages. For example, the NPM ecosystem has been found to contain several security vulnerabilities, many of which arise from the extensive use of transitive dependencies, dependencies of dependencies that are automatically installed when a developer imports a package. These transitive dependencies significantly increase the attack surface, as vulnerabilities in even one can cascade to affect the entire project (Decan et al., 2018; Kabir et al., 2022; Latendresse et al., 2022).

Several incidents have highlighted the dangers of these vulnerabilities. In November 2018, the event-stream incident involved a popular utility library for working with data streams in Node.js that unknowingly incorporated a malicious dependency, leading to over two million downloads of malware (Zerouali et al., 2022). Similarly, the removal of left-pad, a small but widely used NPM package, caused widespread disruption, impacting thousands of projects (Zimmermann et al., 2019). These demonstrate how software dependencies in public repositories can lead to emergent security problems.

Identifying Security Risks in Dependencies

There are two primary ways developers can identify security risks in dependencies: direct and transitive analysis. Direct dependencies are those explicitly declared in the package manifest (e.g., package.json for NPM), whereas transitive dependencies are automatically included through other installed packages (Decan et al., 2018; Zerouali et al., 2022).

Transitive dependencies are one of the most critical sources of risk. Research shows that roughly 40% of NPM packages rely on code with known vulnerabilities, many of which stem from transitive dependencies (Zimmermann et al., 2019). As projects scale up, the number of indirect dependencies grows, making tracking and assessing vulnerabilities difficult.

Developers can use tools such as npm audit, which connects directly to NPM’s known vulnerabilities database, or Snyk, a tool that provides real-time monitoring. These tools analyze the entire dependency tree and alert developers to packages with security problems such as transitive dependencies (Kabir et al., 2022). However, a challenge with such tools is the frequent occurrence of false positives, particularly for vulnerabilities in development dependencies that are never deployed in production. For example, npm audit may flag vulnerabilities in packages that are part of the development environment and are never included in the final production build. While these vulnerabilities are technically present, they do not threaten the production application because the flagged dependencies are not part of the final product (Latendresse et al., 2022).

To mitigate these risks, developers should:

  • Regularly audit their dependencies with tools like npm audit and manually ensure required fixes are applied promptly (Kabir et al., 2022).
  • Lock down dependency versions using tools like package-lock to avoid inadvertently updating to a vulnerable version (Zimmermann et al., 2019).
  • Remove unused or redundant dependencies. Kabir et al. (2022) found that 90% of projects sampled (n=841) had unused dependencies, and 83% had duplicated dependencies, unnecessarily increasing the attack surface.
  • Incorporate Software Composition Analysis (SCA) tools such as Snyk into the development workflow to detect vulnerabilities deep within the dependency tree (Latendresse et al., 2022).
  • Apply “tree shaking” techniques to remove unused transitive dependencies from production builds (Latendresse et al., 2022).

Ethical Responsibilities of Developers and Educators

Developers have an ethical responsibility to safeguard the software they create and the users who depend on it. By using packages from public repositories, developers must ensure they are not exposing users to security risks. This responsibility ties into the ISTE standard 4.7d, which emphasizes empowering individuals to make informed decisions to protect personal data and curate a secure digital profile. Developers must prioritize software security on components requiring sensitive data management.

One crucial aspect of this responsibility is ensuring the safety of third-party packages and educating others on best practices. For computer science educators, this involves teaching students how to assess package security and encouraging them to use secure alternatives. Educators should also model responsible practices, such as regularly updating dependencies and employing security audits in their projects. Strategies for this were outlined in an earlier post on CRAP detection in NPM.

From an educational standpoint, understanding the security risks associated with public package managers can be incorporated into the SAMR model of educational technology integration. At the Substitution level, students might learn how to install dependencies using package managers. At the Augmentation level, they could explore using tools like npm audit or Snyk to discover package vulnerabilities. The Modification stage would involve students modifying code to replace insecure dependencies, while the Redefinition stage would design more secure workflows for integrating third-party libraries into their applications.

References

Decan, A., Mens, T., & Constantinou, E. (2018). On the impact of security vulnerabilities in the npm package dependency network. Proceedings of the 15th International Conference on Mining Software Repositories. https://doi.org/10.1145/3196398.3196401

Latendresse, J., Mujahid, S., Costa, D. E., & Shihab, E. (2022). Not All Dependencies are Equal: An Empirical Study on Production Dependencies in NPM. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. https://doi.org/10.1145/3551349.3556896

Kabir, M. M. A., Wang, Y., Yao, D., & Meng, N. (2022). How Do Developers Follow Security-Relevant Best Practices When Using NPM Packages? 2022 IEEE Secure Development Conference (SecDev). https://doi.org/10.1109/secdev53368.2022.00027

Zerouali, A., Mens, T., Decan, A., & De Roover, C. (2022). On the impact of security vulnerabilities in the npm and RubyGems dependency networks. Empirical Software Engineering27(5). https://doi.org/10.1007/s10664-022-10154-1

Zimmermann, M., Staicu, C.-A., Tenny, C., & Pradel, M. (2019). Small World with High Risks: A Study of Security Threats in the npm Ecosystem. Www.usenix.org. https://www.usenix.org/conference/usenixsecurity19/presentation/zimmerman

Models to Measure Students’ Learning in Computer Science

As computer science becomes integrated into K-12 education systems worldwide, educators and researchers continuously search for effective methods to measure and understand students’ learning levels in this field. The challenge lies in developing reliable and comprehensive assessment models that accurately and discreetly gauge student learning. Teachers must assess learning to support students’ educational needs better. Similarly, students and parents expect schools to document students’ proficiency in computing and their practical application. Unlike conventional subjects such as math and science, very few relevant assessments are available for K-12 CS education. This article explores specific models used to measure knowledge in various CS contexts and then examines several examples of student learning indicators in computer science.

Randomized Controlled Trials and Measurement Techniques

An innovative approach to measuring student performance in computer science education involves evaluating the effectiveness of teaching parallel programming concepts. Research by Daleiden et al. (2020) focuses on assessing students’ understanding and application of these concepts.

The Token Accuracy Map (TAM) technique supplements traditional empirical analysis methods, such as timings, error counting, or compiler errors, which often need more depth in analyzing the cause of errors or providing detailed insights into specific problem areas encountered by students. The study applied TAM to examine student performance across two parallel programming paradigms: threads and process-oriented programming based on Communicating Sequential Processes (CSP), measuring programming accuracy through an automated process.

The TAM approach analyzes the accuracy of student-submitted code by comparing it against a reference solution using a token-based comparison. Each element of the code, or “token,” is compared to determine its correctness, and the results are aggregated to provide an overall accuracy score ranging from 0% to 100%. This scoring system reflects the percentage of correctness, allowing for a detailed examination of which students intuitively understand specific elements of different programming paradigms or are more likely to implement them correctly.

This approach extends error counts, offering insights into students’ mistakes at a granular level. Such detailed analysis enables researchers and educators to identify specific programming concepts requiring further clarification or alternative teaching approaches. Additionally, TAM can highlight the strengths and weaknesses of different programming paradigms from a learning perspective, thereby guiding curriculum development and instructional design.

Competence Structure Models in Informatics

Torsten et al. (2015) introduced a new model in their discussion aimed at developing a competence structure model for informatics with a focus on system comprehension and object-oriented modelling. This model, part of the MoKoM project (Modeling and Measurement of Competences in Computer Science Education), seeks to create a competence structure model that is both theoretically sound and empirically validated. The project’s goals include identifying essential competencies in the field, organizing them into a coherent framework, and devising assessments to measure them accurately. The study employed the Item Response Theory (IRT) evaluation methodology to construct the test instrument and analyze survey data.

The initial foundation of the competence model was based on theoretical concepts from international syllabi and curricula, such as the ACM’s “Model Curriculum for K-12 Computer Science” and expert papers on software development. This framework encompasses cognitive and non-cognitive skills pertinent to computer science, especially emphasizing system comprehension and object-oriented modelling.

The study further included conducting expert interviews using the Critical Incident Technique to validate the model’s applicability to real-world scenarios and its empirical accuracy. This method was instrumental in pinpointing and defining the critical competencies needed to apply and understand informatics systems. It also provided a detailed insight into student learning in informatics, identifying specific strengths and areas for improvement.

Limitations

The limitation of this approach is its specificity, which may hinder scalability to broader contexts or different courses. Nonetheless, the findings indicate that detailed, granular measurements can offer valuable insights into the nature and types of students’ errors and uncover learning gaps. The resources mentioned subsequently propose a more general strategy for assessing learning in computer science.

Evidence-centred Design for High School Introductory CS Courses

Another method for evaluating student learning in computer science involves using Evidence-Centered Design (ECD). Newton et al. (2021) demonstrate the application of ECD to develop assessments that align with the curriculum of introductory high school computer science courses. ECD focuses on beginning with a clear definition of the knowledge, skills, and abilities students are expected to gain from their coursework, followed by creating assessments that directly evaluate these outcomes.

The approach entails specifying the domain-specific tasks that students should be capable of performing, identifying the evidence that would indicate their proficiency, and designing assessment tasks that would generate such evidence. The model further includes an analysis of assessment items for each instructional unit, considering their difficulty, discrimination index, and item type (e.g., multiple-choice, open-ended, etc.). This analysis aids in refining the assessments to gauge student competencies and understanding more accurately.

This model offers a more precise measurement of student learning by ensuring that assessments are closely linked to curriculum objectives and learning outcomes.

Other General Student Indicators

The Exploring Computer Science website, a premier resource for research on indicators of student learning in computer science, identifies several key metrics for understanding concepts within the field:

  • Student-Reported Increase in Knowledge of CS Concepts: Students are asked to self-assess their knowledge in problem-solving techniques, design, programming, data analysis, and robotics, rating their understanding before and after instruction.
  • Persistent Motivation in Computer Problem Solving: This self-reported measure uses a 5-point Likert scale to evaluate students’ determination to tackle computer science problems. Questions include, “Once I start working on a computer science problem or assignment, I find it hard to stop,” and “When a computer science problem arises that I can’t solve immediately, I stick with it until I find a solution.”
  • Student Engagement: This metric again relies on self-reporting to gauge a student’s interest in further pursuing computer science in their studies. It assesses enthusiasm and inclination towards the subject.
  • Use of CS Vocabulary: Through pre- and post-course surveys, students respond to the prompt: “What might it mean to think like a Computer Scientist?”. Responses are analyzed for the use of computer science-related keywords such as “analyze,” “problem-solving,” and “programming.” A positive correlation was found between CS vocabulary use and self-reported CS knowledge levels.

Comparing the Models

Each model discussed provides distinct benefits but converges on a shared objective: to gauge precisely students’ understanding of computer science. The Evidence-Centered Design (ECD) model is notable for its methodical alignment assessments with educational objectives, guaranteeing that evaluations accurately reflect the intended learning outcomes. Conversely, the randomized controlled trial and innovative measurement technique present a solid approach for empirically assessing the impact of instructional strategies on student learning achievements. Finally, the competence structure model offers an exhaustive framework for identifying and evaluating specific competencies within a particular field, like informatics, ensuring a thorough understanding of student abilities. As the field continues to evolve, so will our methods for measuring student success.

References

Daleiden, P., Stefik, A., Uesbeck, P. M., & Pedersen, J. (2020). Analysis of a Randomized Controlled Trial of Student Performance in Parallel Programming using a New Measurement Technique. ACM Transactions on Computing Education20(3), 1–28. https://doi.org/10.1145/3401892

Magenheim, J., Schubert, S., & Schaper, N. (2015). Modelling and measurement of competencies in computer science education. KEYCIT 2014: key competencies in informatics and ICT7(1), 33-57.

Newton, S., Alemdar, M., Rutstein, D., Edwards, D., Helms, M., Hernandez, D., & Usselman, M. (2021). Utilizing Evidence-Centered Design to Develop Assessments: A High School Introductory Computer Science Course. Frontiers in Education6. https://doi.org/10.3389/feduc.2021.695376

Potential of LLMs and Automated Text Analysis in Interpreting Student Course Feedback

Integrating Large Language Models (LLMs) with automated text analysis tools offers a novel approach to interpreting student course feedback. As educators and administrators strive to refine teaching methods and enhance learning experiences, leveraging AI’s capabilities could unlock more profound insights from student feedback. Traditionally seen as a vast collection of qualitative data filled with sentiments, preferences, and suggestions, this feedback can now be more effectively analyzed. This blog will explore how LLMs can be utilized to interpret and classify student feedback, highlighting workflows that could benefit most teachers.

The Advantages of LLMs in Feedback Interpretation

Bano et al. (2023) shed light on the capabilities of LLMs, such as ChatGPT, in analyzing qualitative data, including student feedback. Their research found a significant alignment between human and LLM classifications of Alexa voice assistant app reviews, demonstrating LLMs’ ability to understand and categorize feedback effectively. This indicates that LLMs can grasp the nuances of student feedback, especially when the data is rich in specific word choices and context related to course content or teaching methodologies.

LLMs excel at processing and interpreting large volumes of text, identifying patterns, and extracting themes from qualitative feedback. Their capacity for thematic analysis at scale can assist educators in identifying common concerns, praises, or suggestions within students’ comments, tasks that might be cumbersome and time-consuming through manual efforts.

Limitations and Challenges

Despite their advantages, LLMs have limitations. Linse (2017) highlights that fully understanding the subtleties of student feedback requires more than text analysis; it demands contextual understanding and an awareness of biases. LLMs might not accurately interpret outliers and statistical anomalies, often necessitating human intervention to identify root causes.

Kastrati et al. (2021) identify several challenges in analyzing student feedback sentiment. One major challenge is accurately identifying and interpreting figurative speech, such as sarcasm and irony, which can convey sentiments opposite to their literal meanings. Additionally, many feedback analysis techniques designed for specific domains may falter when applied to the varied contexts of educational feedback. Handling complex linguistic features, such as double negatives, unknown proper names, abbreviations, and words with multiple meanings commonly found in student feedback, presents further difficulties. Lastly, there is a risk that LLMs might inadvertently reinforce biases in their training data, leading to skewed feedback interpretations.

Tools and Workflows

According to ChatGPT (OpenAI, 2024), a suggested workflow for analyzing data from course feedback forms is summarized as follows:

  1. Data Collection: Utilize tools such as Google Forms or Microsoft Forms to design and distribute course feedback forms, emphasizing open-ended questions to gather qualitative feedback from students.
  2. Data Aggregation: Employ automation to compile feedback data into a single repository, like a Google Sheet or Microsoft Excel spreadsheet, simplifying the analysis process.
  3. Initial Thematic Analysis: Import the aggregated feedback into qualitative data analysis software such as NVivo or ATLAS.ti. Use the software’s coding capabilities to identify recurring themes or sentiments in the feedback.
  4. LLM-Assisted Analysis: Engage an LLM, like OpenAI’s GPT, to further analyze the identified themes, categorize comments, and potentially uncover new themes that were not initially evident. It’s crucial to review AI-generated themes for their accuracy and relevance.
  5. Quantitative Integration: Combine qualitative insights with quantitative data from the feedback forms (e.g., ratings) using tools like Microsoft Excel or Google Sheets. This integration offers a more holistic view of student feedback.
  6. Visualization and Presentation: Apply data visualization tools such as Google Charts or Tableau to create interactive dashboards or charts that present the findings of the qualitative analysis. Employing visual aids like word clouds for common themes, sentiment analysis graphs, and charts showing thematic distribution can render the data more engaging and comprehensible.

Case Study: Minecraft Education Lesson

ChatGPT’s recommended workflow was used to analyze feedback from a recent lesson on teaching functions in Minecraft Education.

Step 1: Data Collection

A Google Forms survey was distributed to students, which comprised three quantitative five-point Likert scale questions and three qualitative open-ended questions to gather comprehensive feedback.

MCE Questionnaire

Step 2: Data Aggregation

Using Google Forms’ export to CSV feature, all survey responses were consolidated into a single file, facilitating efficient data management.

Step 3: Initial Thematic Analysis

The survey data was then imported into atlas.ti, an online thematic analysis tool with AI capabilities, to generate initial codes from the quantitative data. This process revealed several major themes, providing valuable insights from the feedback.

Results of AI Coding

Step 4: Manual Verification and Analysis

Upon reviewing the survey data manually, the main themes identified by Atlas.ti were confirmed. Additionally, this manual step highlighted specific approaches students took to solve problems presented in the lesson. Generally, the AI-generated codes were quite accurate, but a closer analysis of the comments (like the ones below) shows even more insightful student suggestions.

AI Coding

Step 5: Quantitative Integration

With both qualitative and quantitative data at hand, we bypass the need for a separate step for quantitative integration.

Step 6: LLM-Assisted Analysis and Visualization

Next, themes were further analyzed using ChatGPT’s code interpreter feature. ChatGPT helped analyze the data and summarized the aggregated data very accurately. It even provided Python code for generating additional visualizations, enhancing the interpretation of the feedback.

Python pandas code

ChatGPT’s guidance facilitated the creation of insightful visualizations such as bar charts and word clouds.

bar chart of qualitative data
Word cloud output

Python offers a wealth of data visualization libraries for even more detailed analysis (https://mode.com/blog/python-data-visualization-libraries).

Best Practices for Using LLMs in Feedback Analysis

Research by Bano et al. (2023) and insights from Linse (2017) highlight the potential of LLMs and automated text analysis tools in interpreting student course feedback. Adopting best practices for integrating these technologies is critical for educators and administrators to make informed decisions that enhance teaching quality and the student learning experience, contributing to a more responsive and dynamic educational environment. Below are several recommendations:

  1. Educators or trained administrators must review AI-generated themes and categorizations to ensure alignment with the intended context and uncover nuances possibly missed by the AI. This step is vital for identifying subtleties and complexities that LLMs may not detect.
  2. Utilize insights from both AI and human analyses to inform changes in teaching practices or course content. Then, assess whether subsequent feedback reflects the effects of these changes, thereby establishing an iterative loop for continuous improvement.
  3. Offer guidance on using Student course evaluations constructively. This involves understanding the context of evaluations, looking beyond average scores to grasp the distribution, and considering student feedback as one of several measures for assessing and enhancing teaching quality.
  4. This process should act as part of a holistic teaching evaluation system, which should also encompass peer evaluations, self-assessments, and reviews of teaching materials. A comprehensive approach offers a more precise and balanced assessment of teaching effectiveness.

References

Bano, M., Didar Zowghi, & Whittle, J. (2023). Exploring Qualitative Research Using LLMs. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2306.13298

Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation54, 94–106. https://doi.org/10.1016/j.stueduc.2016.12.004

Kastrati, Z., Dalipi, F., Imran, A. S., Pireva Nuci, K., & Wani, M. A. (2021). Sentiment Analysis of Students’ Feedback with NLP and Deep Learning: A Systematic Mapping Study. Applied Sciences, 11(9), 3986. https://doi.org/10.3390/app11093986

OpenAI. (2024). ChatGPT (Feb 10, 2024) [Large language model]. https://chat.openai.com/chat

Effective Technology Tools for K-12 CS Teachers

Technology plays a crucial role in teaching computer science and programming concepts to K-12 teachers. The most effective technology tools include interactive coding platforms such as Scratch, Snap! and Blockly. These tools provide a user-friendly interface and visual coding blocks, allowing students to learn programming concepts through hands-on activities and projects (Kashif Amanullah & Bell, 2020). Additionally, online learning platforms such as Code.org offer computer science platforms specifically designed for K-12 teachers. This blog examines various technologies used to teach CS in K-12 schools, drawing insights from a comprehensive study on visual programming languages (VPLs) and their suitability across different school levels.

Role of VPLs in K-12 Education:

VPLs like Scratch and ALICE have revolutionized CS education in schools. Scratch, developed by MIT, is particularly effective in elementary education due to its simplicity and interactive environment, making it an ideal tool for introducing programming concepts (Sáez-López et al., 2016). Although not web-based, ALICE has positively impacted all educational levels – elementary, high school, and undergraduate. Its ability to facilitate learning and enhance student confidence makes it an asset in the CS curriculum (Graczynska, 2010). In a 2019 study, do Nascimento et al. concluded that different visual programming languages (VPL) suit different school levels. The study focused on three VPLs: ALICE, Scratch, and iVProg. The findings indicate that Scratch is strongly suitable for elementary education, while ALICE is more appropriate for high school students. iVProg, on the other hand, has indications of suitability for high school and undergraduate levels.  

Enhancing Computational Thinking with Scratch

Studies have shown that Scratch’s block-based programming approach can significantly improve students’ computational thinking skills. Its integration into various disciplines through programming games and projects encourages creative problem-solving and logical reasoning among students (Stewart & Baek, 2023). In a significant study, Scratch was also found to integrate well into other subjects in the curriculum, such as math, science, and even art and history, where students achieved comprehension and application levels in Bloom’s taxonomy (Sáez-López et al., 2016).

Scratch Interface

The advantages of using Scratch in the classroom are that its intuitive drag-and-drop interface simplifies the programming process, allowing students to focus on the logic behind their creations rather than the code syntax. Overall, the visual programming approach via Scratch was effective for developing computational thinking, improving programming skills, enabling the creation of interactive projects, and supporting active learning pedagogies (Sáez-López et al., 2016). This is significant since Sun, Hu, and Zhou (2022) found that although girls in K-12 had higher computational thinking skills, they had more negative programming attitudes, which may impact their continued development in computational thinking. Visual programming may be a good strategic approach to engage females in computer science.

ALICE for STEAM Education

ALICE (which stands for Alice Learning in a Cyberworld Environment) is a free 3D programming platform developed at Carnegie Mellon University. The visual aspect of ALICE makes programming concepts more engaging and hands-on for students. Actions like loops, methods, and events correspond to actual animated motions they can see on screen. This helps concretize abstract coding notions that beginners often struggle to grasp.

ALICE Lists

Graczyńska (2010) highlights several example uses of ALICE targeted at middle school students:

  • Creating videos set to music, with lyrics displayed as subtitles. This combines coding with music appreciation and language arts.
  • Recording narration for animations, like reciting poetry in English or other languages. This boosts public speaking and foreign language skills.
  • Building simple games with sound effects and animations like fire. This makes programming exciting and fun.

After testing ALICE with students, Graczyńska found increased engagement and interest in programming and academics overall. The visual nature of ALICE also helps attract female students to computer science, where they are traditionally underrepresented.

The use of 3D visual programming tools like Alice has shown positive effects on students’ performance and attitude towards computer programming. Al-Tahat (2019) found that teaching visual programming greatly improved understanding of related concepts in object-oriented programming, making it a perfect fit for the intermediate grades.

Challenges and Future Directions:

The adoption of these technologies in K-12 computer science (CS) education has shown promise, yet challenges remain to be addressed. There is substantial evidence that incorporating VPLs into the K-12 curriculum significantly boosts female engagement (Sun et al., 2022; Graczyńska, 2010). Therefore, it is important to focus on course design that appeals to diverse learners, including females and underrepresented minorities. Additionally, ongoing research and development are necessary to keep up to date with technological progress and the changing needs of education (McGill et al., 2023). Sáez-López et al. (2016) have suggested that VPLs should be implemented across various subjects, particularly in social sciences and the arts, where their visual nature can inspire creative projects. Lastly, the successful integration of new programming tools hinges on teacher training and professional development. Teachers need robust support to acquire and apply these technologies effectively.

References

Kashif Amanullah, & Bell, T. (2020). Teaching Resources for Young Programmers: the use of Patterns. https://doi.org/10.1109/fie44824.2020.9273985

Sáez-López, J.-M., Román-González, M., & Vázquez-Cano, E. (2016). Visual programming languages integrated across the curriculum in elementary school: A two year case study using “Scratch” in five schools. Computers & Education97, 129–141. https://doi.org/10.1016/j.compedu.2016.03.003

Graczyńska, E. (2010). ALICE as a tool for programming at schools. Natural Science02(02), 124–129. https://doi.org/10.4236/ns.2010.22021

do Nascimento, M. D., Felix, I. M., Ferreira, B. M., de Souza, L. M., Dantas, D. L., de Oliveira Brandao, L., & de Oliveira Brandao, A. (2019). Which visual programming language best suits each school level? A look at Alice, iVProg, and Scratch. 2019 IEEE World Conference on Engineering Education (EDUNINE). https://doi.org/10.1109/edunine.2019.8875788

Stewart, W., & Baek, K. (2023). Analyzing computational thinking studies in Scratch programming: A review of elementary education literature. International Journal of Computer Science Education in Schools6(1), 35–58. https://doi.org/10.21585/ijcses.v6i1.156

Sun, L., Hu, L., & Zhou, D. (2022). Programming attitudes predict computational thinking: Analysis of differences in gender and programming experience. Computers & Education181, 104457. https://doi.org/10.1016/j.compedu.2022.104457

Graczyńska, E. (2010). ALICE as a tool for programming at schools. Natural Science02(02), 124–129. https://doi.org/10.4236/ns.2010.22021

Al-Tahat, K. (2019). The Impact of a 3D Visual Programming Tool on Students’ Performance and Attitude in Computer Programming. Journal of Cases on Information Technology21(1), 52–64. https://doi.org/10.4018/jcit.2019010104

Teaching Programming with Minecraft Education: A Reflection

Introduction

Integrating innovative tools to enhance learning is essential in the dynamic landscape of computer science education. This term, I embarked on a collaborative journey to weave Minecraft Education into a Programming 11/12 course. Our objective was to enliven the curriculum by presenting programming concepts in a more engaging and interactive manner. This reflection delves into our experiences, with a particular focus on the concept of functions.

Lesson Overview

Our lesson was carefully prepared to guide students through the fundamentals of functions in programming via the Minecraft Education platform. This approach aimed to convert abstract concepts into concrete, relatable experiences, thus making learning both enjoyable and impactful.

The session began with a simple introduction to functions in Minecraft Education using MakeCode, drawing parallels with real-life scenarios to demystify these concepts. The goal was to underscore the significance of reusing code efficiently. For instance, we showcased a function that could construct various parts of a structure, such as walls, roofs, and fences. This hands-on demonstration helped students visualize the workings of functions, deepening their comprehension.

Subsequently, we organized the students into small teams for a series of Minecraft challenges. Each group applied their newfound knowledge to construct farm elements using coded functions. Encouraging students to build barns, animal enclosures, and residential structures, this immersive experience was crucial in reinforcing the lessons imparted and empowering students to explore coding within the game environment. While the MakeCode IDE is freely available online at https://minecraft.makecode.com/,  it is important to note that witnessing the code’s execution within Minecraft Education itself requires a paid subscription for each student (which we lacked for this iteration).

Following the building activities, groups presented their projects, explained their code, and engaged in Q&A sessions. This exercise culminated in the creation of a complete farm ecosystem (with a small amount of manual intervention), facilitating peer learning and evaluating their understanding of the lesson.

The lesson wrapped up with a debriefing segment, which focused the role of functions in streamlining complex coding tasks. We also distributed surveys to gauge the students’ experiences with the lesson.

Reflections and Learnings

Reflecting on the teaching process, I’ve recognized the crucial need for thorough preparation ahead of each class. Although the lesson itself was effective, there are areas where we could have utilized our time more judiciously.

Time Management:

Our planning meetings often veered towards administrative topics, detracting from the core lesson content. This experience has ingrained in me the importance of arriving at meetings well-prepared and with preliminary research completed, to maximize our collaborative efforts.

Technical Challenges:

Establishing a connection to the same Minecraft world across various platforms, such as PC and Mac, presented significant hurdles. This impacted our preparations and underscored the necessity for preemptive compatibility checks for future sessions. The tightly controlled environment of Minecraft Education by Microsoft impeded remote learning, suggesting that Minecraft Education is best suited to in-lab settings. Remote functionality was unreliable, as indicated by non-descriptive connection error messages like “timed out,” and support from Microsoft was less than helpful. The trial version of the software, supposedly available to schools with Microsoft logins, also failed to work, potentially necessitating IT intervention.

Student Engagement:

The lesson garnered positive feedback and high engagement levels, with the practical application of programming concepts within a familiar gaming environment being a key factor in its success. Nonetheless, some students noted that the inability to run the code hindered the debugging process. Ensuring every student has access to the necessary software and hardware will be a priority for future lessons.

The Power of Interactive Learning:

A major insight from this endeavour is the profound impact of interactive learning tools such as Minecraft in teaching intricate subjects like programming. Students were more engaged and assimilated the concept of functions more thoroughly compared to conventional teaching methods.

Conclusion

Incorporating Minecraft into our programming curriculum has been enlightening for students and educators. It has accentuated the significance of preparation, flexibility, and the assurance of technical compatibility to facilitate a seamless learning experience. The positive student feedback and evident boost in engagement and comprehension underscore our conviction in the power of interactive learning tools. As we progress, we are determined to refine our methods, confront the technical obstacles, and seek inventive strategies to render education more captivating and effective.

The Role of ChatGPT in Introductory Programming Courses

Introduction

Programming education is on the cusp of a major transformation with the emergence of large language models (LLMs) like ChatGPT. These AI systems have demonstrated impressive capabilities in generating, explaining, and summarizing code, leading to proposals for their integration into coding courses. Aligning with ISTE Standard 4.1e for coaches, which urges the “connection of leaders, educators, and various experts to maximize technology’s potential for learning,” this post examines how ChatGPT and similar tools can be effectively integrated into introductory programming classes. It covers the benefits of AI tutors, insights from educators on their use, and current best practices and trends for deployment in the classroom.

The Current State of AI in Computer Science Education

The current integration of AI in computer science education is showing promising results. ChatGPT excels in providing personalized and patient explanations of programming concepts, offering code examples and solutions tailored to students’ individual needs. Its interactive conversational interface encourages students to engage in a dialogue, solidifying their understanding through active participation and feedback. Students can present coding issues in simple terms and receive a comprehensive, step-by-step explanation from ChatGPT, clarifying fundamental principles throughout the process.

Such dynamic assistance clarifies misunderstandings more effectively than static textbooks or videos. ChatGPT’s round-the-clock availability as an AI tutor offers crucial support, bridging gaps when human instructors are unavailable. According to research by Kazemitabaar et al. (2023), using LLMs like ChatGPT can bolster students’ abilities to design algorithms and write code, reducing the stress often accompanying these tasks. The study also noted increased enthusiasm for learning programming among many students after exposure to LLM-based instruction.

Pros of Incorporating ChatGPT into the Classroom

The rapid advancement of AI systems such as ChatGPT offers many opportunities and poses some challenges in computing education. ChatGPT’s conversational interface and its capability to provide personalized content make it an exceptional asset for adaptive learning in AI-assisted teaching. Biswas (2023) identifies multiple applications for LLMs in educational settings, including their role in creating practice problems and code examples that enhance teaching. Furthermore, ChatGPT can anticipate and provide relevant code snippets tailored to the programming task and user preferences, accelerating development processes. It can also fill in gaps in code by analyzing the existing framework and project parameters. Additionally, LLM-facilitated platforms help with explanations, documentation, and resource location for troubleshooting and diagnosing issues from error messages, streamlining debugging and reducing the time spent on minor yet frustrating problems.

Cons of Incorporating ChatGPT in Education

Despite the advantages of ChatGPT, there is concern that its proficiency in solving basic programming tasks may lead to student overreliance on its code generation, potentially diminishing actual learning, as evidenced by Finnie-Ansley et al. (2022) and Kazemitabaar et al. (2023). Finnie-Ansley’s research indicates that, while LLMs can perform at a high level (scoring in the top quartile on CS1 exams), they are not without significant error rates. Moreover, the benefits attributed to ChatGPT, such as code completion, syntax correction, and debugging assistance, overlap with features already available in modern Integrated Development Environments (IDEs).

Concerns extend to ChatGPT facilitating ‘AI-assisted cheating,’ which threatens academic integrity and assessment validity (Finnie-Ansley et al., 2022). To counteract this, researchers suggest crafting more innovative, conceptual assignments beyond simple coding tasks (Finnie-Ansley et al., 2022; Kazemitabaar et al., 2023). Educators in computing must adopt careful strategies for integrating ChatGPT, using it as a scaffolded instructional tool rather than a crutch for solving exam problems, to maintain a focus on in-depth learning.

Instructors’ Perspectives and Experiences

In a study conducted in 2023, Lau and Guo interviewed 20 introductory programming instructors from nine countries regarding their adaptation strategies for LLMs like ChatGPT and GitHub Copilot. In the near term, most instructors intend to limit the use of LLMs to curb cheating on assignments, which they view as a potential detriment to learning. Their strategies range from emphasizing in-person examinations to scrutinizing code submissions for patterns indicative of LLM use and outright prohibiting certain tools. Some, however, are keen to explore the capabilities of ChatGPT, proposing its cautious application, such as demonstrating its limitations to students by having them assess its output against test cases.

In contemplating the future, these educators showed greater willingness to integrate LLMs as teaching tools, recognizing their congruence with real-world job skills, their potential to enhance accessibility, and their use in facilitating more innovative forms of coursework. For example, they discussed transitioning from having students write original code to evaluating and improving upon code produced by LLMs—a few envisioned LLMs functioning as custom-tailored teaching aids for individual learners.

Pedagogical Strategies and Opportunities for Future Research

Designing problems that demand a deep understanding of concepts rather than the execution of routine coding tasks, which LLMs easily handle, is a vital pedagogical shift proposed by Finnie-Ansley et al. (2022) and Kazemitabaar et al. (2023). Utilizing ChatGPT as an interactive educational tool to complement teaching—instead of as a mere solution provider—may strike an optimal balance between its advantages and potential drawbacks. Given the pace at which AI technology is being adopted in education, there’s a pressing need for further empirical research to identify the most effective ways to integrate these tools and assess their impact on student learning.

References

Biswas, S. (2023). Role of ChatGPT in Computer Programming. Mesopotamian Journal of Computer Science, 8–16. https://doi.org/10.58496/mjcsc/2023/002

Kazemitabaar, M., Chow, J., Carl, M., Ericson, B. J., Weintrop, D., & Grossman, T. (2023). Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. https://doi.org/10.1145/3544548.3580919

Finnie-Ansley, J., Denny, P., Becker, B. A., Luxton-Reilly, A., & Prather, J. (2022). The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. Australasian Computing Education Conference. https://doi.org/10.1145/3511861.3511863

Lau, S., & Guo, P. (2023, August). From” Ban it till we understand it” to” Resistance is futile”: How university programming instructors plan to adapt as more students use AI code generation and explanation tools such as ChatGPT and GitHub Copilot. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 1 (pp. 106-121). https://doi.org/10.1145/3568813.3600138

Measuring Student Contribution in a Software Engineering Team

Introduction

In software engineering, there is very little consensus on how to measure an individual developer’s contribution. Although many measures have been proposed, their usefulness in the industry lacks validation, particularly from the perspectives of team leaders and managers (Lima et al., 2015). The lack of measurement also challenges educators (Gardner et al., 2003). This post will examine student developer contributions within the context of a software engineering project.

ISTE Standard 4.6 advocates for ed tech coaches to be data-driven decision-makers using qualitative and quantitative data to inform their decisions. Standard 4.6b states, “Support educators to interpret qualitative and quantitative data to inform their decisions and support individual student learning.” Techniques discussed in this article could be used to measure student engagement and fulfillment in a team project and give insight into where instruction can be altered in a software engineering course.

I will begin by examining the use of chat platforms like Discord to track individual student contributions. Next, I’ll discuss the role of peer evaluations in assessing team member input. Lastly, I’ll introduce repository mining techniques to quantify these contributions.

Live chat Activity

We’ll start with what I consider the least effective among the three metrics. In recent years, many modern developers have adopted Discord as a tool for real-time communication and collaboration in software engineering projects. Fundamentally, Discord channels serve as dedicated spaces for text, voice, and video communication. In educational contexts, these channels can be structured to reflect the various teams within a software project, facilitating organized, topic-specific discussions. Such channels can host various activities, from casual interactions and planning sessions to problem-solving discussions and code reviews, closely mirroring a real-world software development environment. Furthermore, Discord captures all these interactions, creating a comprehensive, searchable archive of every conversation and exchange.

Moreover, thanks to its bot-integration features, Discord is increasingly seen as an innovative tool for gauging student contributions in team-based projects. Analytical bots like Statbot offer detailed statistics on individual interactions on the platform, enabling the assessment of each student’s engagement. Chat histories also supply quantitative data on the quality of contributions in software engineering team projects.

However, while bots offer valuable quantitative and analytical insights, it’s important to complement this data with qualitative evaluations. Direct observations, feedback sessions, and individual discussions remain indispensable for grasping the subtleties of each student’s input. It’s also vital to address privacy concerns and uphold ethical standards in monitoring, ensuring clear guidelines and transparency from the instructor’s side.

Peer Evaluations

Gardner et al. (2003) conducted a study exploring the use of group member ratings to gauge relative contributions among students in a software engineering team project course. At the end of the project, students rate each team member’s contributions across four criteria using a five-point scale:

  • Attendance at team meetings.
  • Volunteering for and carrying out tasks.
  • Quality of work performed.
  • Effectiveness in communicating ideas.

The findings suggest that these anonymous peer ratings are reliable for ranking team members on their contributions. While students often rate themselves higher than their teammates, the relative contributions ranking remains consistent, which aligns with previous research (West, 2018, Ch.16).

This approach quantifies peer perceptions of engagement and effort. It motivates students to interact and collaborate and allows teams to self-manage contributions. However, limitations exist. Students may not accurately judge true contributions. Dominant personalities could influence ratings. Moreover, if grades hinge directly on these ratings, it might encourage score inflation.

Despite its limitations, peer ratings offer a systematic method to encourage and gauge participation in team projects. They represent the firsthand insights of teammates into individual efforts and team dynamics. Instructors should triangulate peer evaluations with other performance indicators to mitigate potential biases. When applied thoughtfully, group member ratings can be a scalable tool to enhance accountability and ensure equitable effort distribution within student engineering teams.

Using Git Repositories

While subjective peer evaluations are commonly used, analyzing data from git repositories provides an objective lens into individual contributions, revealing insights into aspects like collaboration patterns, subsystem ownership, and consistency of participation (Lima et al., 2015). Instructors can combine these repository-based metrics with subjective evaluations to assess student effort and engagement better.

A fundamental metric is examining each student’s number of commits over time, called code contribution (Lima et al., 2015). This helps reveal whether students contribute regularly throughout the project or make concentrated commits right before deadlines. Students with relatively few commits thinly spread across the weeks likely contributed minimally, while a student with a steady stream of commits each week demonstrates consistent engagement (Glassy, 2006).

Examining the content of commits also provides insights into contribution quality. The code complexity measure is also widely accepted as a good measure of contribution. The code complexity measure considers the complexity and difficulty of the sub-problem being solved. Complexity measures were proposed by McCabe in 1976 and are still widely used today to examine git repositories. The measures analyze code complexity before and after a team member has altered it. Low commit complexity suggests weaker contributions to the team’s software development processes.

A variation of the code complexity measure is the bug-related measures, which measure the contribution to bug introductions and bug-fixing. However, this measure has limitations because some bug fixes do not require writing code, mitigating the developer’s efforts (Lima et al., 2015). Also, advanced repository analysis can reveal collaboration patterns within student teams. Tools like FRASR and ProM introduced by Poncin et al. (2011) can extract event logs from student repository data (using FRASR) and subsequently analyze the development process (with ProM). This tool also incorporates developer roles and adherence to certain development models. 

Of course, reliance solely on git metrics has limitations. First, commits mainly represent coding contributions, overlooking other forms of participation like verbal collaboration and project leadership (Lima et al., 2015). Second, students can artificially inflate their repository activity metrics if they know the algorithm being used. Despite these drawbacks, analyzing git data provides valuable insights into individual participation on student software teams. Instructors should interpret repository metrics not as absolute contribution measures but as launching points for further investigation.

Conclusion

By balancing quantitative git data with qualitative peer evaluations, product assessments, and student interviews, instructors can obtain a more equitable evaluation of individuals. Nonetheless, there is a strong correlation between subject and objective measures of contribution to a project (Hundhausen et al., 2022). Software engineering courses require team projects, but assessing individual accountability remains vital. Combining subjective reviews and objective repository analysis helps reveal a more accurate picture of each student’s contributions and commitment.

References

Lima, J., Christoph Treude, Fernando Figueira Filho, & Kulesza, U. (2015). Assessing developer contribution with repository mining-based metrics. https://doi.org/10.1109/icsm.2015.7332509

Gardner, W. (2003). Assessing individual contributions to group software projects. In 8th Western Canadian Conference on Computing Education (WCCCE’03) (pp. 33-50).

Hundhausen, C. D., Conrad, P. T., Carter, A. S., & Adesope, O. (2022). Assessing individual contributions to software engineering projects: a replication study. Computer Science Education32(3), 335–354. https://doi.org/10.1080/08993408.2022.2071543

West, R. E. (2018). Foundations of Learning and Instructional Design Technology. https://doi.org/10.59668/3

Glassy, L. (2006). Using version control to observe student software development processes. Journal of Computing Sciences in Colleges21(3), 99–106.

McCabe, T. J. (1976). A Complexity Measure. IEEE Transactions on Software EngineeringSE-2(4), 308–320. https://doi.org/10.1109/tse.1976.233837

Poncin, W., Serebrenik, A., & Mark. (2011). Mining student capstone projects with FRASR and ProM. https://doi.org/10.1145/2048147.2048181

Teaching Computer Science with Minecraft

Introduction to Minecraft

Minecraft is currently one of the most popular games of 2023, boasting over 140 million monthly active users, according to searchlogistics.com. Despite this popularity, many players overlook that Minecraft offers an engaging and immersive environment for learning terminal commands, programming basics, computational thinking, and even artificial intelligence. ISTE standard 4.3a for coaches indicates that a successful coach should “Establish trusting and respectful coaching relationships that encourage educators to explore new instructional strategies.” So, in this blog post, I will delve into the educational benefits of Minecraft and explore the differences between the Java and Education editions.

While Minecraft is often regarded as merely a game, educators have recognized its potential as a valuable learning tool. At its core, Minecraft is built upon programming concepts. Players use blocks made of various materials to construct anything they can imagine, from simple houses to complex machines that require advanced knowledge of electronics, chemistry, and physics. This encourages computational thinking, creativity, and problem-solving as students work to bring their visions to life.

Concerning programming, Minecraft helps teach fundamental coding concepts, including commands, functions, variables, loops, and conditionals. Students can employ block-based coding or full-fledged programming languages such as Python and JavaScript to automate actions within the game. This hands-on approach to learning captivates students more effectively than traditional coding lessons, as Minecraft provides them with an imaginative space to immediately apply their newfound skills. Creating Minecraft modifications (mods) teaches students how to extend existing programs, a critical programming skill.

Minecraft Versions

Several versions of Minecraft are available for players to choose from, including Minecraft: Java Edition, Minecraft: Bedrock Edition, Minecraft: Education Edition, and Minecraft: Pocket Edition. However, for the specific purpose of our educational analysis, we will concentrate solely on the Java and Education editions. These two versions offer unique features and opportunities for learning that make them particularly relevant in an educational context.

Minecraft: Java Edition

The Java Edition is the original version of Minecraft developed in 2009 by Mojang Studios for Windows, macOS, and Linux, and maintains its popularity among long-time Minecraft players.

The Java Edition offers distinct advantages when teaching advanced computer science concepts due to its “mod-ability” and access to the source code of the game environment. The semi-open-source nature of the Java Edition allows for limitless customization through mods and plugins. Writing mods can illustrate a wide range of advanced programming concepts, including event handling, parallel programming, algorithms, data structures, debugging, and software design patterns. Developing mods not only imparts practical software development skills but also encourages students to show their creativity.

The Minecraft community has produced numerous mods that cater to various lesson plans. For instance, ComputerCraft introduces programmable turtle robots, while RedstonePlus enhances the game with advanced circuitry. The diversity of available mods supports a wide range of educational objectives, not only in CS but other disciplines.

Minecraft: Education/Bedrock Edition

Minecraft: Bedrock Edition was initially released in August 2011 and is particularly advantageous for classrooms with various devices. Bedrock Edition supports mobile devices such as iPads and Android tablets, which many schools already incorporate into their teaching environments. This enables students to start their Minecraft lessons on a classroom desktop computer during the day and seamlessly continue playing on their smartphones or game consoles at home.

However, Bedrock Edition offers less mod support and limited access to code customization. Minecraft Education Edition is a version of Bedrock specifically tailored for classroom use. According to Microsoft, it “typically runs about one full version behind the current Minecraft Bedrock production version” (FAQ: Game Features, 2023).

Advantages of Minecraft Education in the Classroom

One of the most significant advantages of Minecraft Education in a computer science course is its block-based CodeBuilder / MakeCode editor, similar to Scratch or Snap. This editor allows students to drag and drop commands to perform actions in the game. Younger students can learn coding logic and structure by creating houses, gardens, and machines using these visual blocks before transitioning to text-based programming languages like Python or JavaScript.

Another advantage of Education Edition is the teachers’ ability to implement special restrictions, such as limiting chat or preventing students from destroying blocks. These classroom controls create a safe environment for student exploration. Teachers can also switch to spectator mode to observe students and provide feedback; they also have the capability to build worlds and restrict access as needed. Here is a quick start guide for reference.

The Education Edition library offers hundreds of pre-made interactive worlds and lesson plans aligned with computer science curriculum standards (source: https://education.minecraft.net/en-us/resources/computer-science-subject-kit). Teachers can find lesson plans tailored to any grade level, making it much easier for educators to get started with Minecraft compared to building worlds from scratch.

According to research by Bile (2022), their study found that children aged 8 to 10 in a Minecraft education setting were able to solve abstract and complex scientific problems without prior prompting or theoretical knowledge. The game format also helped students retain knowledge better. Vostinar & Dobrota (2022) similarly found that in a primary school class, even though the majority of students had not programmed before in block or Python, they found the lesson enjoyable and easy. Furthermore, according to Nika Klimová et al. (2021), girls in grades 5-10 typically outperform boys in Minecraft education coding challenges, suggesting it may be a valuable tool for increasing diversity in computer science.

Disadvantages of Minecraft

As Vostinar & Dobrota (2022, p. 652) pointed out, there are significant disadvantages to using Minecraft in education. One such drawback is that Minecraft is not free and requires an additional cost per student, which, as mentioned in my previous post, raises ethical concerns about the practice of making students pay for educational software. Another disadvantage is that Minecraft may only appeal to a certain type of student, particularly those with a more creative inclination, potentially excluding students who do not have an affinity for the game.

Furthermore, teachers must become proficient in the game’s mechanics and capabilities to integrate it into the classroom effectively. Given the abundance of “cheats” in Minecraft, more experienced players may find trivial command-line solutions to problems if the teacher is unaware of their existence. Finally, as highlighted by Vostinar & Dobrota (2022), it’s essential to impose adequate constraints on the virtual world, especially when students collaborate, to prevent them from destroying the world with TNT blocks and other mining tools.

References:

Vostinar, P., & Dobrota, R. (2022). Minecraft as a Tool for Teaching Online Programming. 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO). https://doi.org/10.23919/mipro55190.2022.9803384

Bile, A. (2022). Development of intellectual and scientific abilities through game-programming in Minecraft. Education and Information Technologies, 1–16. https://doi.org/10.1007/s10639-022-10894-z

Nika Klimová, Jakub Sajben, & Lovászová, G. (2021). Online Game-Based Learning through Minecraft: Education Edition Programming Contest. https://doi.org/10.1109/educon46332.2021.9453953

FAQ: Game Features. (2023, September 15). Minecraft Education. https://educommunity.minecraft.net/hc/en-us/articles/360047117692-FAQ-Game-Features

The Pros and Cons of Autograders in Programming Courses

Programming courses typically require assignments where students write code to fulfill specific specifications. In such courses, an autograder serves as an automated tool designed to assess student code submissions by conducting input and output tests. Autograders have been in existence since the inception of computer science as a field of study (Hollingsworth, 1960). More recently, with the increase of massive online programming courses hosting up to 500 students, autograders have gained popularity as an efficient means for grading programming assignments (Keuning et al., 2018). They are instrumental in student engagement (Iosup & Epema, 2014) and pivotal in providing students with constructive feedback (Keuning et al., 2018). However, like any educational technology, autograders come with their own set of advantages and disadvantages that warrant consideration. This post aims to explore the significant pros and cons of employing autograders for assessments in programming courses.

Several renowned proprietary programming autograders are currently available, including CodePost, CodeGrade, Codio, and Mimir. Each tool offers a wealth of academic programming resources, including built-in problems, user-friendly interfaces, flexible question setting, and code review capabilities. However, these companies impose a substantial annual fee on institutions, ranging from $20,000 to $100,000 CAD, for a standard school comprising 1000 students. Additionally, each student is required to pay a monthly fee between $10 and $50 CAD.

In my view, such pricing is excessive (and greedy) and contradicts the principles outlined in the computer science code of ethics, particularly when the software is intended to advance software development. As a result, many post-secondary institutions opt to develop and maintain autograders in-house, tailoring them to their specific preferences. This approach allows faculty to propose new features and enhancements, and students can also contribute suggestions for improvement.

Advantages of Autograders

One of the most compelling incentives for using an autograder is the significant time savings it offers instructors compared to manual grading. Studies indicate that autograders can assess assignments at least three to four times faster than human graders (Ihantola et al., 2010; Keuning et al., 2018). This substantial reduction in grading workload allows instructors to allocate more time to essential teaching tasks such as lesson planning, curriculum development, and providing student support and feedback. The time savings can be particularly substantial in large classes.

Autograders also benefit students by providing quicker feedback on their work. This is especially valuable in introductory programming classes, where receiving prompt results on smaller assignments can significantly enhance student learning and motivation (Keuning et al., 2018). Unlike human grading, which can take days or weeks, autograders can assess submissions within seconds or minutes and instantly inform students whether their code has passed or failed the test cases. This expedited feedback allows students to validate and refine their work much more rapidly than traditional grading methods permit.

A prevalent concern with human graders is the inconsistency in grading from one assignment to another, from one student to another, or even within a single assignment. Factors such as fatigue, emotional states, and biases can impact the quality of human grading, potentially leading to unfairness or errors. Autograders, by contrast, eliminate this subjectivity by applying uniform standards and tests to all submissions, ensuring consistent and equitable grading across the entire class, and thereby enhancing student satisfaction (Hagerer, 2021).

In courses that employ autograders, students quickly learn the necessity of writing code that meets all the autograder test cases to secure maximum assignment credit. While the efficacy of test-driven development (TDD) as a software testing methodology is debatable, this workflow provides students with experience in the TDD framework. Here, students continually run tests on their code to rectify errors and attain the desired functionality (Wang et al., 2011). Essentially, autograders compel students to consider testing as an integral part of coding, rather than merely striving to meet the minimal functional requirements.

Disadvantages of Autograders

A significant drawback of autograders, frequently cited in literature, is their inflexibility compared to human graders (Ihantola et al., 2010; Keuning et al., 2018; Wang et al., 2018). Autograders strictly apply identical test cases to all submissions without exception. Consequently, creative solutions that meet the assignment requirements but deviate from the expected implementation or output format are marked incorrect. Even a minor discrepancy such as a missing whitespace can be the difference between a pass and a fail. Unlike autograders, human graders can exercise judgment to accommodate alternative approaches.

Most autograders assess the functional correctness of student codes, evaluating output for given tests. However, programming courses also aim to instill good coding practices, such as readability, modularization, adherence to naming conventions, coherent design, and appropriate commenting, in students. Autograders do not adequately assess these crucial design and style aspects, leading students to neglect good design principles as long as their code passes the functionality tests.

Another concern is that while autograders are designed to offer students a structured means to advance their knowledge across multiple courses, achieving uniformity in their application across various courses is challenging, especially in larger institutions. Typically, post-secondary institutions employ autograders to maintain consistency across different courses, enabling students to track their progress effectively. However, in institutions where numerous faculty members teach diverse courses with varying requirements, achieving universal acceptance and use of autograders is complex. Faculty members may prefer different tools they are more comfortable with, and some might choose not to use autograders. This results in a lack of uniformity in tool usage from one course to another, creating a disjointed student experience.

Relying exclusively on autograders poses the risk of students learning to pass test cases without acquiring a deeper understanding of programming concepts and problem-solving skills. The emphasis on meeting the autograder’s criteria can lead students to adopt a procedural approach, focusing on achieving the correct output rather than understanding the underlying logic. Some might resort to a trial-and-error method, tweaking their program until it gains autograder approval. While this approach may secure the desired grades, it does not foster genuine understanding or long-term retention of knowledge. Baniassad et al. (2021) introduced a submission penalty at the University of British Columbia to discourage over-reliance on their in-house autograding tool. This adaptation exemplifies the flexibility of modifying tool requirements, a possibility uniquely available when the tool is developed in-house.

Finally, like any web-based software system, autograders can experience technical issues that lead to grading failures and student frustration. The UC Berkeley incident highlights the “single point of failure” risk where an autograder disruption blocks all grading capabilities. Unlike distributed human graders, a centralized automated grader represents a vulnerability to technical problems. Some may fail to meet deadlines through no fault of their own. Furthermore, if instructors refuse to make accommodations for autograder malfunctions, students can feel cheated and that the grading is unfairly disconnected from actual instruction. This speaks to larger concerns around over-reliance on algorithmic systems in education. Automated aids like autograders should not be seen as the sole means of assessment.

Conclusion

The existing body of research on autograders underscores that they are not a panacea for replacing human graders entirely. Instead, to optimize their advantages and mitigate their limitations, autograders are most effective when thoughtfully integrated into a course assessment strategy, complemented by manual grading where it is most beneficial. Below are some best practices for incorporating autograders effectively:

  • Employ autograders for basic functionality testing, while manually reviewing selected assignments for flexibility, creativity, and design.
  • Utilize autograders to assess the correctness of core logic, and rely on human graders to evaluate structure, style, and readability.
  • Complement autograder evaluations with human feedback on prevalent mistakes and areas requiring enhancement.
  • Impose penalties for excessive submissions to discourage over-reliance on the autograder.

Proper integration of autograders aligns with technology integration frameworks like SAMR, enhancing existing processes without entirely transforming the grading in programming courses. It also redefines the manner in which students engage with programming, introducing a more gamified approach. Like any educational technology, the value of autograders is derived from their strategic utilization within well-defined goals and contexts.

References

Hollingsworth, J. (1960). Automatic graders for programming classes. Communications of the ACM3(10), 528–529. https://doi.org/10.1145/367415.367422

Keuning, H., Jeuring, J., & Heeren, B. (2016). Towards a Systematic Review of Automated Feedback Generation for Programming Exercises. Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education. https://doi.org/10.1145/2899415.2899422

Iosup, A., & Epema, D. (2014). An experience report on using gamification in technical higher education. Proceedings of the 45th ACM Technical Symposium on Computer Science Education – SIGCSE ’14. https://doi.org/10.1145/2538862.2538899

Ihantola, P., Ahoniemi, T., Karavirta, V., & Seppälä, O. (2010). Review of recent systems for automatic assessment of programming assignments. Proceedings of the 10th Koli Calling International Conference on Computing Education Research – Koli Calling ’10. https://doi.org/10.1145/1930464.1930480

Hagerer, G. (2021). An Analysis of Programming Course Evaluations Before and After the Introduction of an Autograder. (n.d.). Ieeexplore.ieee.org.

 Wang, T., Su, X., Ma, P., Wang, Y., & Wang, K. (2011). Ability-training-oriented automated assessment in introductory programming course. Computers & Education56(1), 220–226. https://doi.org/10.1016/j.compedu.2010.08.003

Baniassad, E., Zamprogno, L., Hall, B., & Holmes, R. (2021). STOP THE (AUTOGRADER) INSANITY: Regression Penalties to Deter Autograder Overreliance. Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. https://doi.org/10.1145/3408877.3432430

Reflecting on a Study of Competitive Programming and Cultural Inclusion

Length of Study

The study is designed to take place over two academic terms, which provides adequate time to collect meaningful data. The inclusion of an initial summer term without competitive programming establishes a baseline for comparison. The second summer term incorporates competitive programming using standardized questions, allowing assessment of this pedagogical approach. The fall term offering adds the dimension of culturally relevant questions, enabling analysis of their impact. Extending the study over multiple terms enables more robust data collection and analysis.

Promoting Active and Engaged Learning

The core content is delivered through weekly lectures focused on programming concepts. The competitive programming contests complement the lectures by providing opportunities to practice applying concepts. Weekly competitive programming contests foster active learning in several key ways. Students must apply conceptual knowledge to solve concrete programming problems. This process reinforces their understanding and helps identify knowledge gaps. The contest format adds an engaging gamification element through scoring, feedback, and peer comparison. Using standardized questions initially assesses whether baseline content needs are being met.

Introducing culturally relevant questions aims to promote better integration of concepts by relating them to students’ cultural knowledge and experiences. Having students co-create contest questions in the fall term further activates learning. They must think critically to develop culturally relevant problems that integrate with the content. This approach promotes deeper engagement with the material and encourages collaboration with classmates, allowing students to take ownership of their learning.

Addressing Teachers’ Needs

The study aims to provide teachers with insight into using competitive programming and culturally relevant pedagogy. The data collected will help determine the effectiveness of these approaches in an international educational setting. Instructors will gain an understanding of how competitive programming engages students versus standardized practice problems. They will also see whether student-created culturally relevant questions increase participation and motivation. The study addresses teachers’ needs for effective and inclusive instructional strategies. They will gain practical knowledge from the comparative data on different contest designs.

Promoting Collaborative Participation

Collaboration is encouraged through the group development of culturally relevant contest questions. Students can brainstorm and build on each other’s ideas, which fosters teamwork. Producing questions from diverse cultural perspectives requires working together. Students are also given the choice of problem-solving in teams. Students can motivate each other and strategize in groups for the competitions. Their scores are tracked on a collective leaderboard which reinforces the collaborative element. The shift from individual to team contest creation necessitates and enables productive collaboration.

The multi-term study design, interactive contest format, customized problems, and collaborative elements demonstrate an interesting pedagogical approach that promotes engaged and inclusive learning. The results should provide valuable insights for computer science educators.