Measuring Student Contribution in a Software Engineering Team

Introduction

In software engineering, there is very little consensus on how to measure an individual developer’s contribution. Although many measures have been proposed, their usefulness in the industry lacks validation, particularly from the perspectives of team leaders and managers (Lima et al., 2015). The lack of measurement also challenges educators (Gardner et al., 2003). This post will examine student developer contributions within the context of a software engineering project.

ISTE Standard 4.6 advocates for ed tech coaches to be data-driven decision-makers using qualitative and quantitative data to inform their decisions. Standard 4.6b states, “Support educators to interpret qualitative and quantitative data to inform their decisions and support individual student learning.” Techniques discussed in this article could be used to measure student engagement and fulfillment in a team project and give insight into where instruction can be altered in a software engineering course.

I will begin by examining the use of chat platforms like Discord to track individual student contributions. Next, I’ll discuss the role of peer evaluations in assessing team member input. Lastly, I’ll introduce repository mining techniques to quantify these contributions.

Live chat Activity

We’ll start with what I consider the least effective among the three metrics. In recent years, many modern developers have adopted Discord as a tool for real-time communication and collaboration in software engineering projects. Fundamentally, Discord channels serve as dedicated spaces for text, voice, and video communication. In educational contexts, these channels can be structured to reflect the various teams within a software project, facilitating organized, topic-specific discussions. Such channels can host various activities, from casual interactions and planning sessions to problem-solving discussions and code reviews, closely mirroring a real-world software development environment. Furthermore, Discord captures all these interactions, creating a comprehensive, searchable archive of every conversation and exchange.

Moreover, thanks to its bot-integration features, Discord is increasingly seen as an innovative tool for gauging student contributions in team-based projects. Analytical bots like Statbot offer detailed statistics on individual interactions on the platform, enabling the assessment of each student’s engagement. Chat histories also supply quantitative data on the quality of contributions in software engineering team projects.

However, while bots offer valuable quantitative and analytical insights, it’s important to complement this data with qualitative evaluations. Direct observations, feedback sessions, and individual discussions remain indispensable for grasping the subtleties of each student’s input. It’s also vital to address privacy concerns and uphold ethical standards in monitoring, ensuring clear guidelines and transparency from the instructor’s side.

Peer Evaluations

Gardner et al. (2003) conducted a study exploring the use of group member ratings to gauge relative contributions among students in a software engineering team project course. At the end of the project, students rate each team member’s contributions across four criteria using a five-point scale:

  • Attendance at team meetings.
  • Volunteering for and carrying out tasks.
  • Quality of work performed.
  • Effectiveness in communicating ideas.

The findings suggest that these anonymous peer ratings are reliable for ranking team members on their contributions. While students often rate themselves higher than their teammates, the relative contributions ranking remains consistent, which aligns with previous research (West, 2018, Ch.16).

This approach quantifies peer perceptions of engagement and effort. It motivates students to interact and collaborate and allows teams to self-manage contributions. However, limitations exist. Students may not accurately judge true contributions. Dominant personalities could influence ratings. Moreover, if grades hinge directly on these ratings, it might encourage score inflation.

Despite its limitations, peer ratings offer a systematic method to encourage and gauge participation in team projects. They represent the firsthand insights of teammates into individual efforts and team dynamics. Instructors should triangulate peer evaluations with other performance indicators to mitigate potential biases. When applied thoughtfully, group member ratings can be a scalable tool to enhance accountability and ensure equitable effort distribution within student engineering teams.

Using Git Repositories

While subjective peer evaluations are commonly used, analyzing data from git repositories provides an objective lens into individual contributions, revealing insights into aspects like collaboration patterns, subsystem ownership, and consistency of participation (Lima et al., 2015). Instructors can combine these repository-based metrics with subjective evaluations to assess student effort and engagement better.

A fundamental metric is examining each student’s number of commits over time, called code contribution (Lima et al., 2015). This helps reveal whether students contribute regularly throughout the project or make concentrated commits right before deadlines. Students with relatively few commits thinly spread across the weeks likely contributed minimally, while a student with a steady stream of commits each week demonstrates consistent engagement (Glassy, 2006).

Examining the content of commits also provides insights into contribution quality. The code complexity measure is also widely accepted as a good measure of contribution. The code complexity measure considers the complexity and difficulty of the sub-problem being solved. Complexity measures were proposed by McCabe in 1976 and are still widely used today to examine git repositories. The measures analyze code complexity before and after a team member has altered it. Low commit complexity suggests weaker contributions to the team’s software development processes.

A variation of the code complexity measure is the bug-related measures, which measure the contribution to bug introductions and bug-fixing. However, this measure has limitations because some bug fixes do not require writing code, mitigating the developer’s efforts (Lima et al., 2015). Also, advanced repository analysis can reveal collaboration patterns within student teams. Tools like FRASR and ProM introduced by Poncin et al. (2011) can extract event logs from student repository data (using FRASR) and subsequently analyze the development process (with ProM). This tool also incorporates developer roles and adherence to certain development models. 

Of course, reliance solely on git metrics has limitations. First, commits mainly represent coding contributions, overlooking other forms of participation like verbal collaboration and project leadership (Lima et al., 2015). Second, students can artificially inflate their repository activity metrics if they know the algorithm being used. Despite these drawbacks, analyzing git data provides valuable insights into individual participation on student software teams. Instructors should interpret repository metrics not as absolute contribution measures but as launching points for further investigation.

Conclusion

By balancing quantitative git data with qualitative peer evaluations, product assessments, and student interviews, instructors can obtain a more equitable evaluation of individuals. Nonetheless, there is a strong correlation between subject and objective measures of contribution to a project (Hundhausen et al., 2022). Software engineering courses require team projects, but assessing individual accountability remains vital. Combining subjective reviews and objective repository analysis helps reveal a more accurate picture of each student’s contributions and commitment.

References

Lima, J., Christoph Treude, Fernando Figueira Filho, & Kulesza, U. (2015). Assessing developer contribution with repository mining-based metrics. https://doi.org/10.1109/icsm.2015.7332509

Gardner, W. (2003). Assessing individual contributions to group software projects. In 8th Western Canadian Conference on Computing Education (WCCCE’03) (pp. 33-50).

Hundhausen, C. D., Conrad, P. T., Carter, A. S., & Adesope, O. (2022). Assessing individual contributions to software engineering projects: a replication study. Computer Science Education32(3), 335–354. https://doi.org/10.1080/08993408.2022.2071543

West, R. E. (2018). Foundations of Learning and Instructional Design Technology. https://doi.org/10.59668/3

Glassy, L. (2006). Using version control to observe student software development processes. Journal of Computing Sciences in Colleges21(3), 99–106.

McCabe, T. J. (1976). A Complexity Measure. IEEE Transactions on Software EngineeringSE-2(4), 308–320. https://doi.org/10.1109/tse.1976.233837

Poncin, W., Serebrenik, A., & Mark. (2011). Mining student capstone projects with FRASR and ProM. https://doi.org/10.1145/2048147.2048181

Teaching Computer Science with Minecraft

Introduction to Minecraft

Minecraft is currently one of the most popular games of 2023, boasting over 140 million monthly active users, according to searchlogistics.com. Despite this popularity, many players overlook that Minecraft offers an engaging and immersive environment for learning terminal commands, programming basics, computational thinking, and even artificial intelligence. ISTE standard 4.3a for coaches indicates that a successful coach should “Establish trusting and respectful coaching relationships that encourage educators to explore new instructional strategies.” So, in this blog post, I will delve into the educational benefits of Minecraft and explore the differences between the Java and Education editions.

While Minecraft is often regarded as merely a game, educators have recognized its potential as a valuable learning tool. At its core, Minecraft is built upon programming concepts. Players use blocks made of various materials to construct anything they can imagine, from simple houses to complex machines that require advanced knowledge of electronics, chemistry, and physics. This encourages computational thinking, creativity, and problem-solving as students work to bring their visions to life.

Concerning programming, Minecraft helps teach fundamental coding concepts, including commands, functions, variables, loops, and conditionals. Students can employ block-based coding or full-fledged programming languages such as Python and JavaScript to automate actions within the game. This hands-on approach to learning captivates students more effectively than traditional coding lessons, as Minecraft provides them with an imaginative space to immediately apply their newfound skills. Creating Minecraft modifications (mods) teaches students how to extend existing programs, a critical programming skill.

Minecraft Versions

Several versions of Minecraft are available for players to choose from, including Minecraft: Java Edition, Minecraft: Bedrock Edition, Minecraft: Education Edition, and Minecraft: Pocket Edition. However, for the specific purpose of our educational analysis, we will concentrate solely on the Java and Education editions. These two versions offer unique features and opportunities for learning that make them particularly relevant in an educational context.

Minecraft: Java Edition

The Java Edition is the original version of Minecraft developed in 2009 by Mojang Studios for Windows, macOS, and Linux, and maintains its popularity among long-time Minecraft players.

The Java Edition offers distinct advantages when teaching advanced computer science concepts due to its “mod-ability” and access to the source code of the game environment. The semi-open-source nature of the Java Edition allows for limitless customization through mods and plugins. Writing mods can illustrate a wide range of advanced programming concepts, including event handling, parallel programming, algorithms, data structures, debugging, and software design patterns. Developing mods not only imparts practical software development skills but also encourages students to show their creativity.

The Minecraft community has produced numerous mods that cater to various lesson plans. For instance, ComputerCraft introduces programmable turtle robots, while RedstonePlus enhances the game with advanced circuitry. The diversity of available mods supports a wide range of educational objectives, not only in CS but other disciplines.

Minecraft: Education/Bedrock Edition

Minecraft: Bedrock Edition was initially released in August 2011 and is particularly advantageous for classrooms with various devices. Bedrock Edition supports mobile devices such as iPads and Android tablets, which many schools already incorporate into their teaching environments. This enables students to start their Minecraft lessons on a classroom desktop computer during the day and seamlessly continue playing on their smartphones or game consoles at home.

However, Bedrock Edition offers less mod support and limited access to code customization. Minecraft Education Edition is a version of Bedrock specifically tailored for classroom use. According to Microsoft, it “typically runs about one full version behind the current Minecraft Bedrock production version” (FAQ: Game Features, 2023).

Advantages of Minecraft Education in the Classroom

One of the most significant advantages of Minecraft Education in a computer science course is its block-based CodeBuilder / MakeCode editor, similar to Scratch or Snap. This editor allows students to drag and drop commands to perform actions in the game. Younger students can learn coding logic and structure by creating houses, gardens, and machines using these visual blocks before transitioning to text-based programming languages like Python or JavaScript.

Another advantage of Education Edition is the teachers’ ability to implement special restrictions, such as limiting chat or preventing students from destroying blocks. These classroom controls create a safe environment for student exploration. Teachers can also switch to spectator mode to observe students and provide feedback; they also have the capability to build worlds and restrict access as needed. Here is a quick start guide for reference.

The Education Edition library offers hundreds of pre-made interactive worlds and lesson plans aligned with computer science curriculum standards (source: https://education.minecraft.net/en-us/resources/computer-science-subject-kit). Teachers can find lesson plans tailored to any grade level, making it much easier for educators to get started with Minecraft compared to building worlds from scratch.

According to research by Bile (2022), their study found that children aged 8 to 10 in a Minecraft education setting were able to solve abstract and complex scientific problems without prior prompting or theoretical knowledge. The game format also helped students retain knowledge better. Vostinar & Dobrota (2022) similarly found that in a primary school class, even though the majority of students had not programmed before in block or Python, they found the lesson enjoyable and easy. Furthermore, according to Nika Klimová et al. (2021), girls in grades 5-10 typically outperform boys in Minecraft education coding challenges, suggesting it may be a valuable tool for increasing diversity in computer science.

Disadvantages of Minecraft

As Vostinar & Dobrota (2022, p. 652) pointed out, there are significant disadvantages to using Minecraft in education. One such drawback is that Minecraft is not free and requires an additional cost per student, which, as mentioned in my previous post, raises ethical concerns about the practice of making students pay for educational software. Another disadvantage is that Minecraft may only appeal to a certain type of student, particularly those with a more creative inclination, potentially excluding students who do not have an affinity for the game.

Furthermore, teachers must become proficient in the game’s mechanics and capabilities to integrate it into the classroom effectively. Given the abundance of “cheats” in Minecraft, more experienced players may find trivial command-line solutions to problems if the teacher is unaware of their existence. Finally, as highlighted by Vostinar & Dobrota (2022), it’s essential to impose adequate constraints on the virtual world, especially when students collaborate, to prevent them from destroying the world with TNT blocks and other mining tools.

References:

Vostinar, P., & Dobrota, R. (2022). Minecraft as a Tool for Teaching Online Programming. 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO). https://doi.org/10.23919/mipro55190.2022.9803384

Bile, A. (2022). Development of intellectual and scientific abilities through game-programming in Minecraft. Education and Information Technologies, 1–16. https://doi.org/10.1007/s10639-022-10894-z

Nika Klimová, Jakub Sajben, & Lovászová, G. (2021). Online Game-Based Learning through Minecraft: Education Edition Programming Contest. https://doi.org/10.1109/educon46332.2021.9453953

FAQ: Game Features. (2023, September 15). Minecraft Education. https://educommunity.minecraft.net/hc/en-us/articles/360047117692-FAQ-Game-Features

The Pros and Cons of Autograders in Programming Courses

Programming courses typically require assignments where students write code to fulfill specific specifications. In such courses, an autograder serves as an automated tool designed to assess student code submissions by conducting input and output tests. Autograders have been in existence since the inception of computer science as a field of study (Hollingsworth, 1960). More recently, with the increase of massive online programming courses hosting up to 500 students, autograders have gained popularity as an efficient means for grading programming assignments (Keuning et al., 2018). They are instrumental in student engagement (Iosup & Epema, 2014) and pivotal in providing students with constructive feedback (Keuning et al., 2018). However, like any educational technology, autograders come with their own set of advantages and disadvantages that warrant consideration. This post aims to explore the significant pros and cons of employing autograders for assessments in programming courses.

Several renowned proprietary programming autograders are currently available, including CodePost, CodeGrade, Codio, and Mimir. Each tool offers a wealth of academic programming resources, including built-in problems, user-friendly interfaces, flexible question setting, and code review capabilities. However, these companies impose a substantial annual fee on institutions, ranging from $20,000 to $100,000 CAD, for a standard school comprising 1000 students. Additionally, each student is required to pay a monthly fee between $10 and $50 CAD.

In my view, such pricing is excessive (and greedy) and contradicts the principles outlined in the computer science code of ethics, particularly when the software is intended to advance software development. As a result, many post-secondary institutions opt to develop and maintain autograders in-house, tailoring them to their specific preferences. This approach allows faculty to propose new features and enhancements, and students can also contribute suggestions for improvement.

Advantages of Autograders

One of the most compelling incentives for using an autograder is the significant time savings it offers instructors compared to manual grading. Studies indicate that autograders can assess assignments at least three to four times faster than human graders (Ihantola et al., 2010; Keuning et al., 2018). This substantial reduction in grading workload allows instructors to allocate more time to essential teaching tasks such as lesson planning, curriculum development, and providing student support and feedback. The time savings can be particularly substantial in large classes.

Autograders also benefit students by providing quicker feedback on their work. This is especially valuable in introductory programming classes, where receiving prompt results on smaller assignments can significantly enhance student learning and motivation (Keuning et al., 2018). Unlike human grading, which can take days or weeks, autograders can assess submissions within seconds or minutes and instantly inform students whether their code has passed or failed the test cases. This expedited feedback allows students to validate and refine their work much more rapidly than traditional grading methods permit.

A prevalent concern with human graders is the inconsistency in grading from one assignment to another, from one student to another, or even within a single assignment. Factors such as fatigue, emotional states, and biases can impact the quality of human grading, potentially leading to unfairness or errors. Autograders, by contrast, eliminate this subjectivity by applying uniform standards and tests to all submissions, ensuring consistent and equitable grading across the entire class, and thereby enhancing student satisfaction (Hagerer, 2021).

In courses that employ autograders, students quickly learn the necessity of writing code that meets all the autograder test cases to secure maximum assignment credit. While the efficacy of test-driven development (TDD) as a software testing methodology is debatable, this workflow provides students with experience in the TDD framework. Here, students continually run tests on their code to rectify errors and attain the desired functionality (Wang et al., 2011). Essentially, autograders compel students to consider testing as an integral part of coding, rather than merely striving to meet the minimal functional requirements.

Disadvantages of Autograders

A significant drawback of autograders, frequently cited in literature, is their inflexibility compared to human graders (Ihantola et al., 2010; Keuning et al., 2018; Wang et al., 2018). Autograders strictly apply identical test cases to all submissions without exception. Consequently, creative solutions that meet the assignment requirements but deviate from the expected implementation or output format are marked incorrect. Even a minor discrepancy such as a missing whitespace can be the difference between a pass and a fail. Unlike autograders, human graders can exercise judgment to accommodate alternative approaches.

Most autograders assess the functional correctness of student codes, evaluating output for given tests. However, programming courses also aim to instill good coding practices, such as readability, modularization, adherence to naming conventions, coherent design, and appropriate commenting, in students. Autograders do not adequately assess these crucial design and style aspects, leading students to neglect good design principles as long as their code passes the functionality tests.

Another concern is that while autograders are designed to offer students a structured means to advance their knowledge across multiple courses, achieving uniformity in their application across various courses is challenging, especially in larger institutions. Typically, post-secondary institutions employ autograders to maintain consistency across different courses, enabling students to track their progress effectively. However, in institutions where numerous faculty members teach diverse courses with varying requirements, achieving universal acceptance and use of autograders is complex. Faculty members may prefer different tools they are more comfortable with, and some might choose not to use autograders. This results in a lack of uniformity in tool usage from one course to another, creating a disjointed student experience.

Relying exclusively on autograders poses the risk of students learning to pass test cases without acquiring a deeper understanding of programming concepts and problem-solving skills. The emphasis on meeting the autograder’s criteria can lead students to adopt a procedural approach, focusing on achieving the correct output rather than understanding the underlying logic. Some might resort to a trial-and-error method, tweaking their program until it gains autograder approval. While this approach may secure the desired grades, it does not foster genuine understanding or long-term retention of knowledge. Baniassad et al. (2021) introduced a submission penalty at the University of British Columbia to discourage over-reliance on their in-house autograding tool. This adaptation exemplifies the flexibility of modifying tool requirements, a possibility uniquely available when the tool is developed in-house.

Finally, like any web-based software system, autograders can experience technical issues that lead to grading failures and student frustration. The UC Berkeley incident highlights the “single point of failure” risk where an autograder disruption blocks all grading capabilities. Unlike distributed human graders, a centralized automated grader represents a vulnerability to technical problems. Some may fail to meet deadlines through no fault of their own. Furthermore, if instructors refuse to make accommodations for autograder malfunctions, students can feel cheated and that the grading is unfairly disconnected from actual instruction. This speaks to larger concerns around over-reliance on algorithmic systems in education. Automated aids like autograders should not be seen as the sole means of assessment.

Conclusion

The existing body of research on autograders underscores that they are not a panacea for replacing human graders entirely. Instead, to optimize their advantages and mitigate their limitations, autograders are most effective when thoughtfully integrated into a course assessment strategy, complemented by manual grading where it is most beneficial. Below are some best practices for incorporating autograders effectively:

  • Employ autograders for basic functionality testing, while manually reviewing selected assignments for flexibility, creativity, and design.
  • Utilize autograders to assess the correctness of core logic, and rely on human graders to evaluate structure, style, and readability.
  • Complement autograder evaluations with human feedback on prevalent mistakes and areas requiring enhancement.
  • Impose penalties for excessive submissions to discourage over-reliance on the autograder.

Proper integration of autograders aligns with technology integration frameworks like SAMR, enhancing existing processes without entirely transforming the grading in programming courses. It also redefines the manner in which students engage with programming, introducing a more gamified approach. Like any educational technology, the value of autograders is derived from their strategic utilization within well-defined goals and contexts.

References

Hollingsworth, J. (1960). Automatic graders for programming classes. Communications of the ACM3(10), 528–529. https://doi.org/10.1145/367415.367422

Keuning, H., Jeuring, J., & Heeren, B. (2016). Towards a Systematic Review of Automated Feedback Generation for Programming Exercises. Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education. https://doi.org/10.1145/2899415.2899422

Iosup, A., & Epema, D. (2014). An experience report on using gamification in technical higher education. Proceedings of the 45th ACM Technical Symposium on Computer Science Education – SIGCSE ’14. https://doi.org/10.1145/2538862.2538899

Ihantola, P., Ahoniemi, T., Karavirta, V., & Seppälä, O. (2010). Review of recent systems for automatic assessment of programming assignments. Proceedings of the 10th Koli Calling International Conference on Computing Education Research – Koli Calling ’10. https://doi.org/10.1145/1930464.1930480

Hagerer, G. (2021). An Analysis of Programming Course Evaluations Before and After the Introduction of an Autograder. (n.d.). Ieeexplore.ieee.org.

 Wang, T., Su, X., Ma, P., Wang, Y., & Wang, K. (2011). Ability-training-oriented automated assessment in introductory programming course. Computers & Education56(1), 220–226. https://doi.org/10.1016/j.compedu.2010.08.003

Baniassad, E., Zamprogno, L., Hall, B., & Holmes, R. (2021). STOP THE (AUTOGRADER) INSANITY: Regression Penalties to Deter Autograder Overreliance. Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. https://doi.org/10.1145/3408877.3432430