Skip to search boxSkip to navigationSkip to main content

WIP: ARTful Insights from a Pilot Study on GPT-Based Automatic Code Reviews in Undergraduate Computer Science Programs

Research Output: Chapter in Book/Report/Conference proceeding Conference contribution

Abstract

This work in progress research paper describes a pilot study using a Large Language Model (LLM) Generative Pre-Trained Transformer-based (GPT) system that generates industry-style code reviews for student feedback on software development projects in Computer Science 2nd, 3rd, and 4th+ semester classes (CS2, CS3, CS4+) at an ABET accredited baccalaureate institution. Code reviews are a valuable, but work-intensive, component of the software engineering process and provide important training to undergraduate students in the form of mentor-peer knowledge transfer. Participants in this study engaged in iterative experiential learning using the Automatic Review Tool (ART), an artificial intelligence tool to support software engineering as an Automatic Static Analysis Tool in the Continuous Integration pipeline alongside software testing harnesses and code style checkers. This pilot study was based on earlier results from a full computer science second semester (CS2) class (n=74) to develop an ART-generated code review intervention pilot study with a small group of students in CS2 / 3 and CS4. The project underway uses an experiential learning and iterative feedback process to answer research questions including 'Does ART provide accurate and actionable code reviews for students' and 'Which levels of students are best prepared to receive and use ART-based code reviews?' During this pilot study, the project used a mixed methods research approach with a series of surveys, code review interventions, and numerical analysis of the code reviews' accuracy. Results showed a reasonable degree of code review accuracy by ART and the students learned code review skills from interaction with the ART-based reviews they received. Ongoing work includes increasing the scale of data collection, using this work to refine and focus the ART-based reviews onto the categories of feedback that students find the most valuable, and building out a more modular tool for wider release in the academic community.