Brian Yu
August 30, 2023

There's more to assessing code quality than automated correctness testing

We know that hiring engineers can be a challenge. One of the key skills that technical interviews are meant to assess is an engineer’s ability to write high-quality code. But when technical assessments only measure one dimension of code quality, engineering recruiters and managers miss out on valuable insights of how an engineer would actually perform in a real coding environment. 

The traditional way to evaluate a candidate’s programming skill is to give them a technical problem to solve, and then measure their performance through automated correctness testing, where the candidate’s solution to the problem is checked against a pre-written set of test cases that assess how correctly the code solves the stated problem.

While this approach is easily scalable and allows for numerical comparisons between different candidates, there are several ways this strategy fails to evaluate critical engineering skills. 

Limitation #1. Real-world engineering problems aren’t clear-cut

Automated correctness testing places constraints on the types of questions that can be asked in interviews. When the interview needs to be evaluated by pre-written tests, the questions asked have to be well-defined problems with precise descriptions for what correct input and output should look like.

But real-world engineering problems are often more complex and open-ended. They require the engineer to define ambiguous ideas precisely, to determine what the correct behavior should be, to question assumptions about whether the problem they’re solving is even the right one to achieve the project’s goals.

Limitation #2. The computer isn’t the only audience that reads code

Writing code, like many other forms of communication, is a task of communicating to multiple audiences. The computer is one audience and it’s important the code the computer operates on matches the engineer’s intent.  

But the second audience, no less important than the first, is people: other engineers who might read the code, or even the same engineer who might look back on their own code in the future. For this audience, it matters how clearly the code’s intent is expressed, how easy the code is to test and maintain, and how well the code is abstracted and logically organized. Writing high-quality code means communicating with people as much as it means communicating with machines, and an effective assessment of engineering skill should take that into consideration.

A correctness-only measure of code only captures how well the engineer has communicated their intentions to the computer, and misses out entirely on how well the engineer communicates their ideas and intentions to other engineers. 

How we evaluate code quality at Byteboard 

It’s for this reason that at Byteboard, we have human graders look critically at each candidate’s code and evaluate it along multiple dimensions of code quality. We consider how well the candidate follows the conventions and idioms of their preferred programming language. We consider how well-documented the candidate’s code is. And we consider the decisions the candidate makes about code clarity, from as small as how to name a variable to larger-scale decisions like how to structure code for a complex task.

Automated correctness tests can only say whether or not a candidate’s code produces the expected result or not. In cases where the code does not produce the expected result, automated testing often struggles to understand why the code deviated from expectation.

Maybe it’s because the candidate’s logical approach was entirely wrong. Maybe their approach was right, but they struggled to express their ideas in code. Or maybe a small syntax error prevented otherwise correct code from running, or they missed a key implementation detail, or there was an edge case they didn’t consider.

Each of those cases suggests something different about the candidate and their strengths and weaknesses. But automated testing treats all of those cases the same: not passing. Byteboard’s approach allows us to take a strengths-based approach and consider what specific strengths the candidate demonstrated even if the code didn’t produce the expected result.

These factors all contribute to Byteboard’s understanding of an engineer’s code quality: an understanding that’s broader than we would be able to determine via unit testing alone. Strong software engineers are expected to be able to communicate and collaborate effectively on code, and a multifaceted approach to thinking about code quality helps us to identify candidates that are highly capable in those skills.