ACES Sophomore Collaborates on Study Identifying a Programmer by their Code

news story image

ACES sophomore, Andrew Liu, was featured in the Diamondback for his collaboration on a study out of Drexel University researching how someone can be identified by the code they write. 

Though computer code might seem less unique than handwriting, researchers have shown that programmers each have their own style, so much so that a code's author can be recognized just by the code he or she writes. 

A study from Drexel University, co-authored by a student at the University of Maryland analyzed the code of 250 programmers and found code could be matched to its author with high accuracy based on inline variations such as naming style and deep structural differences. 

The researchers used publicly accessible code from the 2014 Google Code Jam for analysis. They looked specifically at code in the C++ language, and they had about 650 lines from each author. They then examined surface-level features, as well as more structural qualities, such as abstract syntax trees and random forest regression, said Andrew Liu, a sophomore computer science major at this university and co-author of the study. 

"We got these features from the source code, and then we use data mining to get these features and group them with their author," said Liu, a member of the Advanced Cybersecurity Experience for Students program. 

You can finish reading Joe Zimmermann's article on the study at The Diamondback Online.(link is external) 

Published February 10, 2015