Software code is written to be read by both computers and humans. Machines quickly and perfectly understand the computational meaning, while humans read it the same way they read natural language: not as quickly and sometimes incorrectly. With a new $1.2M three-year , a group of software engineers and social scientists at 不良研究所 will leverage this bimodality to develop tools that make writing, reading and maintaining code easier and improve the overall programming experience.
The project is led by computer science distinguished professor , his colleagues and and his cross-campus collaborators from the Department of Science and Technology Studies and from the Department of Linguistics.
鈥淲hen you write a program, it has two audiences,鈥 explained Devanbu. 鈥淐ode is meant for human consumption and it鈥檚 meant for computer consumption, and the fact that people choose to write code in ways that are easy for human beings to read reflects this bimodality.鈥
Since code has a human audience, there are millions upon millions of lines of code freely available on the internet that the team can use. With this wealth of data from different types of programs and coders of all experience levels, they can train algorithms to improve the human experience of programming.
Writing for Humans
Making code easy for people to read is a critical part of programming. Humans write, edit, check and maintain code, and the best way to catch potentially catastrophic errors is code review, a proofreading-like process where a second person reads through code and explains what it鈥檚 doing. The team plans to train an algorithm that can re-write code so it鈥檚 easier to read without changing the computational meaning.
The initial studies, led by Morgan, have shown that these small changes鈥攖he equivalent of changing 鈥減epper and salt鈥 to 鈥渟alt and pepper鈥濃攃an make a huge difference. Their goal is to train an algorithm that knows this and can read through code to give it a readability score; the higher the score, the harder it is to read. The program can then re-write the code until the readability score drops as much as possible. Improving readability will help coders and code reviewers parse through software and easily find and fix potential errors.
鈥淏ecause these programs have well-defined meanings, we can change the code around without affecting the meaning,鈥 said Devanbu. 鈥淚f I can change around my program so the meaning doesn鈥檛 change but it鈥檚 much easier to read, then code review will be much easier.鈥
Learning Through Bimodality
The team also plans to use the data to make programming better for beginners. Beginners can be disheartened if their program keeps crashing, but often the problem is a slight error like a typo or a missing parenthesis instead of more significant systematic issues. If the team can train an algorithm that recognizes enough different programs, syntaxes and operations, Devanbu thinks it can work like a spellchecker that can identify spelling and grammatical errors in code. This would keep beginners encouraged and focused on learning how to code instead of finding these mistakes.
鈥淭he machine learning models are clever, so if you are converting Celsius to Fahrenheit, for example, the model would have seen that formula many times so it would remember it,鈥 he explained. 鈥淚f you use variable names that look they鈥檙e temperature variable names, then it knows that that鈥檚 probably what you meant to do and it would be able to fix that.鈥
Thinking bimodally about code forces programmers to think both about the software they鈥檙e writing and the human context it鈥檚 being developed in. Devanbu believes this is a crucial part of computer science curriculum, so for the final part of the project, he will be working with D铆az to develop new curriculum where students learn coding and ethics simultaneously from the beginning of their college careers.
鈥淧rogramming is a deeply human experience, even though we don鈥檛 always think of it that way,鈥 he said. 鈥淲hen you learn to write in a natural language, you鈥檙e taught to write in the context of some human experience, like war or romance, and you are asked to think about how that maps onto the way you write. Code shouldn鈥檛 be taught any differently.鈥