Code.AI is the AI4Code research team at CarperAI. We focus specifically on the intersection of Software Engineering and Machine Learning. One of the team’s main focuses is on applying Deep Learning research to build intelligent systems that can code.

Our Team

Nathan Cooper

Nathan is a nerd and a Ph.D Candidate under the supervision of Dr. Denys Poshyvanyk at William and Mary. His research area is at the intersection of Software Engineering and Deep Learning. Specifically, on the creation of intelligent tools for helping software developers.

Erfan Al-Hossami

Erfan is a PhD student at UNC Charlotte under the supervision of Dr. Razvan Bunescu. He is passionate about building a new generation of conversational assistants to help programmers think, do, learn, and navigate an ever-evolving world. Towards that goal, Erfan engages in researching dialogue systems, code intelligence, learning analytics, and computer science education.

Reshinth Adithyan

Reshinth is interested about to what extent does Naturalness impact the behaviour of a formal language like code. In order to answer that question, he likes to ponder upon both ends of the spectrum from applying Machine Learning to aid static analysis to enhancing the bimodal aspects of Code and Natural Language.

Duy Phung

Duy is Data Engineer at CodeAI @ Carper. Previously he worked as Senior Research Resident at VinAI Research Vietnam. His background is in text generation models such as Machine Translation, Text-to-SQL, and Code Generation. He would love to build up code generation models to accelerate software development and use feedback from humans to improve these models.

Our Projects

The Code Pile Project

Foundation models in the NLP domain have unlocked numerous applications and have served as a building block of specialized models via finetuning. Similarly, having such models for Software Engineering has the potential to serve a similar purpose from coding assistant applications to being the building blocks of CarperAI's reinforcement learning projects. To enable the training of these foundation models, we will collect software engineering-specific data that goes beyond the GitHub code sources that are focused on currently. This includes StackOverflow, documentation sites of popular libraries and frameworks, tutorial websites such as tutorial point and geeks4geeks, mining reddit communities that are programming specific, and other repository data from GitHub such as issues, pull requests, community discussions, diffs, etc. For better understanding the data these foundation models are trained on, we will pay special attention to the statistics of vulnerable code.

Read our project