Erfan is a PhD student at UNC Charlotte under the supervision of Dr. Razvan Bunescu. He is passionate about building a new generation of conversational assistants to help programmers think, do, learn, and navigate an ever-evolving world. Towards that goal, Erfan engages in researching dialogue systems, code intelligence, learning analytics, and computer science education.
Reshinth is interested about to what extent does Naturalness impact the behaviour of a formal language like code. In order to answer that question, he likes to ponder upon both ends of the spectrum from applying Machine Learning to aid static analysis to enhancing the bimodal aspects of Code and Natural Language.
Duy is Data Engineer at CodeAI @ Carper. Previously he worked as Senior Research Resident at VinAI Research Vietnam. His background is in text generation models such as Machine Translation, Text-to-SQL, and Code Generation. He would love to build up code generation models to accelerate software development and use feedback from humans to improve these models.
The Code Pile Project
Foundation models in the NLP domain have unlocked numerous applications and have served as a building block of specialized models via finetuning. Similarly, having such models for Software Engineering has the potential to serve a similar purpose from coding assistant applications to being the building blocks of CarperAI's reinforcement learning projects. To enable the training of these foundation models, we will collect software engineering-specific data that goes beyond the GitHub code sources that are focused on currently. This includes StackOverflow, documentation sites of popular libraries and frameworks, tutorial websites such as tutorial point and geeks4geeks, mining reddit communities that are programming specific, and other repository data from GitHub such as issues, pull requests, community discussions, diffs, etc. For better understanding the data these foundation models are trained on, we will pay special attention to the statistics of vulnerable code.