Pushing the limits
We are a democratized AI research team, enhancing the performance of preference learning.
Spun out of EleutherAI, CarperAI is doing human preference learning at scale via a representation learning + RL approach.
We’re excited about building large-scale, natural text personalized preference models.
Meet the team
Alex is a mathematics PhD student at Georgia Tech working with Wenjing Liao and Mark Riedl. He has interests in learning theory, natural language preference learning and their combinations.
Louis is a PhD student at Brown University, studying under Professor Ellie Pavlick. His background is in preference modeling and computational narrative theory.
Shahbuland is an undergrad student at the University of Waterloo. His main research interests are in representation learning and machine generated content.
Aman is a Research Engineer at CarperAI. Their background is in ML for novel interfaces and generative art. They are interested in the future of code, art, and machine learning.
Ryan is a PhD student at the NYU Center for Data Science working with Mengye Ren. He is primarily interested in natural language processing, reinforcement learning, and robust machine reasoning.
Written stories paired with critiques are a good source of data for preference learning. Critiques can serve as a very information-rich measure through which to gauge preferences on story content. With Contrastive Anecdote Review Pretraining (CARP, for short) we presented the Story-Critique dataset of passage/critique pairs (i.e. "Geese are better then ducks", "Should be than not then" as an example of such a pair), and the CARP model. CARP was trained on the Story-Critique dataset to produce embeddings of passages and reviews where high similarity between passages and review embeddings generally corresponds to reviews that would fit the given passage. By having a measure of how well a review fits a story, we can get a measure of how well a story does under a certain preference. The model and checkpoint are publicly available and the dataset can be shared by request.
The direction in which we wanted to move after CARP was to use its similarity scores to guide text generation with preferences. Embedding a passage and preference (where a "preference" is treated as a review), we can use a similarity score as a reward for how well a model met a specified preference. As an example, suppose we had the preference that "the protagonist should be happy". Naturally, this kind of setup would punish passages like "the goose was sad" and reward passages such as "the goose was happy". CARP alone was not capable of this, so we designed CARP-CoOp to take a step towards this goal. The CoOp in the name comes from context optimization. Rather than simply feeding in predetermined reviews to compare stories against, we use context optimization to tune the review before feeding it to CARP's review encoder.
Code.AI is the AI4Code research team at CarperAI.
TL;DR Today we're going to tell you all about DRLX - our library for Diffusion Reinforcement Learning! Released a few weeks ago, DRLX is a library...
CarperAI is happy to announce the paper and 0.9 release of OpenELM! OpenELM is an open-source library that enables evolutionary search with language...