CarperAI is the newest lab within the EleutherAI research collective and focuses on improving the performance and safety of large language models (LLMs) with reinforcement learning.
- CarperAI will release a chinchilla-optimal large language model explicitly trained to follow human instructions, partnering with Scale AI, Multi, Humanloop, and Hugging Face.
- The open-source LLM was trained using Reinforcement Learning from Human Feedback, a technique to improve LLMs' safety and ease of use. The open-source release is crucial for enabling academics, independent researchers, and startups to conduct science and build upon state-of-the-art models.
CarperAI, a new research lab within the EleutherAI research collective, aims to democratize the "LLMs" "instruction-tuning" of large language models, the same way Stable Diffusion democratized image generation. Industry leader OpenAI pioneered the technique of teaching LLMs to follow instructions with their InstructGPT-3 model last year. Still, such models are either locked behind APIs or not released, limiting their value to most academics, hobbyists, and smaller companies. Last week, CarperAI released trlX, the first public implementation of the technique that can be used to train models with billions of parameters, to widespread acclaim.
Today, they're going a step further and announcing a broad coalition aimed at training and publicly releasing instruction-tuned models with EleutherAI and Multi, experts in training large language models, and Scale, Humanloop, and HuggingFace, experts in labelling and human annotation.
Large language models have demonstrated extraordinary capabilities and pushed the frontier of AI.
They enable better search, writing assistants, code generation and even generalist assistants that automate tasks. Notably, compared to traditional supervised machine learning, they do not need large labelled datasets to be adapted for new tasks. Instead, most large language models are trained on the simple task of next-word prediction on massive unlabelled datasets.
Unfortunately, LLMs trained by next-word prediction are difficult to use, often produce factually inaccurate or offensive output, and can be used in harmful applications.
A partial solution is to take a language model trained in the usual way and adjust it afterwards to produce more socially acceptable and honest content by repeatedly prompting a language model with instruction, gathering feedback from humans on its outputs and adjusting the models' parameters in the direction of better predicted human feedback. For example, OpenAI and DeepMind have used Reinforcement Learning from Human Feedback (RHLF) OpenAI, DeepMind, and Anthropic to produce LLMs that can follow instructions and are considerably more truthful and easier to use. In prior work, OpenAI found that the outputs from models trained with RLHF were preferred to those from 100x larger models trained without human feedback.
Few organizations have the resources and technical expertise to build a Large Language Model of this scale and complexity.
Instruction-tuning requires expertise in training large language models, which few outside major tech companies possess. CarperAI's models will be trained by EleutherAI, their parent org and a pioneer in training open-source LLMs, and Multi, a new AI startup working on applying bleeding-edge LLM technology for enterprise automation. In addition, CarperAI is partnering with Scale, Humanloop, and Hugging Face to fine-tune the model. Scale accelerates the development of AI by providing AI data & model infrastructure and full-service operational AI solutions, and Humanloop specializes in adapting LLMs from human feedback. Together they will be helping collect the human feedback data that will be used to improve the underlying language model. Hugging Face will provide the hosting mechanisms to share and load the models in an accessible way. In addition, they will also collaborate on developing demos of its spaces and evaluation tools.
There have been open-source releases of large language models before, but this is the first attempt to create an open model trained with RLHF.
We view RLHF training as an essential step in making LLMs useful and safe to be deployed in a public setting. The risks of LLMs have been well documented and range from spreading misinformation to reinforcing social biases. Compared to standard language models, training with RLHF dramatically reduces these risks and, at the same time, increases the model's usefulness.
It is expected that the release of this model will spur both research and innovation. In addition, it will enable many new applications and companies and allow us to deepen our understanding of state-of-the-art AI systems.