In few other areas of the global economy are the rounds of creative destruction more rapid and more intense than the HPC-AI sector. We see this today with the generative AI sensation: OpenAI’s launch of ChatGPT in November has created a firestorm of competitive responses.
At least two companies – Cerebras, maker of the dinner-plate sized Wafer Scale Engine processing unit, and big data “lakehouse” software company Databricks – have both announced open sourced large language models that they say require relatively fewer parameters and generate ChatGPT-like performance at lower cost.
Starting with Cerebras, the company announced today it is releasing (on Hugging Face and GitHub) a series of seven GPT-based LLMs for open source use by the research community.
“This is the first time a company has used non-GPU based AI systems to train LLMs up to 13 billion parameters and is sharing the models, weights, and training recipe via the industry standard Apache 2.0 license,” Cerebras said, adding that all seven models were trained on the 16 CS-2 systems in the Cerebras Andromeda AI supercomputer.
Andromeda is a Cerebras wafer-scale cluster for AI with 13.5 million cores that delivers more than 1 exaFLOP of AI compute and 120 petaFLOPS of dense compute, according to the company.
In what Cerebras said is a first among AI hardware companies, researchers at the company used the Andromeda supercomputer to train a series of seven GPT models with 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B parameters.
Cerebras said this is typically a multi-month undertaking that was completed in a few weeks, which the company attributed to Cerebras CS-2 systems within Andromeda and the ability of Cerebras’ weight streaming architecture to eliminate the complexity of distributed compute across smaller processing units.
Cerebras CEO and co-founder Andrew Feldman told us the Wafer Scale Engine is a key distinction from OpenAI compute layer, which uses thousands of GPUs. Additionally, the GPT-3 LLM has 175 billion parameters (OpenAI is not disclosing the number of parameters in the recently announced GPT-4) whereas the largest of the GPT models offered by Cerebras has 13 billion parameters. More parameters and more processors, Feldman said, means training and inferencing for Chat-GPT has to be distributed across the GPUs, a time-consuming and labor intensive task.
But the Wafer-Scale Engine’s size means Cerebras doesn’t needs to distribute calculations over multiple CS-2s, Feldman explained. “The reason you do distributed work is because your individual unit, the GPU, isn’t big enough to do the work itself,” he said. “Because we have a Wafer Scale Engine, we don’t need to break up the work.”
“Few organizations are capable of training truly large-scale models. Even fewer have done so on dedicated AI hardware,” said Sean Lie, co-founder and chief software architect at Cerebras. “Releasing seven fully trained GPT models into the open-source community shows just how efficient clusters of Cerebras CS-2 systems can be and how they can rapidly solve the largest scale AI problems – problems that typically require hundreds or thousands of GPUs.”
On open sourcing, Cerebras said other companies have stopped released their models to the public as the cost and complexity of LLM development grows.
“By releasing seven GPT models, Cerebras not only demonstrates the power of its CS-2 systems and Andromeda supercomputer as being amongst the premier training platforms, but elevates Cerebras researchers to the upper echelon of AI practitioners,” said Karl Freund, founder and principal analyst, Cambrian AI. “There are a handful of companies in the world capable of deploying end-to-end AI training infrastructure and training the largest of LLMs to state-of-the-art accuracy. Cerebras must now be counted among them. Moreover, by releasing these models into the open-source community with the permissive Apache 2.0 license, Cerebras shows commitment to ensuring that AI remains an open technology that broadly benefits humanity.”
Meanwhile, late last week Databricks introduced Dolly, which the company described as a cheap-to-build LLM (that utilizes instruction-following techniques found in OpenAI’s models) that works by modifying an existing, open source model from EleutherAI with 6 billion parameters.
The company said it has open sourced a Databricks notebook for building Dolly on Databricks.
“We show that anyone can take a dated, off-the-shelf open source large language model and give it magical ChatGPT-like instruction-following ability by training it in 30 minutes on one machine, using high-quality training data,” Databricks said in a blog entitled “Hello Dolly: Democratizing the Magic of ChatGPT with Open Models.”
The Databricks bloggers said they were surprised that instruction-following does not seem to require the latest or largest models, noting that their model is only 6 billion parameters, compared to 175 billion for GPT-3.
“We evaluated Dolly on the instruction-following capabilities described in the InstructGPT paper that ChatGPT is based on and found that it exhibits many of the same qualitative capabilities, including text generation, brainstorming and open Q&A,” Databricks said. “Of particular note in these examples is not the quality of the generated text, but rather the vast improvement in instruction-following capability that results from fine tuning a years-old open source model on a small, high quality dataset.”