AI researchers develop 'reasoning' model for under $50 like OpenAI's o1 and DeepSeek's R1 - The Times of India
A team of AI researchers from
Stanford University
and the University of Washington have achieved a breakthrough in AI development by training a sophisticated “reasoning” model for less than $50 in cloud compute credits. Researchers used AI models by Google, Alibaba to create a chatbot with reasoning level as good as the ChatGPT-maker OpenAI’s o1 LLMs.
The model, named s1, is claimed to exhibit performance comparable to cutting-edge reasoning models like OpenAI's o1 and DeepSeek's R1 on tests evaluating math and coding abilities. The
s1 model
, along with the data and code used in its training, has been made available on GitHub.
The researchers behind s1 said that they utilised a readily available base model and refined it through a process known as distillation – a technique that involves extracting the “reasoning” capabilities from another AI model by training s1 on its answers.
The researchers claim that their model s1 has been distilled from
Google's Gemini 2.0
Flash Thinking Experimental model. This approach mirrors the method used by Berkeley researchers who created an
AI reasoning model
for around $450 last month.
The researchers behind s1 aimed to identify the simplest approach to achieve strong reasoning performance and "test-time scaling," which allows an AI model to deliberate more before generating a response.
“Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance,” the researchers said in their paper published last week (via TechCrunch), adding that they “seek the simplest approach to achieve test-time scaling and strong reasoning performance.”
The s1 research suggests that reasoning models can be effectively distilled using a relatively small dataset and a process called supervised fine-tuning (SFT), where an AI model is explicitly trained to mimic specific behaviors in the dataset.
“First, we curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we develop budget forcing to control test-time compute by forcefully terminating the model’s thinking process or lengthening it by appending “Wait” multiple times to the model’s generation when it tries to end. This can lead the model to doublecheck its answer, often fixing incorrect reasoning steps,” the researchers said.