AI

Meta releases its biggest ‘open’ AI model yet

Comment

people walking past Meta signage
Image Credits: TOBIAS SCHWARZ/AFP / Getty Images

Meta’s latest open source AI model is its biggest yet.

Today, Meta said it is releasing Llama 3.1 405B, a model containing 405 billion parameters. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.

At 405 billion parameters, Llama 3.1 405B isn’t the absolute largest open source model out there, but it’s the biggest in recent years. Trained using 16,000 Nvidia H100 GPUs, it also benefits from newer training and development techniques that Meta claims makes it competitive with leading proprietary models like OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet (with a few caveats).

As with Meta’s previous models, Llama 3.1 405B is available to download or use on cloud platforms like AWS, Azure and Google Cloud. It’s also being used on WhatsApp and Meta.ai, where it’s powering a chatbot experience for U.S.-based users.

New and improved

Like other open and closed source generative AI models, Llama 3.1 405B can perform a range of different tasks, from coding and answering basic math questions to summarizing documents in eight languages (English, German, French, Italian, Portuguese, Hindi, Spanish and Thai). It’s text-only, meaning that it can’t, for example, answer questions about an image, but most text-based workloads — think analyzing files like PDFs and spreadsheets — are within its purview.

Meta wants to make it known that it is experimenting with multimodality. In a paper published today, researchers at the company write that they’re actively developing Llama models that can recognize images and videos, and understand (and generate) speech. Still, these models aren’t yet ready for public release.

To train Llama 3.1 405B, Meta used a dataset of 15 trillion tokens dating up to 2024 (tokens are parts of words that models can more easily internalize than whole words, and 15 trillion tokens translates to a mind-boggling 750 billion words). It’s not a new training set per se, since Meta used the base set to train earlier Llama models, but the company claims it refined its curation pipelines for data and adopted “more rigorous” quality assurance and data filtering approaches in developing this model.

The company also used synthetic data (data generated by other AI models) to fine-tune Llama 3.1 405B. Most major AI vendors, including OpenAI and Anthropic, are exploring applications of synthetic data to scale up their AI training, but some experts believe that synthetic data should be a last resort due to its potential to exacerbate model bias.

For its part, Meta insists that it “carefully balance[d]” Llama 3.1 405B’s training data, but declined to reveal exactly where the data came from (outside of webpages and public web files). Many generative AI vendors see training data as a competitive advantage and so keep it and any information pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive for companies to reveal much. 

Meta Llama 3.1
Image Credits: Meta

In the aforementioned paper, Meta researchers wrote that compared to earlier Llama models, Llama 3.1 405B was trained on an increased mix of non-English data (to improve its performance on non-English languages), more “mathematical data” and code (to improve the model’s mathematical reasoning skills), and recent web data (to bolster its knowledge of current events).

Recent reporting by Reuters revealed that Meta at one point used copyrighted e-books for AI training despite its own lawyers’ warnings. The company controversially trains its AI on Instagram and Facebook posts, photos and captions, and makes it difficult for users to opt out. What’s more, Meta, along with OpenAI, is the subject of an ongoing lawsuit brought by authors, including comedian Sarah Silverman, over the companies’ alleged unauthorized use of copyrighted data for model training.

“The training data, in many ways, is sort of like the secret recipe and the sauce that goes into building these models,” Ragavan Srinivasan, VP of AI program management at Meta, told TechCrunch in an interview. “And so from our perspective, we’ve invested a lot in this. And it is going to be one of these things where we will continue to refine it.”

Bigger context and tools

Llama 3.1 405B has a larger context window than previous Llama models: 128,000 tokens, or roughly the length of a 50-page book. A model’s context, or context window, refers to the input data (e.g. text) that the model considers before generating output (e.g. additional text).

One of the advantages of models with larger contexts is that they can summarize longer text snippets and files. When powering chatbots, such models are also less likely to forget topics that were recently discussed.

Two other new, smaller models Meta unveiled today, Llama 3.1 8B and Llama 3.1 70B — updated versions of the company’s Llama 3 8B and Llama 3 70B models released in April — also have 128,000-token context windows. The previous models’ contexts topped out at 8,000 tokens, which makes this upgrade fairly substantial — assuming the new Llama models can effectively reason across all that context.

Meta Llama 3.1
Image Credits: Meta

All of the Llama 3.1 models can use third-party tools, apps and APIs to complete tasks, like rival models from Anthropic and OpenAI. Out of the box, they’re trained to tap Brave Search to answer questions about recent events, the Wolfram Alpha API for math- and science-related queries, and a Python interpreter for validating code. In addition, Meta claims the Llama 3.1 models can use certain tools they haven’t seen before — to an extent.

Building an ecosystem

If benchmarks are to be believed (not that benchmarks are the end-all be-all in generative AI), Llama 3.1 405B is a very capable model indeed. That’d be a good thing, considering some of the painfully obvious limitations of previous-generation Llama models.

Llama 3 405B performs on par with OpenAI’s GPT-4, and achieves “mixed results” compared to GPT-4o and Claude 3.5 Sonnet, per human evaluators that Meta hired, the paper notes. While Llama 3 405B is better at executing code and generating plots than GPT-4o, its multilingual capabilities are overall weaker, and Llama 3 405B trails Claude 3.5 Sonnet in programming and general reasoning.

And because of its size, it needs beefy hardware to run. Meta recommends at least a server node.

That’s perhaps why Meta’s pushing its smaller new models, Llama 3.1 8B and Llama 3.1 70B, for general-purpose applications like powering chatbots and generating code. Llama 3.1 405B, the company says, is better reserved for model distillation — the process of transferring knowledge from a large model to a smaller, more efficient model — and generating synthetic data to train (or fine-tune) alternative models.

To encourage the synthetic data use case, Meta said it has updated Llama’s license to let developers use outputs from the Llama 3.1 model family to develop third-party AI generative models (whether that’s a wise idea is up for debate). Importantly, the license still constrains how developers can deploy Llama models: App developers with more than 700 million monthly users must request a special license from Meta that the company will grant on its discretion.

Meta Llama 3.1
Image Credits: Meta

That change in licensing around outputs, which allays a major criticism of Meta’s models within the AI community, is a part of the company’s aggressive push for mindshare in generative AI.

Alongside the Llama 3.1 family, Meta is releasing what it’s calling a “reference system” and new safety tools — several of these block prompts that might cause Llama models to behave in unpredictable or undesirable ways — to encourage developers to use Llama in more places. The company is also previewing and seeking comment on the Llama Stack, a forthcoming API for tools that can be used to fine-tune Llama models, generate synthetic data with Llama and build “agentic” applications — apps powered by Llama that can take action on a user’s behalf.

“[What] We have heard repeatedly from developers is an interest in learning how to actually deploy [Llama models] in production,” Srinivasan said. “So we’re trying to start giving them a bunch of different tools and options.”

Play for market share

In an open letter published this morning, Meta CEO Mark Zuckerberg lays out a vision for the future in which AI tools and models reach the hands of more developers around the world, ensuring people have access to the “benefits and opportunities” of AI.

It’s couched very philanthropically, but implicit in the letter is Zuckerberg’s desire that these tools and models be of Meta’s making.

Meta’s racing to catch up to companies like OpenAI and Anthropic, and it is employing a tried-and-true strategy: give tools away for free to foster an ecosystem and then slowly add products and services, some paid, on top. Spending billions of dollars on models that it can then commoditize also has the effect of driving down Meta competitors’ prices and spreading the company’s version of AI broadly. It also lets the company incorporate improvements from the open source community into its future models.

Llama certainly has developers’ attention. Meta claims Llama models have been downloaded over 300 million times, and more than 20,000 Llama-derived models have been created so far.

Make no mistake, Meta’s playing for keeps. It is spending millions on lobbying regulators to come around to its preferred flavor of “open” generative AI. None of the Llama 3.1 models solve the intractable problems with today’s generative AI tech, like its tendency to make things up and regurgitate problematic training data. But they do advance one of Meta’s key goals: becoming synonymous with generative AI.

There are costs to this. In the research paper, the co-authors — echoing Zuckerberg’s recent comments — discuss energy-related reliability issues with training Meta’s ever-growing generative AI models.

“During training, tens of thousands of GPUs may increase or decrease power consumption at the same time, for example, due to all GPUs waiting for checkpointing or collective communications to finish, or the startup or shutdown of the entire training job,” they write. “When this happens, it can result in instant fluctuations of power consumption across the data center on the order of tens of megawatts, stretching the limits of the power grid. This is an ongoing challenge for us as we scale training for future, even larger Llama models.”

One hopes that training those larger models won’t force more utilities to keep old coal-burning power plants around.

More TechCrunch

The pharma giant won’t say how many patients were affected by its February data breach. A count by TechCrunch confirms that over a million people are affected.

Pharma giant Cencora is alerting millions about its data breach

Self-driving technology company Aurora Innovation is looking to raise hundreds of millions in additional capital as it races toward a driverless commercial launch by the end of 2024.  Aurora is…

Self-driving truck startup Aurora Innovation to sell up to $420M in shares ahead of commercial launch

Payments infrastructure firm Infibeam Avenues has acquired a majority 54% stake in Rediff.com for up to $3 million, a dramatic twist of fate for the 28-year-old business that was the…

Rediff, once an internet pioneer in India, sells majority stake for $3M

The ruling confirmed an earlier decision in April from the High Court of Podgorica which rejected a request to extradite the crypto fugitive to the United States.

Terraform Labs co-founder and crypto fugitive Do Kwon set for extradition to South Korea

A day after Meta CEO Mark Zuckerberg talked about his newest social media experiment Threads reaching “almost” 200 million users on the company’s Q2 2024 earnings call, the platform has…

Meta’s Threads crosses 200 million active users

TechCrunch Disrupt 2024 will be in San Francisco on October 28–30, and we’re already excited! Disrupt brings innovation for every stage of your startup journey, and we could not bring you this…

Connect with Google Cloud, Aerospace, Qualcomm and more at Disrupt 2024

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…

A comprehensive list of 2024 tech layoffs

Intel announced it would layoff more than 15% of its staff, or 15,000 employees, in a memo to employees on Thursday. The massive headcount is part of a large plan…

Intel to lay off 15,000 employees

Following the recent lawsuit filed by the Recording Industry Association of America (RIAA) against music generation startups Udio and Suno, Suno admitted in a court filing on Thursday that it did, in…

AI music startup Suno claims training model on copyrighted music is ‘fair use’

In spite of a drop for the quarter, iPhone remained Apple’s most important category by a wide margin.

iPad sales help bail out Apple amid a continued iPhone slide

Molly Alter wears a lot of hats. She’s a mocumentary filmmaker working on a project about an alternate reality where charades is big business. She’s a caesar salad connoisseur and…

How filming a cappella concerts and dance recitals led Northzone’s newest partner Molly Alter to a career in VC

Microsoft has a long and tangled history with OpenAI, having invested a reported $13 billion in the ChatGPT maker as part of a long-term partnership. As part of the deal,…

Microsoft now lists OpenAI as a competitor in AI and search

The San Jose-based startup raised $60 million in a round that values it lower than the $500 million valuation it garnered in its most recent round, according to multiple sources.

Sequoia-backed Knowde raises Series C at a valuation cut

X (formerly Twitter) can no longer be accessed in the Mac App Store, suggesting that it has been officially delisted.  Searches for both “Twitter” and “X” on Apple’s platform no…

Twitter disappears from Mac App Store

Google Thursday said that it is introducing new Gemini-powered features for Chrome’s desktop version, including Lens for desktop, tab compare for shopping assistance, and natural language integration for search history.…

Google brings Gemini-powered search history and Lens to Chrome desktop

When Xiaoyin Qu was growing up in China, she was obsessed with learning how to build paper airplanes that could do flips in the air. Her parents, though, didn’t have…

Heeyo built an AI chatbot to be a billion kids’ interactive tutor and friend

While the company was awarded a massive, $4.2 billion contract to accelerate Starliner development in 2014, it was structured as a “fixed-price” model.

Boeing bleeds another $125M on Starliner program, bringing total losses to $1.6B

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Summer road…

Anthony Levandowski bets on off-road autonomy, Nuro plots a comeback and Applied Intuition gets more investor love

Google’s new features include Gemini in BigQuery and Looker to help users with data engineering and analysis.

Google Cloud expands its database portfolio with new AI capabilities

Rad Power Bikes, the Seattle-based e-bike startup that has raised more than $300 million from investors, went through another round of layoffs in July, TechCrunch has exclusively learned. This is…

VC darling Rad Power Bikes hit with another round of layoffs

Five years ago, as robotaxis and self-driving truck startups were still raking in millions in venture capital, Anthony Levandowski turned to off-road autonomy. Now, that decision — which brought the…

Why Anthony Levandowski returned to his off-road autonomous vehicle roots with AV startup Pronto

Commercial space station company Vast is building a private microgravity research lab as part of its wider Haven-1 station plans. The module is set to launch no earlier than the…

Vast plans microgravity lab on its Haven-1 private space station

Google Cloud is giving Y Combinator startups access to a dedicated, subsidized cluster of Nvidia graphics processing units and Google tensor processing units to build AI models. It’s part of…

Google Cloud now has a dedicated cluster of Nvidia GPUs for Y Combinator startups

StackShare is one of the more popular platforms for developers to discuss, track, and share the tools they use to build applications.

Open source startup FOSSA is buying StackShare, a site used by 1.5M developers

Featured Article

Indian startups gut valuations ahead of IPO push

Ola Electric and FirstCry are set to test investor appetite with public listing, both pricing their shares below their previous valuation asks.

Indian startups gut valuations ahead of IPO push

The European Union’s risk-based regulation for applications of artificial intelligence has come into force starting from today.

The EU’s AI Act is now in force

The company also said it has received regulatory clearance to start Phase 2 clinical trials for a new drug in the U.S. later this year.

Healx, an AI-enabled drug discovery platform for rare diseases, raises $47M

The European Commission (EC) has given the go-ahead to HPE’s planned megabucks acquisition of Juniper Networks.

EU greenlights HPE’s $14B Juniper Networks acquisition

Meta, which develops one of the biggest foundational open source large language models, Llama, believes it will need significantly more computing power to train models in the future. Mark Zuckerberg…

Zuckerberg says Meta will need 10x more computing power to train Llama 4 than Llama 3

Axle Energy is a B2B, back-end infrastructure business focused on connecting flexible assets, such as electric vehicles and home batteries, to energy markets that aren’t otherwise available for consumers to…

Axle Energy’s sprint to decarbonize the grid lights up with $9M seed led by Accel