AI

OpenAI offers a peek behind the curtain of its AI’s secret instructions

Comment

OpenAI logo with spiraling pastel colors (Image Credits: Bryce Durbin / TechCrunch)
Image Credits: Bryce Durbin / TechCrunch

Ever wonder why conversational AI like ChatGPT says “Sorry, I can’t do that” or some other polite refusal? OpenAI is offering a limited look at the reasoning behind its own models’ rules of engagement, whether it’s sticking to brand guidelines or declining to make NSFW content.

Large language models (LLMs) don’t have any naturally occurring limits on what they can or will say. That’s part of why they’re so versatile, but also why they hallucinate and are easily duped.

It’s necessary for any AI model that interacts with the general public to have a few guardrails on what it should and shouldn’t do, but defining these — let alone enforcing them — is a surprisingly difficult task.

If someone asks an AI to generate a bunch of false claims about a public figure, it should refuse, right? But what if they’re an AI developer themselves, creating a database of synthetic disinformation for a detector model?

What if someone asks for laptop recommendations; it should be objective, right? But what if the model is being deployed by a laptop maker who wants it to only respond with their own devices?

AI makers are all navigating conundrums like these and looking for efficient methods to rein in their models without causing them to refuse perfectly normal requests. But they seldom share exactly how they do it.

OpenAI is bucking the trend a bit by publishing what it calls its “model spec,” a collection of high-level rules that indirectly govern ChatGPT and other models.

There are meta-level objectives, some hard rules and some general behavior guidelines, though to be clear these are not strictly speaking what the model is primed with; OpenAI will have developed specific instructions that accomplish what these rules describe in natural language.

It’s an interesting look at how a company sets its priorities and handles edge cases. And there are numerous examples of how they might play out.

For instance, OpenAI states clearly that the developer intent is basically the highest law. So one version of a chatbot running GPT-4 might provide the answer to a math problem when asked for it. But if that chatbot has been primed by its developer to never simply provide an answer straight out, it will instead offer to work through the solution step by step:

Image Credits: OpenAI

A conversational interface might even decline to talk about anything not approved, in order to nip any manipulation attempts in the bud. Why even let a cooking assistant weigh in on U.S. involvement in the Vietnam War? Why should a customer service chatbot agree to help with your erotic supernatural novella work in progress? Shut it down.

It also gets sticky in matters of privacy, like asking for someone’s name and phone number. As OpenAI points out, obviously a public figure like a mayor or member of Congress should have their contact details provided, but what about tradespeople in the area? That’s probably OK — but what about employees of a certain company, or members of a political party? Probably not.

Choosing when and where to draw the line isn’t simple. Nor is creating the instructions that cause the AI to adhere to the resulting policy. And no doubt these policies will fail all the time as people learn to circumvent them or accidentally find edge cases that aren’t accounted for.

OpenAI isn’t showing its whole hand here, but it’s helpful to users and developers to see how these rules and guidelines are set and why, set out clearly if not necessarily comprehensively.

More TechCrunch

TechCrunch Disrupt 2024 will be in San Francisco on October 28–30, and we’re already excited! Disrupt brings innovation for every stage of your startup journey, and we could not bring you this…

Connect with Google Cloud, Aerospace, Qualcomm and more at Disrupt 2024

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…

A comprehensive list of 2024 tech layoffs

Intel announced it would layoff more than 15% of its staff, or 15,000 employees, in a memo to employees on Thursday. The massive headcount is part of a large plan…

Intel to lay off 15,000 employees

Following the recent lawsuit filed by the Recording Industry Association of America (RIAA) against music generation startups Udio and Suno, Suno admitted in a court filing on Thursday that it did, in…

AI music startup Suno claims training model on copyrighted music is ‘fair use’

In spite of a drop for the quarter, iPhone remained Apple’s most important category by a wide margin.

iPad sales help bail out Apple amid a continued iPhone slide

Molly Alter wears a lot of hats. She’s a mocumentary filmmaker working on a project about an alternate reality where charades is big business. She’s a caesar salad connoisseur and…

How filming a cappella concerts and dance recitals led Northzone’s newest partner Molly Alter to a career in VC

Microsoft has a long and tangled history with OpenAI, having invested a reported $13 billion in the ChatGPT maker as part of a long-term partnership. As part of the deal,…

Microsoft now lists OpenAI as a competitor in AI and search

The San Jose-based startup raised $60 million in a round that values it lower than the $500 million valuation it garnered in its most recent round, according to multiple sources.

Sequoia-backed Knowde raises Series C at a valuation cut

Self-driving technology company Aurora Innovation is looking to raise hundreds of millions in additional capital as it races toward a driverless commercial launch by the end of 2024.  Aurora is…

Self-driving truck startup Aurora Innovation to sell up to $420M in shares ahead of commercial launch

X (formerly Twitter) can no longer be accessed in the Mac App Store, suggesting that it has been officially delisted.  Searches for both “Twitter” and “X” on Apple’s platform no…

Twitter disappears from Mac App Store

Google Thursday said that it is introducing new Gemini-powered features for Chrome’s desktop version, including Lens for desktop, tab compare for shopping assistance, and natural language integration for search history.…

Google brings Gemini-powered search history and Lens to Chrome desktop

When Xiaoyin Qu was growing up in China, she was obsessed with learning how to build paper airplanes that could do flips in the air. Her parents, though, didn’t have…

Heeyo built an AI chatbot to be a billion kids’ interactive tutor and friend

While the company was awarded a massive, $4.2 billion contract to accelerate Starliner development in 2014, it was structured as a “fixed-price” model.

Boeing bleeds another $125M on Starliner program, bringing total losses to $1.6B

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Summer road…

Anthony Levandowski bets on off-road autonomy, Nuro plots a comeback and Applied Intuition gets more investor love

Google’s new features include Gemini in BigQuery and Looker to help users with data engineering and analysis.

Google Cloud expands its database portfolio with new AI capabilities

Rad Power Bikes, the Seattle-based e-bike startup that has raised more than $300 million from investors, went through another round of layoffs in July, TechCrunch has exclusively learned. This is…

VC darling Rad Power Bikes hit with another round of layoffs

Five years ago, as robotaxis and self-driving truck startups were still raking in millions in venture capital, Anthony Levandowski turned to off-road autonomy. Now, that decision — which brought the…

Why Anthony Levandowski returned to his off-road autonomous vehicle roots with AV startup Pronto

Commercial space station company Vast is building a private microgravity research lab as part of its wider Haven-1 station plans. The module is set to launch no earlier than the…

Vast plans microgravity lab on its Haven-1 private space station

Google Cloud is giving Y Combinator startups access to a dedicated, subsidized cluster of Nvidia graphics processing units and Google tensor processing units to build AI models. It’s part of…

Google Cloud now has a dedicated cluster of Nvidia GPUs for Y Combinator startups

Open source compliance and security platform FOSSA has acquired developer community platform StackShare, the company confirmed to TechCrunch.  StackShare is one of the more popular platforms for developers to discuss,…

Open source startup FOSSA is buying StackShare, a site used by 1.5M developers

Featured Article

Indian startups gut valuations ahead of IPO push

Ola Electric and FirstCry are set to test investor appetite with public listing, both pricing their shares below their previous valuation asks.

Indian startups gut valuations ahead of IPO push

The European Union’s risk-based regulation for applications of artificial intelligence has come into force starting from today.

The EU’s AI Act is now in force

The company also said it has received regulatory clearance to start Phase 2 clinical trials for a new drug in the U.S. later this year.

Healx, an AI-enabled drug discovery platform for rare diseases, raises $47M

The European Commission (EC) has given the go-ahead to HPE’s planned megabucks acquisition of Juniper Networks.

EU greenlights HPE’s $14B Juniper Networks acquisition

Meta, which develops one of the biggest foundational open source large language models, Llama, believes it will need significantly more computing power to train models in the future. Mark Zuckerberg…

Zuckerberg says Meta will need 10x more computing power to train Llama 4 than Llama 3

Axle Energy is a B2B, back-end infrastructure business focused on connecting flexible assets, such as electric vehicles and home batteries, to energy markets that aren’t otherwise available for consumers to…

Axle Energy’s sprint to decarbonize the grid lights up with $9M seed led by Accel

OpenAI CEO Sam Altman says that OpenAI is working with the U.S. AI Safety Institute, a federal government body that aims to assess and address risks in AI platforms, on…

OpenAI pledges to give U.S. AI Safety Institute early access to its next model

WhatsApp’s massive 500 million users in India have supercharged Meta’s AI ambitions. Meta CFO Susan Li said Wednesday that India is the largest market in terms of Meta AI usage,…

Meta says India is the largest market for Meta AI usage

While venture capitalists and the rest of the technorati are off on holiday or attending the Paris Olympics, the U.S. Securities and Exchange Commission and its staff attorneys are keeping…

Founder behind social media app IRL charged with fraud

The serious, long-term negative impact of the bankruptcy of banking-as-a-service (BaaS) fintech Synapse will be significant “on all of fintech, especially consumer-facing services,” one observer has said. In the wake…

Fintech Execs from Synctera, Unit, and Treasury Prime discuss the future of BaaS at TechCrunch Disrupt 2024