Anthropic Publishes A "system Demand" That Makes Claude Act.

Anthropic publishes a “system demand” that makes Claude act.

Last updated: 2024/08/26 at 8:54 PM

Paper Plus Media

In fact, generative AI models are not like humans. They have no intelligence or personality—they are simply statistical systems that predict the most likely next words in a sentence. But like interns in an authoritarian workplace, they cannot predict the next words. He does Follow the instructions without complaint – including the initial “system prompts” that prepare the models with their basic qualities, and what they should and should not do.

Every generative AI company, from OpenAI to Anthropic, uses system prompts to prevent (or at least try to prevent) models from behaving badly, and to guide the overall tone and sentiment of models’ responses. For example, they might tell a model to be polite but never apologize.

But vendors are usually careful not to reveal system prompts—perhaps for competitive reasons, but also perhaps because knowledge of system prompts might suggest ways to circumvent them. The only way to reveal system prompts in GPT-4o, for example, is through a system prompt injection attack. (And even then, the system’s output can’t be fully trusted.)

However, in its ongoing efforts to portray itself as a more ethical and transparent vendor of AI, Published The system prompts you for its latest models (Claude 3.5 Opus, Sonnet, and Haiku) in the Claude apps for iOS, Android, and on the web.

Alex Albert, Anthropic’s head of developer relations, said in a post on X that Anthropic plans to make this type of disclosure a regular thing as the system’s prompts are updated and fine-tuned.

We’ve added a new System Prompts release notes section to our documentation. We’ll be recording changes we make to the default System Prompts on Claude dot ai and our mobile apps. (The System Prompt does not impact the API.) pic.twitter.com/9mBwv2SgB1

— Alex Albert (@alexalbert__) August 26, 2024

The latest claims, dated July 12, very clearly state what Claude cannot do – for example, “Clude cannot open URLs, links, or videos.” Facial recognition is a complete no-no; the system prompt for Claude 3.5 Opus tells the model to “always respond as if it were completely blind to faces” and to “avoid identifying or naming any humans in [images]”.”

But these guidelines also describe certain personality traits and characteristics—traits and characteristics that Anthropic wants models to demonstrate.

For example, Claude’s instructions state that he should appear to be “highly intelligent and intellectually curious,” and that he “enjoys hearing what people think about an issue and engaging in discussion on a wide range of topics.” Claude is also instructed to approach controversial topics with impartiality and objectivity, to provide “careful thoughts” and “clear information”—and to never begin a response with “definitely.”

It seems a bit strange to this human being: these systematic demands, which are written as an actor might write in a play, Personality analysis sheetThe artwork ends with the phrase “Claude is now connected to a human,” giving the impression that Claude is some kind of consciousness on the other end of the screen whose sole purpose is to fulfill the whims of his human conversation partners.

But this is an illusion, of course. If the references to Claude tell us anything, it is that these models become, without any human guidance or manual assistance, just blank, eerie pages.