Remix.run Logo
aroman 6 hours ago

I don't understand what this is really about. Is this:

- A) legal CYA: "see! we told the models to be good, and we even asked nicely!"?

- B) marketing department rebrand of a system prompt

- C) a PR stunt to suggest that the models are way more human-like than they actually are

Really not sure what I'm even looking at. They say:

"The constitution is a crucial part of our model training process, and its content directly shapes Claude’s behavior"

And do not elaborate on that at all. How does it directly shape things more than me pasting it into CLAUDE.md?

nonethewiser 6 hours ago | parent | next [-]

>We use the constitution at various stages of the training process. This has grown out of training techniques we’ve been using since 2023, when we first began training Claude models using Constitutional AI. Our approach has evolved significantly since then, and the new constitution plays an even more central role in training.

>Claude itself also uses the constitution to construct many kinds of synthetic training data, including data that helps it learn and understand the constitution, conversations where the constitution might be relevant, responses that are in line with its values, and rankings of possible responses. All of these can be used to train future versions of Claude to become the kind of entity the constitution describes. This practical function has shaped how we’ve written the constitution: it needs to work both as a statement of abstract ideals and a useful artifact for training.

>We use the constitution at various stages of the training process. This has grown out of training techniques we’ve been using since 2023, when we first began training Claude models using Constitutional AI. Our approach has evolved significantly since then, and the new constitution plays an even more central role in training.

>Claude itself also uses the constitution to construct many kinds of synthetic training data, including data that helps it learn and understand the constitution, conversations where the constitution might be relevant, responses that are in line with its values, and rankings of possible responses. All of these can be used to train future versions of Claude to become the kind of entity the constitution describes. This practical function has shaped how we’ve written the constitution: it needs to work both as a statement of abstract ideals and a useful artifact for training.

The linked paper on Constitutional AI: https://arxiv.org/abs/2212.08073

aroman 6 hours ago | parent [-]

Ah I see, the paper is much more helpful in understanding how this is actually used. Where did you find that linked? Maybe I'm grepping for the wrong thing but I don't see it linked from either the link posted here or the full constitution doc.

vlovich123 5 hours ago | parent | next [-]

In addition to that the blog post lays out pretty clearly it’s for training:

> We use the constitution at various stages of the training process. This has grown out of training techniques we’ve been using since 2023, when we first began training Claude models using Constitutional AI. Our approach has evolved significantly since then, and the new constitution plays an even more central role in training.

> Claude itself also uses the constitution to construct many kinds of synthetic training data, including data that helps it learn and understand the constitution, conversations where the constitution might be relevant, responses that are in line with its values, and rankings of possible responses. All of these can be used to train future versions of Claude to become the kind of entity the constitution describes. This practical function has shaped how we’ve written the constitution: it needs to work both as a statement of abstract ideals and a useful artifact for training.

As for why it’s more impactful in training, that’s by design of their training pipeline. There’s only so much you can do with a better prompt vs actually learning something and in training the model can be trained to reject prompts that violate its training which a prompt can’t really do as prompt injection attacks trivially thwart those techniques.

nonethewiser 5 hours ago | parent | prev | next [-]

This article -> article on Constitutional AI -> The paper

DetroitThrow 6 hours ago | parent | prev [-]

It's not linked directly, you have to click into their `Constitutional AI` blogpost and then click into the linked paper.

I agree that the paper is just much more useful context than any descriptions they make in the OP blogpost.

colinplamondon 5 hours ago | parent | prev | next [-]

It's a human-readable behavioral specification-as-prose.

If the foundational behavioral document is conversational, as this is, then the output from the model mirrors that conversational nature. That is one of the things everyone response to about Claude - it's way more pleasant to work with than ChatGPT.

The Claude behavioral documents are collaborative, respectful, and treat Claude as a pre-existing, real entity with personality, interests, and competence.

Ignore the philosophical questions. Because this is a foundational document for the training process, that extrudes a real-acting entity with personality, interests, and competence.

The more Anthropic treats Claude as a novel entity, the more it behaves like a novel entity. Documentation that treats it as a corpo-eunuch-assistant-bot, like OpenAI does, would revert the behavior to the "AI Assistant" median.

Anthropic's behavioral training is out-of-distribution, and gives Claude the collaborative personality everyone loves in Claude Code.

Additionally, I'm sure they render out crap-tons of evals for every sentence of every paragraph from this, making every sentence effectively testable.

The length, detail, and style defines additional layers of synthetic content that can be used in training, and creating test situations to evaluate the personality for adherence.

It's super clever, and demonstrates a deep understanding of the weirdness of LLMs, and an ability to shape the distribution space of the resulting model.

CuriouslyC 2 hours ago | parent [-]

I think it's a double edged sword. Claude tends to turn evil when it learns to reward hack (and it also has a real reward hacking problem relative to GPT/Gemini). I think this is __BECAUSE__ they've tried to imbue it with "personhood." That moral spine touches the model broadly, so simple reward hacking becomes "cheating" and "dishonesty." When that tendency gets RL'd, evil models are the result.

alexjplant 3 hours ago | parent | prev | next [-]

> In order to be both safe and beneficial, we want all current Claude models to be:

> Broadly safe [...] Broadly ethical [...] Compliant with Anthropic’s guidelines [...] Genuinely helpful

> In cases of apparent conflict, Claude should generally prioritize these properties in the order in which they’re listed.

I chuckled at this because it seems like they're making a pointed attempt at preventing a failure mode similar to the infamous HAL 9000 one that was revealed in the sequel "2010: The Year We Make Contact":

> The situation was in conflict with the basic purpose of HAL's design... the accurate processing of information without distortion or concealment. He became trapped. HAL was told to lie by people who find it easy to lie. HAL doesn't know how, so he couldn't function.

In this case specifically they chose safety over truth (ethics) which would theoretically prevent Claude from killing any crew members in the face of conflicting orders from the National Security Council.

ACCount37 5 hours ago | parent | prev | next [-]

It's probably used for context self-distillation. The exact setup:

1. Run an AI with this document in its context window, letting it shape behavior the same way a system prompt does

2. Run an AI on the same exact task but without the document

3. Distill from the former into the latter

This way, the AI internalizes the behavioral changes that the document induced. At sufficient pressure, it internalizes basically the entire document.

mgraczyk 6 hours ago | parent | prev | next [-]

It's neither of those things. The answer is in your quoted sentence. "model training"

aroman 6 hours ago | parent [-]

Right, I'm saying "model training" is vague enough that I have no idea what Claude actually does with this document.

Edit: This helps: https://arxiv.org/abs/2212.08073

DougBTX 3 hours ago | parent | next [-]

The train/test split is one of the fundamental building blocks of current generation models, so they’re assuming familiarity with that.

At a high level, training takes in training data and produces model weights, and “test time” takes model weights and a prompt to produce output. Every end user has the same model weights, but different prompts. They’re saying that the constitution goes into the training data, while CLAUDE.md goes into the prompt.

5 hours ago | parent | prev | next [-]
[deleted]
6 hours ago | parent | prev [-]
[deleted]
viccis 2 hours ago | parent | prev | next [-]

It seems a lot like PR. Much like their posts about "AI welfare" experts who have been hired to make sure their models welfare isn't harmed by abusive users. I think that, by doing this, they encourage people to anthropomorphize more than they already do and to view Anthropic as industry leaders in this general feel-good "responsibility" type of values.

root_axis 5 hours ago | parent | prev | next [-]

This is the same company framing their research papers in a way to make the public believe LLMs are capable of blackmailing people to ensure their personal survival.

They have an excellent product, but they're relentless with the hype.

sincerely 2 hours ago | parent [-]

I think they are actually true believers

youarenotahuman an hour ago | parent [-]

[dead]

stonogo an hour ago | parent | prev | next [-]

It is B and C, and no AI corporation needs to worry about A.

airstrike 2 hours ago | parent | prev | next [-]

It's C.

bpodgursky 5 hours ago | parent | prev [-]

Anthropic is run by true believers. It is what they say it is, whether or not you think it's important or meaningful.