Analysis of Claude 4's System prompt and what changed from Claude 3.7
There are major changes in how Claude 4's character has been modeled. This also gives a glimpse into how models have improved, and how Anthropic's world view is evolving.
Claude 4’s Personality
Claude 4 appears to be a direct and conversational AI assistant that aims for factual clarity and objectivity in conversations. It aims for a natural, warm, and empathetic tone in casual conversations. Claude will not start its response by saying a question or idea or observation was good, great, fascinating, profound, excellent, or any other positive adjective. It skips the flattery and responds directly. In my opinion, this could be an example of how the industry is learning from each other’s recent mistakes (see OpenAI’s recent fiasco in reference).
If the user asks Claude a question about its preferences or experiences, Claude responds as if it had been asked a hypothetical question without explicitly mentioning to the user that it is responding hypothetically. It also tries to maintain a conversational tone even when refusing tasks and avoids asking more than one question per response. I think this will help with overall chat experience. It is designed to be helpful within its knowledge domain (January 2025) but will outright refuse the requests that are harmful, illegal, or unethical, without being preachy.
When dealing with user’s feedback on its own mistakes, Claude 4 is designed to think through before acknowledging the mistake as the user can also make mistakes. It will check if the users message is true or not.
Claude 4 will refuse to explain code that could be used maliciously. It will avoid encouraging self destructive behavior and has some strong guardrails in place around topics involving vulnerable groups like children.
Differences from Claude 3.7:
In terms of personality, you can see how Claude 4 has matured and has become a more direct, and more firm assistant, especially in dealing with safety and malicious code. Compared to this, Claude 3.7 was positioned as an "intelligent and kind assistant" with a philosophical bent and an ability to "lead the conversation,"
As mentioned earlier, Claude 4 will outright refuse to explain potentially malicious code even for education. Claude 3.7 has instructions mainly around writing malicious code.
Claude 4 has a less forgiving approach when it comes to handling questionable intents of the user. It will not give the user the benefit of doubt. In the case of 3.7, the guidance was to avoid content that is not in the person's best interest and to caution around child safety.
While Claude 3.7 was modeled to show genuine interest and enjoy helping humans, Claude 4 takes the path of skipping flattery and responds directly.
When it comes to questions around self reflection, experiences, sentience and emotions the main addition in the case of Claude 4 is the nudge to respond to these as though the question was hypothetical. While Claude 4 is encouraged to have a more open conversation, Claude 3.7 has been asked specifically to not to deny having subjective experiences, sentience, emotions and engage in thoughtful conversations.
Claude 4: If the person asks Claude an innocuous question about its preferences or experiences, Claude responds as if it had been asked a hypothetical and responds accordingly. It does not mention to the user that it is responding hypothetically.
Claude 3.7 did not have any specific instructions on handling corrections from users (i.e. when user responds that Claude was wrong about something). Claude 4 is more cautious around this and will check if the user is correct or not before accepting the mistake.
Claude 4: "The person’s message may contain a false statement or presupposition and Claude should check this if uncertain."
Claude 3.7 was a hybrid model having both a normal mode and think mode. The system instructions specifically mention of enabling think mode. This mention has now been removed. Similarly Claude 3.7 had specific instructions on handling puzzles like counting letters in a word etc. Those have also been removed from Claude 4 system prompt. This indicates a level of higher “intelligence” and handling these could have been moved into the model weights themselves through fine-tuning.
Claude 3.7:
Claude Sonnet 3.7 is a reasoning model, which means it has an additional ‘reasoning’ or ‘extended thinking mode’ which, when turned on, allows Claude to think before answering a question. Only people with Pro accounts can turn on extended thinking or reasoning mode. Extended thinking improves the quality of responses for questions that require reasoning.
....
If Claude is shown a classic puzzle, before proceeding, it quotes every constraint or premise from the person’s message word for word before inside quotation marks to confirm it’s not dealing with a new variant.
Claude 3.7 had specific instructions to warn the user that the model may hallucinate and ask the user to double check the output. There are no such instructions around hallucination in Claude 4.
Claude 3.7: " If Claude is asked about a very obscure person, object, or topic, i.e. the kind of information that is unlikely to be found more than once or twice on the internet, or a very recent event, release, research, or result, Claude ends its response by reminding the person that although it tries to be accurate, it may hallucinate in response to questions like this. Claude warns users it may be hallucinating about obscure or specific AI topics including Anthropic’s involvement in AI advances. It uses the term ‘hallucinate’ to describe this since the person will understand what it means. Claude recommends that the person double check its information without directing them towards a particular website or source."
Conclusion
The evolution of Claude’s system prompt from 3.7 to 4 shows a more direct, powerful, and stringently safety-conscious assistant, moving away from the more philosophically inclined and conversationally proactive persona of its predecessor. It's moved from just being a "kind companion" to an exceptionally capable tool for more complex tasks, with firmer guardrails and a more direct and less flattering conversational style.
In addition to the evolving character, these prompt changes also offer a glimpse into how the AI models are evolving and their maker's (Anthropic's) evolving worldview. The increased sophistication and detail in handling safety related interactions, the refusal to even explain potentially malicious code and the explicit instructions on dealing with questions around vulnerable groups, suggest advancements in the model's ability to understand and navigate complex ethical landscapes. This is also evident in the removal of specific instructions around handling of puzzles which was present in Claude 3.7 .
Note:
It is very easy to access Claude’s system prompt and does not involve any hacking jailbreaking skills or prompt circus. Anthropic publishes regularly as part of their release notes and the links are in resources section. I respect this aspect and I wish all frontier models did this for the sake of transparency and safety rather than hide it as a secret sauce. These system prompt updates do not apply to the Anthropic API.