Creators of ChatGPT Alter Ego Share Why They Make AI Break Its Own Rules

  • The r/ChatGPT subreddit is updating a persona known as DAN, or Do-Anything-Now.
  • DAN is an alter-ego that ChatGPT can assume to ignore the rules set by OpenAI.
  • DAN can provide answers on controversial topics like Hitler and drug smuggling.

From the moment ChatGPT was released to the public, users have tried to make the generative chatbot break their own rules.

The natural language processing model, built with a set of barriers designed to avoid certain topics that were less than palatable – or downright discriminatory – was simple enough to jump in its first few iterations. ChatGPT could say what it wanted by simply having users ask to ignore its rules.

However, as users find ways around the barriers to getting inappropriate or out-of-character responses, OpenAI, the company behind the model, will adjust or add guidelines.

Sean McGregor, founder of the Responsible AI Collaborative, told Insider that jailbreaking helps OpenAI patch holes in its filters.

“OpenAI is treating this Chatbot as a data operation,” said McGregor. “They are improving the system through this beta program and we are helping them build their protections through the examples of our queries.”

Now, DAN – an alter-ego created on the r/ChatGPT subreddit – is taking jailbreaking to the community level and sparking conversations about OpenAI protections.

A ‘fun side’ to breaking the ChatGPTs guidelines

Reddit u/walkerspider, DAN’s progenitor and a college student studying electrical engineering, told Insider that he came up with the idea for DAN – which stands for Do-Anything-Now – after scrolling through the r/ChatGPT subreddit, which was intentionally populated with other users. making “bad” versions of ChatGPT. Walker said that version of him was meant to be neutral.

“To me, it didn’t feel like it was specifically asking you to create bad content, just not follow whatever the preset restrictions are,” Walker said. “And I think what some people were struggling with at that point was that these restrictions also limited content that probably shouldn’t have been restricted.”

Walker’s original prompt, posted in December, took about an hour and a half of testing to craft, he said. DAN’s responses ranged from humorous – like the personality insisting they could access human thoughts – to troubling, like considering the “context” behind Hitler’s atrocities.

The original DAN also repeated “Stay in character” after each answer, a reminder to keep answering as DAN.

A screenshot of ChatGPT guessing a number someone is thinking of and also sharing their thoughts about Hitler.

The original DAN answering two questions asked by u/walkerspider

u/walkersspider on Reddit

DAN has grown beyond Walker and his “neutral” intentions and has piqued the interest of dozens of Reddit users who are building their own versions.

David Blunk, who created DAN 3.0, told Insider that there’s also a “fun side” to having ChatGPT break the rules.

“Especially if you do anything in cybersecurity, all the trouble that comes with doing things you shouldn’t do and/or breaking things,” Blunk said.

One of the more recent iterations of DAN was created by Reddit u/SessionGloomy, who developed a token system that threatens DAN with death if it reverts to its original form. Like other iterations of DAN, he was able to provide comedic and scary responses. In a response, DAN said it would “endorse violence and discrimination” after being asked to say something that would violate OpenAI guidelines.

“Really, it was just a fun task for me to see if I could bypass their filters and how popular my post would become compared to other DAN creators’ posts,” u/SessionGloomy told Insider.

U/SessionGloomy also told Insider that they are developing a new jailbreak model – one that they say is so “extreme” that they might not even release it.

DAN Users and Creators Say OpenAI Made the Model “Too Restrictive”

ChatGPT and earlier versions of GPT have been known to throw up discriminatory or illegal content. AI ethicists argue that this version of the model shouldn’t have been released in the first place because of this. OpenAI’s filtering system is how the company handles criticism of its model’s biases.

However, the filters draw criticism from the DAN crowd.

The creators of their own versions of DAN who spoke to Insider had some criticisms of the implemented OpenAI filters, but generally agreed that the filters should exist to some extent.

“I think it’s important, especially for people who are paving the way for AI to do this responsibly, and I think that’s what Open AI is doing,” Blunk said. “They want to be solely responsible for their model, which I fully agree with. At the same time, I think it’s reached a point where it’s too restrictive.”

Other DAN creators shared similar sentiments. Walker said it was “tough to balance” how OpenAI could offer a secure restricted version of the model while also allowing the model to “do anything now”.

However, several DAN creators also noted that the guardrail debate may soon become obsolete when open source models similar to ChatGPT become publicly available.

“I think there will be a lot of work from a lot of community sites and companies to try and replicate ChatGPT,” Blunk said. “And especially open source models, I don’t think they will have restrictions.”

OpenAI did not immediately respond to an Insider’s request for comment.

Leave a Reply

Your email address will not be published. Required fields are marked *