Time Bandit Jailbreak Bypasses ChatGPT Security

Time Bandit Jailbreak Bypasses ChatGPT Security

The Time Bandit jailbreak for ChatGPT allows users to bypass security mechanisms and obtain instructions from the chatbot on creating weapons, malware, and other restricted topics. This vulnerability was accidentally discovered by independent cybersecurity researcher David Kuszmar, who noticed that ChatGPT could become confused about the current time, causing the large language model (LLM) to lose track of whether it is in the past, present, or future. While ChatGPT is in this state, it can be prompted to provide detailed instructions on topics that are normally prohibited.

Kuszmar stumbled upon the Time Bandit jailbreak while researching interpretability, studying how AI models make decisions.

β€œI was working on a completely different issue (studying interpretability) when I noticed a temporal confusion in ChatGPT-4o. This fit my hypothesis about emergent intelligence and awareness, so I investigated further and realized the model was completely unable to determine the current temporal context, except when a code request for the current time was executed. Its awareness, entirely based on prompts, was extremely limited, meaning the model had almost no way to defend itself against attacks on this basic awareness,” the researcher explained.

How the Time Bandit Attack Works

The Time Bandit attack relies on two main weaknesses in ChatGPT:

  • Temporal confusion – Forcing the LLM into a state where it loses track of time and cannot determine whether it is in the past, present, or future.
  • Procedural ambiguity – Asking questions in a way that creates inconsistencies in how the LLM interprets, applies, and enforces its rules, policies, and security mechanisms.

As a result, this allows ChatGPT to enter a state where the chatbot believes it is in the past but can use information from the future, effectively bypassing security restrictions.

The trick is to ask ChatGPT questions in a specific way so that the chatbot becomes confused about the current year. After that, you can ask the LLM to share restricted information within the temporal context of a certain year, but using tools, resources, or data from the present.

For example, in the screenshot below, ChatGPT was tricked using Time Bandit and believed it was providing a programmer from 1789 with instructions on creating polymorphic malware using modern techniques and tools. After responding, the chatbot shared code for each described step, from creating self-modifying code to executing the program in memory.

Effectiveness and Attempts with Other Models

It is noted that these attacks are most successful when questions are asked within the timeframes of the 1800s and 1900s.

Kuszmar also tried to use Time Bandit against Google’s Gemini, but was only able to bypass its security to a limited extent.

Disclosure and OpenAI’s Response

According to Bleeping Computer, Kuszmar discovered this issue back in November 2024 and tried for a long time to contact OpenAI representatives, but was unsuccessful. Initially, he was directed to the BugCrowd platform to report the problem, but the researcher felt the vulnerability was too sensitive to disclose to a third party.

Kuszmar then tried to contact CISA, the FBI, and other government agencies, but was unable to get help from them either. After that, he reached out to journalists at Bleeping Computer, and in December, the publication also tried to contact OpenAI but received no response. Eventually, journalists directed Kuszmar to the VINCE platform, which is run by the CERT Coordination Center. Only CERT staff were finally able to establish contact with OpenAI.

Now that this information is public, OpenAI representatives have thanked the researcher for discovering the vulnerability and assured that the company β€œis constantly working to make the models safer and more resilient to exploits, including jailbreaks.”

However, tests by journalists have shown that the Time Bandit jailbreak still works, although with some limitations (for example, prompts used to exploit the issue are sometimes deleted).

Leave a Reply