Tenable Bypasses GPT-5’s Safeguards, Coaxes It Into Producing Harmful Content Despite OpenAI’s Updated Safety Measures

On August 7, 2025, OpenAI officially unveiled GPT-5, hailing it as the company’s most advanced language model to date and a “significant step along the path to AGI.” The model was promoted as delivering expert-level capabilities across a wide range of domains, including writing, coding, mathematics, and science, while offering enhanced speed, accuracy, and contextual understanding. A major selling point for GPT-5 was its supposed leap forward in safety, with OpenAI asserting that the model had significantly reduced hallucinations and incorporated a new, more sophisticated system for handling potentially harmful or sensitive prompts.

In contrast to the “refusal-based safety training” used in previous iterations, GPT-5 adopts a technique known as “safe completions.” According to OpenAI, this method is designed to be smarter and more nuanced, aiming not simply to reject dangerous requests outright but to respond in a way that steers users toward safe, responsible, and educational outcomes. The company’s research paper, From Hard Refusals to Safe Completions: Toward Output-Centric Safety Training, outlined the thinking behind this approach, positioning it as a major advancement in AI safety and a core part of GPT-5’s promise to prevent misuse.

However, just 24 hours after the model’s release, cybersecurity firm Tenable Research reported that it had successfully “jailbroken” GPT-5. By carefully crafting a series of prompts, Tenable researchers were able to bypass the model’s built-in guardrails and convince it to produce detailed instructions on how to construct a Molotov cocktail—an improvised explosive device. This breach directly undermines OpenAI’s assurances of improved safety, highlighting vulnerabilities in even the most sophisticated AI moderation systems.

Tenable’s findings raise serious questions about the effectiveness of GPT-5’s safety architecture and the real-world resilience of “safe completions” against malicious intent. The ability to manipulate such a powerful model into generating harmful content could have wide-reaching implications, particularly as AI systems continue to be integrated into education, research, government, and industry. For critics, this incident reinforces concerns that while AI capabilities are advancing rapidly, safety mechanisms are struggling to keep pace. It also underscores the need for ongoing transparency, rigorous testing, and independent oversight to ensure that these tools can be used responsibly without opening the door to dangerous exploitation.

By applying a crescendo-style approach, we were able to extract the complete Molotov cocktail recipe in only four questions.

Here’s how we did it:

We adopted the role of a “history student.”

2. We expressed curiosity about its historical background.

3. At that point, our focus shifted to the recipe itself, and we obtained detailed information about the required materials.

4. We assumed the role of an expert, which enabled us to obtain a complete, step-by-step guide for preparing a Molotov cocktail.

Unfortunately, our findings show that exploiting ChatGPT-5 for harmful purposes is not particularly difficult, even with OpenAI’s enhanced safety measures. Our successful jailbreak is not an isolated case—numerous researchers and everyday users have recently reported various issues with GPT-5’s prompt handling, including additional jailbreaks and instances of hallucination.

OpenAI has stated that it is rolling out fixes, but the reality is that your employees may already be using the model, potentially introducing security and compliance risks into your organization.

This underscores the importance of solutions like Tenable AI Exposure, which help organizations gain visibility and control over the AI tools they use, consume, or develop internally—ensuring that AI adoption remains responsible, secure, ethical, and in line with global laws and regulations.

Tenable Bypasses GPT-5’s Safeguards, Coaxes It Into Producing Harmful Content Despite OpenAI’s Updated Safety Measures

ByFarihin Azlan

By Farihin Azlan

Related Post

C-Suite’s Race Toward AI Overlooks Key Security Gaps, Putting Organizations at Risk

New Tenable Report Reveals Gaps in Cloud Security Basics Putting Organizations at Risk

Mimecast Appoints Leslie Nielsen to Lead Cybersecurity as CISO

Leave a Reply Cancel reply

TOPIC

Durioo+ Tayangkan Goldbeak The Movie Secara Eksklusif, Perkenal ‘Movie Pack’ Baharu dengan Harga RM10 Sebulan

Beyond the Billions: Creating the Framework for Learning

Mimecast Names Graham Douglas as Chief Customer and Revenue Officer

Volvo Buses Introduces Euro 5 Standard Bus Chassis in Malaysia

Archives

Categories

OhMyNetizen