OpenAI’s Latest AI Models Report High Hallucination Rates

OpenAI’s Latest AI Models Report High Hallucination Rates

17-05-2025

 

  1. In April 2025, OpenAI released a technical report indicating that its latest AI models, namely o3 and o4-mini, are hallucinating (producing incorrect or fabricated outputs) more frequently than their predecessors.
  2. This revelation challenges the assumption that newer AI models are always more accurate and reliable.

What is AI?

  1. AI performs tasks requiring human-like intelligence: learning, reasoning, decision-making.
  2. Subfields:
    • ML: Learns patterns from data.
    • Deep Learning: Uses neural networks for complex data like images or text.

Key Findings

Model

Hallucination Rate (PersonQA Benchmark)

o3

33%

o4-mini

48%

Older Models

Lower, more stable error rates

  1. The PersonQA benchmark tests a model’s ability to answer questions about public figures.
  2. OpenAI stated that it currently does not know why these newer models hallucinate more.

What are AI Hallucinations ?

  1. “Hallucination” in AI refers to incorrect, fabricated, or misleading outputs generated by AI models.
  2. Originally referred to clearly false information (e.g., citing non-existent court cases).
  3. Now includes any type of factual or contextual error, including:
    • False data
    • Misinterpretation
    • Irrelevant but factually correct responses

Notable Incident:

  1. In June 2023, a lawyer in the U.S. used ChatGPT to prepare a court filing which included fake legal citations.
  2. The cases it referenced did not exist, drawing global attention to the issue.

Why Do AI Hallucinations Occur?

  1. Models like ChatGPT, o3, o4-mini, Gemini, Perplexity, Grok use Large Language Models (LLMs).
  2. These LLMs are trained on massive datasets from the internet.
  3. They predict responses by analyzing patterns, not by verifying facts.
  4. Technical Insights:
    1. LLMs do not have real-world understanding.
    2. They:
      1. Identify which words tend to appear together.
      2. Generate likely sequences based on input prompts.
    3. Cannot cross-check answers like humans or search engines.
    4. Errors happen when:
      1. Training data includes inaccurate information.
      2. Models combine data in unexpected or incorrect ways.

Why Is the Report Significant?

  1. AI labs earlier claimed that hallucinations would reduce as models evolved.
  2. Initially, newer models did hallucinate less, reinforcing this belief.
  3. OpenAI’s latest models are hallucinating more, not less.
  4. Similar patterns seen in other companies:
    1. DeepSeek’s R-1 model (Chinese startup) shows double-digit increases in hallucination rate over older versions.

Implications:

  1. AI applications must be limited in high-stakes fields:
    1. Legal domain: Risk of citing fake cases.
    2. Academic research: Potential for generating false citations.
    3. Medical advice: Could endanger lives if hallucinations go unchecked.

Ethical & Legal Dimensions

  1. Misinformation risks: Hallucinated outputs can fuel fake news or misinformation.
  2. Legal liabilities: Who is responsible if AI provides incorrect advice?
  3. Data privacy: Models trained on public data may inadvertently reveal private or sensitive information.

What is OpenAI?

  1. An AI research organization founded in 2015.
  2. Mission: Ensure AGI (Artificial General Intelligence) benefits all of humanity.
  3. Key products: GPT models, ChatGPT, Codex, DALL·E.
  4. What are GPT, o3, o4-mini?
    1. GPT stands for Generative Pre-trained Transformer.
    2. o3 is OpenAI’s most powerful system as of 2025.
    3. o4-mini is a lighter, faster model.
  5. Why is this globally relevant?
    1. AI systems like GPT are widely used in:
      • Customer service
      • Healthcare (AI diagnosis)
      • Legal tech (contract review, legal research)
      • Education (AI tutors)

What is an LLM (Large Language Model)?

  1. A Large Language Model (LLM) is an AI system trained to understand and generate human language using statistical patterns learned from large text datasets.
  2. It's powerful, but not perfect—prone to hallucinations and lacks real-world understanding.

Key Features of LLMs:

Feature

Description

Scale

Trained on billions to trillions of words from books, websites, forums, etc.

Learning Method

Uses unsupervised learning to detect patterns, grammar, and context in text.

Function

Can generate, translate, summarise, or complete text, and even answer questions.

Architecture

Most LLMs today are based on Transformer architecture (introduced by Google in 2017).

Examples of LLMs:

Model

Company

Notes

GPT-4 / o3 / o4-mini

OpenAI

Powers ChatGPT

Gemini (formerly Bard)

Google DeepMind

Multi-modal model

Claude 3

Anthropic

Focuses on safety and alignment

LLaMA

Meta

Open-source foundation model

Mistral

Mistral AI

Lightweight, open-source LLM

How Do LLMs Work?

  1. LLMs are trained to predict the next word in a sentence based on the context of previous words.
  2. They use probabilistic methods to choose the most likely word.
  3. They don’t understand meaning like humans do; they recognize statistical patterns.

India’s AI Revolution: A Roadmap to Viksit Bharat

  1.  India aims to become a global leader in AI, ensuring inclusive growth.
  2. AI viewed as a transformative “intelligent utility” like electricity.
  3. India’s INDIAai Mission, backed by over ₹10,000 crore in funding, emphasizes safe and trusted AI development, ensuring accountability and ethics in AI governance through frameworks like the Digital Personal Data Protection (DPDP) Act.
  4. Through initiatives like the AI Ethical Certification Project and Privacy Enhancing Strategy Project, India is working to ensure AI fairness and privacy preservation, with the long-term goal of leveraging AI for social good in sectors like healthcare, education, and agriculture.
  5.  India’s #AIforAll Strategy :
    1. Indigenous GPU development planned within 3-5 years to reduce reliance on imports.
    2. Affordable compute access: Subsidized rate of ₹100 per hour compared to global rates of $2.5–3 per hour.
    3. Construction of 5 semiconductor plants to support AI innovation and strengthen India’s electronics sector.
    4. Development of a high-performance computing facility with 18,693 GPUs (one of the largest globally).
    5. 10,000 GPUs already available, more to be added soon for indigenous AI solutions.
    6. Launch of open GPU marketplace for startups, researchers, and students.

Advancing AI with Open Data and Centres of Excellence (CoE)

  1. IndiaAI Dataset Platform launched for access to non-personal, anonymised datasets.
  2. Reducing barriers to AI innovation with large-scale datasets for sectors like agriculture, weather forecasting, and traffic management.
  3. Centres of Excellence (CoE):
    1. 3 established in Healthcare, Agriculture, and Sustainable Cities.
    2. 4th CoE announced for AI in Education (₹500 crore budget).
  4. Skilling initiatives: Five National Centres of Excellence for Skilling to equip youth with AI industry skills in collaboration with global partners.

India’s AI Models & Language Technologies

  1. Development of indigenous foundational AI models (LLMs, SLMs) tailored to India’s needs.
  2. Key initiatives include:
    1. Digital India BHASHINI: Language translation platform for Indian languages (voice-based).
    2. BharatGen: Multimodal LLM to enhance public services.
    3. Sarvam-1: AI model supporting 10 major Indian languages for translation, summarisation, and content generation.
    4. Chitralekha: Open-source video transcreation platform for Indic languages.
    5. Hanooman’s Everest 1.0: Supports 35 Indian languages, expanding to 90.

AI Integration with Digital Public Infrastructure (DPI)

  1. DPI platforms like Aadhaar, UPI, and DigiLocker integrated with AI for enhanced public services.
  2. Mahakumbh 2025: AI-driven solutions to manage the world’s largest human gathering.
    • AI tools optimized railway passenger movement and crowd management.
    • Bhashini-powered Kumbh Sah’AI’yak Chatbot provided real-time translation, lost-and-found services, and multilingual assistance.

AI Talent & Workforce Development

  1. AI education expansion through IndiaAI Future Skills initiative across undergraduate, postgraduate, and Ph.D. programs.
  2. India ranks 1st in AI skill penetration globally (Stanford AI Index 2024).
  3. 14-fold increase in AI-skilled workforce from 2016 to 2023, with a projected 1 million AI professionals by 2026.
  4. Women in AI: India leads in AI skill penetration for women.
  5. Establishment of Data and AI Labs in Tier 2 and Tier 3 cities.

AI Adoption & Industry Growth

  1. Generative AI (GenAI) ecosystem growing rapidly:
    • 80% of Indian companies consider AI a core strategic priority (BCG).
    • 69% of businesses plan to increase AI investments in 2025.
    • 78% of SMBs using AI report revenue growth.
  2. India’s AI market projected to grow at CAGR of 25-35%.
  3. India hosts 520+ tech incubators and accelerators globally, with 42% established in the past 5 years.
  4. AI-focused accelerators like T-Hub MATH support 60+ startups.

Pragmatic AI Regulation Approach

  1. India adopts a balanced AI regulation that encourages innovation while addressing ethical concerns.
  2. Techno-legal approach: Funding top universities and IITs to develop solutions for deep fakes, privacy risks, and cybersecurity challenges.

Anti-Submarine Warfare Shallow Water Craft INS Arnala

Astronomers Detect Eleven New Active Galactic Nuclei

Scientists Make Strange 2d Metals Sought for Future Technologies