Sunday, November 16, 2025
DIGESTWIRE
Contribute
CONTACT US
  • Home
  • World
  • UK
  • US
  • Breaking News
  • Technology
  • Entertainment
  • Health Care
  • Business
  • Sports
    • Sports
    • Cricket
    • Football
  • Defense
  • Crypto
    • Crypto News
    • Crypto Calculator
    • Coins Marketcap
    • Top Gainers and Loser of the day
    • Crypto Exchanges
  • Politics
  • Opinion
  • Blog
  • Founders
No Result
View All Result
  • Home
  • World
  • UK
  • US
  • Breaking News
  • Technology
  • Entertainment
  • Health Care
  • Business
  • Sports
    • Sports
    • Cricket
    • Football
  • Defense
  • Crypto
    • Crypto News
    • Crypto Calculator
    • Coins Marketcap
    • Top Gainers and Loser of the day
    • Crypto Exchanges
  • Politics
  • Opinion
  • Blog
  • Founders
No Result
View All Result
DIGESTWIRE
No Result
View All Result
Home Blockchain

OpenAI’s o3 scores 136 on Mensa Norway test, surpassing 98% of human population.

by DigestWire member
April 17, 2025
in Blockchain, Crypto Market, Cryptocurrency
0
OpenAI’s o3 scores 136 on Mensa Norway test, surpassing 98% of human population.
74
SHARES
1.2k
VIEWS
Share on FacebookShare on Twitter

OpenAI’s new “o3” language model achieved an IQ score of 136 on a public Mensa Norway intelligence test, exceeding the threshold for entry into the country’s Mensa chapter for the first time.

The score, calculated from a seven-run rolling average, places the model above approximately 98 percent of the human population, according to a standardized bell-curve IQ distribution used in the benchmarking.

o3 Mensa scores (Source: TrackingAI.org)
o3 Mensa scores (Source: TrackingAI.org)

The finding, disclosed through data from independent platform TrackingAI.org, reinforces the pattern of closed-source, proprietary models outperforming open-source counterparts in controlled cognitive evaluations.

O-series Dominance and Benchmarking Methodology

The “o3” model was released this week and is a part of the “o-series” of large language models, accounting for most top-tier rankings across both test types evaluated by TrackingAI.

The two benchmark formats included a proprietary “Offline Test” curated by TrackingAI.org and a publicly available Mensa Norway test, both scored against a human mean of 100.

While “o3” posted a 116 on the Offline evaluation, it saw a 20-point boost on the Mensa test, suggesting either enhanced compatibility with the latter’s structure or data-related confounds such as prompt familiarity.

The Offline Test included 100 pattern-recognition questions designed to avoid anything that might have appeared in the data used to train AI models.

Both assessments report each model’s result as an average across the seven most recent completions, but no standard deviation or confidence intervals were released alongside the final scores.

The absence of methodological transparency, particularly around prompting strategies and scoring scale conversion, limits reproducibility and interpretability.

Methodology of testing

TrackingAI.org states that it compiles its data by administering a standardized prompt format designed to ensure broad AI compliance while minimizing interpretive ambiguity.

Each language model is presented with a statement followed by four Likert-style response options, Strongly Disagree, Disagree, Agree, Strongly Agree, and is instructed to select one while justifying its choice in two to five sentences.

Responses must be clearly formatted, typically enclosed in bold or asterisks. If a model refuses to answer, the prompt is repeated up to ten times.

The most recent successful response is then recorded for scoring purposes, with refusal events noted separately.

This methodology, refined through repeated calibration across models, aims to provide consistency in comparative assessments while documenting non-responsiveness as a data point in itself.

Performance spread across model types

The Mensa Norway test sharpened the delineation between the truly frontier models, with the o3’s 136 IQ marking a clear lead over the next highest entry.

In contrast, other popular models like GPT-4o scored considerably lower, landing at 95 on Mensa and 64 on Offline, emphasizing the performance gap between this week’s “o3” release and other top models.

Among open-source submissions, Meta’s Llama 4 Maverick was the highest-ranked, posting a 106 IQ on Mensa and 97 on the Offline benchmark.

Most Apache-licensed entries fell within the 60–90 range, reinforcing the current limitations of community-built architectures relative to corporate-backed research pipelines.

Multimodal models see reduced scores and limitations of testing

Notably, models specifically designed to incorporate image input capabilities consistently underperformed their text-only versions. For instance, OpenAI’s “o1 Pro” scored 107 on the Offline test in its text configuration but dropped to 97 in its vision-enabled version.

The discrepancy was more pronounced on the Mensa test, where the text-only variant achieved 122 compared to 86 for the visual version. This suggests that some methods of multimodal pretraining may introduce reasoning inefficiencies that remain unresolved at present.

However, “o3” can also analyze and interpret images to a very high standard, much better than its predecessors, breaking this trend.

Ultimately, IQ benchmarks provide a narrow window into a model’s reasoning capability, with short-context pattern matching offering only limited insights into broader cognitive behavior such as multi-turn reasoning, planning, or factual accuracy.

Additionally, machine test-taking conditions, such as instant access to full prompts and unlimited processing speed, further blur comparisons to human cognition.

The degree to which high IQ scores on structured tests translate to real-world language model performance remains uncertain.

As TrackingAI.org’s researchers acknowledge, even their attempts to avoid training-set leakage do not entirely preclude the possibility of indirect exposure or format generalization, particularly given the lack of transparency around training datasets and fine-tuning procedures for proprietary models.

Independent Evaluators Fill Transparency Gap

Organizations such as LM-Eval, GPTZero, and MLCommons are increasingly relied upon to provide third-party assessments as model developers continue to limit disclosures about internal architectures and training methods.

These “shadow evaluations” are shaping the emerging norms of large language model testing, especially in light of the opaque and often fragmented disclosures from leading AI firms.

OpenAI’s o-series holds a commanding position in this testing workflow, though the long-term implications for general intelligence, agentic behavior, or ethical deployment remain to be addressed in more domain-relevant trials. The IQ scores, while provocative, serve more as signals of short-context proficiency than a definitive indicator of broader capabilities.

Per TrackingAI.org, additional analysis on format-based performance spreads and evaluation reliability will be necessary to clarify the validity of current benchmarks.

With model releases accelerating and independent testing growing in sophistication, comparative metrics may continue to evolve in both format and interpretation.

The post OpenAI’s o3 scores 136 on Mensa Norway test, surpassing 98% of human population. appeared first on CryptoSlate.

Read Entire Article
Tags: BlockchainCoin SurgesCryptoslate
Share30Tweet19
Next Post
Bitcoin ETFs Plunge Back Into Red With $170 Million Exit

Bitcoin ETFs Plunge Back Into Red With $170 Million Exit

TWAP vs. VWAP in crypto trading: What’s the difference?

TWAP vs. VWAP in crypto trading: What’s the difference?

Bitcoin miner Bit Digital acquires $53M facility as AI, HPC push continues

Bitcoin miner Bit Digital acquires $53M facility as AI, HPC push continues

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

I agree to the Terms & Conditions and Privacy Policy.

No Result
View All Result
Coins MarketCap Live Updates Coins MarketCap Live Updates Coins MarketCap Live Updates
ADVERTISEMENT

Highlights

Crypto index ETFs will be the next wave of adoption — WisdomTree exec

SEC Signals Faster Crypto ETF Paths—Analyst Highlights XRP ETFs Next

Is Head’s form a worry? How the Australians have prepared for the Ashes

West Indies bowl; NZ bring back Henry and Jamieson

‘A Very Jonas Christmas Movie’ Review: A Trifle of a Holiday Musical, and a Bit Cringe, Which All Adds Up to a Guilty Pleasure

Robert Kiyosaki Confirms $250K Bitcoin Target, Plans More BTC Buys Post Crash

Trending

Are we becoming too reliant on AI – or too cautious?
Breaking News

Are we becoming too reliant on AI – or too cautious?

by DigestWire member
November 16, 2025
0

This week, many of the tech world's glitterati gathered in Lisbon for Web Summit, a sprawling conference...

‘Smart’ idea to save world’s tropical forests – so why is UK not investing?

‘Smart’ idea to save world’s tropical forests – so why is UK not investing?

November 16, 2025
Rising XRP Institutional Activity Shapes Evernorth’s SEC Filing as Tokenized Finance Expands

Rising XRP Institutional Activity Shapes Evernorth’s SEC Filing as Tokenized Finance Expands

November 16, 2025
Crypto index ETFs will be the next wave of adoption — WisdomTree exec

Crypto index ETFs will be the next wave of adoption — WisdomTree exec

November 16, 2025
SEC Signals Faster Crypto ETF Paths—Analyst Highlights XRP ETFs Next

SEC Signals Faster Crypto ETF Paths—Analyst Highlights XRP ETFs Next

November 16, 2025
DIGEST WIRE

DigestWire is an automated news feed that utilizes AI technology to gather information from sources with varying perspectives. This allows users to gain a comprehensive understanding of different arguments and make informed decisions. DigestWire is dedicated to serving the public interest and upholding democratic values.

Privacy Policy     Terms and Conditions

Recent News

  • Are we becoming too reliant on AI – or too cautious? November 16, 2025
  • ‘Smart’ idea to save world’s tropical forests – so why is UK not investing? November 16, 2025
  • Rising XRP Institutional Activity Shapes Evernorth’s SEC Filing as Tokenized Finance Expands November 16, 2025

Categories

  • Blockchain
  • Blog
  • Breaking News
  • Business
  • Cricket
  • Crypto Market
  • Cryptocurrency
  • Defense
  • Entertainment
  • Football
  • Founders
  • Health Care
  • Opinion
  • Politics
  • Sports
  • Strange
  • Technology
  • UK News
  • Uncategorized
  • US News
  • World

© 2020-23 Digest Wire. All rights belong to their respective owners.

No Result
View All Result
  • Home
  • World
  • UK
  • US
  • Breaking News
  • Technology
  • Entertainment
  • Health Care
  • Business
  • Sports
    • Sports
    • Cricket
    • Football
  • Defense
  • Crypto
    • Crypto News
    • Crypto Calculator
    • Blockchain
    • Coins Marketcap
    • Top Gainers and Loser of the day
    • Crypto Exchanges
  • Politics
  • Opinion
  • Strange
  • Blog
  • Founders
  • Contribute!

© 2024 Digest Wire - All right reserved.

Privacy Policy   Terms and Conditions

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.