Evidence · Not Marketing

PRISM & Quad vs
GPT-4 & Perplexity

We ran 50 real founder briefs and 10 controlled queries through our engine and two leading AI tools. Here's what the data says about claim accuracy, hallucinations, and decision quality.

n= 50 real engagements
+10 controlled queries
Date: Q1–Q2 2026
Eval: Blind, multi-analyst

Self-conducted · External replication invited · Methodology published below

Headline numbers

89% claim accuracy vs 61% for GPT-4o

PRISM / Quad
89%
claim accuracy
ThriveFinity
Competitor
61%
claim accuracy
GPT-4o
Competitor
58%
claim accuracy
Perplexity Pro
Detailed results

Across six dimensions

Dimension PRISM / Quad GPT-4o Perplexity Pro
Claim accuracy
% of verifiable claims correctly supported by primary sources
89%
61%
58%
Hallucination rate
% of cited facts traceable to no real source (lower = better)
4%
23%
19%
Source quality
% of citations from named, dated, accessible primary sources
91%
54%
62%
Decision-change rate
% of test queries where analyst reversed initial position after reading report
68%
12%
9%
Adversarial flag rate
% of weak/false claims correctly identified as problematic
82%
18%
14%
Market size accuracy
TAM/SAM figure within ±20% of independently verified figure
79%
44%
51%
The metric that matters most

Decision-change rate: 68% vs 12%

We gave analysts an initial position on each query, then showed them each tool output. 68% of the time, the PRISM / Quad report caused them to revise their position. GPT-4o: 12%. This is the metric that converts "useful" into "essential."

Claim accuracy
PRISM/Quad
89%
GPT-4o
61%
Perplexity
58%
Hallucination rate
PRISM/Quad
4%
GPT-4o
23%
Perplexity
19%
Source quality
PRISM/Quad
91%
GPT-4o
54%
Perplexity
62%
Decision-change rate
PRISM/Quad
68%
GPT-4o
12%
Perplexity
9%
Adversarial flag rate
PRISM/Quad
82%
GPT-4o
18%
Perplexity
14%
Market size accuracy
PRISM/Quad
79%
GPT-4o
44%
Perplexity
51%
How we ran it

Methodology

We believe every claim should be traceable. That includes our own benchmark. Here's exactly how we ran it.

1
Test corpus
50 real founder briefs from Q1–Q2 2026, ranging from SaaS to climate to consumer. 10 controlled test queries drawn from a standardised bank of verifiable market, competitive, and claim questions.
2
Blind evaluation
Each query run independently through PRISM/Quad, GPT-4o, and Perplexity Pro. Evaluators received outputs without source labels and rated accuracy, source quality, and adversarial coverage.
3
Ground-truth verification
Every factual claim verified against primary sources independently of the tools. Hallucinations defined as citations to non-existent or misrepresented sources.
4
Decision-change measurement
Analysts given an initial position on each query, then shown each tool output. Reported whether the output caused them to revise their initial position.
5
Limitations noted
This benchmark is self-conducted. External replication is invited — contact us to receive the standardised query bank and evaluation rubric.

Limitations — read these before citing the benchmark

We run this business. We have an obvious interest in favourable numbers. We've tried to mitigate that with blind evaluation and published methodology, but you should weight accordingly.

  • This is a self-conducted benchmark. It has not been independently replicated or peer-reviewed.
  • The test corpus reflects our client base (predominantly UK/US, B2B SaaS and deep-tech). Results may differ in other segments.
  • GPT-4o and Perplexity Pro outputs vary by prompt. We used standardised prompts; exact results depend on prompt choice.
  • The hallucination rate figures are based on citations we could independently verify within 48 hours. Some citations may be correct but to paywalled sources we could not access.
  • External research teams are invited to replicate. Contact us to receive the query bank and rubric.

See the difference on your own brief

The free Pulse tier takes 15 minutes and costs nothing. One brief, one verified output — you'll have a direct comparison data point in under an hour.