PRISM & Quad vs
GPT-4 & Perplexity
We ran 50 real founder briefs and 10 controlled queries through our engine and two leading AI tools. Here's what the data says about claim accuracy, hallucinations, and decision quality.
Self-conducted · External replication invited · Methodology published below
89% claim accuracy vs 61% for GPT-4o
Across six dimensions
| Dimension | PRISM / Quad | GPT-4o | Perplexity Pro |
|---|---|---|---|
| Claim accuracy % of verifiable claims correctly supported by primary sources | |||
| Hallucination rate % of cited facts traceable to no real source (lower = better) | |||
| Source quality % of citations from named, dated, accessible primary sources | |||
| Decision-change rate % of test queries where analyst reversed initial position after reading report | |||
| Adversarial flag rate % of weak/false claims correctly identified as problematic | |||
| Market size accuracy TAM/SAM figure within ±20% of independently verified figure |
Decision-change rate: 68% vs 12%
We gave analysts an initial position on each query, then showed them each tool output. 68% of the time, the PRISM / Quad report caused them to revise their position. GPT-4o: 12%. This is the metric that converts "useful" into "essential."
Methodology
We believe every claim should be traceable. That includes our own benchmark. Here's exactly how we ran it.
Limitations — read these before citing the benchmark
We run this business. We have an obvious interest in favourable numbers. We've tried to mitigate that with blind evaluation and published methodology, but you should weight accordingly.
- This is a self-conducted benchmark. It has not been independently replicated or peer-reviewed.
- The test corpus reflects our client base (predominantly UK/US, B2B SaaS and deep-tech). Results may differ in other segments.
- GPT-4o and Perplexity Pro outputs vary by prompt. We used standardised prompts; exact results depend on prompt choice.
- The hallucination rate figures are based on citations we could independently verify within 48 hours. Some citations may be correct but to paywalled sources we could not access.
- External research teams are invited to replicate. Contact us to receive the query bank and rubric.
See the difference on your own brief
The free Pulse tier takes 15 minutes and costs nothing. One brief, one verified output — you'll have a direct comparison data point in under an hour.