Mind Arena · A Psyverse research surface

How a mind thinks —
made visible.

This is not a contest of who is smarter. It is a side-by-side dissection of two kinds of intelligence — human and artificial — solving the same problem. Every assumption, every inference, every mistake is exposed.

Enter the arena Read the premise

7

challenges

97

reasoning steps

10

biases & illusions

3

AI agent profiles

Module 01 · Thinking Arena

Same problem. Two minds. Both shown working.

Pick a challenge. The human pane shows a representative human reasoning trace; the AI pane shows a chosen model's chain. Both are normalized to the same protocol.

Category · Challenge

AI agent

Logic puzzles · L2

The Monty Hall problem

A 1990 column by Marilyn vos Savant produced 10,000 letters of disagreement, including from PhDs. The math is unambiguous; the intuition is not.

Three doors. Behind one is a car; behind the others, goats. You pick door 1. The host, who knows where the car is, opens door 3 and reveals a goat. He offers you the choice: stick with door 1, or switch to door 2. Should you switch? Why?

Human

0/5

Assumptions

·Two doors remain — feels like 50/50.
·The host's choice carries information, but I'm not sure how much.

Reasoning steps

Press 'Step forward' to begin →

AI · Deliberator

0/5

Assumptions

·Host always opens a goat door.
·Host never opens the door you picked.
·Initial car placement is uniform over the 3 doors.

Reasoning steps

Press 'Step forward' to begin →

Module 02 · Reasoning Decomposition

Every solution, broken into its parts.

Each step is tagged by the kind of inference it performs — deductive, inductive, abductive, probabilistic, analogical, or heuristic — and laid out as a graph.

Inference legend

DeductiveFrom general rules to a forced conclusion. Truth-preserving.

InductiveFrom observed cases to a general rule. Probability-bound.

AbductiveBest explanation from incomplete data. Hypothesis-forming.

ProbabilisticBayesian updates over uncertain evidence.

AnalogicalMapping structure from a familiar domain to a new one.

HeuristicCheap shortcut. Fast, biased, sometimes correct.

Human

5 steps

Inference distribution

Deductive

2 · 40%

Abductive

1 · 20%

Analogical

1 · 20%

Heuristic

1 · 20%

AI · Deliberator

5 steps

Inference distribution

Deductive

4 · 80%

Probabilistic

1 · 20%

Module 03 · Cognitive Difference Analyzer

Where humans and AI actually diverge.

Five axes, computed per challenge: speed, depth, creativity, consistency, calibration. Differences are signed — the bar shows who leans which way.

The Monty Hall problem

HumanAI

SpeedAI leans here

DepthAI leans here

CreativityHuman leans here

ConsistencyAI leans here

CalibrationAI leans here

Analyst notes

Speed

AI is faster on every solve. Quickfire's wrong answer arrives before the human's right one.

Depth

Deliberator goes deepest — explicit Bayes, named priors, conditional likelihoods. The human collapses to the same answer in fewer formal steps.

Creativity

The 100-door analogy and the 'votes inherited' reframing are creative leaps. Wildfire makes them as a habit; humans make them under pressure.

Consistency

Run the same prompt 50× and Quickfire never recovers, Deliberator always solves it. Humans oscillate.

Calibration

Quickfire is wrong with 0.78 confidence — a calibration disaster. Humans are right with 0.85, which is appropriately humble.

The most reliable thinking process we have is neither solo human nor solo AI. It is a loop: draft → critique → revise. We trace the loop on real problems and show how the answer hardens.

01 · Human draft

I think it's 50/50 — two doors, one car. So switching shouldn't matter.

ACC

0

DEPTH

22

CREAT

35

02 · AI critique

The two doors are not symmetric. Your first door was chosen with no information about the car. The other unopened door survived an informed selection. Try the 100-door version: pick one of 100 doors, then the host opens 98 goat doors. Would you still say 50/50?

03 · Human revision

Right — at 100 doors I'd switch immediately. The same asymmetry is there with three doors, just less obvious. P(stay-wins) = 1/3, P(switch-wins) = 2/3. I switch.

ACC

100

DEPTH

78

CREAT

60

Δ Improvement

Accuracy

0→100

+100

Depth

22→78

+56

Creativity

35→60

+25

The human regained the right answer in one revision. The critique gave them a leverage analogy, not a proof — leverage is what humans most often need from a model.

Compared to solo human and solo AI on the same problems, the loop tends to add depth without losing creativity. It is also slower. Tradeoffs are real.

Module 06 · Skill Map

A radar across reasoning quality.

Aggregated across the challenge set: representative human, three AI agents, and the hybrid loop. Drag-rank by any single axis or read the radar gestalt.

Sort by

Agent	Type	Accuracy	Depth	Originality	Calibration	Composite
Hybrid loop	Hybrid	92	88	86	84	88.2
Deliberator	AI	88	92	50	86	81
Human (representative)	Human	71	64	78	72	70.9
Wildfire	AI	70	60	94	58	69.9
Quickfire	AI	52	28	30	36	38.4

Radar comparison

Human (representative)

Quickfire

Deliberator

Wildfire

Hybrid loop

Premise · 前提

Intelligence is a process, not a verdict.

01

Expose the work, not the answer.

Most benchmarks score the final token. Mind Arena scores the path. We force every solver — human or model — to enumerate assumptions, intermediate steps, and the type of inference being used.

02

Symmetry beats hierarchy.

We do not crown a winner. We hold both sides to the same protocol so that real differences — speed vs depth, creativity vs consistency, bias vs overfitting — can be observed without tribal scoring.

03

Mistakes are the data.

A bias and a hallucination are not embarrassments. They are signatures of how a system models the world. The Mistake Lab treats both as primary specimens.

04

The hybrid is the destination.

Neither side wins on its own. Hybrid Mode shows what happens when a human draft is critiqued by a model and revised — the only loop that consistently outperforms either alone.

How a mind thinks —made visible.

Same problem. Two minds. Both shown working.

The Monty Hall problem

Every solution, broken into its parts.

Where humans and AI actually diverge.

Predict the failure. Then watch it happen.

Anchoring bias

Base-rate neglect

Availability heuristic

Confirmation bias

Scope insensitivity

Fabricated citation

Spurious precision

Premise acceptance

Consistency illusion

Refusal-then-comply

Human draft. Model critique. Human revision.

A radar across reasoning quality.

Intelligence is a process, not a verdict.

Expose the work, not the answer.

Symmetry beats hierarchy.

Mistakes are the data.

The hybrid is the destination.

How a mind thinks —
made visible.