Skip to content
← Back

02 — Research

Belief Agent

A study in how artificial agents should reason — and communicate — when they are uncertain and the information they are handed might be wrong.

Premise

Most models output answers. Belief Agent instead gives each agent an explicit probability distribution over the world, a measure of its own uncertainty — Shannon entropy — and a channel to talk to another agent. Then it stress-tests what happens when two agents with different-quality evidence have to agree.

The interesting failures aren't in the inference math. They're in the communication: how confident-but-wrong sources, ambiguous language, and naive memory quietly corrupt a system that would have done fine on its own.

The Setup

Each episode hides a true goal — one of three states, A, B, or C. Two agents try to infer it. A Sensor agent reads a noisy observation that points at the truth roughly 70% of the time. A Language agent receives natural-language clues — "not B", "either A or C" — that are frequently ambiguous or outright misleading.

Both maintain a belief vector, update it with Bayes' rule, and broadcast a message each step. The catch: every message is weighted by the sender's precision — how low-entropy, how confident, it currently is. An uncertain agent's message should move you less than a certain one's.

Inference Under Uncertainty

The first version used hard logical constraints. A clue like "either B or C" drove A's probability to exactly zero — and once Bayes multiplies a hypothesis by zero, nothing can revive it. A single misleading clue could permanently eliminate the correct answer, and two agents in conversation would reinforce each other straight into a confident, wrong consensus — a belief-collapse attractor.

3-D loss surface · drag to rotate ⟲

wrong attractortrue goal (A)

top-down contour · trajectories descend into the basins

BELIEF AS GRADIENT DESCENT — HARD CONSTRAINTS DEEPEN A WRONG-ATTRACTOR BASIN, AND CASCADING COMMS ROLL BOTH AGENTS INTO IT

The fix was epistemic humility, encoded numerically: replace hard zeros with a small probability floor so no hypothesis is ever truly dead, soften "not" messages so they nudge rather than veto, and weight every incoming message by the sender's precision. Trusting confident sources more than uncertain ones did most of the work.

The sharpest result came from the communication topology itself. The sensor stream is strong and compounds evidence over time; the language stream is weak and often misleading. Symmetric two-way fusion dragged the strong agent down toward the weak one. Letting the weak agent listen to the strong one — but not the reverse — kept the sensor intact while pulling the language agent up.

Sensor agent sends a precision-weighted message to the Language agent in one direction onlySensor agent (S)obs → truth ~70%low entropy · high precisionLanguage agent (L)noisy / ambiguous clueshigh entropy · listens to Sprecision-weighted messageS ignores L ✕
STRONG → WEAK, ONE WAY · THE TOPOLOGY THAT AVOIDS CONTAMINATING THE SENSOR STREAM
Joint accuracy (both agents correct) across noise levels, by communication topology
NoiseNo commBidirectionalUnidirectional
0.100.7850.8690.938
0.200.5940.7110.796
0.300.3570.4680.606
0.400.1830.2690.415
0.500.0670.1460.208
JOINT ACCURACY (BOTH AGENTS CORRECT) — UNIDIRECTIONAL WINS AT EVERY NOISE LEVEL ABOVE ZERO

Under noise, unidirectional communication delivered up to a 2–4× improvement in joint accuracy over bidirectional fusion, with no sign of the herding that makes two agents agree more often than they're right.

The Memory Trap

Adding episodic memory — letting an agent recall what it did in similar past situations — backfired. Naive memory weighted recall by belief and cue similarity times confidence, so it learned the most frequent action, not the most useful one. Under noise it became a habit system, reflexively repeating common messages and ignoring current evidence. With memory on, the language agent was measurably less accurate than without it — imitation learning of its own behavior, which is exactly how habits form.

Three changes turned memory from a liability into a gain: label each stored step with whether the episode actually succeeded (so the agent imitates wins, not frequencies), gate retrieval on entropy (only lean on memory when genuinely uncertain), and give the agent a learned reliability model that tracks whether its language source has been truthful so far this episode — up-weighting it when consistent, down-weighting it when it contradicts the sensor stream.

What It Shows

  1. 01 · Communication isn't neutral. It can contaminate a high-quality evidence stream as easily as it rescues a weak one.
  2. 02 · Trust should scale with the sender's uncertainty. Adaptive, precision-weighted trust beats any fixed weight.
  3. 03 · Under asymmetric reliability, unidirectional precision-weighted communication is strictly more robust than symmetric fusion.

These are the same failure modes that show up in multi-agent LLM systems, sensor fusion, and human committees: confident-but-wrong voices, herding, and memory that rewards the familiar over the correct. Belief Agent is a small, legible sandbox for reasoning about all three.