Skip to content

Human-in-the-Loop Oracle

The human-in-the-loop (HITL) oracle is a reference oracle implementation that routes contested claims to a human reviewer rather than an automated resolver. It implements the OraclePort duck-typed protocol in Python and is the pattern used by the mempill demo’s /review command.

HITL is appropriate when:

  • The claim concerns high-criticality facts where automated resolution is unsafe (e.g., medication allergies, financial status).
  • No automated oracle is available or confident enough.
  • The belief is Contested and the correct value must come from a human with domain knowledge.

The HITL oracle is stateless. It does not maintain its own queue. When the engine calls request_adjudication(), the oracle generates a correlation UUID and returns it immediately. The engine writes the full AdjudicationRequest (including both conflicting values) to the durable pending_adjudications table. The review UI reads from that table — not from the oracle object.

[Conflict detected by engine]
oracle.request_adjudication() called
HumanOracle generates handle_id (UUID), returns immediately
Engine writes {handle_id, incumbent_value, challenger_value, ...}
to pending_adjudications DB table (durable across restart)
[Out-of-band review]
Human calls engine.list_pending_adjudications()
Human inspects incumbent vs. challenger values
Human picks challenger / incumbent / skip / abstain
engine.submit_adjudication(handle_id, verdict)
Disposition flips atomically (I9); handle consumed

When a human reviews a pending conflict, four choices are available:

Choice Verdict sent Engine behavior
challenger (c) Affirm Challenger value wins; incumbent bounded to Superseded. Resolution carries External(ExternalFirstHand) provenance.
incumbent (i) Deny Incumbent confirmed; challenger bounded to Superseded.
skip (s) (none — defer) Handle left pending. No engine call. Reappears on next /review.
abstain (a) Unknown Undecidable; handle consumed; both claims remain Contested. Removed from the queue.

Human verdicts are submitted with evidence_provenance: External(ExternalFirstHand). This is intentional: the human is acting as an external authority resolving an ambiguous fact. The resolved claim carries maximum external weight — it can overturn future conflicting claims via the normal gate logic without re-escalating to the oracle.

UserAsserted provenance is for claims ingested directly via ingest_claim. ExternalFirstHand is for oracle-resolved claims, including HITL verdicts.

The review queue survives process restarts because the pending_adjudications table is stored in the SQLite file. HumanOracle itself is stateless — it holds no in-memory dict. After a restart:

  1. engine.list_pending_adjudications() reads the DB directly and returns all status='pending' rows.
  2. The review queue is fully intact — the human sees the same pending items as before the restart.
import mempill
from mempill_demo.adapters.human_oracle import HumanOracle
# Wire the HITL oracle at construction time.
oracle = HumanOracle()
engine = mempill.open_oracle("/path/to/agent.db", oracle)
# Ingest a conflicting claim — oracle is present, so it goes to QueuedForAdjudication.
resp = engine.ingest_claim({
"agent_id": "my-agent",
"subject": "acme:ceo",
"predicate": "held_by",
"value": "Bob",
"provenance": {"type": "External", "kind": "UserAsserted"},
"cardinality": "Functional",
"confidence": {"value_confidence": 0.9, "valid_time_confidence": 0.0},
"criticality": "High",
"derived_from": [],
})
print(resp["disposition"]) # QueuedForAdjudication
# Review pending conflicts.
pending = engine.list_pending_adjudications(agent_id="my-agent")
for item in pending:
print(f"{item['predicate']}: {item['incumbent_value']} vs {item['challenger_value']}")
# Human decision: challenger wins.
outcome = engine.submit_adjudication({
"handle_id": item["handle_id"],
"verdict": "Affirm",
"evidence_provenance": {"type": "External", "kind": "ExternalFirstHand"},
})
print(outcome["disposition"]) # CommittedCheap (challenger won)

The mempill demo exposes a /review REPL command that drives the review loop interactively:

> /review
2 conflict(s) awaiting review:
[1] predicate: held_by
Incumbent: Alice
Challenger: Bob
Queued: 2026-06-24T10:15:00Z
[c]hallenger / [i]ncumbent / [s]kip-ask-later / [a]bstain? c
Resolved: CommittedCheap (Bob wins; Alice → Superseded)
[2] predicate: city
Incumbent: Berlin
Challenger: Munich
Queued: 2026-06-24T11:30:00Z
[c]hallenger / [i]ncumbent / [s]kip-ask-later / [a]bstain? s
Deferred — will ask again next /review
Review complete.

If you configure a default_adjudication_ttl on the engine, expired handles are swept by sweep_expired_adjudications(), which reverts them to Contested. The demo calls this on startup:

n = engine.sweep_expired_adjudications()
if n > 0:
print(f"[startup] Reverted {n} expired adjudication(s) to Contested.")

With no TTL configured (the demo default), pending items stay in the queue indefinitely until a human resolves or abstains.

The mempill LangGraph agent supports /review at the REPL level (intercepted before graph.invoke()), so the review loop does not block or interrupt the conversation graph. The graph’s retrieve_memory node surfaces a banner when pending conflicts exist:

[2 unresolved conflict(s) pending. Type /review to resolve.]

The graph never calls submit_adjudication automatically — human judgment is required. Only the REPL-level /review command drives resolution.

  • I7 — Contested first-class: abstain → Unknown → both claims stay Contested; never silently picks the incumbent.
  • I9 — Atomic commit unit: submit_adjudication acquires the per-agent write lock; the resolution is atomic.
  • I4 — Provenance immutable: HITL verdicts write External(ExternalFirstHand) provenance at resolution time; immutable thereafter.
  • I1 — Non-destruction: incumbent and challenger are never deleted. After Deny, the challenger is Superseded but remains in history.