Semantic Walkthrough Interface
A spatial interface for bounded knowledge corpora where a semantic graph is rendered as stable architecture—movement constrains retrieval scope, speech edits intent, and every claim is inspectable through explicit provenance, uncertainty, and conflict rather than persuasive prose.
Problem: Text-First AI Breaks on Topology
Text chat works for linear queries, but it degrades when the user’s real task is to maintain a stable mental model of dependencies, competing hypotheses, and time-evolving evidence. The user ends up doing private state management: remembering what depends on what, which claims are uncertain, and where citations actually support the summary.
The Semantic Walkthrough targets those cases. It is not a general-purpose “better chatbot.” It is a world where structure is primary and prose is secondary.

Core Proposition: Navigation as a Retrieval Constraint
In a Semantic Walkthrough, movement is a control surface. Traversal changes the active scope, which changes retrieval, which changes what the AI is allowed to assert.
- Rooms are bounded scopes (paper, incident, component, claim, decision record).
- Corridors are typed edges with directionality (depends-on, caused-by, supports, contradicts).
- Districts are stable partitions (themes, subsystems, research clusters).
- Floors are abstraction tiers (overview → mechanisms → evidence).
Speech sets intent (“trace the failure chain,” “compare interpretations,” “filter to post-2023 sources”), while spatial selection binds the referents (“this node,” “that corridor,” “the fractured doorway”). The system’s job is to keep reasoning coupled to inspectable state.

Substrate Contract: The World is a View, Not a Simulation
The environment is a rendering of a semantic substrate. If the substrate cannot represent it, the world must not imply it.
Minimum viable substrate
- Entities: stable IDs, alias sets, and disambiguation metadata.
- Relations: typed edges (supports, contradicts, depends-on, similar-to, authored-by, cited-by).
- Provenance pointers: machine-addressable anchors into sources (URL + byte offsets, PDF page+bounding boxes, git commit hashes + file+line ranges, database record IDs).
- Uncertainty: calibrated scores per claim/edge plus explicit “unknown” and “conflict” states.
- Versioning: snapshot semantics so the world cannot silently rewrite history.
Invariant Every visible object is traceable to a substrate artifact with provenance. If it cannot be cited, it cannot become solid geometry.

Mapping Grammar v0: From Graph to Architecture (Spec, Not Metaphor)
The main design problem is deterministic grammar: a stable mapping from semantic structure to spatial structure that preserves user memory under updates.
Input A scoped subgraph (G = (V, E)) where each node/edge includes: type, timestamp, provenance pointers, and confidence (c \in [0,1]).
Partitioning → districts
- Produce districts (D) via ontology grouping or community detection on typed edges.
- Stickiness rule: district membership is stable unless change exceeds a threshold (\tau) (avoid re-clustering thrash).
- Each district has landmarks derived from high-centrality nodes and canonical labels.
Rooms
- Room-per-scope-node: papers, incidents, components, claims, hypotheses.
- Room radius (r) scales with evidence volume and degree (bounded by caps to prevent cathedral rooms).
- Room entrances encode allowed filters (e.g., “only primary sources,” “only post-2023”).
Corridors
- Each directed relation becomes a corridor with:
- thickness (\propto c)
- traversal cost (\propto (1-c) + \lambda \cdot \text{missing-evidence-penalty})
- signage showing relation type and timestamp deltas
- Corridors cannot exist without a provenance pointer on the edge itself or on the claim it represents.
Conflicts
contradictsedges instantiate paired mirror rooms or forked corridors.- Reconciliation writes a new hypothesis node with provenance; it does not overwrite prior claims.
Stability guarantee (behavioral) For edits affecting (\le k) nodes, geometry updates must be bounded to the containing district and adjacent corridors. Global re-layout is prohibited except by an explicit user-triggered “replan” operation with a visible diff preview.

Interaction Loop: XR Spatial Plane + 2D Precision Plane
VR/AR is strong for spatial memory and shared context, weak for precision, fatigue, and long reading. The system is therefore hybrid by design.
Spatial plane (XR)
- Navigation, topology comprehension, contradiction inspection, team walkthroughs.
- Spatial pointing binds referents without name lookup.
Precision plane (2D)
- Exact search, typing, bulk edits, long-form citation reading, dense metadata.
- Pinned panel or companion desktop/tablet view, always available.
- Any 2D selection can be “promoted” into XR as a room/node with the same entity ID.
Closed loop
- Speak intent: goal + constraints (investigate, compare, locate contradictions).
- System proposes actions: candidate rooms, edges to inspect, filters to apply.
- User selects in space or in 2D: nodes, corridors, evidence spans.
- Orchestrator executes tools: retrieval, graph ops, world diffs.
- User corrects: rename/merge/split/mark-uncertain as reversible writes to the substrate.
A strong system spends most cycles manipulating structured state (scope, filters, hypotheses, citations), not generating narrative.
Trust Layer: Making Uncertainty and Conflict Unavoidable
Spatial objects feel authoritative. This interface must actively resist false solidity.
- Citations as objects: claims have evidence panels with directly inspectable source spans.
- Uncertainty fields: low confidence is volumetric fog/desaturation; the user can tighten by demanding additional sources or by downgrading the claim state to unknown.
- Conflict geometry: contradictions are fractures, mirrored rooms, or forked corridors; the UI forces explicit comparison rather than blended summaries.
- Temporal patina: recency and staleness are rendered as material aging; snapshots are selectable layers.
Related Evidence (Non-Exhaustive)
Immersive analytics has already tested graph interaction beyond 2D. Work comparing VR and 2D knowledge-graph exploration (e.g., Neo4j-based “knowledge atlas” studies such as Grapho, 2025/2026) and studies contrasting desktop and VR for collaborative sensemaking (Frontiers in Virtual Reality, 2025) suggest XR can improve spatial understanding and shared context, but can lose on precision and fatigue. This project’s bounded-scope design, hybrid precision plane, and strict latency budgets are designed around those trade-offs, not against them.
Evaluation: Proving It’s Not 3D Theater
The system only earns its existence if it wins on constrained, measurable regimes—not on novelty.
Baselines
- Text chat + retrieval + citations.
- 2D graph tools with search, filters, and provenance overlays.
Hypothesized win conditions
- Better topology memory and dependency-path accuracy.
- Faster contradiction detection and evidence inspection.
- Higher shared understanding in collaborative walkthroughs.
Likely loss conditions
- High-precision selection, dense editing, prolonged reading, and long sessions without breaks.
Pilot task families
- Literature triage: identify strongest supporting and contradicting evidence for a claim.
- Project onboarding: build a correct dependency model of a codebase/system and surface risk clusters.
- Root-cause analysis: trace failure chains and propose interventions with provenance.
Metrics
- Time-to-answer; number of reversals (“revise” events).
- Structure recall and map accuracy (post-task topology tests).
- Error rate: uncited claims accepted, contradictions missed, wrong dependency paths.
- Interaction cost: steps, voice turns, edit operations, latency distributions.
- Gaze-derived measures: revisits to uncertainty regions, fixation time on evidence panels vs summaries, scanpath entropy during contradiction tasks (as a proxy for search strategy).
Passing threshold: faster and more correct under constraints, with fewer uncited acceptances.
Technical Implementation Notes (Prototype-Grade)
A minimal build is four layers with explicit diff boundaries:
- World engine: Unity/Unreal/WebXR; deterministic layout; LOD; world updates as streaming diffs.
- Intent parser: streaming speech to structured intents (scope, filter, compare, traverse, annotate).
- Reasoning orchestrator: tool-using agent constrained by “no-solid-without-citation” and snapshot semantics; emits graph ops + proposed UI diffs.
- Semantic store: graph DB + vector index; provenance-pointer schema; versioned snapshots.
Latency budgets
- Stream partial speech and partial diffs; show speculative outlines that only become solid when citations resolve.
- Prefetch nearby subgraphs based on proximity and intent.
- Graceful degradation is not a mode switch; it is continuity: the 2D precision plane remains authoritative when XR frame budgets fail.
Failure Modes and Trade-offs
- Spatial overload: 3D amplifies complexity; enforce bounded districts, scoped rooms, and aggressive progressive disclosure.
- Layout drift: unstable mapping destroys memory; prioritize stability constraints over optimal clustering.
- False solidity: geometry implies truth; enforce provenance gates and render uncertainty as unavoidable.
- Device friction: VR fatigue and AR FOV limits are real; design for short sessions and hybrid workflows.
- Ontology debt: relation taxonomies and identity resolution become product-critical; plan for versioning and migration early.
Outcome: A Decision Space (If It Earns It)
If successful, this becomes a decision instrument: teams can walk a problem, inspect evidence, preserve competing hypotheses, and converge faster than text alone. If it fails to outperform baselines, it still yields durable artifacts: mapping grammars, hybrid interaction patterns, and evaluation protocols for when spatial interfaces help and when they harm.
Generation Prompts
Image Prompt Photorealistic VR scene of a minimalist semantic library-city interior: off-white matte ceramic corridors leading into a circular room, floating translucent nodes with subtle non-text glyph IDs (no readable paragraphs), emissive fiber-optic graph arcs overhead, wall-mounted evidence plaques with tiny highlighted source spans, volumetric fog pockets showing uncertainty, one fractured doorway indicating contradiction. Studio HDR lighting, sharp focus, wide-angle 24–28mm, no bokeh, no cyberpunk clutter, 8k.
Video Prompt 12–15 second smooth first-person glide through a bright corridor into a circular room; overhead emissive graph arcs animate as the viewer approaches a floating node pedestal; an evidence panel unfolds from the wall showing a highlighted citation span; the camera pans to a forked hallway with a fractured threshold marking contradiction and faint fog on the weaker path. Crisp studio lighting, steady motion, no motion blur.
3D Model Prompt Real-time modular kit for Unity/Unreal: circular room module, corridor segments, doorframe filter gates, node pedestals, spline-based emissive edge fibers, wall evidence plaque panels, and volumetric fog meshes for uncertainty. Materials: matte ceramic architecture, glass-like UI panels, emissive fibers. Clean topology, consistent scale, LODs, lightmap-ready UV2, performant draw calls, neutral pivot placement.
Constraints & Non-Goals
- —The 3D world must be structurally coupled to a semantic substrate (entity IDs, typed relations, retrieval scopes, citations); no decorative environments.
- —Navigation must remain cognitively bounded via scoped rooms, landmarks, and reversible actions; no infinite labyrinths or open-world sprawl.
- —Trust is first-class: uncertainty, provenance, and conflicts are always inspectable in-world and never collapsed into a single persuasive narrative by default.
- —Latency budgets are enforced for speech→reasoning→world updates; a hybrid precision plane (2D) must remain usable at all times, not only as a failure fallback.
Feasibility Gradient
A prototype is implementable with current components (streaming speech I/O, an LMM/LLM orchestrator, a graph store + vector retrieval, and a real-time engine via Unity/Unreal/WebXR), but the hard risks are semantic anchoring (stable concept identity, disambiguation, and versioning), interaction design that measurably outperforms 2D chat/graph tools on specific tasks, and end-to-end latency on consumer devices; feasibility is highest on bounded corpora (single organization or domain) with strict scope control and explicit provenance pointers (e.g., PDF coordinates, git commit hashes), while open-world knowledge remains research-grade due to grounding drift, safety constraints, and unreliable citations.
Next Actions
- Specify the substrate contract v0: entity identity rules, relation taxonomy, provenance pointer formats (URL+byte offsets/PDF coords/git hashes), uncertainty calibration, and snapshot semantics.
- Implement mapping grammar v0 as code: deterministic partitioning, room/corridor instantiation, and explicit stability bounds under local edits (insert/delete/merge/split).
- Build the hybrid loop: XR spatial plane + pinned 2D precision plane with shared selection state; add reversible graph edits and instrumentation (gaze/path/intent/latency).
- Run a crossover evaluation against text chat (RAG+citation) and 2D graph tools on three tasks; include topology-memory tests and gaze-derived measures tied to uncertainty inspection.
Interactive 3D Model
Restricted Layer
Restricted materials would include the full mapping grammar spec and implementation (parameters, stability proofs-of-behavior, layout regression tests), agent policies for disambiguation and navigation planning, latency engineering playbooks (streaming diffs, caching, prefetch, prediction), the evaluation harness with instrumentation and scoring rubrics (including gaze metrics), and a commercialization decision matrix (device targets, privacy posture, and enterprise knowledge-base integrations).
Request accessLast updated: February 24, 2026