Concentric Neural Sphere

Design Intent: Locality Across Scale, Not Coordinates

Hierarchical spatial tasks are nested: environments contain regions, regions contain objects, objects contain parts. A grid biases locality in Cartesian coordinates; a transformer biases global mixing. The concentric sphere imposes a third bias: locality across scale bands. The outer shells carry high-detail, high-bandwidth signals; inner shells are forced to form low-bandwidth commitments.

The claim is narrow: for tasks whose causal structure is coarse-to-fine, radial compression with adjacency-dominant wiring can reduce active compute while improving generalization to larger scales.

overview-orbit

Substrate Specification: Shell Discretization Must Be Deterministic

A “sphere” is only meaningful if discretization is fixed and reproducible.

Shells: (S_0 \dots S_k), where (S_0) is peripheral input and (S_k) is the core readout.
Per-shell node placement: choose one deterministic equal-area method and lock it:
- Fibonacci sphere sampling (simple, stable across resolutions), or
- Icosahedral subdivision (structured, mesh-friendly).
Lateral neighborhoods: within shell (S_i), define neighbors via k-nearest geodesic distance (k fixed per shell, optionally scaling with shell resolution).
Topology fingerprints: every run logs the discretization method, random seed, degree distribution per shell, edge length distribution, and wormhole fraction. The topology itself becomes a comparable artifact.

This removes “it worked because of implementation details” as an escape hatch.

propagation-mechanism

Connectivity Contract: Allowed Edges, Budgeted Exceptions

The sphere is not a visual metaphor; it is a constraint that can be violated only at explicit cost.

Default edge rules

Radial adjacency (primary): edges primarily connect (S_i \leftrightarrow S_{i+1}). This is the architectural invariant.
Lateral edges (optional): within-shell edges only inside the geodesic k-NN neighborhood to prevent silent densification.
Wormholes (optional, budgeted): skip edges across non-adjacent shells are permitted only under a hard count budget and/or Pareto penalty.

If wormholes are repeatedly selected by evolution, that is not failure—it is evidence about which task classes demand long-range dependencies and where the constraint breaks.

cutaway-section

Computation: Radial Message Passing as Controlled Depth

Computation is expressed as propagation across shells. Depth is not “number of layers” but “number of radial transitions.”

Modes

Radial feedforward: one outward-to-inward sweep; cheapest and clearest baseline.
Bidirectional predictive flow (optional): inner shells send priors outward; outer shells return residuals inward. This tests iterative refinement without collapsing into unconstrained recurrence.
Sparse activation routing (optional): per-input gating activates a subset of edges; compute is measured as E_active, not merely parameter count.

Readout strategy

Core readout: decisions/actions emitted from (S_k).
Auxiliary heads: intermediate shells may predict coarse targets (global map hints, object counts, low-res occupancy) to improve credit assignment through the bottleneck. These heads are ablated; they are not allowed to become hidden shortcuts.

Optimization: NEAT-Derived Structure Search, SGD for Weights

Pure neuroevolution over both topology and weights is compute-prohibitive for the scope. The pragmatic stance is hybrid.

Topology search

Borrow NEAT’s efficiency levers (Stanley & Miikkulainen, 2002):
- Incremental complexification: start minimal (radial adjacency only) and grow.
- Speciation: prevent early extinction of structurally novel candidates.
Mutations operate on: shell sizes, lateral k, wormhole budget, directionality, gating choices, and (optionally) learned shell membership.

Weight training

Each candidate receives short-horizon SGD training under a locked schedule.
Weight inheritance is applied when mutations are small to reduce evaluation variance.
Multi-fidelity evaluation screens cheaply first (few steps/low-res) before full training.

Fitness and reporting

Two experimental regimes:
1. Hard-budget mode: reject candidates exceeding ((E_{active}\le E_{max},\ \text{hops}\le H_{max})).
2. Pareto mode: evolve a trade-off front over task performance vs compute (E_active, hop depth, latency).
A single deployment point is selected by a fixed policy (e.g., “max accuracy subject to latency < X ms”), not by post-hoc preference.

Benchmarks: Where This Bias Should Win (and Where It Should Lose)

The suite is chosen to separate structural advantage from tuning luck.

Hierarchical spatial reasoning

Multi-scale mazes: local decisions + global goal inference; hold out larger maze sizes for scale generalization.
Procedural scene-graph queries: nested entities (room→furniture→parts) with compositional questions; hold out deeper nesting than trained.
3D occupancy / SDF: coarse geometry guides fine detail; test generalization to higher resolution volumes.

Embodied control (optional)

Partial observability settings where peripheral observation is high-dimensional and action is low-dimensional; measure stability under sensor dropout.

Counter-benchmarks

Tasks intentionally lacking hierarchical spatial structure (e.g., IID classification with no multi-scale benefit). The sphere should not dominate; this guards against narrative-driven interpretation.

Primary metrics

Sample efficiency, held-out scale generalization.
Compute: E_active per sample, max-hop depth, and measured latency on a specified hardware target.
Robustness: noise/occlusion/dropout sensitivity.

Baselines + Search Controls: Prevent False Attribution

Baselines must match both capacity and budget.

Model-family baselines

CNN/U-Net style spatial models for grid-like tasks.
Transformers for global mixing.
Unconstrained GNNs for expressive graph computation.
MLPs as a minimal control.

Search-method baselines (non-negotiable)

Random search over the same shell generator parameterization (same budgets, same training).
A sparsity-first training baseline (dynamic/adaptive sparse connectivity) to test whether gains come from the radial constraint rather than sparsity alone.

The outcome must answer: “Is the sphere doing real work, or is this just sparse training plus luck?”

Ablations: The Scientific Core

Ablations are defined to make it hard to lie.

adjacency-only vs +lateral edges
wormhole budget sweep (0 → permissive)
feedforward vs bidirectional predictive flow
dense activation vs sparse routing (measure E_active)
fixed topology vs evolved topology vs random search
strict shell assignment vs learned/soft shell membership

Every run exports topology fingerprints and a compute report, so comparisons are structural, not just numerical.

Interpretability: Topology Fingerprints as the Artifact

The product is not a single checkpoint; it is a rulebook extracted from evolution.

recurrent microcircuits at shell boundaries (gating, pooling, arbitration)
bottleneck archetypes (single core vs multi-core committees)
degree distributions per shell and how they shift by task class
wormhole placement statistics (which shell pairs, which functional roles)
lesion tests: remove shells/edge classes; quantify collapse modes and compute savings

A practical deliverable is a “motif library” a builder can graft into other sparse or hierarchical systems.

Trade-offs and Failure Modes

Bottleneck brittleness: strict adjacency can block long-range dependencies; wormhole frequency quantifies the minimum long-range requirement.
Iterative refinement vs compute advantage: bidirectional flow may improve accuracy but erase latency gains; Pareto reporting prevents cherry-picking.
Generator dependence: discretization choice can silently alter inductive bias; locking and logging discretization is mandatory.
Search variance: multi-fidelity evaluation and weight inheritance reduce noise, but negative results are still valid if boundaries are sharp.

Decision-Grade Success Criteria

Any of the following is a “ship” outcome:

A Pareto regime where the sphere matches/exceeds baseline accuracy at lower E_active/latency and better scale generalization.
A negative result with clear boundaries: which task families break adjacency and the quantified wormhole requirement.
Transferable motifs that survive grafting into non-spherical sparse architectures.

Generation Prompts

Image Prompt High-contrast scientific visualization of a concentric neural sphere: 6 translucent nested shells on a matte off-white infinity background, thousands of small emissive nodes, adjacency-dominant fiber bundles linking neighboring shells, sparse faint wormhole arcs crossing multiple shells, radial vector arrows indicating outer-to-inner propagation, crisp volumetric lighting, no depth-of-field, research-diagram precision, ultra-detailed 8k render.

Video Prompt 12–15 second slow orbital camera around a translucent concentric neural sphere on a matte off-white infinity background; pulses of light propagate from the outer shell inward in discrete waves, adjacency fibers ignite sequentially and converge into a bright core decision cluster; rare wormhole flashes arc across shells as brief highlights. Clean lab lighting, crisp focus across the full object, no depth-of-field, restrained timing.

3D Model Prompt Watertight 3D model of a concentric neural sphere with 6 nested shells, each shell a thin geodesic lattice populated by small node beads; connection fibers are separate spline meshes primarily linking adjacent shells with a configurable sparse set of long wormhole fibers. Materials: translucent shell glass, emissive nodes, satin-finish fibers. Organized hierarchy, consistent scale, optimized topology for real-time rendering.