Skip to main content
Most MirrorNeuron problems fall into a small number of categories: missing or unhealthy dependencies (Redis, OpenShell), cluster networking and configuration errors, job manifest or sandbox failures, and monitor connectivity. Work through the relevant section below to identify and fix the problem you are seeing.

Installation issues

Symptoms
  • Runtime tests fail immediately on startup.
  • mirror_neuron run ... hangs without output or exits with a connection error.
DiagnosisCheck whether the Redis container is running and accepting connections:
docker ps
docker exec mirror-neuron-redis redis-cli ping
A healthy Redis responds with PONG. If the container is missing or not responding, continue to the fix below.FixRemove any stale container and start a fresh one:
docker rm -f mirror-neuron-redis 2>/dev/null || true
docker run -d --name mirror-neuron-redis -p 6379:6379 redis:7
Run docker exec mirror-neuron-redis redis-cli ping again to confirm Redis is available before retrying your command.
Symptoms
  • Transport errors or “connection reset by peer” when jobs start.
  • Jobs fail before any worker code runs.
  • openshell status reports the gateway as unreachable.
Diagnosis
openshell status
openshell sandbox list
If the gateway shows as stopped or reports errors, proceed to the fix.FixDestroy the stale gateway and start a clean one:
openshell gateway destroy --name openshell
openshell gateway start
openshell status
Confirm the gateway is running before submitting jobs again.
Symptoms
  • Long provisioning delays even for small jobs.
  • Repeated benchmark runs get progressively slower.
  • openshell sandbox list shows many old sandbox entries.
FixClean up leftover sandboxes from previous runs. The command below removes sandboxes whose names start with prime-worker-:
NO_COLOR=1 openshell sandbox list \
  | awk 'NR>1 && index($1, "prime-worker-")==1 {print $1}' \
  | xargs -I{} openshell sandbox delete {}
Adjust the prefix pattern if your workflow uses a different naming convention.
If provisioning latency is still high after cleanup, also check gateway health with openshell status. A stale gateway state can add overhead independent of sandbox count.

Cluster issues

Symptoms
  • The runtime exits at startup with a :nodistribution error.
  • Nodes cannot reach each other even when IP addresses are correct.
DiagnosisCheck whether epmd (the Erlang port mapper daemon) is running and whether port 4369 is reachable:
epmd -names
nc -vz 127.0.0.1 4369
Fix
  • Start epmd if it is not running: epmd -daemon
  • Pin the Erlang distribution port range to avoid random port allocation:
    export ERL_AFLAGS="-kernel inet_dist_listen_min 4370 inet_dist_listen_max 4370"
    export MIRROR_NEURON_DIST_PORT="4370"
    
  • Verify that your firewall allows traffic on port 4369 and the distribution port you chose.
Symptoms
  • Error log: Running MirrorNeuron.API.Router with Bandit at http failed, port 4000 already in use
  • Startup fails with ** (EXIT) shutdown: failed to start child: :listener
CauseMirrorNeuron’s HTTP API binds to port 4000 by default. If you run two nodes on the same machine, or if your Erlang distribution --bind port is also set to 4000, the second process fails to start.FixOverride the HTTP API port for the second node:
export MIRROR_NEURON_API_PORT=4001
Make sure this value is completely different from your Erlang distribution port (e.g. 4370). The two must not overlap.
Symptoms
  • Error: the name mn1@... seems to be in use
  • eaddrinuse on startup without an obvious port conflict.
FixA previous runtime process is still registered under that node name. Stop it before starting again:
./mirror_neuron node list
Identify the stale process, stop it, and retry. Avoid starting the same node twice on the same box.
Symptoms
  • Both nodes appear in the cluster view, but executor activity only shows on one box.
  • Jobs complete but remote box capacity is idle.
Possible causes
  • Jobs are too small to benefit from remote distribution.
  • The remote bundle failed to sync to the second node.
  • A stale CLI or control node is interfering with routing.
  • One node has significantly less executor pool capacity configured.
DiagnosisInspect cluster membership and live job distribution:
./mirror_neuron node list

./mirror_neuron monitor \
  --box1-ip 192.168.4.29 \
  --box2-ip 192.168.4.35 \
  --self-ip 192.168.4.29
Confirm both nodes are visible and check whether executor capacity is balanced.
Split-brain is not a concern here — Redis acts as the single arbiter for leader election and job ownership, so the partition that can reach Redis retains control.

Job execution issues

Symptoms
  • mirror_neuron run exits early with a validation error.
  • Error message references a missing field, wrong type, or unknown agent type.
DiagnosisRun the validator directly to get a detailed error message:
./mirror_neuron validate path/to/your/bundle
Common causes and fixes
  • Missing required top-level fields (name, agents, edges).
  • Agent type is not one of the built-in primitives: router, executor, aggregator, sensor.
  • An edge references an agent ID that does not exist in the agents list.
  • Payload files referenced in the manifest are missing from the payloads/ directory.
Fix each reported error, then re-run validate until it passes before submitting.
Symptoms
  • Jobs start but individual executor tasks fail with sandbox errors.
  • Error logs include transport errors, connection reset, or sandbox startup failures.
DiagnosisCheck gateway health and current sandbox state:
openshell status
openshell sandbox list
Review recent job events for the failing job:
./mirror_neuron events <job_id>
FixThe executor retries transient sandbox failures automatically with backoff. If failures persist:
  1. Reset the OpenShell gateway (see OpenShell gateway not reachable above).
  2. Clean up stale sandboxes.
  3. Re-submit the job.
If worker code succeeds on one box but fails on another, check that both machines run the same Python version: python3 --version. Syntax errors from version mismatches are a common cross-box failure mode.
Symptoms
  • A job is submitted successfully but stays in pending state indefinitely.
  • No executor activity appears in the monitor.
Possible causes
  • All executor pool slots are occupied by a previous job.
  • The job coordinator failed to start and was not yet rescheduled by Horde.
  • Redis is unavailable, preventing the coordinator from reading job state.
Diagnosis
./mirror_neuron node list
./mirror_neuron agent list <job_id>
Also verify Redis is healthy:
docker exec mirror-neuron-redis redis-cli ping
If the coordinator node is healthy and Redis is up, wait briefly for Horde to reschedule the coordinator. If it does not recover, restart the runtime on the affected node.

Monitor issues

Symptoms
  • The monitor view is cluttered with jobs from previous runs.
  • It is hard to distinguish active jobs from historical ones.
FixFilter to running jobs only:
./mirror_neuron monitor --running-only
To permanently remove old job metadata from Redis, delete the relevant records manually. Use node list and event history to confirm a job is truly complete before deleting.
Symptoms
  • JSON output from the monitor includes Elixir compiler messages.
  • Terminal rendering is corrupted or hard to read.
FixUse the checked-in wrapper instead of running mix run directly:
./mirror_neuron monitor
The wrapper starts the application in a cleaner mode that suppresses build-time output before rendering the monitor UI.
Symptoms
  • Monitor starts but shows no nodes or agents.
  • All metrics read as zero despite jobs being active.
DiagnosisConfirm the monitor is pointing at the correct node addresses:
./mirror_neuron monitor \
  --box1-ip 192.168.4.29 \
  --box2-ip 192.168.4.35 \
  --self-ip 192.168.4.29
Also run a direct node check to confirm the cluster is reachable:
./mirror_neuron node list
If nodes do not appear, work through the cluster issues section above.

Useful diagnostic commands

When you are not sure where a problem originates, run these commands to get a quick picture of cluster and job health:
./mirror_neuron node list
./mirror_neuron events <job_id>
./mirror_neuron agent list <job_id>
./mirror_neuron monitor
openshell status
openshell sandbox list
epmd -names