Installation issues
Problem: Redis is not running
Problem: Redis is not running
- Runtime tests fail immediately on startup.
mirror_neuron run ...hangs without output or exits with a connection error.
PONG. If the container is missing or not responding, continue to the fix below.FixRemove any stale container and start a fresh one:docker exec mirror-neuron-redis redis-cli ping again to confirm Redis is available before retrying your command.Problem: OpenShell gateway is not reachable
Problem: OpenShell gateway is not reachable
- Transport errors or “connection reset by peer” when jobs start.
- Jobs fail before any worker code runs.
openshell statusreports the gateway as unreachable.
Problem: Stale sandboxes cause slow provisioning
Problem: Stale sandboxes cause slow provisioning
- Long provisioning delays even for small jobs.
- Repeated benchmark runs get progressively slower.
openshell sandbox listshows many old sandbox entries.
prime-worker-:Cluster issues
Problem: :nodistribution error on startup
Problem: :nodistribution error on startup
- The runtime exits at startup with a
:nodistributionerror. - Nodes cannot reach each other even when IP addresses are correct.
epmd (the Erlang port mapper daemon) is running and whether port 4369 is reachable:-
Start
epmdif it is not running:epmd -daemon -
Pin the Erlang distribution port range to avoid random port allocation:
-
Verify that your firewall allows traffic on port
4369and the distribution port you chose.
Problem: Invalid challenge reply (cookie mismatch)
Problem: Invalid challenge reply (cookie mismatch)
Problem: Port 4000 already in use (eaddrinuse)
Problem: Port 4000 already in use (eaddrinuse)
- Error log:
Running MirrorNeuron.API.Router with Bandit at http failed, port 4000 already in use - Startup fails with
** (EXIT) shutdown: failed to start child: :listener
4000 by default. If you run two nodes on the same machine, or if your Erlang distribution --bind port is also set to 4000, the second process fails to start.FixOverride the HTTP API port for the second node:4370). The two must not overlap.Problem: Node name already in use
Problem: Node name already in use
- Error:
the name mn1@... seems to be in use eaddrinuseon startup without an obvious port conflict.
Problem: Cluster forms but work only runs on one box
Problem: Cluster forms but work only runs on one box
- Both nodes appear in the cluster view, but executor activity only shows on one box.
- Jobs complete but remote box capacity is idle.
- Jobs are too small to benefit from remote distribution.
- The remote bundle failed to sync to the second node.
- A stale CLI or control node is interfering with routing.
- One node has significantly less executor pool capacity configured.
Job execution issues
Problem: Manifest validation errors
Problem: Manifest validation errors
mirror_neuron runexits early with a validation error.- Error message references a missing field, wrong type, or unknown agent type.
- Missing required top-level fields (
name,agents,edges). - Agent type is not one of the built-in primitives:
router,executor,aggregator,sensor. - An edge references an agent ID that does not exist in the
agentslist. - Payload files referenced in the manifest are missing from the
payloads/directory.
validate until it passes before submitting.Problem: Sandbox execution failure
Problem: Sandbox execution failure
- Jobs start but individual executor tasks fail with sandbox errors.
- Error logs include transport errors, connection reset, or sandbox startup failures.
- Reset the OpenShell gateway (see OpenShell gateway not reachable above).
- Clean up stale sandboxes.
- Re-submit the job.
python3 --version. Syntax errors from version mismatches are a common cross-box failure mode.Problem: Job is stuck in pending
Problem: Job is stuck in pending
- A job is submitted successfully but stays in
pendingstate indefinitely. - No executor activity appears in the monitor.
- All executor pool slots are occupied by a previous job.
- The job coordinator failed to start and was not yet rescheduled by Horde.
- Redis is unavailable, preventing the coordinator from reading job state.
Monitor issues
Problem: Monitor shows too many old or completed jobs
Problem: Monitor shows too many old or completed jobs
- The monitor view is cluttered with jobs from previous runs.
- It is hard to distinguish active jobs from historical ones.
node list and event history to confirm a job is truly complete before deleting.Problem: Monitor output contains build noise or garbage characters
Problem: Monitor output contains build noise or garbage characters
- JSON output from the monitor includes Elixir compiler messages.
- Terminal rendering is corrupted or hard to read.
mix run directly:Problem: Monitor cannot connect to the cluster
Problem: Monitor cannot connect to the cluster
- Monitor starts but shows no nodes or agents.
- All metrics read as zero despite jobs being active.