The Docker Container Is Running, the Port Is Dead: A Complete Walkthrough of a Silent Host-Bind Failure

Published: 2026-06-13 · 阅读量 --

TL;DR:

docker ps says everything is fine. docker inspect’s HostConfig.PortBindings clearly says 203.0.113.14:3001:3000. But on the host, no docker-proxy is listening, the iptables nat/DOCKER chain has no DNAT rule, and NetworkSettings.Networks and Ports are both empty {}. This “container alive, port dead” state is what happens when libnetwork silently rolls back the endpoint creation because the target interface wasn’t up at attach time — and Docker doesn’t bother to tell you that the port publish never actually happened. The fix is a single 30-second command: docker network connect <net> <ctr>.

The Failure Scene: docker ps looks fine, but the port is unreachable

Figure 1: A textbook “half-alive” state — the container process is running, HostConfig still has the port binding, but no process on the host is listening on 3001.

1. Background: A Seemingly Impossible Outage

I run a small AI gateway on a public-facing server, with a stack of services managed by Docker Compose: new-api, dockhand, headroom, and a few others. Their compose.yaml looks like this:

ports:
  - "203.0.113.14:3001:3000"   # new-api
  - "203.0.113.14:3003:3000"   # dockhand
  - "203.0.113.14:3005:3000"   # headroom

That is, every container’s host port is bound to a fixed secondary IP on a side NIC — 203.0.113.14 is a secondary address configured on top of an auxiliary interface, and every external path (firewall whitelist, DNS, peer routing) keys off this IP.

On an unremarkable afternoon, I tried to reach http://203.0.113.14:3001/ from another node on the same network. I expected the new-api login page. I got:

curl: (7) Failed to connect to 203.0.113.14 port 3001
      after 0 ms: Couldn't connect to server

Not just 3001. I tried 3003 and 3005 too:

nc -z -v 203.0.113.14 3001   → Connection refused
nc -z -v 203.0.113.14 3003   → Connection refused
nc -z -v 203.0.113.14 3005   → Connection refused

Three ports were dead at the same time. This didn’t look like one container crashing on its own — it looked like a systemic problem on the host side.

But the weird part was docker ps:

NAME       STATUS                      PORTS
new-api    Up 4 hours                  ─
dockhand   Up 4 hours (healthy)        ─
headroom   Up 4 hours (healthy)        ─

The PORTS column is empty — no port mappings shown at all. Up 4 hours means they weren’t just freshly started. The containers were alive; they just couldn’t be reached from outside.

2. Investigation: Digging the “Invisible” State Out of the Host

Containers aren’t dead, ports aren’t mapped, and no host process is listening — this kind of “silent failure” is the worst category to debug. I dug the host state out in this order.

2.1 First, Curl Inside the Container to Rule Out a Dead Process

docker exec new-api curl -sS -m 5 -o /dev/null \
  -w 'HTTP %{http_code}\n' http://127.0.0.1:3000/
# HTTP 200

Port 3000 inside the container is fine. The process is fine; the suspicion falls entirely on the host side.

2.2 Is Anything Listening on the Host?

ss -tlnp | grep -E '3001|3003|3005'
# (empty)

Nothing is listening. This is a critical signal — a normal Docker port mapping starts a child process called docker-proxy that acts as a userland proxy, and you should always see it on the host. It’s not there. The proxy never started.

2.3 Are There Any NAT Rules in iptables?

iptables -t nat -L DOCKER -n -v
# Chain DOCKER (2 references)
#  pkts bytes target prot opt source destination
#  (no rules)

The DOCKER chain in the nat table is where Docker inserts DNAT rules for port mapping. It’s also empty.

Two key locations empty at the same time tell us: the port-mapping chain was skipped somewhere along the way.

2.4 Look Back at the “Two Fields” in `docker inspect`

docker inspect new-api --format '{{json .NetworkSettings}}'
# {
#   "SandboxID": "",
#   "SandboxKey": "",
#   "Ports": {},
#   "Networks": {},
#   "NetworkID": "",
#   "EndpointID": "",
#   "MacAddress": "",
#   "IPAddress": ""
# }

— Networks: {} and Ports: {} both empty.

But on the other side:

docker inspect new-api --format '{{json .HostConfig}}' | head -c 400
# {"Binds":["/docker/newapi_data/data:/data:rw"],
#  "NetworkMode":"newapi_data_default",
#  "PortBindings":{"3000/tcp":[
#     {"HostIp":"203.0.113.14","HostPort":"3001"}]},
#  "RestartPolicy":{"Name":"always",...}}

PortBindings is still there — “I want to map the container’s 3000 to 203.0.113.14:3001.” That intent is written clearly.

NetworkMode is still there — “I want to attach to the newapi_data_default bridge.”

HostConfig remembers everything, but NetworkSettings has already “forgotten.” This is the classic “intent vs. reality” mismatch — Docker wrote what it should do in HostConfig, but what it actually did is blank in NetworkSettings.

2.5 Did the Upstream Bridge Even Know About the Container?

docker network inspect newapi_data_default --format '{{json .Containers}}'
# {}

On the newapi_data_default bridge, Containers: {} — the bridge thinks no container is attached to it. But the container itself is running, can curl itself, and hasn’t restarted.

This means the attach action between the container and the bridge it declared never actually took effect — and Docker never explicitly told the user.

The Half-Alive Map: The Gap Between HostConfig and NetworkSettings

Figure 2: HostConfig on the left still has the full port mapping and network name. NetworkSettings on the right is empty {}. Intent is preserved, reality is lost.

2.6 Sweep the Other Containers — This Isn’t Isolated

for c in $(docker ps --format '{{.Names}}'); do
  nets=$(docker inspect $c --format '{{len .NetworkSettings.Networks}}')
  ports=$(docker inspect $c --format '{{len .NetworkSettings.Ports}}')
  printf '%-25s | networks=%s | ports=%s\n' "$c" "$nets" "$ports"
done

# dockhand   | networks=0 | ports=0
# new-api    | networks=0 | ports=0
# easytier   | networks=1 (host) | ports=0   ← this one is fine, it uses host network
# headroom   | networks=0 | ports=0

Three non-host-network containers all show networks=0, ports=0 simultaneously. The only thing they have in common is that each one’s HostConfig.NetworkMode points to a different custom bridge.

At this point the root-cause direction is locked in: the bridge network these containers were supposed to attach to never actually got the attach action to succeed.

Five-Minute Diagnostic Checklist: Six Steps from Symptom to Root Cause

Figure 4: The six steps above compressed into a copy-paste checklist. Next time you hit a “container alive, port dead” mystery, 30 seconds is enough to localize.

3. Root Cause: A Silently Rolled-Back Endpoint Creation

Two questions left to answer:

Why is the state inconsistent? (HostConfig has it, NetworkSettings doesn’t)
Why did it hit three containers at the same time?

3.1 The Complete Publish Chain — Which Link Is Missing

For Docker to “expose” a container’s port, this chain has to run end-to-end:

The Full Publish Chain and the Broken Links

Figure 3: The full path from a VPN peer to new-api 0.0.0.0:3000. The two ✕ marks are the two links that broke in this incident.

Roughly:

Inside the container: the new-api process listens on 0.0.0.0:3000 (fine);
libnetwork: create the network sandbox, attach the container to the newapi_data_default bridge, assign it an internal IP like 172.20.0.2;
Port mapping: for each HostIp:HostPort, call StartProxy to launch the docker-proxy child process;
iptables: insert a DNAT 203.0.113.14:3001 → 172.20.0.2:3000 rule in the DOCKER chain of the nat table;
Report-back: populate NetworkSettings.Ports and NetworkSettings.Networks.

Step 5 is the “commit” step. The empty Ports: {} and Networks: {} mean: from step 1 to step 4, at least one of them was rolled back during sandbox creation. When libnetwork rolls back, it clears NetworkSettings.Networks and NetworkSettings.Ports, but it doesn’t touch HostConfig.PortBindings, which is why you see the state you see.

3.2 This Is a Long-Recognized Class of Bug

This isn’t speculation. Moby’s issue tracker has a stack of tickets with almost-identical titles:

moby/moby#9818 “Container port not expose; neither iptables rules added nor userland proxy started” — a 2014 issue whose title is exactly this incident’s symptoms.
moby/moby#44137 “docker network connect removes/resets dynamically published/exposed ports” — explicitly states that under certain conditions, docker network connect can “reset” port mappings.
moby/moby#52480 writes it directly: “the conflicting port goes silently unbound… HostConfig.PortBindings is preserved but never applied.” — intent preserved, reality abandoned. A textbook case.

Why didn’t docker-proxy start? Why weren’t iptables rules inserted? Why is NetworkSettings empty? It’s all the local rollback libnetwork does when sandbox creation fails.

3.3 The Common Cause: “What” Came Up Before Docker?

The key question: why did three non-host-network containers break at the same time? Because at the moment Docker tried to attach them, the interface they were supposed to bind to didn’t exist yet.

203.0.113.14 on my box is a secondary address on a side NIC — it’s not a NIC’s primary IP, but a ip addr add configuration on top of it. The NIC itself does not exist at boot:

It’s created by a userland TUN service that runs after the system is up;
Once the TUN service is up, the TUN interface comes up and 203.0.113.14 gets attached;
Between these two events there’s an unavoidable race window — Docker’s daemon starts earlier than the TUN.

When Docker starts the container and tries to attach the bridge network, the target HostIp (i.e. 203.0.113.14) doesn’t exist on the host at that moment. In daemon/libnetwork/portmapper/proxy_linux.go, Moby’s StartProxy() does not verify that HostIP is bound to a host interface — it just passes the address straight to the docker-proxy child process as the -host-ip argument.

// proxy_linux.go (excerpt)
cmd := reexec.Command("docker-proxy",
    "-host-ip", p.Binding.HostIP,
    "-host-port", p.Binding.HostPort,
    "-container-ip", p.Binding.IP,
    "-container-port", strings.ToLower(p.Port),
)

Whether the proxy fails to start or iptables’ setChildHostIP decides the binding isn’t usable, libnetwork just rolls back the endpoint creation when it gets the error. The rollback clears the NetworkSettings fields for this attempt, but it doesn’t touch HostConfig, and it doesn’t report to the user that the port mapping never took effect.

The host process doesn’t exit, the container process doesn’t exit, the daemon keeps running — but the port never actually gets exposed from that moment on. Nobody raises an error.

The same class of problem appears in WireGuard, Tailscale, and eBPF-based VPN scenarios where the userland TUN tunnel comes up late. In moby/moby#39559 there’s literally a comment: “I’m moving the whole server into Docker with compose. Container A sets up WireGuard (using --net=host), and container B runs a DNS server on the IP that container A configured — but the daemon starts before WireGuard, so the port never binds.”

3.4 The One-Sentence Root Cause

When Docker attaches a container to a user-defined bridge network, if the configured HostIp is not yet bound to any host interface at that exact moment, libnetwork silently rolls back the sandbox creation — docker-proxy doesn’t start, no iptables DNAT rule gets inserted, NetworkSettings.Networks/Ports get cleared, but HostConfig.PortBindings still says “I want this.” The container process is running, but the port is invisible to the outside world.

4. The Fix: One Command Brings the “Half-Alive” Back

Once you know the root cause, the fix is two steps.

4.1 The Stopgap: `docker network connect`

The most direct command — re-runs libnetwork’s attach flow, this time with the TUN already ready:

docker network connect newapi_data_default new-api
docker network connect dockhand_data_default dockhand
docker network connect headroom_default headroom

For each one you run, NetworkSettings refills from {} to the full structure, the docker-proxy child process gets spawned, and the iptables DOCKER chain gains the matching DNAT rule.

Side note: docker network connect is an official command. The docs say: “You can connect a container to one or more networks. The networks need not be the same type.” (docs.docker.com) — that’s literally the action we need to redo.

After the fix:

ss -tlnp | grep 3001
# LISTEN 0  4096  203.0.113.14:3001  docker-proxy

curl -sS -m 5 -o /dev/null \
  -w 'HTTP %{http_code}\n' http://203.0.113.14:3001/
# HTTP 200

It’s back.

4.2 The Durable Fix: Make Docker Wait for the TUN

The one-liner above is firefighting. It will recur on the next reboot or the next TUN reconnect. To keep the host out of this state, fix the boot order.

Option A: systemd dependency (the cleanest)

Add an After= clause to docker.service to make systemd start Docker strictly after the TUN service:

[Unit]
After=network-online.target tun-up.service
Wants=network-online.target

network-online.target alone won’t catch “userland TUN still handshaking” — pair it with a small wait-online unit that polls ip -4 addr show.

Option B: Drop HostIp from compose, bind 0.0.0.0 (the most aggressive)

Change:

ports:
  - "203.0.113.14:3001:3000"

to:

ports:
  - "3001:3000"

Docker doesn’t pick an interface — it tries to bind on all available addresses. The cost is that you lose the “only this IP can reach me” isolation — you have to enforce that in the host firewall instead.

Option C: A boot-time probe + reconnect (the most robust belt-and-suspenders)

A small script that waits for the interface, then reconnects every container that lost its network:

#!/usr/bin/env bash
set -euo pipefail
# wait until 203.0.113.14 shows up
until ip -4 addr show | grep -q '203.0.113.14'; do
  sleep 1
done
# reconnect every container that's still in the half-alive state
for c in new-api dockhand headroom; do
  net=$(docker inspect "$c" --format '{{.HostConfig.NetworkMode}}')
  docker network connect "$net" "$c" || true
done

Hook it into systemd with After=docker.service.

Stopgap vs Durable Fix: Reconnect vs Boot-Order Rework

Figure 4: The left column (“stopgap”) is the 30-second docker network connect. The right column (“durable”) is making sure the interface is always ready before Docker tries to attach. They aren’t mutually exclusive — combine them for the best result.

4.3 Before vs. After (the Same `docker inspect` Command)

Before vs After: docker inspect and ss side-by-side

Figure 5: Before the fix, NetworkSettings is full of empty fields and ss shows no docker-proxy. After a single docker network connect, both fields fill up, the proxy appears, and curl returns 200.

5. Q&A

Q1. How do I tell apart “the container process is dead” from “the port isn’t published” in one glance?

A: Hitting <container-name>:<port> from the host doesn’t work — Docker networks don’t see the host’s network. The cleanest one-liner is to curl from inside the container:

docker exec <ctr> curl -sf 127.0.0.1:<port> && echo OK || echo DEAD

OK → the process is fine, suspect the host side (continue with this post);
DEAD → the process is gone, look at docker logs <ctr> instead — this isn’t a port-mapping problem.

Q2. Will `docker restart` Fix It?

A: Sometimes, but not reliably. In my testing, if the host IP is already up (the system has been running for a while), docker restart re-runs the sandbox creation and the port usually comes back. But if the host IP isn’t up yet, restart won’t make Docker “wait” — it goes through the same failure path and lands in the same half-alive state. So restart is a coin-flip; docker network connect is targeted treatment.

Q3. Why Doesn’t `docker port <ctr>` Show Anything?

A: docker port reads the NetworkSettings.Ports field. That field is empty in this failure mode — docker port showing empty is part of the symptom, not a diagnostic. To see the intent, use docker inspect <ctr> --format '{{json .HostConfig.PortBindings}}'.

Q4. Do I Have to Reconnect Each Container One by One?

A: Yes — each container needs its own docker network connect, there’s no batch subcommand. In practice three commands take 30 seconds. To save a few keystrokes, a for loop is enough:

for c in new-api dockhand headroom; do
  net=$(docker inspect "$c" --format '{{.HostConfig.NetworkMode}}')
  docker network connect "$net" "$c" || true
done

Q5. Is `network-online.target` Really Enough for the Durable Fix?

A: The default network-online.target only fires when systemd-networkd or NetworkManager thinks the main network is up. It doesn’t observe userland TUN/WireGuard interfaces on its own. In production, write a 1-2-line systemd unit that does ip -4 addr show | grep <vpn-ip> in a loop, and only start docker.service after that succeeds. That’s far more reliable than trusting the target.

Q6. Are There Other Failure Modes That Land in This Same “Half-Alive” State?

A: Yes — but they all share the same entry point: “the host IP was unreachable at the moment of attach”. A few more:

Floating IP / elastic IP drifted, and Docker didn’t notice;
Bonded NIC switched master, and the old master’s IP got released;
DHCP renewal failed, IP was temporarily reclaimed and re-issued;
network_mode: host combined with -p — older Docker versions handled this inconsistently.

The diagnostic logic is identical: check HostConfig for “I want to do this”, check NetworkSettings for “I actually did this”. Mismatch = a rollback trace from libnetwork.

Q7. Did Docker 28+ Fix This Bug?

A: As of writing, Moby’s issue tracker has #51758 “PortBindings shows binding but NetworkSettings.Ports is empty” still being filed, with the matching fix in PR #52480 waiting to land. This is a long-standing corner case — understanding it beats waiting for it to be fixed.

6. Summary

The investigation path of this incident boils down to three things:

Look at the two fields of docker inspect separately — HostConfig is “what I intend to do”, NetworkSettings is “what I actually did”. A mismatch = half-alive.
Use ss and iptables -t nat -L DOCKER to verify in reverse — no docker-proxy process + no DNAT rule = 100% matches this scenario.
docker network connect is the minimum stopgap action. The durable fix is changing the boot order so Docker runs strictly after the userland TUN.

Burn ss -tlnp and iptables -t nat -L DOCKER into muscle memory. Next time you hit a “container alive, port dead” mystery, 30 seconds to localize, three commands to recover.

References

Moby source · daemon/libnetwork/portmapper/proxy_linux.go — StartProxy passes HostIP straight to docker-proxy as -host-ip
Moby source · daemon/libnetwork/portmappers/nat/mapper_linux.go — setChildHostIP and the binding decision
Moby source · daemon/libnetwork/portmapperapi/api.go — PortBinding / StopProxy lifecycle
moby/moby #9818 — Container port not expose; neither iptables rules added nor userland proxy started — same-symptom ticket from 2014
moby/moby #39559 — Container does not start (–restart always) at boot if port bind fails — closest known ticket to the TUN/WireGuard case
moby/moby #44137 — docker network connect removes/resets dynamically published/exposed ports — port mapping reset under certain conditions
moby/moby #51758 — PortBindings shows binding but NetworkSettings.Ports is empty — same class of issue still being filed in 2025
moby/moby #52480 — connectToNetwork: keep Networks on failure — the matching fix PR
Docker Docs — Networking overview (engine/network) — official description of -p and firewall rules
Docker Docs — Docker with iptables (engine/network/firewall-iptables) — authoritative reference for the iptables DOCKER chain
Docker Docs — Port publishing and mapping — details on the HostIP form of -p
Docker Docs — docker network connect reference — the official “attach a container to a network” command
Docker Blog — Docker Engine v28: Hardening Container Networking by Default — context for behavior changes in v28
systemd docs — systemd.unit(5), After=/Wants= relationships