Thou shall only ping once

I’ve been working on a cross-platform VMM library. Part of that involves a userspace NAT implementation that lets guest VMs access the network without requiring root privileges or TUN/TAP devices. Running the network stack in userspace also lets us enforce network policies on the guest, restricting which hosts and ports it can access.

We have integration tests that boot minimal Linux VMs and run commands via the console. One of those tests pings the gateway to verify networking works. It was using ping -c 1 with a FIXME: “Using -c 1 because -c 3 or higher causes the test to hang.” I decided to figure out why.

Symptoms#

ping -c 1 10.0.2.2 works fine. The reply comes back, ping exits, test passes.

ping -c 2 10.0.2.2 sends the first ping, receives the reply, then… nothing. The VM just sits there. No second ping is ever sent. No timeout. The process appears frozen.

This affected both pings to the gateway (handled by smoltcp) and pings to external hosts (routed through ICMP NAT). Same behavior for both.

Down the wrong path#

My first instinct was that something was wrong with our networking code. I added debug logging everywhere. ICMP echo replies were being received and delivered to the guest. The network stack was doing its job.

But after that first reply, the guest simply stopped sending packets.

Diving into busybox#

I was using Claude Code to help debug this. At some point I asked it to investigate how busybox’s ping works. It spawned a subagent to research the source code and came back with a report.

The key finding: busybox ping uses alarm() to schedule subsequent pings. After receiving a reply, it calls alarm(1) to fire a SIGALRM in one second, then blocks on recvfrom(). When the signal fires, recvfrom() returns with EINTR and ping sends the next packet.

If alarm() isn’t working, ping blocks forever.

Testing our hypothesis#

To confirm, I ran this inside the VM:

ping -c 2 10.0.2.2 &
sleep 2
kill -ALRM $!

Both pings completed. The first ping goes out immediately, then ping blocks on recvfrom() waiting for either a reply or a signal. The manual SIGALRM interrupts the syscall, triggering the second ping. Signal delivery worked fine, but alarm() wasn’t scheduling it.

Root cause#

I asked Claude Code whether tinyconfig would support alarm(). After investigating, it found we were missing timer support.

Our test VMs use minimal Linux kernels built from tinyconfig. Turns out it doesn’t include timers by default:

TICK_ONESHOT = true;
HIGH_RES_TIMERS = true;
POSIX_TIMERS = true;
ARM_ARCH_TIMER = true;  # ARM64 only

Without these, alarm() accepts the syscall but never fires the signal. Four lines of config, and ping -c 3 works.

Lessons learned#

AI is good at the tedious research you’d skip.

The breakthrough came from a subagent reading busybox’s ping implementation. I wouldn’t have done that manually. I’d have kept staring at my own code. Having Claude Code go off, study the source, and come back with “it uses alarm() to schedule pings” was the turning point.

Fresh sessions can act as a second pair of eyes.

I also asked a separate Claude Code instance for ideas. Without our debugging context, it approached the problem differently and suggested things like PID 1 signal semantics and SA_RESTART behavior. Sometimes missing context is a feature.

Three hours of debugging. Four lines of config.