The Life and Death of a Feature: Four Rewrites of Pier's Per-Process Network Monitor
A Pier postmortem: block buffering, the CRLF grapheme, a spin-looping nettop, a private framework that crashes on sight — and how to do the math on deleting a feature.
Pier 1.2.0 shipped a feature: per-process network speed — real-time upload/download for every process, Activity Monitor style. Three releases later (1.2.3), I ripped it out by the roots. This is the full autopsy: four rewrites, each of which was the “reasonable next step” at the time, and which strung together form a road that only ever went deeper.
Version one: nettop + Pipe, numbers that update every 30 seconds
The most naive approach: keep a resident nettop child process and parse its output. After shipping, the numbers ticked roughly every 30 seconds.
The cause is an old C standard library rule: when stdout is connected to a pipe, it’s block-buffered (typically 4KB and up), flushing only when a block fills; it’s line-buffered only when connected to a terminal. nettop was producing data every second — all of it stuck in its own process’s buffer.
The fix is an equally old Unix trick: openpty to create a pseudo-terminal pair and attach the child’s stdout to the slave end. nettop probes with isatty() — it’s a terminal, so it switches back to line buffering and emits each frame immediately.
But the pty immediately introduced a second bug, buried deeper: in pty mode, lines end with \r\n, and Swift’s String follows Unicode’s extended grapheme cluster rules — the two bytes of CRLF count as a single Character. So firstIndex(of: "\n") found no newline anywhere in the output ("\r\n" != "\n"), and the parse loop consumed 0 lines per pass. The symptom was identical to block buffering: data doesn’t move. Two bugs with the same symptom and unrelated causes, covering for each other during debugging. The fix was predicate matching:
// firstIndex(of: "\n") // CRLF is a single grapheme — never matches
buffer.firstIndex(where: { $0.isNewline }) // covers \n, \r\n, and \r
Version two: nettop eats 1.4 cores all by itself
Line buffering fixed, numbers real-time. Then Activity Monitor slapped me across the face: the resident nettop -L 0 was spin-looping — a known bug — continuously burning about 1.4 cores. A resource monitor had become the biggest resource consumer on the machine. This feature needed a new implementation or a funeral.
Version three: dlopen a private framework — exquisite, and short-lived
How is Activity Monitor both accurate and cheap? It uses the private system framework NetworkStatistics. So I took the classic wrong turn:
dlopen("/System/Library/PrivateFrameworks/NetworkStatistics.framework/NetworkStatistics")
→ dlsym: NStatManagerCreate / NStatManagerAddAllTCP / NStatManagerAddAllUDP / ...
Subscribe to all TCP/UDP sockets, track cumulative bytes per source UUID; query every 2 seconds to trigger callbacks, diff against the previous snapshot to get rates; then a layer of EWMA smoothing (rate = 0.5 × new + 0.5 × old) so bursty traffic doesn’t make the numbers jump around. As engineering, it was nearly perfect: 0 CPU when idle, and data that matched Activity Monitor digit for digit.
Worth recording honestly: every local judgment along the way held up — “the system itself uses it, so the capability exists,” “if dlopen fails we can degrade gracefully.” The one variable I never weighed: when a private framework’s behavior drifts across OS versions, I have no recourse whatsoever.
Death sentence: guaranteed crash on macOS 15
The crash reports came in: on macOS 15.0/15.1, the framework’s internal query call hits a dangling pointer at objc_retain — the framework’s own memory management bug, retaining an already-freed object, a textbook use-after-free that crashes on contact and cannot be avoided from the calling side (the crash site is inside their framework). It’s a known Apple issue, but a private framework has no API contract — nobody owes you a fix. And one layer further out: there’s no promise that future OS versions will even keep allowing dlopen/dlsym of private frameworks at all.
The options I listed, and how they scored:
- Disable the feature per OS version — leaves behind a ghost toggle where “some systems have it, some don’t,” the most expensive form of existence for support;
- Wait for Apple to fix it — mortgages your release cadence to a team that doesn’t know you exist;
- Degrade back to nettop — back to 1.4 cores, which is not a fix;
- Delete the feature — the only option that shrinks the maintenance surface.
I chose deletion, and deleted clean: ProcessNetMonitor removed entirely, along with the VPN status banner it had dragged in (which read tunnel interface traffic). The process list went back to two sort keys, CPU and memory; the only network metric left is machine-wide upload/download (which uses public APIs — zero risk).
Postmortem
The full path: buffering problem in a public tool → pty to fix buffering → the tool’s own performance bug → private API to fix performance → private API stability unfixable → delete the feature. Locally optimal at every step; globally, a sunk-cost slide. In hindsight, the moment version two revealed the nettop spin-loop was when this meeting should have happened: is this feature’s value worth a maintenance cost that will only ever go up?
Three conclusions to take away:
- Private APIs aren’t a risky shortcut — they’re debt. The interest payment is that every major OS release puts your product’s stability back on trial, and you have no bargaining power.
- The bar for keeping a feature isn’t whether you can build it — it’s the failure budget. I built it, and it was exquisite. But a $9 menu bar utility carrying “guaranteed crash on new OS” for a nice-to-have number is a losing trade no matter how you run the math.
- Feature removal should be surgical. Version-gated degradation, hiding it in advanced settings, leaving half the code around “for later” — all of it just transfers the decision cost to your future self. Everything lives on in git history; you can excavate whenever you need to.
Pier’s network metrics are now machine-wide only. Since that shipped, not one user has asked where the feature went.