Three App Freezes, One Culprit: Don't Block Swift's Cooperative Thread Pool

Since Flick shipped, I’ve fixed three bugs that looked nothing alike: the whole app freezing during a scan (v1.3.0), analysis never finishing on iPad (v1.3.2), and progress stuck at 0 for iCloud-only photo libraries (v1.3.2). The fixes were nearly identical, because the root cause was the same every time: blocking Swift concurrency’s cooperative thread pool.

First, what this pool actually is

Every async task in Swift structured concurrency runs on a cooperative thread pool. It has two properties that make it fundamentally different from GCD:

Its width is fixed at the CPU core count. A 6-core device gets 6 threads, and the runtime will never spin up an extra one for you — that’s the whole point, it’s how thread explosion is avoided by design.
Scheduling is cooperative. Tasks only yield their thread at an await. If a task blocks synchronously (a lock, a semaphore, synchronous I/O), that thread is held hostage — the runtime won’t notice and won’t backfill.

The corollary is brutal: once “core count” tasks block simultaneously, every async task in the app — UI .tasks, network callback continuations, background work in unrelated modules — starves together. No crash, no error, near-zero CPU. Everything asynchronous just stops. That’s also why it’s so hard to diagnose: the crash logs are empty, because nothing crashed.

Round one: synchronous PhotoKit requests park all 6 threads

Symptom: after tapping “rescan,” images stopped loading on the swipe screen and the pending-delete list turned into gray placeholders.

The image-fetching code wrapped a PhotoKit request in withCheckedContinuation with isSynchronous = true. Synchronous mode was not a mistake — it was deliberate: PhotoKit’s async mode with .opportunistic can invoke the callback multiple times for one request (low-res first, then high-res), and resuming a checked continuation twice is an instant runtime crash. Synchronous mode guarantees exactly one callback, keeping the continuation safe.

What I hadn’t priced in: a synchronous request blocks the calling thread until the image comes back — including the entire iCloud download. During a scan, 4 analysis workers plus a .task for every visible thumbnail were all issuing these requests on the cooperative pool. Six threads, drained in seconds.

The fix is a bridging pattern worth memorizing: move the blocking wait onto a dedicated concurrent GCD queue (GCD can over-provision threads, so blocking it doesn’t hurt the cooperative pool), and bridge back to the async world with a continuation:

private let blockingRequestQueue = DispatchQueue(
  label: "photo.blocking", qos: .userInitiated, attributes: .concurrent)

func requestImage(...) async -> UIImage? {
  await withCheckedContinuation { cont in
    blockingRequestQueue.async {
      let opts = PHImageRequestOptions()
      opts.isSynchronous = true          // blocks a GCD thread, not the cooperative pool
      manager.requestImage(...) { image, _ in cont.resume(returning: image) }
    }
  }
}

The task on the cooperative pool goes from “blocked inside PhotoKit” to “suspended at an await” — and suspension yields the thread, which is exactly what the contract demands.

Round two: synchronous Vision compute, a textbook deadlock on real hardware

Symptom: analysis never finished on iPad. The device logs handed me the cleanest deadlock evidence I’ve ever seen: 4 “compute start”, 0 “compute done”.

VNGenerateImageFeaturePrintRequest’s perform is synchronous computation, and the first version ran it directly on task group children — cooperative threads again. Analysis concurrency was set to 4, so on an iPad with fewer cores, 4 cooperative threads dove into Vision and never came back out — while the task that consumes results and advances the pipeline had no thread left to run on. Producers hold the whole pool waiting for the consumer to make room; the consumer waits for producers to give up a thread. Circular wait established, deadlock confirmed.

The fix is isomorphic to round one: a dedicated computeQueue takes the synchronous work, and the cooperative pool only schedules and collects results. In AnalysisCoordinator.swift I left a self-deprecating comment: “Same lesson as the PhotoKit blockingRequestQueue fix.” Same trap — first time dressed as I/O, second time dressed as CPU-bound work, and I failed to recognize it either way.

Round three: iCloud download stalls, a continuation that never resumes

Symptom: users with “Optimize iPhone Storage” enabled saw analysis stuck at 0.

This time “just wait longer” couldn’t save you: with network access allowed, a synchronous request that hits a stalled iCloud download means PhotoKit’s callback may never arrive. The continuation never resumes, the thread is held forever, and the scan loop wedges as a whole. The first two rounds were chronic congestion; this one was a permanent disappearance.

The fix had a policy layer and a mechanism layer:

Policy: analysis never needed the original image — feature extraction only uses 256px. Switch to an .opportunistic async request and resume with the first usable image: full-size for local photos, the local thumbnail for offline ones, either way milliseconds. Never wait for a download.
Mechanism: .opportunistic calls back multiple times, so pair it with a ResumeOnce guard (an atomic flag; callbacks after the first are dropped), plus a hard timeout as a backstop for the pathological “callback never comes” case — on timeout, resume with nil. Better to skip one photo than take down the whole pipeline with it.

Appendix: a close cousin — a canceled task buries the new one

Around the same time there was a darkly comic bug (v1.2.0): every group cover on the grouping screen rendered black. A canceled analysis task still ran its cleanup finalize() — which set isRunning to false and runTask to nil, except those fields now belonged to the newly started task. The old task performed last rites on the new one, the new start() spawned another analysis loop, and the two loops fought over PhotoKit concurrently — the IPC storm starved every thumbnail request.

One-line fix: if !Task.isCancelled { await finalize() }. The lesson files under the same theme: in structured concurrency, a task’s lifecycle hooks must check whether it’s still the incumbent.

The rules

Never block-wait on the cooperative thread pool — synchronous APIs, locks, semaphores, implicit download waits, all of it counts.
When you must call a synchronous API, bridge with “GCD queue + continuation,” so the cooperative pool’s task is parked at an await instead of inside someone else’s library.
For APIs that may call back multiple times, put a resume-once guard in front of the continuation; for APIs that may never call back, add a hard timeout.
A canceled task must not run cleanup logic that mutates shared state.
The signature of this failure class: global async standstill + zero crash logs + near-idle CPU. See that trio, suspect a drained pool first, and go find who’s waiting synchronously on a cooperative thread.

Swift concurrency’s fixed-width thread pool is a performance feature, and it’s also a contract: a task is either running or yielding. Three incidents’ worth of tuition buys one sentence — the contract doesn’t grant exceptions for good excuses (fear of double-resume, needing synchronous semantics). Excuses get solved with bridging, not papered over with blocking.