Posted by u/skibidi-toaleta-2137
PSA: Claude Code has two cache bugs that can silently 10-20x your API costs — here's the root cause and workarounds
I spent the past few days reverse-engineering the Claude Code standalone binary (228MB ELF, Ghidra + MITM proxy + radare2) and found two independent bugs that cause prompt cache to break, silently inflating costs by 10-20x. Posting this so others can protect themselves. ## Bug 1: Sentinel replacement in standalone binary breaks cache when conversation discusses billing internals **Issue:** anthropics/claude-code#40524 The standalone Claude Code binary (the one you get from `claude.ai/install.sh` or `npm install -g`) contains a **native-layer string replacement** baked into Anthropic's custom Bun fork. It's injected into the Zig HTTP header builder function — the same function that builds `Content-Length`, `User-Agent`, etc. On every API request to `/v1/messages`, if the `anthropic-version` header is present, it searches the JSON request body for `cch=00000` (the billing attribution sentinel) and replaces `00000` with a 5-char hex derived from hashing the body. This happens **after** `JSON.stringify` but **before** TLS encryption — completely invisible from JavaScript. **When does this cause problems?** The replacement targets the **first** occurrence in the body. Since `messages[]` comes before `system[]` in the serialized JSON, if your conversation history contains the literal sentinel (e.g., from reading the CC bundle source, discussing billing headers, or having it in your CLAUDE.md), the sentinel in messages gets replaced instead of the one in `system[0]`. This changes your messages content every request → cache prefix broken → full cache rebuild (~$0.04-0.15 per request depending on context size). In normal usage (not discussing CC internals), only `system[0]` is affected, and since it has `cache_control: null`, it doesn't impact caching. **Workaround:** Run Claude Code via `npx @anthropic-ai/claude-code`* instead of the standalone binary. The replacement mechanism exists only in the custom Bun fork compiled into the standalone — the npm package running on standard Bun/Node has no replacement. Confirmed experimentally: same JS, same bytecode, zero replacement on npx. *- Do not blindly use that command, verify what it does (it is safe, but you should check nonetheless) ## Bug 2: `--resume` ALWAYS breaks cache (since v2.1.69) **Issue:** anthropics/claude-code#34629 Every `--resume` causes a full cache miss on the entire conversation history. Only the system prompt (~11-14k tokens) is cached; everything else is `cache_creation` from scratch. This is a ~10-20x cost increase on the resume request. **Root cause:** In v2.1.69, Anthropic introduced `deferred_tools_delta` — a new system-reminder attachment listing tools available via ToolSearch. On a **fresh session**, these attachments (deferred tools + MCP instructions + skills list, ~13KB) are injected into `messages[0]` alongside the AU$ user context. On **resume**, they're appended at the **end** of messages (`messages[N]`) while `messages[0]` contains only the AU$ context (~352B). This creates three independent cache-breaking differences: 1. `messages[0]`: 13KB (4 reminders) vs 352B (1 reminder) — completely different prefix 2. `system[0]` billing hash: changes because `cc_version` suffix is computed from chars at positions 4, 7, 20 of the first user message (which IS the system-reminder, not the actual user prompt) 3. `cache_control` breakpoint position: moves from `messages[0]` to `messages[last]` `deferred_tools_delta` does not exist in v2.1.68 (`grep -c 'deferred_tools_delta' cli.js` → 0 in 2.1.68, 5 in 2.1.69). Without it, `messages[0]` was identical on fresh and resumed sessions → cache hit. **Subsequent turns after resume cache normally** — the one-time miss is only on the first request after resume. **Workaround:** There's no external workaround for this one. Pinning to v2.1.68 works (as the original issue reporter found) but you lose 60+ versions of features. An invasive patch to the npm package's `cli.js` could theoretically reorder the attachment injection on resume, but that's fragile across updates. ## Cost impact For a large conversation (~500k tokens): - **Bug 1** (when triggered): ~155k tokens shift from `cache_read` ($0.03/MTok) to `cache_creation` ($0.30/MTok) = **~$0.04 per request**, every request - **Bug 2** (every resume): ~500k tokens as `cache_creation` = **~$0.15 one-time per resume** - Combined (discussing CC internals + resuming): up to **$0.20+ per request** ## Methodology Full details in the GitHub issues, but briefly: MITM proxy (mitmproxy addon capturing all API payloads), Ghidra reverse engineering of the standalone ELF to locate the replacement code in the Zig HTTP header builder, `Bun.hash()` to identify all header name hashes, npm package comparison across versions 1.0.0–2.1.87, and controlled experiments with fresh sessions → resume → consecutive resumes with payload diffing. PS. Co-written by claude code, obviously PPS. Claude code has special 1h TTL of cache, or at least mine has, so any request should be cached correctly. Except extra usage, it has 5 minutes TTL. PPPS. Apparently downgrading to 2.1.30 also works. Verification script: https://gitlab.com/treetank/cc-diag/-/raw/c126a7890f2ee12f76d91bfb1cc92612ae95284e/test_cache.py (please read it before executing)
More from r/ClaudeAI
You're right to push back.
Taught Claude to talk like a caveman to use 75% less tokens.
Opus tryna be TOO human
Opus 4.7 single handedly proved ijustvibecodedthis.com right.