Skip to content

Data race: process-global xoshiro_s PRNG and private_heap_* counters mutated outside SQLCIPHER_MUTEX_MEM — heap corruption under concurrent keyed-connection use #598

@josefguenther

Description

@josefguenther

Summary

SQLCipher 4.7+ routes all cipher-context and key-buffer allocations through a process-global "private heap," and scrubs freed memory with a process-global xoshiro256++ PRNG. The private-heap free-list traversal is correctly guarded by SQLCIPHER_MUTEX_MEM, but two pieces of shared mutable state that the same code touches are accessed outside that mutex:

  1. xoshiro_s[4] (the scrub PRNG state) is mutated by xoshiro_next() with no synchronization.
  2. The private_heap_* statistics counters (private_heap_used, private_heap_alloc, private_heap_allocs, private_heap_hwm, private_heap_overflow, private_heap_overflows) are updated after sqlite3_mutex_leave(SQLCIPHER_MUTEX_MEM) in both sqlcipher_malloc and sqlcipher_free.

Because every keyed connection shares this single global heap/PRNG, any two threads that concurrently allocate or free a codec context race these globals. The simplest trigger is concurrent open/close of keyed connections, but the same sqlcipher_malloc/sqlcipher_free traffic also fires on the deferred read- and write-context key derivation (sqlcipher_codec_key_derivesqlcipher_cipher_ctx_copy, on first use of each context) and on ATTACH ... KEY — so a steady-state workload that mixes opens, attaches, and first reads/writes across threads triggers it too. In a build compiled with SQLITE_THREADSAFE=1, this manifests as heap corruption and intermittent SIGSEGV/SIGBUS inside sqlcipher_malloc / sqlcipher_free / sqlcipher_shield / sqlcipher_page_cipher, typically at the next allocation, free, or page-codec operation.

This is latent for typical usage (one keyed connection, low thread concurrency) but reliably triggered by workloads that open/close/key many keyed connections across a thread pool.

Affected versions / environment

  • SQLCipher: the racy machinery (private heap + sqlcipher_shield + freed-memory scrub) was introduced in the 4.7.0 memory redesign — it is absent in 4.6.1 (the last pre-4.7 release) and present, unchanged, through the current release 4.16.0 (verified by diffing src/sqlcipher.c at the v4.14.0 / v4.15.0 / v4.16.0 tags — byte-for-byte identical; the 4.16.0 "allocation issue" changelog entry was an unrelated LibTomCrypt stack-allocation + logging change). Reproduced and TSan-confirmed on 4.14.0 community (CIPHER_VERSION_NUMBER 4.14.0, SQLite base 3.51.3), the version the Rust libsqlite3-sys 0.38 binding vendors.
  • Crypto provider: OpenSSL 3.x (SQLCIPHER_CRYPTO_OPENSSL). Provider-independent — the racy code is in the core memory subsystem, not the provider.
  • Threading: SQLITE_THREADSAFE=1 (static mutexes real). Connections opened with the default SQLITE_OPEN_NOMUTEX.
  • Build: bundled/amalgamation compile (observed via Rust libsqlite3-sys 0.38 / rusqlite 0.40, but this is pure C — not Rust-specific).
  • Platform observed: macOS arm64. The race is platform-independent; arm64 pointer authentication just makes the corrupted-pointer dereference fault promptly (KERN_INVALID_ADDRESS ... (possible pointer authentication failure)).

The racy code (src/sqlcipher.c; line numbers at the current release v4.16.0, unchanged since 4.7.0)

  • xoshiro_sstatic volatile uint64_t xoshiro_s[4] (decl line 205); xoshiro_next() (lines 211–224) mutates it with no lock. It is called from xoshiro_randomness(), which runs on every freed-memory scrub:
    • inside sqlcipher_free (under the MEM mutex), and
    • inside the overflow fallback sqlcipher_internal_free (scrub at line 881), which runs after the MEM mutex is released, and
    • during heap/shield init.
      Concurrent calls from different threads tear xoshiro_s.
  • private_heap_* counters — in sqlcipher_malloc the free-list mutation is inside sqlite3_mutex_enter/leave(SQLCIPHER_MUTEX_MEM), but the private_heap_overflow/overflows/used/hwm/alloc/allocs updates happen at lines 950–961, after sqlite3_mutex_leave at line 944. Same in sqlcipher_free (private_heap_used -= … at line 1023, after the mutex leave at line 1013). An unsynchronized read-modify-write of shared globals on every alloc and free.

(In the amalgamated sqlite3.c that language bindings build from, these live around sqlite3.c:109625110400. There is no fixed upstream version to upgrade to — the machinery is identical at the current v4.16.0.)

This is one instance of a broader pattern: volatile treated as thread-safety

A source audit of src/sqlcipher.c (the offsets below are from the amalgamated sqlite3.c) found this is not an isolated slip but a repeated assumption: ~20 process-global mutables are declared volatile and accessed without a guarding mutex. volatile in C provides neither atomicity nor a happens-before edge, so each is a data race when touched from multiple threads. By severity:

  • Hot path (crash-class) — this issue: xoshiro_s + the private_heap_* counters (~109625, 109724–109731), touched on every sqlcipher_malloc/sqlcipher_free.
  • Lifecycle / teardown (crash/UAF-class): sqlcipher_log_file (FILE*, ~109697) is fclose'd and reassigned in sqlcipher_set_log with no mutex while sqlcipher_log reads it and calls fprintf; default_provider (~109690) and sqlcipher_shield_mask (~109704) are freed + NULLed in sqlcipher_extra_shutdown (~110022/110039) with no provider mutex while codec_ctx_init / sqlcipher_shield read them on other threads.
  • Config (torn/stale-value): default_flags, hmac_salt_mask, default_kdf_iter, default_page_size, default_plaintext_header_size, default_hmac_algorithm, default_kdf_algorithm (~109679–109685) and sqlcipher_mem_security_on/_executed (~109686/109687) — written by cipher_default_* / cipher_memory_security PRAGMAs under only the issuing connection's db->mutex, read by other connections' codec_ctx_init / the wrapped allocator with no shared lock.

The private-heap/PRNG pair surfaces first because it sits on the per-page/lifecycle hot path; the others are the same defect on colder paths (logging/teardown/config).

Reproduction

Minimal C-level repro (no application code needed):

  1. Open N keyed connections (sqlite3_open + PRAGMA key = '...'), N ≥ ~8.
  2. From a thread pool (≥ 8 threads), in a tight loop, concurrently open and close keyed connections (and/or run trivial statements). Distinct DB files per thread is sufficient — a shared DB file is not required, which confirms the race is in the process-global heap/PRNG, not in any per-database state.
  3. Build with -fsanitize=thread (ThreadSanitizer), SQLITE_THREADSAFE=1.

Within seconds, ThreadSanitizer reports data races. Without TSan, a long enough run intermittently SIGSEGVs/SIGBUSes inside sqlcipher_*.

We built the bundled C with -fsanitize=thread and confirmed:

  • ~1674 races in a short run, the large majority naming xoshiro_s, the remainder naming the private_heap_* counters.
  • Race stacks root at sqlite3_close (close path → sqlcipher_codec_ctx_freesqlcipher_free) and sqlite3_key_v2 (open path → sqlcipherCodecAttachsqlcipher_codec_ctx_initsqlcipher_malloc).

Representative real (non-TSan) faulting stacks observed:

# open path
sqlcipher_malloc
sqlcipher_codec_ctx_init
sqlcipherCodecAttach
sqlite3_key_v2
sqlite3Pragma
... sqlite3_prepare_v3

# close path
sqlcipher_free
sqlcipher_codec_ctx_free
sqlite3FreeCodecArg
sqlite3PagerClose
sqlite3BtreeClose
... sqlite3_close

# steady-state page codec (collateral, after heap is corrupted)
sqlcipher_shield
sqlcipher_page_hmac / sqlcipher_page_cipher
sqlite3Codec
... sqlite3_step

Faults present as SIGSEGV/SIGBUS at garbage / non-canonical addresses (the classic "freed/overwritten then dereferenced" signature).

Notes

  • SQLite-level serialization does not cover it. Setting SQLITE_OPEN_FULLMUTEX and/or sqlite3_config(SQLITE_CONFIG_SERIALIZED) does not prevent the race — these globals live outside SQLite's own mutex framework.
  • PRAGMA cipher_memory_security is unrelated (and can only be enabled, not disabled); the racy private-heap/scrub path is always-on in the open-source build and is not gated by that pragma.

Suggested fix

Small and shouldn't affect the hot path (the relevant mutex is already taken microseconds away):

  1. private_heap_* counters: move the stat bookkeeping to before sqlite3_mutex_leave(SQLCIPHER_MUTEX_MEM) in sqlcipher_malloc/sqlcipher_free (or make the counters _Atomic).
  2. xoshiro_s: serialize all xoshiro_randomness() calls under SQLCIPHER_MUTEX_MEM (including the sqlcipher_internal_free fallback, which currently scrubs outside it), or make the PRNG state thread-local, or use atomics. Thread-local is the most contention-friendly.
  3. More broadly: apply the same to the whole volatile-global set above — guard each global's reads + writes with the appropriate existing SQLCIPHER_MUTEX_* (the provider mutex already exists for default_provider; a config mutex would cover the default_*/log set), or make the scalars _Atomic. volatile is not a substitute for either.

Impact

This is a memory-safety data race in a security library, reachable from ordinary multi-threaded use of keyed connections (any app that opens/closes/keys several encrypted connections across threads). It corrupts a process-global heap shared by all connections, so the resulting crash can surface far from the racing thread. Happy to share the ThreadSanitizer log and a trimmed standalone reproducer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions