Data race: process-global `xoshiro_s` PRNG and `private_heap_*` counters mutated outside `SQLCIPHER_MUTEX_MEM` — heap corruption under concurrent keyed-connection use

## Summary

SQLCipher 4.7+ routes all cipher-context and key-buffer allocations through a process-global "private heap," and scrubs freed memory with a process-global xoshiro256++ PRNG. The private-heap free-list traversal is correctly guarded by `SQLCIPHER_MUTEX_MEM`, but two pieces of shared mutable state that the same code touches are accessed **outside** that mutex:

1. **`xoshiro_s[4]`** (the scrub PRNG state) is mutated by `xoshiro_next()` with **no** synchronization.
2. The **`private_heap_*` statistics counters** (`private_heap_used`, `private_heap_alloc`, `private_heap_allocs`, `private_heap_hwm`, `private_heap_overflow`, `private_heap_overflows`) are updated **after** `sqlite3_mutex_leave(SQLCIPHER_MUTEX_MEM)` in both `sqlcipher_malloc` and `sqlcipher_free`.

Because every keyed connection shares this single global heap/PRNG, any two threads that **concurrently allocate or free a codec context** race these globals. The simplest trigger is concurrent **open/close** of keyed connections, but the same `sqlcipher_malloc`/`sqlcipher_free` traffic also fires on the **deferred read- and write-context key derivation** (`sqlcipher_codec_key_derive` → `sqlcipher_cipher_ctx_copy`, on first use of each context) and on `ATTACH ... KEY` — so a steady-state workload that mixes opens, attaches, and first reads/writes across threads triggers it too. In a build compiled with `SQLITE_THREADSAFE=1`, this manifests as heap corruption and intermittent `SIGSEGV`/`SIGBUS` inside `sqlcipher_malloc` / `sqlcipher_free` / `sqlcipher_shield` / `sqlcipher_page_cipher`, typically at the next allocation, free, or page-codec operation.

This is latent for typical usage (one keyed connection, low thread concurrency) but reliably triggered by workloads that open/close/key **many** keyed connections across a thread pool.

## Affected versions / environment

- **SQLCipher:** the racy machinery (private heap + `sqlcipher_shield` + freed-memory scrub) was introduced in the **4.7.0** memory redesign — it is absent in 4.6.1 (the last pre-4.7 release) and present, **unchanged, through the current release 4.16.0** (verified by diffing `src/sqlcipher.c` at the v4.14.0 / v4.15.0 / v4.16.0 tags — byte-for-byte identical; the 4.16.0 "allocation issue" changelog entry was an unrelated LibTomCrypt stack-allocation + logging change). Reproduced and TSan-confirmed on **4.14.0 community** (`CIPHER_VERSION_NUMBER 4.14.0`, SQLite base 3.51.3), the version the Rust `libsqlite3-sys 0.38` binding vendors.
- **Crypto provider:** OpenSSL 3.x (`SQLCIPHER_CRYPTO_OPENSSL`). Provider-independent — the racy code is in the core memory subsystem, not the provider.
- **Threading:** `SQLITE_THREADSAFE=1` (static mutexes real). Connections opened with the default `SQLITE_OPEN_NOMUTEX`.
- **Build:** bundled/amalgamation compile (observed via Rust `libsqlite3-sys 0.38` / `rusqlite 0.40`, but this is pure C — not Rust-specific).
- **Platform observed:** macOS arm64. The race is platform-independent; arm64 pointer authentication just makes the corrupted-pointer dereference fault promptly (`KERN_INVALID_ADDRESS ... (possible pointer authentication failure)`).

## The racy code (`src/sqlcipher.c`; line numbers at the current release **v4.16.0**, unchanged since 4.7.0)

- **`xoshiro_s`** — `static volatile uint64_t xoshiro_s[4]` (decl line **205**); `xoshiro_next()` (lines **211–224**) mutates it with no lock. It is called from `xoshiro_randomness()`, which runs on every freed-memory scrub:
  - inside `sqlcipher_free` (under the MEM mutex), **and**
  - inside the overflow fallback `sqlcipher_internal_free` (scrub at line **881**), which runs **after** the MEM mutex is released, **and**
  - during heap/shield init.
  Concurrent calls from different threads tear `xoshiro_s`.
- **`private_heap_*` counters** — in `sqlcipher_malloc` the free-list mutation is inside `sqlite3_mutex_enter/leave(SQLCIPHER_MUTEX_MEM)`, but the `private_heap_overflow/overflows/used/hwm/alloc/allocs` updates happen at lines **950–961**, **after** `sqlite3_mutex_leave` at line **944**. Same in `sqlcipher_free` (`private_heap_used -= …` at line **1023**, after the mutex leave at line **1013**). An unsynchronized read-modify-write of shared globals on every alloc and free.

*(In the amalgamated `sqlite3.c` that language bindings build from, these live around `sqlite3.c:109625`–`110400`. There is no fixed upstream version to upgrade to — the machinery is identical at the current v4.16.0.)*

## This is one instance of a broader pattern: `volatile` treated as thread-safety

A source audit of `src/sqlcipher.c` (the offsets below are from the amalgamated `sqlite3.c`) found this is not an isolated slip but a **repeated assumption**: ~20 process-global mutables are declared `volatile` and accessed without a guarding mutex. `volatile` in C provides neither atomicity nor a happens-before edge, so each is a data race when touched from multiple threads. By severity:

- **Hot path (crash-class) — this issue:** `xoshiro_s` + the `private_heap_*` counters (~109625, 109724–109731), touched on every `sqlcipher_malloc`/`sqlcipher_free`.
- **Lifecycle / teardown (crash/UAF-class):** `sqlcipher_log_file` (`FILE*`, ~109697) is `fclose`'d and reassigned in `sqlcipher_set_log` with no mutex while `sqlcipher_log` reads it and calls `fprintf`; `default_provider` (~109690) and `sqlcipher_shield_mask` (~109704) are freed + NULLed in `sqlcipher_extra_shutdown` (~110022/110039) with no provider mutex while `codec_ctx_init` / `sqlcipher_shield` read them on other threads.
- **Config (torn/stale-value):** `default_flags`, `hmac_salt_mask`, `default_kdf_iter`, `default_page_size`, `default_plaintext_header_size`, `default_hmac_algorithm`, `default_kdf_algorithm` (~109679–109685) and `sqlcipher_mem_security_on`/`_executed` (~109686/109687) — written by `cipher_default_*` / `cipher_memory_security` PRAGMAs under only the issuing connection's `db->mutex`, read by other connections' `codec_ctx_init` / the wrapped allocator with no shared lock.

The private-heap/PRNG pair surfaces first because it sits on the per-page/lifecycle hot path; the others are the same defect on colder paths (logging/teardown/config).

## Reproduction

Minimal C-level repro (no application code needed):

1. Open **N** keyed connections (`sqlite3_open` + `PRAGMA key = '...'`), N ≥ ~8.
2. From a thread pool (≥ 8 threads), in a tight loop, concurrently **open and close** keyed connections (and/or run trivial statements). **Distinct DB files per thread is sufficient** — a shared DB file is *not* required, which confirms the race is in the process-global heap/PRNG, not in any per-database state.
3. Build with `-fsanitize=thread` (ThreadSanitizer), `SQLITE_THREADSAFE=1`.

Within seconds, ThreadSanitizer reports data races. Without TSan, a long enough run intermittently `SIGSEGV`s/`SIGBUS`es inside `sqlcipher_*`.

We built the bundled C with `-fsanitize=thread` and confirmed:

- **~1674 races in a short run**, the large majority naming **`xoshiro_s`**, the remainder naming the **`private_heap_*`** counters.
- Race stacks root at **`sqlite3_close`** (close path → `sqlcipher_codec_ctx_free` → `sqlcipher_free`) and **`sqlite3_key_v2`** (open path → `sqlcipherCodecAttach` → `sqlcipher_codec_ctx_init` → `sqlcipher_malloc`).

Representative real (non-TSan) faulting stacks observed:

```
# open path
sqlcipher_malloc
sqlcipher_codec_ctx_init
sqlcipherCodecAttach
sqlite3_key_v2
sqlite3Pragma
... sqlite3_prepare_v3

# close path
sqlcipher_free
sqlcipher_codec_ctx_free
sqlite3FreeCodecArg
sqlite3PagerClose
sqlite3BtreeClose
... sqlite3_close

# steady-state page codec (collateral, after heap is corrupted)
sqlcipher_shield
sqlcipher_page_hmac / sqlcipher_page_cipher
sqlite3Codec
... sqlite3_step
```

Faults present as `SIGSEGV`/`SIGBUS` at garbage / non-canonical addresses (the classic "freed/overwritten then dereferenced" signature).

## Notes

- **SQLite-level serialization does not cover it.** Setting `SQLITE_OPEN_FULLMUTEX` and/or `sqlite3_config(SQLITE_CONFIG_SERIALIZED)` does **not** prevent the race — these globals live outside SQLite's own mutex framework.
- `PRAGMA cipher_memory_security` is unrelated (and can only be enabled, not disabled); the racy private-heap/scrub path is always-on in the open-source build and is not gated by that pragma.

## Suggested fix

Small and shouldn't affect the hot path (the relevant mutex is already taken microseconds away):

1. **`private_heap_*` counters:** move the stat bookkeeping to *before* `sqlite3_mutex_leave(SQLCIPHER_MUTEX_MEM)` in `sqlcipher_malloc`/`sqlcipher_free` (or make the counters `_Atomic`).
2. **`xoshiro_s`:** serialize all `xoshiro_randomness()` calls under `SQLCIPHER_MUTEX_MEM` (including the `sqlcipher_internal_free` fallback, which currently scrubs outside it), **or** make the PRNG state thread-local, **or** use atomics. Thread-local is the most contention-friendly.
3. **More broadly:** apply the same to the whole `volatile`-global set above — guard each global's reads + writes with the appropriate existing `SQLCIPHER_MUTEX_*` (the provider mutex already exists for `default_provider`; a config mutex would cover the `default_*`/log set), or make the scalars `_Atomic`. `volatile` is not a substitute for either.

## Impact

This is a memory-safety data race in a security library, reachable from ordinary multi-threaded use of keyed connections (any app that opens/closes/keys several encrypted connections across threads). It corrupts a process-global heap shared by all connections, so the resulting crash can surface far from the racing thread. Happy to share the ThreadSanitizer log and a trimmed standalone reproducer.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data race: process-global `xoshiro_s` PRNG and `private_heap_*` counters mutated outside `SQLCIPHER_MUTEX_MEM` — heap corruption under concurrent keyed-connection use #598

Summary

Affected versions / environment

The racy code (`src/sqlcipher.c`; line numbers at the current release v4.16.0, unchanged since 4.7.0)

This is one instance of a broader pattern: `volatile` treated as thread-safety

Reproduction

Notes

Suggested fix

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Data race: process-global xoshiro_s PRNG and private_heap_* counters mutated outside SQLCIPHER_MUTEX_MEM — heap corruption under concurrent keyed-connection use #598

Description

Summary

Affected versions / environment

The racy code (src/sqlcipher.c; line numbers at the current release v4.16.0, unchanged since 4.7.0)

This is one instance of a broader pattern: volatile treated as thread-safety

Reproduction

Notes

Suggested fix

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Data race: process-global `xoshiro_s` PRNG and `private_heap_*` counters mutated outside `SQLCIPHER_MUTEX_MEM` — heap corruption under concurrent keyed-connection use #598

The racy code (`src/sqlcipher.c`; line numbers at the current release v4.16.0, unchanged since 4.7.0)

This is one instance of a broader pattern: `volatile` treated as thread-safety