Reliability and failure model¶
This page is the reference for what liel takes responsibility for, and where
the failure model intentionally stops.
liel is not a server database. Its durability path is designed around a narrow
model: one writer process writing one local .liel file, with explicit
commit() calls.
For the feature catalogue, see feature list. For the byte-level format, see format spec. For why this scope is intentional, see product trade-offs.
1. Short version¶
Use liel when this model fits:
- One process owns writes to a
.lielfile. - Writes are grouped and finalized with
commit()orwith db.transaction():. - The file lives on a normal local filesystem.
- If the process crashes, reopening the file should return to the last committed state or replay a complete WAL.
Do not rely on liel for:
- Multiple processes mutating the same
.lielfile at the same time. - Server-style concurrent writer workloads.
- Filesystems that do not preserve ordinary write and fsync ordering.
- Keeping changes that were never committed.
2. What is guaranteed¶
2.1 Committed data is the unit of durability¶
commit() is the durability boundary. Before commit(), changes belong to the
current GraphDB session and are not guaranteed to survive a process crash.
After a successful commit(), those changes are intended to survive crash and
reopen.
Recommended pattern:
with db.transaction():
db.add_node(["Task"], title="write docs")
db.add_node(["Decision"], text="keep the API small")
For larger imports, commit in explicit batches sized for the WAL reservation:
pending = 0
for row in rows:
db.add_node(["Record"], **row)
pending += 1
if pending >= 20_000:
db.commit()
pending = 0
if pending:
db.commit()
2.2 Interrupted commits are recovered on open¶
liel uses a fixed-location, page-level WAL. On commit, modified pages are
written to the WAL, the WAL is fsynced, pages are copied to their canonical
locations, and the data file is fsynced.
If the process stops while the WAL is still live, the next open replays complete WAL entries back into the data file. The byte layout is specified in format spec ยง6.
2.3 Double-open inside one process is rejected¶
Within one Python process, opening the same .liel path twice for writing
returns AlreadyOpenError.
db1 = liel.open("memory.liel")
db2 = liel.open("memory.liel") # AlreadyOpenError
Close the previous handle, or let its with block exit, before reopening.
2.4 Corrupt files fail closed¶
Files that cannot be interpreted safely should not be read silently.
Examples:
- Invalid magic bytes.
- Header checksum mismatch.
- Unsupported layout metadata or format version.
- Truncated files.
- WAL entries that fail validation.
- Invalid property, slot, or extent references.
These surface as GraphDBError subclasses such as CorruptedFileError.
3. What is not guaranteed¶
3.1 Multi-process writers are rejected¶
When another process tries to open the same .liel file for writing, liel
uses a <file>.lock/ directory to reject the second writer. This is not
concurrent write support; it is fail-closed protection against file corruption.
Recommended pattern:
- Put one process in charge of writes to a
.lielfile. - If several tools or agents need to update it, route writes through an MCP server, application service, job queue, or another owner process.
- Do not let each producer open and mutate the same
.lielfile directly.
If a writer process crashes and leaves .lock/ behind, the next open() checks
the PID stored in owner.json. If the owner is clearly dead, liel reclaims the
stale lock with rename -> delete -> recreate. If the owner is alive or cannot
be checked safely, open() fails with AlreadyOpenError.
3.2 Uncommitted changes may be lost¶
If the process exits before commit(), those changes are outside the durability
contract. Reopening the file returns to the last committed state.
Use rollback() when you want to discard in-session changes explicitly.
3.3 Network and sync filesystems are outside the comfort zone¶
liel relies on the local filesystem preserving ordinary write and fsync
ordering. Cloud sync folders, network filesystems, virtual filesystems, and
aggressive antivirus or backup tools can change the real failure behavior.
For durable application state, prefer a normal local disk. Copy the .liel file
for backup only after the writer has closed or after coordinating explicitly
with the writer process.
3.4 WAL capacity is finite¶
The current WAL reservation is fixed at 4 MiB. A single transaction that would exceed that reservation returns a transaction error before it can be committed.
Split very large imports into batches. This is intentional: it keeps recovery small and predictable.
4. Failure modes table¶
| Failure | Expected behavior |
|---|---|
Process exits before commit() |
Uncommitted changes are lost; reopen returns to the last committed state |
| Process exits before WAL write | Reopen returns to the last committed state |
| Process exits after WAL fsync but before data-page apply | Next open replays the WAL |
| Process exits while applying data pages | Next open replays the remaining WAL |
| WAL entry is truncated or fails CRC | Recovery accepts complete entries only; invalid tail is not applied |
| Header checksum mismatch | CorruptedFileError |
| Unsupported format version | Explicit compatibility error or CorruptedFileError |
| Same file opened twice in one process | AlreadyOpenError |
| Same file opened by two writer processes | Lock directory rejects the second writer with AlreadyOpenError |
.lock/ remains after writer crash |
Next open() reclaims it if the owner PID is dead |
| Transaction exceeds WAL reservation | Transaction error; retry with smaller batches |
5. Operational recommendations¶
- Keep one writer process per
.lielfile. - Group bulk inserts inside
with db.transaction():or explicit batch commits. - Avoid very frequent tiny commits.
- Do not write directly to
.lielfiles inside cloud sync folders. - Copy a
.lielfile for backup only after the writer has closed, or after coordinating with the writer process. - For MCP integration, route writes through one MCP server or owner process rather than letting multiple clients open the file directly.
6. Compatibility policy¶
6.1 API compatibility¶
liel is published as Development Status :: 4 - Beta during the current
0.x series. The Beta compatibility surface is Python-first:
liel.openGraphDBlifecycle, CRUD, traversal, transaction, QueryBuilder, merge, maintenance, and metadata methods documented in the Python guideNode,Edge,NodeQuery,EdgeQuery,Transaction, andMergeReport- the main exception classes under
GraphDBError
Breaking changes may still happen before 1.0, but changes to this surface
should be recorded in the changelog with migration notes.
Rust internals and helper APIs that are not documented in the Python guide may
change more freely.
6.2 On-disk format compatibility¶
The canonical .liel file format lives in format spec.
During 0.x, breaking format changes may happen, but they must be explicit:
the format version should be checked, unsupported future formats should fail
closed, and release notes should say whether existing
files remain readable or need migration.
An older liel should not silently read an unknown future format version.
Unsupported formats fail closed.
7. How to describe reliability¶
Good short description:
lielis a portable external-brain store built on a graph engine that uses a page-level WAL to recover committed writes under a single-writer, single-file, local-filesystem model.
Avoid saying:
lielis a server-grade, fully concurrent database.
The accurate claim is narrower and stronger: liel is not trying to make
concurrent writers safe; it is trying to keep local single-file graph memory
recoverable and easy to reason about.