fix: add asyncio.Lock to Context to prevent concurrent file write errors on Windows#1438
fix: add asyncio.Lock to Context to prevent concurrent file write errors on Windows#1438howardpen9 wants to merge 1 commit intoMoonshotAI:mainfrom
Conversation
…ors on Windows Fixes MoonshotAI#1429 — On Windows, multiple coroutines opening the same session file concurrently triggers PermissionError: [Errno 13] due to OS-level file locking. Adding an asyncio.Lock serializes all file I/O operations within each Context instance, preventing the race condition.
| self._history.extend(messages) | ||
|
|
||
| async with aiofiles.open(self._file_backend, "a", encoding="utf-8") as f: | ||
| for message in messages: | ||
| await f.write(message.model_dump_json(exclude_none=True) + "\n") | ||
| async with self._file_lock: | ||
| async with aiofiles.open(self._file_backend, "a", encoding="utf-8") as f: | ||
| for message in messages: | ||
| await f.write(message.model_dump_json(exclude_none=True) + "\n") |
There was a problem hiding this comment.
🔴 In-memory state modifications in append_message and update_token_count are outside the file lock, causing potential file-memory divergence
The PR adds self._file_lock to serialize file access, but append_message modifies self._history at line 210 before acquiring the lock, and update_token_count modifies self._token_count at line 219 before acquiring the lock. Meanwhile, clear() and revert_to() both reset these same fields inside the lock (e.g., self._history.clear() at context.py:202, self._token_count = 0 at context.py:203). If a concurrent coroutine holds the lock (e.g., clear() is doing file I/O inside the lock), append_message can execute self._history.extend(messages) and then block on the lock. When clear() then runs self._history.clear() (still under the lock), it undoes the extend. After clear() releases the lock, append_message acquires it and writes the message to the now-empty file — leaving the file with a message that the in-memory history doesn't contain.
The same inconsistency applies to update_token_count at line 219. Compare this with clear() and revert_to(), which correctly place all state mutations inside the lock. The in-memory mutations should also be moved inside self._file_lock for consistency.
| self._history.extend(messages) | |
| async with aiofiles.open(self._file_backend, "a", encoding="utf-8") as f: | |
| for message in messages: | |
| await f.write(message.model_dump_json(exclude_none=True) + "\n") | |
| async with self._file_lock: | |
| async with aiofiles.open(self._file_backend, "a", encoding="utf-8") as f: | |
| for message in messages: | |
| await f.write(message.model_dump_json(exclude_none=True) + "\n") | |
| async with self._file_lock: | |
| self._history.extend(messages) | |
| async with aiofiles.open(self._file_backend, "a", encoding="utf-8") as f: | |
| for message in messages: | |
| await f.write(message.model_dump_json(exclude_none=True) + "\n") |
Was this helpful? React with 👍 or 👎 to provide feedback.
| await tmp_f.write(chunk) | ||
| await aiofiles.os.replace(tmp_path, self._file_backend) | ||
|
|
||
| self._system_prompt = prompt |
There was a problem hiding this comment.
🟡 write_system_prompt sets self._system_prompt after releasing the lock, creating a window for state inconsistency
At line 104, self._system_prompt = prompt is set after the async with self._file_lock: block exits. This is the opposite ordering from append_message (which mutates state before the lock), but both are wrong: neither is atomic with the file write. If clear() runs between the lock release at line 102 and line 104, it will set self._system_prompt = None inside its lock block (context.py:205), and then this line will overwrite it to prompt — leaving in-memory state saying there's a system prompt while the file was cleared.
For consistency with clear() and revert_to() (which correctly put all mutations inside the lock), self._system_prompt = prompt should be moved inside the async with self._file_lock: block.
| self._system_prompt = prompt | |
| self._system_prompt = prompt |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Fixes #1429
On Windows, multiple async coroutines within the same Kimi CLI process can open the session context file (
context.jsonl) concurrently, triggeringPermissionError: [Errno 13]due to OS-level file locking semantics.This PR adds an
asyncio.Lockto theContextclass to serialize all file I/O operations on the backing file, preventing concurrent access conflicts.Changes
self._file_lock = asyncio.Lock()toContext.__init__async with self._file_lock:write_system_prompt— write/prepend system promptcheckpoint— append checkpoint recordrevert_to— rotate + rewrite fileclear— rotate + truncate fileappend_message— append message recordsupdate_token_count— append usage recordrestore()is read-only and called once at init, so it does not need the lockWhy
asyncio.Lockis sufficientThe root cause is multiple coroutines within the same process racing to open the same file. Since all coroutines share one event loop,
asyncio.Lock(notthreading.Lock) is the correct synchronization primitive. This serializes file access without blocking the event loop.Test plan
PermissionError