feat: add is_finished method to RequestQueueClient interface#1982
feat: add is_finished method to RequestQueueClient interface#1982Mantisus wants to merge 6 commits into
is_finished method to RequestQueueClient interface#1982Conversation
janbuchar
left a comment
There was a problem hiding this comment.
Looks sound, the only thing I'm mildly concerned about is the removal of the is_empty return value caching — I'd feel better if I knew some historical context and why it's safe to do this.
| # If we have a cached value, return it immediately. | ||
| if self._is_empty_cache is not None: | ||
| return self._is_empty_cache | ||
|
|
||
| state = self._state.current_value | ||
|
|
||
| # If there are in-progress requests, return False immediately. | ||
| if len(state.in_progress_requests) > 0: | ||
| self._is_empty_cache = False | ||
| return False | ||
|
|
||
| # If we have a cached requests, check them first (fast path). | ||
| if self._request_cache: | ||
| for req in self._request_cache: | ||
| if req.unique_key not in state.handled_requests: | ||
| self._is_empty_cache = False | ||
| return False | ||
| self._is_empty_cache = True | ||
| return len(state.in_progress_requests) == 0 | ||
|
|
There was a problem hiding this comment.
Are you sure it's safe to remove all this?
There was a problem hiding this comment.
My bad. I was sure that, in the current implementation of atomic_write, it guarantees that the metadata file is always valid. However, I just double-checked this, and on Windows, there is still a risk of creating an invalid metadata file that would cause the queue to fail.
There was a problem hiding this comment.
Updated with cache logic
vdusek
left a comment
There was a problem hiding this comment.
Thanks! Could you please implement apify/apify-sdk-python#987 as well?
That way, we can verify that it works on the platform too.
I'd like this to be part of SDK v4, built on top of Crawlee v1.8 with this included.
Sure. |
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
Description
is_finishedmethod to theRequestQueueClientinterface. It's not a breaking change. The base class defaultsis_finishedtois_emptywhen it isn't overridden. Splits the queue predicates.is_emptymeans there are no requests available to fetch.is_finishedmeans all requests are fully processed (nothing to fetch and nothing in progress).is_finishedin all built-inRequestQueueClient.is_emptyin all built-inRequestQueueClientto match the new meaning.AutoscaledPool. It now skips scheduling new tasks when no requests are available to fetch. For example, while the only request is being processed.Issues
is_finishedmethod to theStorageClientinterface #1966Testing
RequestQueuethat checks theis_emptyandis_finishedvalues through the queue lifecycle. It covers all built-in clients.AutoscaledPooldoesn't schedule new tasks while the only request is being processed.