feat!: Implement RequestManagerTandem, remove add_request from RequestList, accept any iterable in RequestList constructor#777
Merged
Conversation
janbuchar
commented
Dec 3, 2024
RequestProviderTandem, remove addRequest from RequestList
RequestProviderTandem, remove addRequest from RequestListRequestSourceTandem, remove addRequest from RequestList
Pijukatel
reviewed
Dec 4, 2024
Pijukatel
reviewed
Dec 4, 2024
vdusek
requested changes
Dec 6, 2024
vdusek
left a comment
Collaborator
There was a problem hiding this comment.
Based on the tests, it seems to be working, and I like how it's used. Good job. However, I have a few comments (& maybe concerns)...
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
vdusek
reviewed
Dec 16, 2024
|
|
||
| ## Processing requests from multiple sources | ||
|
|
||
| In some cases, you might need to combine requests from multiple sources, most frequently from a static list of URLs and a <ApiLink to="class/RequestQueue">`RequestQueue`</ApiLink>, where the queue takes care of persistence and retrying failed requests. |
Collaborator
There was a problem hiding this comment.
Maybe mention and add the link to the RequestList as well?
janbuchar
commented
Dec 17, 2024
vdusek
reviewed
Dec 19, 2024
vdusek
left a comment
Collaborator
There was a problem hiding this comment.
Great, thanks! Only a few nits...
And before merging, could you please:
- Update the PR title to match the new version.
- Mention the breaking changes in the upgrading guide.
- We will also need to update the Request storage and Result storage guides to reflect these changes. It doesn't have to happen now, but could you at least open an issue for that?
RequestSourceTandem, remove addRequest from RequestListRequestManagerTandem, remove add_request from RequestList
RequestManagerTandem, remove add_request from RequestListRequestManagerTandem, remove add_request from RequestList, accept any iterable in RequestList constructor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BasicCrawler(...any further)requestListSourcesinput from the user, which may be pretty complex (regexp-based extraction from remote URL lists), and which is usually parsed usingapify.RequestList.open. At the same time, the Actor wants to use the built inRequestQueue.Breaking changes
RequestListdoes not support.drop(),.reclaim_request(),.add_request()andadd_requests_batched()anymoreRequestManagerTandemwith aRequestQueueshould be used for this use case,await list.to_tandem()can be used as a shortcutRequestProviderinterface has been renamed toRequestManagerand moved to thecrawlee.request_loaderspackageRequestListhas been moved to thecrawlee.request_loaderspackageBasicCrawler.get_request_providermethod has been renamed toBasicCrawler.get_request_managerand it does not accept theidandnamearguments anymorerequest_providerparameter ofBasicCrawler.__init__has been renamed torequest_managerTODO