- Fetching requests from
RequestQueue is sometimes very slow and can get stuck for a while.
- I turned on logging and reproduced the issue with the following code:
import asyncio
import logging
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
logging.basicConfig(level=logging.INFO)
async def main() -> None:
crawler = BeautifulSoupCrawler()
@crawler.router.default_handler
async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
await context.enqueue_links(strategy='same-hostname')
data = {
'request_url': context.request.url,
'soup_url': context.soup.url,
'soup_title': context.soup.title.string if context.soup.title else None,
}
await context.push_data(data)
await crawler.run(['https://crawlee.dev'])
if __name__ == '__main__':
asyncio.run(main())
- In the logs, there are many lines like this:
INFO:crawlee.storages.request_queue:Waiting for 9.988466 for queue finalization, to ensure data consistency.
Questions
- Is this behavior correct?
- Is the waiting period necessary?
- Is it necessary for memory storage as well?
RequestQueueis sometimes very slow and can get stuck for a while.Questions