When using the EnqueueStrategy.SAME_HOSTNAME I noticed it does not work properly on non www urls.
In the debugger I noticed it passes origin to the _check_enqueue_strategy but it uses the context.request.loaded_url if available.
So every URL that is checked will mismatch because of the difference in hostname

I tested this with multiple urls with & without www prefix and got the same behaviour.

Changing the line to origin = context.request.url fix this issue, but I have no idea what implications this would have on the other code.
I use the PlaywrightCrawler in my code with context.enqueue_links
When using the
EnqueueStrategy.SAME_HOSTNAMEI noticed it does not work properly on non www urls.In the debugger I noticed it passes
originto the_check_enqueue_strategybut it uses thecontext.request.loaded_urlif available.So every URL that is checked will mismatch because of the difference in hostname
I tested this with multiple urls with & without www prefix and got the same behaviour.
Changing the line to
origin = context.request.urlfix this issue, but I have no idea what implications this would have on the other code.I use the
PlaywrightCrawlerin my code withcontext.enqueue_links