Skip to content

PlaywrightCrawler extract_links does not respect strategy #1212

Description

@phughesion-h3

I was crawling a test site that I have hosted locally (localhost).

My PlaywrightCrawler subclass must add additional user_data to each request, so I extract and form each request manually.

new_requests = await context.extract_links(strategy='same-origin')
for new_request in new_requests:
        print(f"[LINK] Extracted link: {new_request.url}")

I start my crawl pointed at http://localhost, yet the crawler ends up crawling YouTube since there is a link to YouTube on my site.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions