Skip to content

feat: add time_remaining_secs property to MIGRATING event data#940

Merged
fnesveda merged 4 commits into
masterfrom
feat/migrating-event-data-time-remaining
Jan 29, 2025
Merged

feat: add time_remaining_secs property to MIGRATING event data#940
fnesveda merged 4 commits into
masterfrom
feat/migrating-event-data-time-remaining

Conversation

@fnesveda

Copy link
Copy Markdown
Member

The Apify platform now sends a timeRemainingSecs property in the MIGRATING event data, this adds it to the event data model.

@fnesveda fnesveda added adhoc Ad-hoc unplanned task added during the sprint. t-core-services Issues with this label are in the ownership of the core services team. labels Jan 29, 2025
@fnesveda fnesveda added this to the 108th sprint - Platform team milestone Jan 29, 2025
@fnesveda fnesveda requested a review from vdusek January 29, 2025 10:44
@fnesveda fnesveda self-assigned this Jan 29, 2025
@vdusek vdusek self-requested a review January 29, 2025 10:50
Comment thread src/crawlee/events/_types.py Outdated

# The remaining time in seconds before the migration is forced and the process is killed
# Optional because it's not present when the event handler is called manually
time_remaining_secs: Annotated[float | None, Field(alias='timeRemainingSecs')]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use datetime/time types rather than float + secs suffix identifier?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for durations, we usually do timedelta

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very Python-specific, it would be hard to keep parity between JS and Python there.

Plus this way (with seconds) it's easier to use with AutoscaledPool:

Actor.on('migrating', ({ timeRemainingSecs }) => {
    await crawler.autoscaledPool.pause(0.9 * timeRemainingSecs);
    await persistStateSomehow();
    await Actor.reboot();
});

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you want it as timedelta in Python, I can add it, it shouldn't be hard. Only the property names will be slightly different between JS and Python, but I guess it's fine.

@vdusek vdusek Jan 29, 2025

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please. We use timedelta for all time-related fields. For Pydantic models, we have a custom type called timedelta_ms, which includes a custom validator and serializer. You can see it in use e.g. here.

And the property will only differ in Python’s internal code, the env var name can remain unchanged.
Edit: ah, of course, sorry...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an env var, it's data in an Actor event, and it's not internal, since it's something that users will be working with.

I'll rename the field exposed to users to time_remaining, which will be a timedelta, and internally it will be parsed from the timeRemainingSecs property of the received event data.

@vdusek vdusek self-requested a review January 29, 2025 10:54
Comment thread src/crawlee/events/_types.py Outdated

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fnesveda fnesveda merged commit b44501b into master Jan 29, 2025
@fnesveda fnesveda deleted the feat/migrating-event-data-time-remaining branch January 29, 2025 14:28
@fnesveda fnesveda added the validated Issues that are resolved and their solutions fulfill the acceptance criteria. label Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-core-services Issues with this label are in the ownership of the core services team. validated Issues that are resolved and their solutions fulfill the acceptance criteria.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants