Skip to content

feat: degrade gracefully in offline / enterprise environments#337

Open
gtsiolis wants to merge 5 commits into
devx-703-support-enterprise-environments-that-cannot-pull-from-ac7afrom
devx-703-offline-graceful-degradation
Open

feat: degrade gracefully in offline / enterprise environments#337
gtsiolis wants to merge 5 commits into
devx-703-support-enterprise-environments-that-cannot-pull-from-ac7afrom
devx-703-offline-graceful-degradation

Conversation

@gtsiolis

@gtsiolis gtsiolis commented Jun 23, 2026

Copy link
Copy Markdown
Member

Stacked on #325. This PR's base is the image-override branch, so its diff shows offline changes only; it auto-retargets to main when #325 merges. The local-image fallback integration test relies on the image= field introduced in #325.

What

Makes lstk start degrade gracefully in offline / enterprise environments that cannot reach Docker Hub or the license server (offline/air-gapped, corporate proxy, or TLS interception) — without an explicit flag.

  • Image pull: if PullImage fails but the image is already present locally (new runtime.ImageExists), lstk warns "using the local image" and starts the local image instead of failing.
  • License pre-flight: validateLicense now distinguishes a definitive server rejection (*api.LicenseError, e.g. HTTP 403/400 — still fatal) from a transport-level failure (any other error — offline/proxy/cert). On a transport failure it skips the pre-flight check and lets the container validate its own bundled license at startup.
  • runtime.PullImage always closes its progress channel, even when ImagePull fails early, so the local-image fallback path does not leak the progress goroutine.
  • Context cancellation is propagated during the license pre-flight so Ctrl+C aborts cleanly.

Scope

Second of the two PRs split out of DEVX-703. The custom image override half ships separately in #325. This PR is offline graceful-degradation only.

Note: this addresses the truly offline case (network requests fail). The separate slow-network / demo case raised in #team-cli (a working-but-slow link where the pull succeeds but takes 20 min) is a different problem — graceful degradation never triggers there because nothing fails. That is being tracked as its own piece of work (--offline to skip the pull when the image is local + a cancellable pull).

Tests

  • Unit (internal/container): TestPullImages_FallsBackToLocalImageWhenPullFails, TestPullImages_FailsWhenPullFailsAndImageMissing, TestValidateLicense_ContinuesWhenServerUnreachable, TestValidateLicense_FailsOnServerRejection.
  • Integration: local-image fallback and license-server-unreachable E2E coverage (added on this branch).

Refs DEVX-703

@gtsiolis gtsiolis self-assigned this Jun 23, 2026
@gtsiolis gtsiolis added semver: patch docs: needed Pull request requires documentation updates labels Jun 23, 2026
@gtsiolis gtsiolis force-pushed the devx-703-offline-graceful-degradation branch from 5428042 to 79b63e1 Compare June 23, 2026 19:28
@gtsiolis gtsiolis changed the base branch from main to devx-703-support-enterprise-environments-that-cannot-pull-from-ac7a June 23, 2026 19:28
@gtsiolis gtsiolis force-pushed the devx-703-support-enterprise-environments-that-cannot-pull-from-ac7a branch 2 times, most recently from fd896c8 to 696f298 Compare June 25, 2026 10:52
@gtsiolis gtsiolis force-pushed the devx-703-offline-graceful-degradation branch 2 times, most recently from 658808b to 70643a3 Compare June 25, 2026 11:07
@gtsiolis gtsiolis marked this pull request as ready for review June 25, 2026 11:07
@gtsiolis gtsiolis requested a review from a team as a code owner June 25, 2026 11:07
@gtsiolis

gtsiolis commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

Note to any reviewers, this is a stacked PR against #325. Cc @anisaoshafi

gtsiolis and others added 5 commits June 26, 2026 14:06
Enterprise environments that cannot reach Docker Hub or the license
server (offline/air-gapped, proxy or TLS interception) hit two hard
failures: image pulls and license validation. Rather than gate this
behind an explicit flag, container.Start now degrades gracefully when an
internet request fails:

- Image pull: if PullImage fails but the image is already available
  locally (via the new runtime.ImageExists), lstk warns and uses the
  local image instead of failing.
- License pre-flight: validateLicense distinguishes a definitive server
  rejection (*api.LicenseError, still fatal) from a transport-level
  failure (offline/proxy/cert), skipping the check on transport failure
  and letting the container validate its own bundled license.

runtime.PullImage always closes its progress channel, even when
ImagePull fails early, so the local-image fallback does not leak the
progress goroutine. Context cancellation is propagated during the
license pre-flight so Ctrl+C aborts cleanly.

Refs DEVX-703

Co-authored-by: linear-code[bot] <222613912+linear-code[bot]@users.noreply.github.com>
Adds end-to-end coverage for the two graceful-degradation paths that were
previously only exercised by mock-based unit tests:

- TestStartFallsBackToLocalImageWhenPullFails: tags a real LocalStack
  image under an unpullable name so the pull fails but the image exists
  locally, and asserts lstk warns and starts the local image.
- TestStartContinuesWhenLicenseServerUnreachable: points the license
  endpoint at a closed server so the pre-flight fails at the transport
  level, and asserts lstk skips the check and the container still starts.

Both require Docker and a valid LOCALSTACK_AUTH_TOKEN (the container must
activate to become healthy), mirroring TestStartCommandSucceedsWithValidToken.

Refs DEVX-703

Co-authored-by: linear-code[bot] <222613912+linear-code[bot]@users.noreply.github.com>
On Ctrl+C mid-pull, the cancelled context made the rt.ImageExists
local-image probe fail, so the start surfaced a misleading "Failed to
pull" error and emitted a spurious start-error telemetry event. Guard the
pull-failure path with ctx.Err() so a user cancel propagates cleanly,
mirroring the existing license pre-flight handling.

Also documents the known limitation that an HTTP error response from the
license server (5xx, or 407 from a proxy) is still treated as a definitive
verdict rather than degrading.

Refs DEVX-703

Co-authored-by: linear-code[bot] <222613912+linear-code[bot]@users.noreply.github.com>
A pinned image that is already present locally is not pulled (pullImages),
so the CLI license pre-flight is now skipped for it too: the redundant
network round-trip would otherwise block an entirely offline start, and the
container validates its own bundled license at startup. This is symmetric
with the existing skip-pull behaviour for local pinned images.

tryPrePullLicenseValidation gains the runtime so it can probe ImageExists;
a probe error is non-fatal and falls through to the pre-flight check.
… path

The two offline start tests started a real container under an isolated
t.TempDir() HOME, so the container's root-owned volume files (e.g.
server.test.pem.key) could not be removed by t.TempDir cleanup, failing in
CI. Use the real (inherited) HOME like every other fresh-start container
test; config stays isolated via --config.

Adds TestStartSkipsPullAndLicenseCheckWhenImageIsLocal covering the #325
review's success path: a pinned configured image found locally starts with
no pull and no CLI license check (asserted via a license server that fails
the test if contacted).
@gtsiolis gtsiolis force-pushed the devx-703-support-enterprise-environments-that-cannot-pull-from-ac7a branch from 696f298 to d059449 Compare June 26, 2026 11:22
@gtsiolis gtsiolis force-pushed the devx-703-offline-graceful-degradation branch from 70643a3 to 61c278d Compare June 26, 2026 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs: needed Pull request requires documentation updates semver: patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant