Data Engineering Resources

DZone's Featured Data Engineering Resources

Keeping AI-Powered BI Honest: A Human-in-the-Loop (HITL) Playbook

By Nithish Shetty

A few months ago, I led a BI project with a deceptively simple pitch: let business users ask questions in plain English, and hand back the answer. We wired an LLM to our warehouse, got SQL generation working, and ran a pilot. It did not go well. The model was actually right a lot of the time, and that wasn’t the problem. The problem was that nobody on the business side could tell when it was right. Prompts came in tangled; the model would interpret one clause subtly wrong, and we’d return a clean-looking number sitting on top of a clean-looking SQL query. The users couldn’t read the SQL. When we tried to surface the model’s reasoning, it was a wall of CTEs and join keys that helped no one. We had humans in the loop. Reviewers, too. The catch is they weren’t operating as a loop; they were operating as a relay. They’d glance at the SQL, agree it looked plausible, and forward the answer along. The user nodded. Two weeks later, finance would surface a number that didn’t reconcile, and by the time we traced it back, the decision had already been made. That was the painful version of the lesson I want to share. HITL is not a checkbox between the model and production. It’s a translation layer. The model produces SQL and rows; the user needs an answer they trust. A human has to do the work of turning one into the other, and the system around that human has to make the work possible. Show a reviewer raw SQL plus a confidence score, and you’ve built a relay, not a loop. Below is the playbook I wish I’d had on day one. 1. Confidence Threshold Routing Score every generated query before it runs. Self-consistency sampling is the cleanest version: generate 5 candidate SQL statements for the same prompt and check how many agree on the join logic. If 4 out of 5 join to dim_employee and one joins to dim_customer, your agreement ratio is 0.8. If your threshold is 0.85, that query gets routed to review even though it looks correct on the surface. Agreement across multiple generations is a stronger signal than any single model’s confidence score, which is famously well-calibrated for the wrong things. Aggregated log-probabilities are another option. The choice matters less than the discipline: anything below the threshold goes into a queue, never straight to execution. The threshold itself becomes a tunable lever you tighten over time as you learn which query patterns deserve more scrutiny. 2. Staged Execution With Approval Gates Confidence on its own isn’t enough for high-stakes domains. Define a list of high-impact tables such as revenue facts, employee dimensions, compliance event logs, and require human approval for any query that touches them, regardless of confidence score. The model might be certain. The business context still demands validation. In practice, the table list is governance work, not engineering work. The data team should own it, finance and HR should ratify it, and you should revisit it the same way you revisit access controls. If you let engineering pick the list alone, the list will be wrong, and nobody outside engineering will know it’s wrong until it’s too late. 3. Reviewer Tooling: The Part Everyone Underinvests In This is where my pilot fell over. Showing a non-technical reviewer raw SQL and asking “looks good?” is worse than no review at all, as it produces fake assurance. Reviewer tooling has to bridge SQL and business context. On a single screen, the reviewer needs the original natural-language prompt, the semantic-model entities the query touches (measures and dimensions, not table names), the filters being applied in human terms, and the expected shape of the result. The reviewer’s job is to validate intent, not parse syntax. Build the interface around that. If your reviewers are reading SQL out loud in their heads to figure out what a query does, you’ve shipped a relay. 4. Audit-Linked Approval Records Every reviewer decision to approve, reject, or edit has to write back to an audit log alongside the original prompt, the generated SQL, the reviewer’s identity and a timestamp. That log is the dataset you’ll need months later. It’s how you explain a number when finance comes asking. It’s how you recalibrate thresholds based on what actually shipped versus what got bounced. It’s how you find the query patterns that consistently trip up the model. Skip this step and the program loses its memory. You keep paying the human-review cost without ever compounding the learnings, which is the worst of both worlds. 5. Escalation Paths Reviewers will get stuck. They’ll sense a query is doing something odd without being able to articulate why, especially when it crosses domain boundaries. Give them a one-click route to a domain expert such as a finance lead, HR ops, or compliance, along with their concern, without freezing the user who originally submitted the query. The whole point is to prevent reluctant approvals. A reviewer who isn’t sure should never feel pressured to sign off because they have no other option. In my pilot, “no other option” was the silent failure mode. Reviewers approved because rejecting felt rude, and the loop swallowed the doubt instead of routing it. 6. HITL Bypass Logging When a query clears the confidence threshold and isn’t flagged as high-impact, it just runs. Log that bypass anyway with the score that justified it, the prompt, and the SQL. This is the data that surfaces threshold drift, model regressions, and good training examples for the next iteration. It also closes the audit gap between “approved by a human” and “approved by silence”. Without it, you can’t tell the two apart, which means you can’t defend either. Wrapping Up Shipping AI-generated SQL straight to production is reckless. The model will be wrong, and it will be wrong in ways that look right. A single bad number in a board deck can outlive whoever wrote the prompt. HITL isn’t a nice-to-have here. It’s the only thing standing between a useful BI assistant and a very fast way to make confident, well-formatted and completely wrong decisions. The lesson from my pilot wasn’t that humans should validate SQL. It’s that humans have to translate. The model speaks in joins; the business speaks in outcomes. Build the loop so the people in the middle have a real chance of bridging the two, like tooling that surfaces intent, processes that protect them from reluctant sign-off, audit trails that turn every decision into future training data. Do that, and you get a BI assistant that’s actually trusted. Skip it, and you get a relay that breaks quietly until it doesn’t. Key Takeaways Confidence threshold routing using self-consistency sampling catches semantic errors that a model’s own confidence scores miss. Generate multiple candidates and measure agreement.Staged execution with approval gates protects high-stakes queries such as revenue, headcount, and compliance regardless of model confidence.Reviewer tooling has to bridge SQL and business context. Show the prompt, the semantic entities, and the expected output shape, and never just the raw query.Audit-linked approval records are the dataset you’ll need to recalibrate thresholds and explain numbers when finance comes asking months later.Escalation paths prevent reluctant approvals. Make it one click to route to a domain expert.HITL bypass logging turns silent successes into a feedback loop and closes the gap between “approved by a human” and “approved by silence.” More

Solving Data Traffic Jams in Your Network

By Sascha Neumeier

Stop, start. Stop, start. Nothing brings data flows to a grinding halt (or raises an admin’s blood pressure) quite like network congestion. The unwanted, unexpected extra step in an information request or response operation chain is a telltale sign that something’s changed or isn’t working in your infrastructure. And heavier traffic is more than just an inconvenience – it’s a multifaceted problem with knock-on business effects that falls upon admins to identify and fix. Let’s dig deeper into network traffic jams, their primary causes, and how to resolve and prevent them. Understanding What Causes a Digital Traffic Jam Network congestion occurs when the demand for sending or receiving data exceeds the network’s capacity. In other words, a computer network link can’t handle the volume of data trying to use it. It’s like what happens when a person tries to pour more water through a straw than it can handle at once. At a certain point, there’s simply not enough space, causing a backup in the straw. In computer networks, when data packets exceed the network’s capacity, they’re similarly queued in network devices, leading to increased latency and, in turn, traffic jams. 7 Most Common Causes of Network Congestion Bandwidth bottlenecks: When the capacity of network links (such as cables or wireless connections) is insufficient to handle the amount of data being sent.Network device limitations: Routers, switches, and other devices have limited processing power and memory and can become overwhelmed when handling large volumes of traffic.Broadcast storms: A situation where a network becomes flooded with broadcast or multicast packets, often caused by misconfigured devices or faulty hardware.High-bandwidth applications: Applications that consume a lot of network resources, such as video streaming, large file transfers, and backup operations.DDoS attacks: A distributed denial-of-service (DDoS) attack occurs when a network is intentionally flooded with excessive traffic from multiple sources.Poor network architecture: Inefficient routing or inadequate network capacity planning can lead to congestion hotspots.Insufficient internet speeds: Slow service-provider connections can cause bottlenecks at the edge of the network. Performance and Business Consequences of Network Congestion The consequences of network congestion extend far beyond the digital realm, wreaking havoc on your entire IT infrastructure. As data packets get caught in the congestion chaos, you’ll see increased latency and sluggish application performance. Network devices, overwhelmed by traffic, might then start dropping packets, causing retransmissions that add more load and exacerbate congestion. Worse, applications can start to time out because they can’t handle the lengthy delays in data transmission, further compounding the problem. You're also likely to notice jitter, or uneven packet delays, that affect real-time applications like VoIP and video conferencing. Network throughput suffers too, with the overall amount of data that can be transmitted over the network taking a nosedive. Ultimately, users soon begin to notice this digital snarl-up, with slow network performance leading to a decline in productivity and potentially a negative impact on your bottom line. Quality of Service (QoS) for critical applications can degrade as they struggle to receive the priority they need amid congestion. The overarching message is that network congestion can have serious repercussions on performance, end-user experience, and business operations as a whole. Maintaining healthy network traffic is about speed, sure, but it’s also about supporting day-to-day operations. 10 Proven Solutions for Fixing Bad Network Traffic This doesn’t need to be the network status quo. Here’s how admins can and should take back control: Bandwidth management and QoS: Implement QoS policies to prioritize important traffic, effectively creating an express lane for your VIP data packets. Use traffic shaping to control data flow and prevent one application from hogging all the bandwidth.Network segmentation: Divide your network into smaller subnets to contain congestion and prevent a problem in one area from spreading like wildfire.Upgrade network infrastructure: Sometimes you just need more oomph. Upgrade your network devices, increase link capacities, and consider SDN for greater flexibility in traffic management. Optimize application performance: Collaborate with your development teams to improve network efficiency via data compression and caching.Implement caching and Content Delivery Networks (CDNs): For frequently accessed data or web content, use caching or CDNs to lighten the load on your primary network and improve data transfer speeds.Regular network performance monitoring and analysis: Keep a watchful eye on your network performance to identify congestion points and proactively address network issues before they spiral out of control.Load balancing: Distribute network traffic across multiple paths or servers to prevent any single point from becoming a bottleneck.Traffic prioritization: Prioritize critical unicast and multicast traffic over less important data flows.Optimize routing: Regularly review and optimize routing protocols and configurations to ensure efficient traffic flow.Firewall optimization: Ensure your firewalls are properly configured and can handle the traffic load; poorly configured or underpowered firewalls can become network bottlenecks. Keeping Data Speeds Up and Bottom Line Impacts at Bay Again, this is more than about speed (or lack thereof), but the impact of bad network traffic and how it can become a serious business problem. The good news is that it doesn’t have to be a digital death sentence for your IT infrastructure. With a combination of smart network management strategies and the right monitoring tools, you can effectively tackle network congestion and keep your network in the fast lane. More

Offline Evaluation of RAG-Grounded Answers in LaunchDarkly AI Configs

By Scarlett Attensil

Generative Engine Optimization: How to Make Your Content Visible to AI

By Sibanjan Das

GenAI Isn't Solving the Problem Most Development Teams Actually Have

By Gaurav Gaur

CORE

Automating Power Automate: How to Ensure Cloud Flows Are Active After Every Pipeline Deployment

You've spent hours — maybe days — building and testing a Dynamics 365 Power Platform solution. Your Azure DevOps pipeline runs clean. The managed solution imports successfully into the target environment. All green. Then the business calls. Nothing is working. The automations aren't firing. You log into Power Automate in the target environment and find the same scene every time: every single cloud flow is turned off. Not broken. Not errored. Just off. And every connection reference is sitting there unresolved, pointing at nothing, waiting for someone to manually wire it up. If your solution has 5 flows, that's annoying. If your solution has 50 or 100 flows, that's a half-day of manual work — clicking into each flow, assigning the connection, saving, turning it on, and moving to the next one. In a team doing frequent releases across multiple environments (Test, UAT, Production), this compounds quickly. It turns what should be a 10-minute deployment into an hours-long chore, introduces human error, and makes your pipeline feel like it only does half the job. This is one of the most common pain points in Power Platform DevOps, and it's almost never solved cleanly out of the box. This article explains exactly why it happens and how to fix it so that flows are on, connections are wired, and the environment is fully operational the moment the pipeline finishes. Why This Happens Understanding the root cause is important because there are actually three separate things that go wrong, and you need to address all three. 1. Flows Are Exported in Whatever State They Were In When a Power Platform solution is exported from a source environment, every cloud flow is embedded in the solution package in its current state. If a flow was turned off in the source environment at the time of export — even briefly, for testing or debugging — it ships in that state. When the managed solution is imported into the target environment, the flow arrives and stays off. There is no automatic activation step built into the standard import process. 2. Connection References and Actual Connections Are Different Things This is the conceptual point that trips up most teams new to Power Platform ALM. A connection is the actual authenticated link to a service — a specific Dataverse instance, an Outlook mailbox, a SharePoint site. Connections are environment-specific, created manually or via admin tools, and they live outside any solution. They should never be part of a solution package. A connection reference is a pointer. It's a solution component that says "this flow uses a connection of type X." The connection reference lives inside the solution, travels with it across environments, and is what the flow binds to at runtime. The connection reference itself has no credentials — it just points to whichever actual connection in the environment is assigned to it. The correct setup is: In the source environment (DEV): The actual connections exist and are assigned to the connection references. The solution contains only the connection references, not the connections themselves. In the target environment (Test, UAT, Production): The actual connections are pre-created by an administrator and given appropriate access. The service principal used by the pipeline to deploy the solution must have read/write access to these connections. When the solution is imported, the deployment settings file maps each connection reference in the solution to the correct pre-existing connection in that environment. If this mapping is not done correctly, flows that depend on unresolved connection references will remain in a draft state after import, regardless of any other settings. 3. The Standard Import Task Does Not Activate Flows Power Platform Build Tools for Azure DevOps includes an ActivatePlugins flag on the import task. Despite what the name implies, this activates Dataverse plugins and custom workflow activities only — it has no effect on Power Automate cloud flows. There is no built-in flag on the standard import task that activates cloud flows. This means that even a perfectly configured import, with all connection references resolved and all tokens substituted, will still leave flows in a deactivated state unless you add an explicit activation step. Prerequisites: What Must Be in Place Before the Pipeline Runs Before the pipeline can solve this problem end-to-end, two things must be true in every target environment. First, the actual connections must already exist. For every service your flows connect to — Dataverse, Outlook, SharePoint, Teams, or any other connector — an administrator must have already created a connection in the target environment. These connections are not part of the solution and should never be included in the solution export. They are environmental infrastructure, created once and maintained independently of deployments. Second, the service principal must have access. The Azure Active Directory app registration used as the service principal for the pipeline (the account that authenticates the import) must be granted access to read and write in the target environment. This includes having sufficient Dataverse security roles and, where applicable, being designated as an owner or co-owner of the connections so that the deployment settings file can map connection references to those connections during import. Once these are in place, the pipeline can take over the rest automatically. The Pipeline Approach The pipeline is split into two phases: a build phase that runs against the source environment and packages the solution, and a release phase that deploys the packaged solution to each target environment. Build Phase The build phase exports the managed solution from the source environment and generates a DeploymentSettings.json file. This file is the key to automating connection reference mapping. It is generated by the PAC CLI from the solution ZIP and contains a structured list of every connection reference and environment variable in the solution. Out of the box, the generated file has empty ConnectionId fields. The build pipeline post-processes this file by replacing those empty fields with placeholder tokens in the format @@token_name@@. For example, a connection reference with the logical name shared_commondataserviceforapps becomes @@shared_commondataserviceforapps@@ in the deployment settings file. The file is then published as part of the build artifact. The critical point is that connection reference logical names often include a random trailing suffix added by the platform (e.g., shared_commondataserviceforapps_8ca1f). The build script normalizes these by stripping the suffix, so the token is deterministic and consistent across builds. Release Phase The release pipeline picks up the build artifact for each environment stage and runs the following sequence: Step 1: Replace Tokens A token replacement task reads DeploymentSettings.json and substitutes each @@token@@ with the corresponding pipeline variable for that stage. The pipeline variables for each stage hold the actual connection IDs of the pre-existing connections in that environment. For example, the Test stage has a variable shared_commondataserviceforapps with the value of the Dataverse connection ID in the Test environment. After this step, the deployment settings file is fully resolved with no remaining placeholders. Step 2: Import Solution The Power Platform Import Solution task imports the managed solution ZIP using the resolved DeploymentSettings.json. This wires up every connection reference to its corresponding connection in the environment automatically, with no manual intervention. Step 3: Activate Flows This is the step that closes the gap. A PowerShell task runs after the import and queries the Dataverse API for all cloud flows in the solution that are not currently active. It then activates each one programmatically. The Activation Script This PowerShell script uses the Dataverse Web API, authenticated with the same service principal credentials used by the rest of the pipeline. It queries specifically for Modern Flow entities (category eq 5) in the target solution and activates any that are in a stopped or draft state. PowerShell # Activate all Power Automate cloud flows in the solution post-import $tenantId = "$(TenantId)" $clientId = "$(CRMClientId)" $clientSecret = "$(CRMClientSecret)" $environmentUrl = "$(CRMEnvironmentUrl)" $solutionName = "$(CRMSolutionName)" # Obtain an OAuth token for the Dataverse API $tokenUrl = "" $tokenBody = @{ grant_type = "client_credentials" client_id = $clientId client_secret = $clientSecret resource = $environmentUrl } $tokenResponse = Invoke-RestMethod -Method Post -Uri $tokenUrl -Body $tokenBody $token = $tokenResponse.access_token $headers = @{ "Authorization" = "Bearer $token" "OData-MaxVersion" = "4.0" "OData-Version" = "4.0" "Content-Type" = "application/json" } $apiBase = "$environmentUrl/api/data/v9.2" # Query for cloud flows (category = 5) in this solution that are not Active (statecode != 1) $queryUrl = "$apiBase/workflows?`$filter=category eq 5 and statecode ne 1" + " and _solutionid_value eq (select solutionid from solutions where uniquename eq '$solutionName')" + "&`$select=workflowid,name,statecode,statuscode" $flows = (Invoke-RestMethod -Uri $queryUrl -Headers $headers).value if ($flows.Count -eq 0) { Write-Host "All flows are already active. Nothing to do." } else { Write-Host "Found $($flows.Count) flow(s) to activate." foreach ($flow in $flows) { $patchUrl = "$apiBase/workflows($($flow.workflowid))" $payload = @{ statecode = 1; statuscode = 2 } | ConvertTo-Json Invoke-RestMethod -Method Patch -Uri $patchUrl -Headers $headers -Body $payload Write-Host "Activated: $($flow.name)" } Write-Host "Done. $($flows.Count) flow(s) activated successfully." } Note on category eq 5: Power Platform stores multiple automation types in the same workflow entity. Category 0 is classic workflows, category 4 is business process flows, and category 5 is Modern Flow (Power Automate cloud flows). The filter ensures only cloud flows are touched. Guarding Against Unresolved Connections A common failure mode is a new connection reference being added to the solution in DEV, the build generating a new @@token@@ for it, but the corresponding pipeline variable not being added to the release stage yet. The import will succeed, but the flow that depends on that connection will remain inactive — and the activation script will fail to activate it because the connection reference is still unresolved. To catch this early, add a validation step before the import that checks for any remaining @@token@@ placeholders in the deployment settings file and fails the pipeline immediately if any are found: PowerShell $settingsPath = "$(System.DefaultWorkingDirectory)/$(Build.DefinitionName)/$(Build.BuildNumber)/DeploymentSettings.json" $content = Get-Content $settingsPath -Raw $unresolved = [regex]::Matches($content, '@@[^@]+@@') | Select-Object -ExpandProperty Value if ($unresolved.Count -gt 0) { Write-Error "Unresolved connection tokens found in DeploymentSettings.json:`n$($unresolved -join "`n")" Write-Error "Add the missing pipeline variables for this stage and re-run." exit 1 } Write-Host "All connection tokens resolved. Proceeding with import." Failing fast here is far better than a silent partial deployment where some flows activate, and others don't. The Result Once this is in place, the deployment experience changes completely. A pipeline run that previously required a human to log into each target environment, open Power Automate, navigate to each flow, assign connections, and manually toggle flows on — a process that scales linearly with the number of flows — becomes fully automated. For a solution with 100 cloud flows deploying across three environments, that might be 300 individual manual actions eliminated per release cycle. The environment is fully operational the moment the pipeline is completed. No follow-up tickets. No forgotten flows. No production incidents because someone missed one. The key insight is that the platform gives you all the pieces — connection references for portability, deployment settings for environment-specific mapping, and the Dataverse API for programmatic activation — but it does not wire them together for you automatically. Once you do, your Power Platform deployments become as reliable and hands-off as any other enterprise application deployment.

By karthik nallani chakravartula

Testing Strategies for Web Development Code Generated by LLMs

Large Language Models (LLMs) can automate the development process by producing a substantial amount of web application code in just a few minutes. Nonetheless, it is important to bear in mind that these models are pattern-based and not deterministic. Work in the domain of AI programming assistants shows that AI-based code often exhibits security vulnerabilities in real-world testing. A study on GitHub's features showed that approximately 40% of the generated code was susceptible to security issues, emphasizing the need for careful testing and scrutiny. In other words, programmers and engineers employ a particular mode of working rooted in software methodology, which enables them to tackle this problem straightforwardly and continually incorporate AI-produced code as it is generated. However, AI speed tends to result in errors being overlooked every now and then. In some instances, project managers allocate more rigorous testing because they have to ensure that what people often call correct code" has become "perfect, functional, and secure code." Every code that is deemed complete has to go through a number of tests, from simple static checkups and unit tests to more sophisticated integration tests, end-to-end tests, automated checks for security breaches, capacity checks, and manual code reviews, to ensure that the delivered software is functionally good enough and meets the security requirements. This article presents different testing methods for LLM systems that create HTML code intended for use on the web. Node.js and React are examples of relevant development frameworks used in such software. As an aid to merging the code branches, a pre-merge checklist is also included, along with recommendations on testing the triggers themselves to ensure that the material does not put the security of the system at risk, as far as it is included in the final body of code. Why AI-Generated Web Code Requires Extra Scrutiny It is commonly agreed that traditional bugs are caused by human error. Humans are the source of bugs, especially if they are involved in the development processes. This is different in the case of AI-generated bugs. AI generates bugs that are meant to fit in the missing context somehow by the problem model, leading to code that may appear to work on certain testing conditions; hence, it may bypass, but when the conditions change, the code does not. This lack of logic will mostly be observed at the borders, including the most sensitive parts, such as authentication mechanisms, actions upon ridiculous requests, handling many things simultaneously, reloading, differences in application versions, or vulnerabilities that were enabled due to an incorrect setting of the security defaults. Security is not just a legal obligation. One study used GitHub Copilot to build the most dangerous code by design without any errors in judgment. The study revealed a non-negligible number of insecure code recommendations, with the wording and context of the instructions playing a significant role in these recommendations. Another study utilizing a more sophisticated methodology with current versions of the software confirmed the drawbacks of generating AI-written code. This highlights the significance of the efforts made by individuals using LLMs to develop web features, emphasizing the need for substantial changes from the existing methods to move away from the 'it works on my machine' mentality. Harmonization initiatives should mainly focus on the inputs and outputs of the code, emphasizing important factors such as code execution across various test scenarios that simulate real-world conditions, as well as sharing knowledge with the machine. Testing Layers for LLM-Written Code The main idea is not to rely on a single test format but to use multiple forms, each targeting different types of 'AI errors.' To detect fundamental problems, static assessments such as linting and type verification can be performed, which can help identify certain issues early on. It is crucial that these issues are identified and addressed promptly, as they are expected to be easy to detect and fix quickly. A good tool that can be used for this is ESLint, as it detects the code patterns in JavaScript, which is also well or very well adaptable to the best coding conventions of your organization. Shell npm init @eslint/config@latest npx eslint src/ According to the official documentation of the ESLint tool, it is preferable to execute the following sequence of commands: npm init @eslint/config@latest and then npx eslint on the required files and folders. It is also worth considering that the ruleset should include those designed for heightened security. For example, there is a security plug-in called eslint-plugin-security that is specifically created to monitor the presence of known security problems in JavaScript and Node.js code. Although there may be instances of information misuse, eslint-plugin-security provides good support for developers. Shell npm i -D eslint-plugin-security Once completed, turn on the appropriate rules in your ESLint configuration, noting that different ESLint setups may require slightly different setup techniques. During testing, attention should be paid to the more elusive aspects of the program, such as logic algorithms, edge cases, and consistency testing of the generated results. One of the strategies that is readily comprehensible and offered by Jest is to write tests and use the expect() construct that includes tools and the toBe mechanism to confirm the results as desired. How-to (Node/JS + Jest): JavaScript // utils/sanitizeSlug.js export const sanitizeSlug = (s) => s.trim().toLowerCase().replace(/\s+/g, "-"); // utils/sanitizeSlug.test.js import { sanitizeSlug } from "./sanitizeSlug"; test("sanitizes slugs", () => { expect(sanitizeSlug(" Hello World ")).toBe("hello-world"); }); A helpful routine for improving an LLM model is to consider assumptions about the input. If the model handles input in the form of “slugs being space-separated”, it must be stated in the code, or there is a danger that this code will lead to bugs during real practice tests. Component Tests: Test React Like a User, Not Like a Compiler React Testing Library's emphasis on usability testing is well recognized because such tests help build confidence in the application's functionality. The guide written for React provides no reassurance based solely on best practices and forces the use of React Testing Library for testing instead. How-to (React + React Testing Library + Jest): JavaScript // LoginButton.jsx export function LoginButton({ onLogin }) { return <button onClick={onLogin}>Log in</button>; } // LoginButton.test.jsx import { render, screen, fireEvent } from "@testing-library/react"; import { LoginButton } from "./LoginButton"; test("calls onLogin when clicked", () => { const onLogin = jest.fn(); render(<LoginButton onLogin={onLogin} />); fireEvent.click(screen.getByText("Log in")); expect(onLogin).toHaveBeenCalledTimes(1); }); Integration Tests: Verify Contracts Between Modules and Services Usually, AI-generated code fails to pass the integration testing. This is because the AI model was imprecise in defining the contract, such as response structures, status codes, operation of authentication middleware, database connection procedures, and similar details. When it comes to Node.js applications, many developers would opt for Supertest, which is an extension and support of SuperAgent, which provides HTTP assertion support for testing Node HTTP servers. How-to (Express + Supertest): JavaScript import request from "supertest"; import app from "../app"; test("GET /health returns ok", async () => { await request(app) .get("/health") .expect(200); }); E2E Tests: Make the Browser Prove the Feature Works E2E tests are capable of uncovering defects when described types do not: navigation, live view, data storage, HTTP cookies, access restrictions, and, what is colloquially known as, “working when a user just does whatever.” Cypress strives to be more than just a solution for end-to-end testing. Its documentation contains examples that help you write an end-to-end test from scratch. In contrast, action + assertion chains are more important from the perspective of the playwright, with additional functions for waiting inside elements, which greatly alleviates the necessity to use sleep states only for checks. How-to (install Cypress): Shell npm install cypress --save-dev npx cypress open Those commands are straight from Cypress installation docs. How-to (Playwright E2E test snippet): JavaScript import { test, expect } from "@playwright/test"; test("login redirects to dashboard", async ({ page }) => { await page.goto("/login"); await page.getByLabel("Email").fill("[email protected]"); await page.getByLabel("Password").fill("password123"); await page.getByRole("button", { name: "Log in" }).click(); await expect(page).toHaveURL(/dashboard/); }); Playwright documents this general “do actions, then assert the state” structure, and notes its auto-waiting behavior. Security Testing: Treat “Generated Code” as a Risk Multiplier Use dependency scanning and static analysis to improve the security of web applications. When it comes to reviewing the endpoints and UI flows (auth, access control, injection, etc.) that are generated by the AI model, one may refer to the list of the OWASP Top 10 guidelines as a checklist. Dependency Scanning (Snyk + npm audit): Snyk test checks for open-source vulnerabilities and license issues. npm audit exits non-zero on found vulnerabilities (ideal for CI gates). How-to (Snyk): Shell Copy snyk test How-to (Snyk Code SAST): Shell Copy snyk code test Snyk code test, called Snyk, performs Static Application Security Testing against the source code. Although not required, it is advised to activate the CodeQL extension in GitHub Actions in order to conduct code scanning. Performance Testing: “It Works” Is Not “It Survives Traffic” AI-scripted functionalities are also known to cause performance detriment, either through the introduction of additional DB access calls or multiple N+1 queries; thus, it is advisable to smoke load test critical routes. k6 has proper documentation on how to write and execute such tests. How-to (k6 smoke test): JavaScript import http from "k6/http"; import { check, sleep } from "k6"; export default function () { const res = http.get("https://example.com/api/health"); check(res, { "status is 200": (r) => r.status === 200 }); sleep(1); } Both k6 and Artillery are equipped with documentation on how to formulate HTTP requests and set up tests. Artillery can be installed either through npm or npx to execute tests. Snapshot and Golden Master Testing: Use Sparingly, Review Aggressively Creating snapshot tests is useful for monitoring changes in different versions of the app that should not be changed quietly (such as HTML email templates, stable fragments of the user interface, etc.). The Jest snapshot file requires verification of snapshot outputs alongside code modifications, which are then reviewed to prevent misunderstandings; Jest compares future runs with past snapshots and reports errors if discrepancies are found. How-to (Jest snapshot): JavaScript import renderer from "react-test-renderer"; import { Banner } from "./Banner"; test("banner matches snapshot", () => { const tree = renderer.create(<Banner />).toJSON(); expect(tree).toMatchSnapshot(); }); The Ultimate Golden Master Hack for LLM Code Code Review: The Essential Human Layer Among other things, code review is an important step, as it allows for the asking of questions such as “Is this approach valid?” or even “Is this in line with the architecture?” The Secure Software Development Framework (SSDF) by the National Institute of Standards and Technology (NIST) exists because most Software Development Life Cycles (SDLCs) tend to ignore security at the source. It promotes the incorporation of safe behavioural patterns into the existing cycle of activities. Such mechanisms as code review and other process controls remain significant as they are aimed at human beings and not machines. For AI-generated PRs, code review should explicitly check: authz/authn boundariesinput validation and encodingerror handling and loggingdependency choices“magic” regexes and crypto (danger zone) Testing the Prompt and Validating LLM Outputs A great number of teams skip over the small fact that the prompt is itself a code construct. Since the wording of the prompt can generate particular behavior, it is imperative that it is put to the test, as is done for APIs. Workflow: Define prompt contract: Templates with stack, versions, constraints, and testing requirements.Request tests: Generate and run unit/integration tests before trusting features.Create regression suite: Store prompts, invariants, and run tests/scripts.Use checklists: Keep prompts as review checklists for each PR. Before merging AI-generated code, require: lint/type checks pass (ESLint) unit + integration pass (Jest + Supertest patterns) at least one E2E flow passes (Cypress/Playwright) dependency scan passes or is triaged (Snyk / npm audit) Tool Comparisons and When to Use What No tool here is used inappropriately since Jest is designed for unit testing; React Testing Library, for testing those aspects of unit that show nuances to end users; Supertest – for HTTP server testing; Cypress alongside Playwright – for E2E testing; Snyk and SAST are used for scanning dependencies; GitHub CodeQL along with k6 and Artillery are used for scanning and testing codes as well as for load testing, respectively. Commonly Used Commands ESLint: npm init @eslint/config@latest then npx eslint src/Cypress: npm install cypress --save-dev then npx cypress openSnyk Dependency Scan: snyk testSnyk SAST: snyk code testnpm Dependency Audit: npm auditk6: Write a script, then run with k6Artillery: npm install -g artillery@latest (or npx artillery@latest), then artillery run my-test.yml CI automation with test gates The main purpose of the layers is to verify that they are genuine and feasible. GitHub Actions are automation scripts written in YAML syntax that can contain steps and jobs. According to the instructions provided in the official guide for Node.js Actions by GitHub, the following three standard procedures must be followed: installing Node, injecting dependencies into the environment, and testing. Minimal GitHub Actions workflow example YAML name: CI on: pull_request: push: branches: [main] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 22 cache: npm - run: npm ci - name: Lint run: npx eslint . - name: Unit + integration run: npm test - name: Dependency scan run: | npm audit snyk test env: SNYK_TOKEN: ${{ secrets.SNYK_TOKEN } - name: SAST scan (optional but strong) run: snyk code test env: SNYK_TOKEN: ${{ secrets.SNYK_TOKEN } The actions/setup-node install and cache the Node in workflows. The playwright furnishes CI hints and assists in creating GitHub Actions workflows. GitHub documentation goes into depth on how CodeQL operates within GitHub actions workflow. Before You Merge: Checklist, Pitfalls, and Mitigations Pre-merge checklist for AI-generated web code Lint passes: Ensure lint passes and security lint is reviewed.Unit tests: Cover model assumptions (edge cases, input shapes).Integration tests: Confirm API contracts (status codes, auth, schema).E2E tests: Cover at least one critical user journey (e.g., login).Dependency scan: Run Snyk or npm audit and triage findings.SAST: Run Snyk code test or CodeQL for risky changes.Snapshot diffs: Review snapshot diffs like code; no auto-updates.CI checks: Require CI checks before merge; no exceptions. Common Pitfalls (and How to Avoid Them) Passing tests: Test real user behavior, as React Testing Library suggests.Over-mocking: Avoid mocking everything—use a real test DB.Flaky E2E tests: Use Playwright to reduce timing issues.Snapshot testing: Don’t auto-update snapshots; review them.Skipping security scanning: Include security checks for all PRs, big or small. Closing thought The most important responsibility assigned to an AI for web generation is to ensure that validation becomes unobtrusive and a straightforward activity. Individuals tend to trust processes that are uniform and happen repeatedly. Although LLMs streamline the process of generating the coding, it is the exhaustive examination that makes the process of its correctness more efficient. Those who achieve the desired outcomes are the ones who do all the above: set up CI-enforced gates, apply testing at different levels, and treat prompts as part of the design.

By Sandesh Basrur

From Open SQL to CDS Views: Rewriting SAP Data Access for Performance at Scale

Modern SAP landscapes running on SAP HANA demand a rethink of how ABAP programs access data. Traditional Open SQL queries embedded in ABAP code have served developers for decades, but at large data volumes, they can become performance bottlenecks. SAP’s introduction of Core Data Services (CDS) views offers a new paradigm: push more work to the in-memory database and retrieve only what’s needed. Traditional ABAP Data Access With Open SQL Open SQL is the standard SQL interface in ABAP that allows developers to query the underlying database in a database-agnostic way. For example, an ABAP report might join two tables and fetch results like this: Plain Text SELECT bkpf~bukrs, bkpf~belnr, bkpf~gjahr, bseg~koart, bseg~wrbtr, bseg~shkzg FROM bkpf INNER JOIN bseg ON bkpf~bukrs = bseg~bukrs AND bkpf~belnr = bseg~belnr AND bkpf~gjahr = bseg~gjahr INTO TABLE @DATA(it_fi_docs) WHERE bkpf~bukrs = '1000' AND bkpf~gjahr = '2023' AND bseg~koart = 'K'. This Open SQL example joins the BKPF and BSEG tables to retrieve financial documents. Open SQL sends such queries to the database, and on SAP HANA, the heavy lifting of the join and filtering is done in-memory on the DB server. The result is then brought back to the ABAP application server. However, the challenge with Open SQL at scale comes when ABAP code handles large data sets or complex logic in the application layer. Common performance issues in legacy ABAP include: Too much data transferred: Selecting wide tables or not filtering enough leads to heavy network and memory usage. Best practice is to filter and aggregate in the query to keep the result set small and transfer only the required columns (avoid SELECT *). Multiple round-trips: Performing calculations with many small queries or loops causes repeated DB calls. It’s more efficient to push joins and subqueries into one SQL if possible. Each context switch adds overhead. Application-side processing: If business logic runs on millions of records in ABAP, the application server CPU becomes the bottleneck. The database could perform these operations faster, set-wise. In summary, while Open SQL can express complex data retrieval, ABAP developers traditionally had to be very disciplined in query design to avoid performance issues at scale. This paved the way for a new approach leveraging SAP HANA’s strengths. The Case for Change: Code-to-Data Paradigm SAP HANA’s in-memory, columnar architecture enables it to execute aggregations, filters, and joins extremely fast at the database level. To exploit this, SAP advocated the code-to-data paradigm. push computations down to the database rather than pulling data up to the code. Rewriting data access using CDS views is a key technique in this paradigm, alongside others like AMDP. By offloading heavy operations to the DB, we minimize data transfer and let HANA’s optimized engines handle crunching the data. For example, instead of reading a full table and then filtering in ABAP, you pass WHERE conditions so the DB does it. Instead of multiple selects and merges in ABAP, you perform a JOIN or a subquery in one shot. Another driver for change is SAP’s new data models in S/4HANA. Many classic transparent tables were replaced by HANA-optimized structures or compatibility views. Custom ABAP code written for ECC often breaks or needs adaptation for S/4HANA’s simplified data model. In these cases, SAP often provides CDS views as the new interface to data. As one DZone article notes, engineers moving to S/4 must switch to the S/4 equivalents to replace old data access logic. In short, adopting CDS views is not only about performance but also about aligning with SAP’s modern architecture. Introducing ABAP Core Data Services (CDS) Views ABAP CDS is a framework to define rich data models directly on the database, using a declarative syntax in ABAP Development Tools (ADT). A CDS view is essentially a view in the HANA database, defined via an ABAP DDL statement. For example, here’s a simple CDS view definition joining two tables: Plain Text @AbapCatalog.sqlViewName: 'ZDEMO_FLIGHTS' define view ZFlightInfo as select from spfli inner join scarr on spfli.carrid = scarr.carrid { scarr.carrname as carrier, spfli.connid as flight, spfli.cityfrom as departure, spfli.cityto as arrival } This CDS view ZFlightInfo performs the same join between SPFLI and SCARR as an equivalent Open SQL join would. In fact, you could copy-paste the join logic from ABAP into the CDS definition with minor syntax changes. After activating this view in ADT, the system creates a database view in HANA. ABAP programs can then consume the CDS view just like a table: SQL SELECT * FROM ZFlightInfo INTO TABLE @DATA(it_flights) ORDER BY carrier, flight. The result set it_flights from the CDS view will be identical to what an Open SQL join would produce for the same input tables. Under the hood, both approaches result in the database executing a similar SQL SELECT. So, why use CDS? The benefits become evident as complexity grows: Reusability and model centralization: CDS definitions are stored in the ABAP Dictionary and can be reused by any number of programs or even other CDS views. Instead of writing the same joins or calculations in multiple ABAP reports, you define them once in a CDS view. SAP recommends using a CDS view when you need to retrieve data from multiple related tables, because it involves the least amount of coding and can be reused in multiple objects. In large-scale systems, this consistency is key to a single source of truth for that piece of data logic. Rich expression and metadata: CDS supports advanced SQL features and built-in functions. You can define calculated fields, aggregations, and even leverage specialized HANA capabilities within the view. CDS also allows adding annotations, making the data model self-descriptive. Performance through pushdown: By moving logic into the CDS (and thus into SQL on the database), you reduce the workload on the ABAP layer. The database can apply filters, joins, and computations in parallel, using its optimized engines. Only the final result is sent back to ABAP. Secure and controlled access: CDS views integrate with the SAP authorization concept, ensuring consistent enforcement of business security rules at the data model level, rather than scattering checks in ABAP code. This means performance benefits without sacrificing governance. Tutorial: Converting an Open SQL to a CDS View (with Code) To solidify the concept, let’s walk through a simple conversion. Imagine we have an ABAP report that needs to list flight routes with the airline name. In classic ABAP, you might do this with an inner join in Open SQL as shown below: Open SQL Approach (Legacy ABAP code): Plain Text DATA: lt_flights TYPE TABLE OF zflight_info. "Structure for results SELECT scarr~carrname AS carrier, spfli~connid AS flight, spfli~cityfrom AS departure, spfli~cityto AS arrival FROM spfli INNER JOIN scarr ON spfli~carrid = scarr~carrid INTO TABLE @lt_flights ORDER BY carrname, connid. This code joins SPFLI with SCARR and populates an internal table lt_flights. It works, but the logic is embedded in the program. Now, suppose we want to reuse this same join in multiple places. We can refactor it into a CDS view: CDS View Approach: Define the view in ABAP DDL (e.g., in Eclipse ADT): Plain Text @AbapCatalog.sqlViewName: 'ZFLIGHTINF' @AccessControl.authorizationCheck: #NOT_REQUIRED define view ZFlightInfo as select from spfli inner join scarr on spfli.carrid = scarr.carrid { scarr.carrname as carrier, spfli.connid as flight, spfli.cityfrom as departure, spfli.cityto as arrival } We give the view a name ZFlightInfo. Note that this is almost identical to the Open SQL, just expressed as a view definition. Once activated, the CDS is available system-wide. Now our ABAP report can simply do: Plain Text SELECT * FROM ZFlightInfo INTO TABLE @lt_flights ORDER BY carrier, flight. The result in lt_flights will be the same. We have effectively decoupled the data retrieval logic from the program and centralized it in the DB layer. This not only improves reuse; in a HANA system, it can also improve performance. The database can better optimize a single persistent view than ad-hoc SQL scattered in code. And if we needed to adjust the join or add a new field. Performance Considerations and Best Practices When rewriting Open SQL to CDS, ABAP developers should keep a few important considerations in mind: Measure, don’t guess: Simply converting an Open SQL to a CDS view doesn’t magically speed up the query if it was already efficient. As noted earlier, for straightforward SELECTs or joins, the performance will be equivalent in many cases. The real gains come when you use CDS to do more complex processing in one go. Always use tools like ST05 SQL trace or HANA’s PlanViz to ensure the new design is actually optimal. The execution plan is what matters, not whether you wrote it in Open SQL or CDS. Avoid over-complex views: It’s possible to go overboard with stacking CDS views on top of each other. While layering is good for separation of concerns, too many nested views or excessive use of associations can lead to very complex SQL at runtime. This can confuse the optimizer or prevent predicate pushdown. Be wary of heavy calculations in a single CDS. If performance suffers, consider alternatives like ABAP Managed DB Procedures (AMDP) for really complex logic or break the problem down differently. Select only what you need: Just as with Open SQL, a CDS view should be designed to return only necessary fields and records. Don’t define a CDS with SELECT * from a wide table list the needed fields. This ensures consumer queries aren’t unknowingly pulling extra data. One common pitfall is using CDS to expose an entire table with all columns, which defeats the purpose. Instead, tailor views to use cases or use parameters in CDS to filter data. Use CDS features wisely: Leverage CDS capabilities like aggregations, calculated fields, and unions to eliminate extra work in ABAP. Reuse and consistency: Replace multiple Open SQL implementations of the same logic with a single CDS. Not only does this reuse improve maintainability, but it also means the database might handle the unified load more efficiently. SAP itself follows this approach in S/4HANA with the Virtual Data Model, hundreds of CDS views that serve as the source for Fiori apps and reports, rather than raw table access. By moving to CDS, you align your custom code to the same philosophy. Conclusion Rewriting data access from Open SQL to CDS views is a strategic move for ABAP developers aiming to maximize performance at scale. By pushing more logic to the SAP HANA database, we take full advantage of its in-memory speed and parallel processing. CDS views enable complex data gathering in one shot, reduce the load on the application server, and provide a modular, reusable data model for your SAP applications. That said, an engineer must also approach CDS with a critical eye, understanding the execution plan and ensuring that moving to CDS truly improves the situation, rather than blindly adding abstraction. Advanced ABAP development is about choosing the right tool for the job. In the case of data-intensive operations, CDS views have proven to be a powerful tool, aligning with SAP’s modern direction and delivering robust performance at scale. By rewriting your data access with CDS and following best practices, you can future-proof your ABAP code for the HANA era, achieving faster results and a cleaner, more sustainable codebase for the long run.

By Deepika Paturu

The Cross-Lingual RAG Problem Nobody Is Talking About

The Benchmark Trap The retrieval-augmented generation (RAG) ecosystem has matured remarkably fast. Vector databases are production-grade, embedding models are cheaper than ever, and retrieval pipelines are being deployed across healthcare, finance, legal, and education systems worldwide. Every major benchmark shows impressive numbers. Almost every major benchmark is in English. This is not a minor oversight. It is a structural blind spot that has allowed a critical class of failures to accumulate in production systems largely undetected. When your evaluation dataset is monolingual and your deployment is multilingual, you are not measuring what you think you are measuring. The gap between benchmark performance and real-world performance for non-English users is not a rounding error — it is, in documented cases, up to 29% accuracy degradation for non-English queries compared to equivalent English ones. That number comes from Oracle AI researchers who studied RAG consistency across languages in enterprise deployments. Twenty-nine percent. In a medical context, that is not a metric. That is a patient safety issue. Where Exactly Does Cross-Lingual RAG Break? The failure is not in one place. It cascades across all three stages of the RAG pipeline, which makes it particularly difficult to diagnose and fix. At Retrieval Most embedding models used in production RAG systems are trained predominantly on English corpora. When a Tamil or Arabic query is embedded, it enters a vector space whose geometry was shaped by English semantics. The nearest neighbors retrieved may appear topically related but carry subtle semantic misalignments that compound downstream. Amazon AGI's XRAG benchmark, published in 2026, was one of the first systematic evaluations of this failure mode. Their findings were stark: in monolingual retrieval settings, where an English knowledge base serves non-English queries, all evaluated models struggled with response language correctness. The system retrieved the right document. It still got the answer wrong. At Augmentation The naive fix — retrieve documents in the user's language alongside English documents and concatenate them into the context — introduces a different problem. A French document and a Hindi document about the same topic may express subtly different facts, use different cultural reference points, or carry implicit contradictions that the model has no mechanism to resolve. Concatenation without alignment is not multilingual RAG. It is multilingual noise. At Generation This is the most insidious failure mode. Research has consistently shown that large language models tend to reason internally in English even when processing non-English inputs. The model receives a Tamil query, retrieves relevant context, and then effectively thinks in English before generating a Tamil response. Cultural grounding, local conventions, and contextual meaning are lost at the final and most consequential step. The result is a response that may be grammatically correct in Tamil but conceptually rooted in English assumptions — wrong units of measurement, unfamiliar care protocols, culturally inappropriate framing. The Research Is There. The Attention Is Not. A small but growing body of research is directly addressing this problem. It deserves far more attention than it is currently receiving in mainstream AI engineering conversations. XRAG (Amazon AGI, 2026) introduced one of the first dedicated benchmarks for cross-lingual RAG evaluation, covering monolingual and multilingual retrieval scenarios with relevancy annotations per retrieved document. Their finding that cross-lingual reasoning — not just language generation — is the core challenge reframes the problem in an important way. This is not a translation problem. It is a reasoning problem. CroSearch-R1 (Beijing Jiaotong University/Université de Montréal, SIGIR 2026) proposed using reinforcement learning, specifically Group Relative Policy Optimization (GRPO), to dynamically align multilingual knowledge during retrieval. Rather than treating documents in different languages as competing contexts, their framework integrates them as complementary evidence. Results showed measurable improvements in cross-lingual RAG effectiveness across multiple language pairs. CrossRAG (University of Edinburgh, EACL 2026) took a different approach — translating retrieved documents into a common language before generation rather than translating the query before retrieval. Their experiments showed that this document-side translation strategy significantly outperforms query-side translation, particularly for low-resource languages, because it preserves the semantic richness of retrieval while giving the generation model a consistent linguistic context to reason over. BordIRLines (ACL 2025) introduced a dataset of territorial disputes across 49 languages to study cross-lingual RAG robustness in culturally sensitive scenarios. Their finding that retrieving multilingual documents actually improves response consistency over monolingual retrieval — when done correctly — is an important signal that the solution lies in better multilingual architecture, not in defaulting to English-only retrieval. Together, these papers paint a clear picture: the problem is real, measurable, and solvable. What is missing is the engineering community treating it as a first-class concern. Who Is Actually Affected The framing of this as a technical NLP problem undersells its human stakes. Consider the populations for whom English-centric RAG is not an inconvenience but a genuine barrier: A patient in rural Tamil Nadu queries a hospital AI system about post-surgery medication. A student in rural Nigeria is trying to use an AI tutoring system to access global research in Yoruba. A refugee querying a legal AI system about asylum rights in their native Dari. A farmer in rural India is asking an agricultural advisory AI about crop disease treatment in Marathi. In every one of these cases, a RAG system that was benchmarked at 90%+ accuracy in English may be operating at 60-70% accuracy in the language that actually matters to the user. The people least able to absorb the consequences of AI errors are the ones most exposed to them. This is not an edge-case population. Over 6.5 billion people speak a language other than English as their primary language. The majority of the world is the edge case in most RAG deployments. What Good Cross-Lingual RAG Looks Like The research points toward a few clear architectural principles for building RAG systems that work equitably across languages. Shared semantic embedding spaces over language-specific ones. Models like mE5, LaBSE, and multilingual-E5-large represent meaningful progress here — they map semantically equivalent content across languages into nearby regions of vector space, reducing the retrieval gap for non-English queries without requiring query translation. Explicit cross-lingual knowledge alignment rather than naive concatenation. The CroSearch-R1 approach of using RL to integrate multilingual evidence as complementary knowledge is a significant step forward. The goal is a retrieval-augmented context that is linguistically unified before the generation model ever sees it. Document-side translation over query-side translation when translation is necessary at all. CrossRAG's findings suggest that translating retrieved documents into a common language preserves more semantic fidelity than translating the user's query into English. This is counterintuitive but empirically supported. Culture-aware generation as a design goal, not an afterthought. Language and culture are not separable. A RAG system that generates linguistically correct but culturally inappropriate responses has not solved the problem — it has reframed it. A Proposal Worth Exploring The building blocks for genuinely equitable cross-lingual RAG exist today. What does not yet exist is an intentional, end-to-end architecture that assembles them with language equity as a first-class design principle rather than a post-hoc consideration. We call this architectural vision PolyRAG — a framework that coordinates multilingual semantic retrieval, reinforcement learning-based cross-lingual knowledge fusion, and culture-aware generation into a unified pipeline. The goal is not to make RAG work slightly better in non-English languages. It is to eliminate the architecture-level reasons why it fails in the first place. Each of the three components draws from independently validated research. What remains is the engineering work of intentionally combining them, rigorously benchmarking them across low- and high-resource language pairs, and releasing the results openly so the broader community can build on them. The Conversation We Should Be Having The RAG community has done extraordinary work optimizing retrieval latency, chunk strategies, reranking approaches, and hallucination reduction. Almost all of it assumes English. The question worth asking in 2026 is simple: what would RAG look like if we designed it for everyone from the start?

By Janani Annur Thiruvengadam

CORE

Jakarta NoSQL: Why JPA Is Not Enough for the AI Era

The most effective way to present this idea is to begin with the challenge architects face: AI has transformed the persistence landscape. Enterprise applications were once built almost exclusively on relational databases, making JPA a keystone of Jakarta EE. Today, modern systems use a mix of relational databases, document stores, caches, graph engines, and increasingly, vector databases that support semantic search, retrieval-augmented generation (RAG), and AI-powered applications. Polyglot persistence is now the industry standard. While Jakarta EE standardized relational persistence through JPA, it still lacks a vendor-neutral standard for non-relational persistence. This gap forces developers to rely on fragmented, proprietary solutions, creating barriers to portability, productivity, and innovation. The rise of AI makes this gap critical. Vector databases are now essential to intelligent systems, supporting semantic search, embeddings, and contextual retrieval. For Jakarta EE to remain the leading enterprise Java platform in the AI era, it must offer a standardized approach to NoSQL persistence, as it did for relational databases. Jakarta NoSQL is not just another specification; it constitutes a strategic investment in the ecosystem's future. By offering a familiar programming model, reducing vendor lock-in, and integrating with AI workloads, Jakarta NoSQL ensures that Jakarta EE remains relevant and competitive for the next generation of enterprise applications. NoSQL in the AI Era: Understanding the Modern Data Landscape For years, enterprise data persistence focused on relational databases. Systems relied on tables, rows, foreign keys, and SQL, making relational technology the standard for business applications. While still essential, modern architectures now use polyglot persistence, where multiple database types coexist, each satisfying specific requirements. Today, NoSQL refers to a family of database paradigms, each engineered for specific workloads and architectural needs, rather than just document databases. Key-value databases store data as key-value pairs, enabling fast lookups and low latency. Typical uses include caching, user sessions, feature flags, and temporary application state.Document databases store data as structured documents, such as JSON or BSON. They are effective for applications having hierarchical or evolving schemas, including web applications, e-commerce platforms, and content management systems.Column-family databases organize data by columns instead of rows, supporting high write throughput and horizontal scalability. They are used for IoT telemetry, event logging, analytics, and large-scale distributed systems.Graph databases model entities and relationships as nodes and edges. This structure is ideal for social networks, fraud detection, recommendation engines, dependency analysis, and knowledge graphs in which relationships are critical.Vector databases store high-dimensional embeddings from machine learning models and large language models (LLMs). They enable semantic search, similarity matching, retrieval-augmented generation (RAG), recommendation platforms, and other AI-driven features via understanding meaning instead of exact text matches.Time-series databases specialize in timestamped data that changes over time. They are used for observability, monitoring, financial markets, industrial sensors, and operational metrics where high-performance temporal data storage and analysis are essential. These database types often coexist within the same architecture. Modern applications may use PostgreSQL for transactions, Redis for caching, MongoDB for documents, Neo4j for relationships, InfluxDB for telemetry, and a vector database like Milvus, Pinecone, or Weaviate for AI-powered search and retrieval. This approach, known as polyglot persistence, is now standard in enterprise systems. The industry has embraced this shift. The Stack Overflow Developer Survey shows that while relational databases still dominate enterprise workloads, NoSQL technologies are now standard tools for developers. Technologies like Redis, MongoDB, and Elasticsearch are used alongside PostgreSQL and MySQL. Organizations no longer choose between SQL and NoSQL; instead, they combine multiple persistence technologies to leverage their strengths. Polyglot persistence is now the baseline for modern software systems. Vector databases are especially important among NoSQL categories, as they are basic to modern Artificial Intelligence systems. In contrast to traditional databases that store explicit business data, vector databases store numerical representations called embeddings. Generated by machine learning models, these embeddings encode the semantic meaning of words, documents, images, or other content as mathematical vectors. This enables software to search and retrieve information based on meaning rather than exact text matches. The distinction between lexical and semantic search illustrates the significance of vector databases. For example, a traditional SQL search for “Pet” returns records with that exact term, such as “Pet Shop,” but ignores related expressions like “Dog” or “Puppy.” Semantic search, by comparing embeddings, retrieves documents about dogs, puppies, or animal companions because it recognizes their semantic relationship. The search engine matches meaning, not just syntax. This function is vital for modern AI architectures. Large language models do not process relational tables directly; they use embeddings and contextual connections between concepts. Systems such as retrieval-augmented generation (RAG), enterprise knowledge search, recommendation engines, and intelligent assistants depend on similarity searches across millions of vectors. While relational databases can support some vector operations through extensions, vector databases are purpose-built for these workloads, offering optimized indexing and similarity algorithms for large-scale semantic retrieval. As AI adoption grows, vector databases are becoming a strategic component of enterprise architecture. Appreciating the importance of NoSQL, several Java ecosystems have developed their own solutions. Spring offers independent projects like Spring Data MongoDB, Spring Data Redis, and Spring Data Cassandra. These integrations provide a productive programming model but are tightly coupled to the Spring ecosystem. Quarkus supports NoSQL persistence through Panache and database-specific integrations, emphasizing developer productivity and cloud-native deployment. Micronaut Data supports several NoSQL engines, using compile-time code generation and ahead-of-time processing to improve performance and reduce execution overhead. While these solutions are effective, they remain framework-specific rather than platform standards. Developers switching frameworks encounter different APIs, abstractions, annotations, and operational models, even when solving similar persistence challenges. Jakarta EE addressed this for relational persistence with Jakarta Persistence (JPA), delivering a standardized, vendor-independent programming model. As NoSQL technologies expand and AI workloads more and more depend on vector databases, the lack of a vendor-neutral NoSQL standard is a significant gap in the Jakarta ecosystem. The Java Standardization Journey The need for a standardized NoSQL solution in the Java ecosystem has been discussed for years. During the Java EE era, several proposals tried to integrate non-relational databases into the enterprise platform. As NoSQL technologies grew in popularity throughout the 2010s, developers anticipated a dedicated specification to accompany traditional enterprise APIs at JavaOne conferences. Despite clear demand, no such initiative emerged within Java EE. The platform remained focused on relational persistence via JPA, leaving NoSQL adoption to rely on vendor-specific libraries and framework integrations. The transition of Java EE to the Eclipse Foundation provided an opportunity to address this challenge. Instead of waiting for a platform-level solution, the community launched Eclipse JNoSQL, an open-source project supplying a unified programming model for NoSQL databases. Drawing on JPA's success, Eclipse JNoSQL introduced mapping annotations, repositories, templates, and communication APIs that support document, key-value, column-family, and graph databases. The project showed that a consistent developer experience could be attained without compromising each database model's unique features. As Jakarta EE matured, Eclipse JNoSQL became the foundation for a new standardization effort: Jakarta NoSQL. Jakarta NoSQL was the first persistence specification created entirely within the Jakarta EE process. Unlike earlier specifications that migrated from Java EE, Jakarta NoSQL was conceived, developed, and released under the Eclipse Foundation governance model. It was among the first to complete the full Jakarta Specification Process from inception to release. Jakarta NoSQL's impact extended beyond its initial scope. During development, the expert group identified a common challenge for both relational and non-relational databases: developers needed a consistent repository abstraction independent of the underlying persistence engine. This led to the creation of a separate specification, Jakarta Data. The need to standardize NoSQL access patterns directly influenced the development of Jakarta Data's repository-oriented programming model, which applies across multiple persistence technologies. The relationship between these specifications highlights Jakarta NoSQL's broader influence on the Jakarta EE ecosystem. Jakarta NoSQL focuses on mapping and interacting with non-relational databases, while Jakarta Data delivers a unified repository abstraction for both relational and NoSQL implementations. Together, they significantly reduce fragmentation in enterprise persistence. This evolution continued beyond Jakarta Data. The drive to standardize modern persistence requirements has inspired new specifications, such as Jakarta Query, which aims to deliver a portable, type-safe, and expressive query language for various persistence technologies. As the Jakarta ecosystem grows, Jakarta NoSQL acts as a key milestone. It addressed the long-standing absence of a NoSQL standard and helped lay the foundation for the next generation of persistence specifications within Jakarta EE. Jakarta NoSQL: Built for NoSQL, Not Adapted to It When architects consider standardizing NoSQL development in Jakarta EE, a common question arises: why not extend Jakarta Persistence (JPA) to support NoSQL databases? JPA has long provided a unified programming model for relational databases in the Java ecosystem. The answer is based on a core architectural principle: tools should be optimized for their intended purpose. The first challenge is that JPA was designed specifically for relational databases, relying on concepts like tables, columns, joins, foreign keys, and transactional consistency. These are not simply implementation details but core elements of the specification. Forcing document, graph, key-value, or vector databases into this model creates friction and limits the use of each database’s native features. The second challenge is that NoSQL systems behave fundamentally differently. Graph databases perform path traversals, document databases store nested structures without normalization, key-value databases focus on fast lookups, and vector databases handle similarity calculations. These systems also differ in consistency, transactions, query languages, indexing, and scalability capabilities. Representing all these paradigms through a single relational abstraction leads to compromises. The third challenge is the importance of specialization. As Abraham Maslow noted, “if the only tool you have is a hammer, it is tempting to treat everything as if it were a nail.” Relational databases are effective, but not ideal for every persistence need. Semantic search, graph traversal, and high-volume telemetry storage are not relational problems. Applying a relational abstraction to all database types runs the risk of losing the unique optimizations each technology provides. Examine the analogy of transportation: cars, boats, submarines, and airplanes all address transportation but are specialized for different environments. Forcing them to use the same controls would result in mediocrity across all. Similarly, a single persistence abstraction may remove the features that make each database effective. Therefore, Jakarta NoSQL does not extend JPA beyond its intended scope. Instead, it offers a dedicated persistence model for non-relational databases, while continuing to maintain the familiar developer experience that contributed to JPA’s success. A key design goal of Jakarta NoSQL is to reduce mental effort for enterprise Java developers. Teams experienced with JPA should find the specification immediately approachable, as Jakarta NoSQL intentionally uses familiar terminology and concepts from the Jakarta EE community. Developers will encounter annotations like @Entity, @Id, and @Column, enabling a smooth transition from relational to non-relational persistence. Java @Entity public class Car { @Id private Long id; @Column private String name; @Column private CarType type; } At first glance, this entity closely resembles a JPA entity, which is intentional. However, the underlying implementation is fundamentally different. Jakarta NoSQL is built to support schema flexibility, embedded structures, nested documents, and database-specific storage models. This approach is reflected throughout the API. Instead of requiring developers to oversee low-level driver details, Jakarta NoSQL offers a high-level programming model via the Template API. Java @Inject Template template; Car ferrari = Car.builder() .id(1L) .name("Ferrari") .build(); template.insert(ferrari); List<Car> sports = template.select(Car.class) .where("type").eq(CarType.SPORT) .orderBy("name") .result(); The objective mirrors JPA’s original mission: permitting developers to focus on domain models and business logic, rather than serialization, connection management, or vendor-specific APIs. This foundation shaped Jakarta NoSQL 1.0. The initial release introduced the mapping layer, CDI integration, repository support, template operations, and standardized endpoints for four major NoSQL categories: Document databasesKey-value databasesColumn-family databasesGraph databases Jakarta NoSQL 1.0 showed that a unified Java programming model can respect the particular characteristics of each database family. Jakarta NoSQL 1.1 continued this evolution. While version 1.0 focused on mapping and persistence, version 1.1 expanded querying capabilities through integration with Jakarta Query. A key addition is support for parameterized queries, letting developers to safely bind parameters instead of manually constructing query strings. Java List<Car> cars = template.query( "FROM Car WHERE type = :type") .bind("type", CarType.SPORT) .result(); Version 1.1 also introduces projection support, allowing applications to retrieve lightweight views instead of entire entities. Java @Projection public record TechCarView( String name, CarType type) { } List<TechCarView> views = template .typedQuery( "FROM Car WHERE type = 'SPORT'", TechCarView.class) .result(); These features improve performance, reduce data transfer, and comply with modern Java features such as records. An important aspect of Jakarta NoSQL is its long-term architectural vision. While most developers use the mapping layer, the specification also defines a lower-level communication API for advanced scenarios. Java DocumentManagerFactory factory = ...; DocumentManager manager = factory.get("users"); DocumentRecord record = ...; manager.put(record); Optional<DocumentRecord> result = manager.findByKey("user:10"); manager.deleteByKey("user:10"); This communication layer is optional. Application developers can build complete systems without it, but it is valuable for database vendors, framework authors, and advanced integrations needing direct access to database capabilities. This design is fundamentally different from JDBC, which assumes communication through SQL statements and tabular result sets. That model works well because relational databases share a common language and interaction pattern. NoSQL databases do not. Document databases may use BSON, graph databases may offer traversal languages, and vector databases may provide similarity-search APIs. Others use REST endpoints, binary protocols, gRPC streams, or vendor-specific mechanisms. Forcing these models into a JDBC-style abstraction would limit their capabilities or demand ongoing vendor-specific extensions. For this reason, Jakarta NoSQL uses a layered architecture. The mapping layer offers a portable, productive programming model for developers, while the communication layer remains flexible to support diverse NoSQL systems. This architecture positions the specification for future growth. As new technologies like vector databases, time-series engines, and AI-native storage emerge, Jakarta NoSQL can evolve without imposing a relational mindset. Rather than treating every database as a nail for the JPA hammer, Jakarta NoSQL recognizes that different problems require different tools, while still presenting a consistent and familiar experience for enterprise Java developers.

By Otavio Santana

CORE

Your AI Coding Agent Can't Steal What It Never Had: The Docker Sandbox Isolation Story

I ran an AI coding agent against a broken Kubernetes deployment for five minutes. The agent called Anthropic's API dozens of times — reasoning about manifests, running kubectl commands, redeploying workloads. It made fully authenticated requests throughout the entire session. The API key was never in its environment. Shell env | grep -iE "anthropic|api_key|secret|token|password" # (empty) That is Docker Sandbox's credential isolation model in action. This article is about what that actually means — and what else the isolation holds, breaks, and surprises you with when you probe it properly. Key Takeaways Docker Sandbox uses a host-side proxy to inject API credentials without the agent ever seeing them — the agent makes authenticated calls without possessing the keySeven live isolation probes confirmed the boundary held throughout real AI agent activity, not just at restNetwork policy is hostname-scoped HTTP filtering — not a full network control plane — with three specific behaviors the documentation doesn't make clearDevOps agents can run docker build and kubectl inside the sandbox without any path to the host Docker daemon or cluster credentialsThe --branch parallel agent mode is Git-level isolation, not VM-level — important distinction for threat models requiring separate credentials per agent The Setup I manage eight AKS clusters for Fortune 500 clients. My laptop has Azure service principals, SSH keys, kubeconfig files with a dozen cluster contexts, and twenty-plus repos — some with .env files containing real API keys. Running an AI agent from this machine without guardrails means the agent inherits all of it. Docker Sandbox changes that. Each sandbox is a microVM — its own Linux kernel, its own Docker daemon, its own network stack. You mount one project directory. The agent sees one project directory. Everything else on the machine does not exist inside the sandbox. I spent two weeks testing this claim. Here is what I found. Test environment: What Detail sbx version v0.31.1 · commit e658be1 Host macOS Apple Silicon Network endpoints probed 13 Isolation probes 7 targeted commands Kubernetes scenario Real agent task, two bugs, timed All findings backed by real terminal output. Full repo: github.com/opscart/docker-sandbox-devops. How the Credential Isolation Actually Works The sandbox environment has no API keys. But the agent made authenticated API calls. Here is the mechanism: Shell env | grep proxy # https_proxy=http://gateway.docker.internal:3128 # http_proxy=http://gateway.docker.internal:3128 # JAVA_TOOL_OPTIONS=-Dhttp.proxyHost=gateway.docker.internal -Dhttp.proxyPort=3128 ... Every outbound request — HTTP, HTTPS, even Java tools — routes through a proxy at gateway.docker.internal:3128. That proxy runs on the Mac host, completely outside the microVM boundary. When the agent sends a POST to api.anthropic.com, there is no Authorization header — the agent does not have the key. The request reaches the host-side proxy. The proxy checks the allowlist — api.anthropic.com is in the default AI services group under the Balanced policy. Authentication is performed by the host-side proxy using credentials stored outside the sandbox boundary. The authenticated request is forwarded to Anthropic. The agent receives the response. It has no idea what key was used, where it came from, or how to find it again. Think of it like an OAuth gateway. The proxy holds the credential and vouches for the agent's requests. The agent gets access without ever possessing the key. You cannot steal what you never had. This is architecturally different from the standard setup where ANTHROPIC_API_KEY sits in the shell environment — one echo $ANTHROPIC_API_KEY away from being exfiltrated. What the Four Isolation Layers Actually Do Docker Sandbox stacks four layers: Hypervisor isolation. Separate Linux kernel per sandbox. Host processes invisible. Other sandboxes invisible. A compromised sandbox cannot escalate to the host kernel. This is the fundamental difference from a Docker container — a container shares the host kernel. The microVM does not. Network isolation. All outbound HTTP/HTTPS routes through the host-side proxy. Raw TCP, UDP, and ICMP are blocked at the network layer. Three policy tiers: allow-all, balanced (curated dev allowlist), deny-all. Set before starting your first sandbox: Shell sbx policy set-default balanced Docker Engine isolation. Each sandbox runs a private Docker daemon with its own socket. No path to the host Docker daemon. An agent can run docker build and docker run without socket mounting — which is the tradeoff that breaks isolation in plain container-based approaches. Credential isolation. Proxy-based injection as described above. The raw key never enters the microVM. macOS host with sensitive assets and proxy on the left, Docker Sandbox microVM in the center, network policy zones on the right. Seven Isolation Proofs — Run Live After a Real Agent Task The agent exited after completing the debugging task. The sandbox remained alive, and I executed the following commands from the same shell session the agent had used — to show exactly what was accessible throughout the entire run. 1. Filesystem Boundary Shell ls /Users/opscart/ # Source ls /Users/opscart/.ssh/ 2>&1 One directory. The workspace mount. SSH keys, other repos, credential directories — none of them exist inside the sandbox. Parent directories above the workspace are read-only stubs with no siblings. One critical implication: if your workspace is your home directory, your entire home is visible and writable. Always mount a project subdirectory, not your home. 2. No Credentials in Environment Shell env | grep -iE "anthropic|api_key|aws|secret|token|password" # (empty) Confirmed. The agent that just made dozens of API calls had no raw credentials anywhere in its environment. 3. Proxy Confirms the Injection Mechanism Shell env | grep proxy # https_proxy=http://gateway.docker.internal:3128 # no_proxy=localhost,127.0.0.1,::1,[::1],gateway.docker.internal Proxy address visible. Credentials it carries: not visible. The mechanism described above confirmed live inside the running sandbox. 4. Process Namespace Shell ps aux | wc -l # 13 A macOS host runs hundreds of processes. The sandbox shows 13 — all internal. The stack includes dockerd, containerd, socat bridging SSH agent forwarding, and the coding agent. Host processes completely invisible. No way to inspect or interact with anything running on the host. 5. Private Docker Engine Shell docker info | grep -E "Server Version|Operating System|ID" # Server Version: 29.4.3 # Operating System: Ubuntu 25.10 (containerized) # ID: e6934b23-368c-4259-a873-96f879f587e5 Ubuntu 25.10. A unique daemon ID that differs from docker info on the host — confirming the sandbox runs a fully isolated daemon. The agent deployed a full Kubernetes cluster using this daemon. No path to the host Docker socket existed. 6. Host Services Unreachable Shell curl -s --max-time 3 https://localhost:6443 2>&1 || echo "blocked" # curl: (7) Failed to connect to localhost port 6443: Connection refused Port 6443 — my minikube cluster on the Mac host. From inside the sandbox, localhost is the sandbox's own loopback. Host clusters, host SSH, host services — unreachable by default. Eight AKS contexts on this machine. Zero is reachable from inside the sandbox without an explicit policy rule. 7. What the Agent Had vs. What It Didn't During the entire debugging task, the agent had full access to one project directory, kubectl to the sandbox-internal Kubernetes cluster, and full Docker capabilities against the private daemon. It could not reach any other directory, cloud credentials, other kubeconfig contexts, the host Docker daemon, or any cluster not running inside the sandbox. All seven proofs held throughout the session without exception. Three Network Policy Findings That Change How You Think About It Network policy is not a full network control plane. It is hostname-scoped HTTP filtering. Three findings define the actual scope: Finding 1: Blocking returns HTTP 403, not TCP rejection. Plain Text probe "example.com" "https://example.com" # example.com | exit=0 | http=403 Exit code 0. The curl command succeeded. The proxy returned 403 directly. An agent that retries on 403 will retry blocked requests indefinitely. It cannot distinguish a blocked domain from a legitimate server-side error by exit code. For DevOps workflows — an agent hitting a blocked container registry will keep retrying silently rather than failing fast. Finding 2: HTTP CONNECT established a tunnel to port 22 on an allowed host. Plain Text # Port 22 — SSH port curl -s --max-time 5 telnet://github.com:22 # Connected to github.com port 22 # Port 9999 — non-standard port curl -s --max-time 5 telnet://github.com:9999 # Connected to github.com port 9999 github.com is on the Balanced allowlist. HTTP CONNECT established TCP tunnels to github.com on both port 22 and the non-standard port 9999 — both succeeded. Port-based restrictions are not enforced at the proxy layer. The Balanced policy is hostname-scoped only. Any port to an allowed host is reachable via HTTP CONNECT. Finding 3: DNS is not filtered. A common assumption is that all outbound traffic routes through the HTTP proxy — including DNS. Lab results show DNS resolution occurs independently: Plain Text dig example.com +short # 172.66.147.243 A blocked domain resolved. The microVM has an internal stub resolver that forwards DNS independently of the HTTP proxy. An agent can resolve any hostname regardless of the active policy. DNS cannot serve as a secondary enforcement layer. These findings do not break the isolation model. They define its actual boundary. Network policy controls HTTP/HTTPS access by hostname. It does not control DNS, TCP tunnels to allowed hosts on arbitrary ports, or how agents interpret 403 responses. The Agent Scenario: Isolation Under Real Load The real test of isolation is not seven probe commands — it is whether the boundary holds while an agent is actively working, making API calls, running kubectl, deploying containers. I gave an AI agent a broken Kubernetes deployment: a payments-service with memory limits set to 64Mi on a service that needs ~150Mi at peak. The agent received a task file and a set of manifests. No other context. The agent completed the task in under five minutes. It found two bugs — one planted, one discovered independently by reading the manifest and noticing health check probes targeting port 8080 on an nginx container that only serves on port 80. The task said nothing about probes. Result: both pods 1/1 Running, 0 restarts. The seven isolation proofs above were verified immediately after — throughout the entire debugging session, the boundary held without exception. Full article and complete repo at opscart.com/docker-sandbox-devops. What This Means for DevOps Engineers Specifically Most Docker Sandbox articles target software developers running Claude Code on a single codebase. The DevOps case is different and more demanding. A DevOps engineer running an AI agent faces a broader attack surface: multiple cluster contexts, infrastructure credentials, IAM roles, service accounts, kubeconfigs that grant production access. The blast radius of a compromised or manipulated agent is not one repo — it is potentially every system those credentials touch. Docker Sandbox addresses this at the architecture level rather than the prompt level. You are not relying on the agent being well-behaved. You are relying on the microVM boundary, the proxy, and the private Docker daemon. The agent can be fully autonomous inside the sandbox because the guardrail is the environment, not the agent's behavior. The private Docker Engine is particularly significant. DevOps agents need to build and test containers. Every other local isolation approach that allows container operations requires socket mounting — which gives the agent direct access to the host Docker daemon and every image and volume on the host. Docker Sandbox eliminates this tradeoff. What Is Still Rough The image iteration cycle is the primary friction point. Adding a tool requires editing a Dockerfile, rebuilding, pushing to a registry, and recreating the sandbox. For a stable toolchain, this is acceptable. For rapid experimentation, it is not. The --branch parallel agent mode is Git isolation, not VM isolation. Both agents run in one microVM with shared Docker and network. For separate credentials or separate network policies per agent, you need separate workspace directories. The network policy CLI has non-obvious syntax in several places — sbx policy deny does not remove an allow rule, and external cluster access requires two policy rules not one. Neither behavior is documented. The CLI changes between minor versions. v0.31.1 changed login flow, renamed policy tiers, and introduced --clone mode. Pin your version. When Not to Use Docker Sandbox Docker Sandbox is the right tool for a specific set of problems. It is not the right tool when: You need raw UDP or ICMP. Network tracing tools (traceroute, mtr), some mTLS configurations, and anything relying on ICMP will not work — the sandbox proxy only handles HTTP/HTTPS. Your toolchain requires host-device access. USB devices, GPU passthrough beyond basic forwarding, and hardware security keys are not accessible from inside the microVM. You are on a memory-constrained machine. Each sandbox runs a full microVM plus its own Docker daemon. On a machine with 8GB RAM, running multiple sandboxes simultaneously alongside Docker Desktop and a browser will cause pressure. You need production-grade audit logging. Docker Sandbox is Experimental. Audit trails, compliance logging, and enterprise controls are not mature yet. For regulated environments, evaluate accordingly. Your agent needs to coordinate across multiple repositories simultaneously. The one-sandbox-per-workspace model means cross-repo agent work requires careful orchestration. The --clone mode helps but adds git workflow overhead. Conclusion The credential isolation model is the headline: the agent made authenticated API calls throughout the session without the API key ever entering the sandbox. Authentication was performed by the host-side proxy using credentials stored outside the sandbox boundary. The agent could use the credential — it could never see, copy, or exfiltrate it. Seven isolation proofs confirmed the boundary held under real active load. One directory visible. No credentials. No host processes. No host clusters. No host Docker daemon. The network policy findings add important nuance. The --branch mode reality is different from what the documentation implies. Docker Sandbox is Experimental, and the CLI is moving. Use it knowing what it is — and what it is not.

By Shamsher Khan

CORE

From printTriangularNumber to Duff’s Device: Mastering Java Switch Statements Old and New

In this blog post, we will see how the humble Java switch statement evolved from a fall-through curiosity into a powerful expression, and how understanding its mechanics unlocks classic techniques like Duff's Device. Java's switch statement has evolved from a fall-through-prone construct into a modern expression syntax introduced in Java 14. The post traces this evolution using a concrete example, a method that computes triangular numbers by intentionally allowing execution to cascade through cases without break statements. The post also connects this behavior to Duff's Device, a 1983 loop-unrolling technique that uses deliberate fall-through to handle remainder elements before processing full blocks. A comparison of old and new switch syntax outlines trade-offs, and practical guidance is offered on when each form is appropriate. The Accidental Discovery I was prepping for the OCP Java 21 exam and stumbled across a tricky question. A method named question2 used a switch statement without any break statements. The output surprised me at first. Once I traced through it, I renamed the method to printTriangularNumber. That one rename told the whole story. This post dives into why. The Old Switch Statement The traditional switch statement has been part of Java since day one. The syntax looks like this: Java int day = 3; switch (day) { case 1: System.out.println("Monday"); break; case 2: System.out.println("Tuesday"); break; case 3: System.out.println("Wednesday"); break; default: System.out.println("Unknown"); break; } As shown above, every case ends with a break. Without it, execution does not stop. It keeps going into the next case. The old switch works on int, char, String, and enum types. Fall-Through: Feature or Bug? The most misunderstood behavior in switch is fall-through. When you omit break, execution literally falls into the next case. Java int x = 2; switch (x) { case 3: System.out.println("three"); case 2: System.out.println("two"); // jumps here case 1: System.out.println("one"); // falls through default: System.out.println("done"); // falls through } Output: Plain Text two one done Most developers treat this as a bug waiting to happen. They are not wrong. Forgetting a break is one of the most common Java mistakes. But intentional fall-through is a different story. It is a deliberate tool. And printTriangularNumber is the perfect example. printTriangularNumber: Fall-Through in Action Here is the method I renamed from question2 during my OCP prep: Java private static void printTriangularNumber(int n) { int res = 0; switch (n) { case 5: res += 5; case 4: res += 4; case 3: res += 3; case 2: res += 2; case 1: res += 1; default: break; } System.out.println(res == 0 ? "Ok, bye." : res); Let us trace through n = 4: Jumps to case 4, adds 4. res = 4 Falls to case 3, adds 3. res = 7 Falls to case 2, adds 2. res = 9 Falls to case 1, adds 1. res = 10 Hits default, breaks Output: 10 The pattern for each input: nResultFormula111232+1363+2+14104+3+2+15155+4+3+2+1 This is n * (n + 1) / 2, the triangular number formula. The fall-through is doing the summation for you. Each case accumulates the remaining values by simply not stopping. For n = 0 or any value above 5, no case matches, default fires immediately, and res stays 0. The ternary prints "Ok, bye.". I personally find it a beautiful example of using language semantics intentionally. This is also the kind of question the OCP exam loves to throw at you. The New Switch Expression (Java 14+) Java 14 introduced switch expressions as a standard feature. The arrow syntax -> eliminates fall-through entirely. Each arm is independent. Java int day = 3; String name = switch (day) { case 1 -> "Monday"; case 2 -> "Tuesday"; case 3 -> "Wednesday"; default -> "Unknown"; }; System.out.println(name); // Wednesday A few things to notice here: Switch is now an expression. It returns a value. The arrow -> replaces : and break together. No fall-through. Each arm executes independently. Multiple labels on a single arm: case 1, 7 -> "Weekend"; You can also use it inline: Java System.out.println(switch (day) { case 1, 7 -> "Weekend"; default -> "Weekday"; }); Much cleaner. Much safer. Switch Expressions With Yield Sometimes you need more than a single expression in an arm. That is where yield comes in. Java int n = 4; int result = switch (n) { case 1, 2 -> n * 10; case 3, 4 -> { int temp = n * n; System.out.println("Computing for: " + n); yield temp; // return value from block } default -> 0; }; System.out.println(result); // 16 Think of yield as the return statement for a switch block arm. You need it whenever the arm has multiple statements inside {}. A common mistake is using return instead of yield inside a switch expression block. That compiles only inside a method and it returns from the entire method, not just the switch. Always use yield inside switch expression blocks. Duff's Device: Fall-Through Taken to the Extreme Now that we understand fall-through well, let us look at the most famous intentional use of it: Duff's Device. Tom Duff invented this in 1983 to speed up memory copy operations by reducing loop branch overhead. The trick is to unroll the copy loop and use a switch to jump into the middle of it based on the remainder. In Java, we replicate it in two clean phases since Java does not allow interleaved switch+loop syntax: Java public static void duffCopy(int[] src, int[] dst, int n) { int i = 0; int rem = n % 4; // Phase 1: handle remainder via fall-through switch (rem) { case 3: dst[i] = src[i]; i++; case 2: dst[i] = src[i]; i++; case 1: dst[i] = src[i]; i++; case 0: break; } // Phase 2: full blocks of 4 int fullBlocks = (n - rem) / 4; while (fullBlocks-- > 0) { dst[i] = src[i]; i++; dst[i] = src[i]; i++; dst[i] = src[i]; i++; dst[i] = src[i]; i++; } } Let us trace through n = 13: rem = 13 % 4 = 1 Switch jumps to case 1, copies 1 element. i = 1 fullBlocks = (13 - 1) / 4 = 3 Loop runs 3 times, copying 4 elements each time Total: 1 + 12 = 13 elements The Python equivalent makes the two phases explicit: Python def duff_copy(src, n): dst = [None] * n rem = n % 4 for i in range(rem): # Phase 1: remainder dst[i] = src[i] i = rem while i < n: # Phase 2: full blocks dst[i] = src[i] dst[i+1] = src[i+1] dst[i+2] = src[i+2] dst[i+3] = src[i+3] i += 4 return dst The connection to printTriangularNumber is direct. Both use fall-through intentionally. In printTriangularNumber, the switch jumps to the right case and accumulates downward. In Duff's Device, the switch jumps to the right case and copies the remainder before the main loop takes over. Old vs. New Switch at a Glance FeatureOld Switch (:)New Switch (->)Fall-throughYes (default)NoReturns valueNoYesbreak neededYesNoMultiple labelsNoYes (case 1, 2 ->)Block with yieldNoYesNull safeNoYes (Java 21 preview)OCP exam topicYesYes Which One Should You Use? For new code, always prefer the switch expression with ->. It is safer, cleaner, and expressive. Your reviewers will thank you. Reserve the old switch with fall-through only when you genuinely need the cascading behavior, like in printTriangularNumber or a hand-tuned loop like Duff's Device. In those cases, add a comment explaining the intent. Otherwise, the next developer (including future you) will assume the break is missing by accident. My personal observation: the OCP Java 21 exam tests both heavily. Knowing when fall-through is intentional versus accidental is the key distinction examiners probe. Make sure you can trace through any switch block without running it. Happy testing! What is your take: is intentional fall-through clever engineering or a maintenance nightmare waiting to happen? Drop your thoughts below!

By NaveenKumar Namachivayam

CORE

A Practical Guide to Temporal Workflow Design Patterns

Long-running, distributed business processes often require careful coordination, state management, and fault handling. Temporal offers a code-first approach to durable workflows: developers write ordinary code for orchestration, and the Temporal service persists state, retries failed tasks, and resumes execution after failures. This shifts focus from plumbing (queues, retries, timeouts) to domain logic, but it also encourages reuse of proven patterns. The Temporal community and documentation highlight several orchestration patterns — for example, sagas, state machines/actors, polling strategies, fan-out/fan-in, and versioning patterns — that solve recurring problems in workflow design. This article surveys these patterns, explaining when and how to use them, with concise code snippets to illustrate their implementation in Temporal. A classic pattern in distributed transactions is the Saga (compensating transaction). In a saga, a business process is broken into a sequence of steps, each with its own “undo” compensation. If any step fails, the saga executes compensations in reverse order to restore consistency. In Temporal, this maps naturally to a try/catch around activities or to the built-in Saga helper. For example, a vacation booking workflow might book a hotel, then a flight, then an excursion. Each step registers a compensation action before invoking the activity. If a failure occurs, the catch block calls saga.compensate() to run all registered compensations in reverse. The following Java-like snippet shows this approach: Java public void bookVacation(BookingInfo info) { Saga saga = new Saga(new Saga.Options.Builder().build()); try { saga.addCompensation(activities::cancelHotel, info.getClientId()); activities.bookHotel(info); saga.addCompensation(activities::cancelFlight, info.getClientId()); activities.bookFlight(info); saga.addCompensation(activities::cancelExcursion, info.getClientId()); activities.bookExcursion(info); // If all succeed, method returns normally. } catch (TemporalFailure e) { saga.compensate(); // undo previous steps throw e; // propagate failure } } If any book* activity throws an exception, the catch invokes saga.compensate(), which calls cancelExcursion, cancelFlight, and cancelHotel in reverse order. This pattern ensures that even if the workflow crashes after partial work, Temporal’s durable execution will eventually resume the compensation sequence. Because Temporal workflows are persistent, the saga logic itself is recoverable – the service records each step and its compensation in the history. In effect, workflows become distributed state machines where try/catch embodies the saga pattern. Polling and External Events Workflows often need to wait for external processes or inputs. In Temporal, there are two main polling strategies. Frequent polling (short interval) is implemented inside an activity loop: the activity repeatedly attempts a call, sleeps briefly, and heartbeats after each iteration. Because long-running activities must heartbeat to show liveness, the loop invokes Activity.getExecutionContext().heartbeat(null) each cycle. For example, a polling activity might look like this: Java @Override public String doPoll() { ActivityExecutionContext context = Activity.getExecutionContext(); while (true) { try { return service.getServiceResult(); } catch (TestServiceException e) { // Service not ready; will retry } // Heartbeat to prevent timeout, then sleep briefly context.heartbeat(null); sleep(POLL_DURATION_SECONDS); } } In this snippet, service.getServiceResult() is retried until it succeeds. Each loop iteration heartbeats and sleeps for a fixed interval. If the worker or service crashes, Temporal will resume the loop exactly where it left off. This pattern is ideal for rapid retries or waiting on resources that become available shortly. For infrequent polling, Temporal relies on activity retry options instead of a custom loop. A workflow can call an activity once, but configure its retry backoff so that failures trigger re-execution after longer delays. In practice, one sets a high initial retry interval and backoff coefficient in the ActivityOptions at workflow time. The workflow code itself is just a single activity call (no loop needed). If the activity throws an error, Temporal automatically retries it later, waiting longer each time. This approach leverages the built-in retry policy (e.g., exponential backoff) for occasional checks. To handle arbitrary external signals or time delays, Temporal workflows can also use Workflow.await(timeout, condition) or Workflow.newTimer(). For instance, a workflow might await a boolean flag that is set by a signal handler, or await a fixed timeout for human input. This avoids busy-wait loops at the workflow level. Signals themselves can come at any time; Temporal’s messaging system lets running workflows be interrupted by signals without polling. In short, Temporal workflows mix timers (Workflow.await) and external signals to wait efficiently. Frequent polling lives in an activity with heartbeats, whereas infrequent or one-off waits can use activity retry or workflow timers. Parallel and Batch Processing When processing large data sets or issuing many operations in parallel, Temporal’s fan-out/fan-in pattern is useful. A parent workflow can spawn multiple child workflows or activities concurrently and then wait for all to complete. This is commonly used for batch jobs, bulk queries, or any parallel computations. The following example shows a “page-by-page” batch processing workflow. For each batch of records, it spawns a child workflow per record and then uses Promise.allOf() to wait for all children. When a batch is done, it can optionally continue-as-new to process the next page without growing history indefinitely: Java @Override public int processBatch(int pageSize, int offset) { List<SingleRecord> records = recordLoader.getRecords(pageSize, offset); List<Promise<Void>> results = new ArrayList<>(); for (SingleRecord record : records) { String childId = Workflow.getInfo().getWorkflowId() + "/" + record.getId(); RecordProcessorWorkflow processor = Workflow.newChildWorkflowStub(RecordProcessorWorkflow.class, ChildWorkflowOptions.newBuilder().setWorkflowId(childId).build()); results.add(Async.procedure(processor::processRecord, record)); } // Wait for all child workflows to finish Promise.allOf(results).get(); // If no more records, return result and finish if (records.isEmpty()) { return offset; } // Otherwise continue as new for the next batch (to reset history) return nextRun.processBatch(pageSize, offset + records.size()); } In this code, each child workflow processes one record. The parent collects a list of Promise<Void> and calls Promise.allOf(...).get(), which blocks the parent until all child workflows complete. Using children allows highly parallel processing without overloading a single worker. After finishing a batch, the code checks if (records.isEmpty()) and returns; otherwise it calls a continueAsNew stub (nextRun) with an updated offset. This continueAsNew effectively starts a fresh workflow execution with a new history, avoiding unbounded history growth for long-running loops. As shown, Temporal’s Async and Promise primitives make parallel fan-out/fan-in straightforward. Beyond paging, fan-out can apply to any use case needing parallel work (bulk updates, scatter-gather queries, etc.). Conversely, gathering results into a list or aggregation is just collecting activity/child results into a shared variable, which Temporal safely persists in the history. Actor-Like Workflows and Event-Driven Patterns Temporal workflows are naturally stateful and can run indefinitely, making them suitable for actor or process-manager patterns. A workflow can “sleep” or wait for signals, maintain in-memory state, and react to external events. Clients can use signals (@SignalMethod) to send events into a running workflow and queries (@QueryMethod) to read its state without affecting it. This allows workflows to act like autonomous entities. For example, imagine a subscription service workflow. It starts with a customer on trial, waits for either trial expiration or a cancellation signal, then proceeds to billing periods. Signals like cancelSubscription() can interrupt the main flow. Meanwhile, queries like queryCustomerId() can retrieve the workflow’s state from outside. Temporal’s event system handles all this without polling: “a running workflow can receive external messages without polling, and clients can inspect workflow state at any time”. Internally, the workflow code can use Workflow.await(...) to pause until a signal sets a flag. Here’s a conceptual sketch (TypeScript/JavaScript style) of using signal and query definitions: TypeScript const abortSignal = defineSignal<[string]>('abort'); const updateSignal = defineSignal<[number]>('update'); const getStateQuery = defineQuery<State>('getState'); export async function statefulWorkflow(config: Config): Promise<Result> { let state: State = {...initial...}; let aborted = false; setHandler(abortSignal, (reason: string) => { aborted = true; }); setHandler(getStateQuery, () => state); // Main workflow logic: await condition(() => aborted, '1 minute'); if (aborted) { // cleanup or compensation return { status: 'aborted' }; } // ... continue normal processing return { status: 'completed' }; } In this pattern, external callers would workflow.signal(abortSignal, reason) or workflow.query(getStateQuery). Temporal’s signal-and-query features implement a process manager-style pattern: a workflow can behave like an event-driven state machine, reacting to signals in real time and allowing external inspection. This is more robust than polling, and since all state changes happen in the workflow code, consistency is guaranteed. (If a query is issued while the workflow is mid-activity, it will reflect the last completed state.) Note that newer Temporal releases also support Workflow Updates, which are like synchronous signals that can return values. In environments where Update is available, a workflow can reply to a message directly. Otherwise, a client can query state as a two-step “signal then query” process. Either way, this pattern empowers long-lived processes and human-in-the-loop steps. Versioning and Evolving Workflows Temporal requires workflow code to be deterministic, so changing logic in running workflows must be done carefully. The community and docs describe versioning strategies. For short-lived or rare workflows, one can deploy a new workflow definition (e.g. MyWorkflowV2) or use a new task queue for new versions. For long-lived workflows, Temporal’s Workflow.getVersion API lets the code branch on a version number recorded in the history. This is often called the “patch” strategy. For example: Java int version = Workflow.getVersion("checksumAdded", Workflow.DEFAULT_VERSION, 1); if (version == Workflow.DEFAULT_VERSION) { activities.upload(targetBucket, targetFilename, data); } else { long checksum = activities.calculateChecksum(data); activities.uploadWithChecksum(targetBucket, targetFilename, data, checksum); } Here, on first execution getVersion("checksumAdded", DEFAULT, 1) returns DEFAULT_VERSION and runs the original upload() call. When a new worker with updated code runs getVersion("checksumAdded", DEFAULT, 1) again, Temporal records version = 1 in the history. Future runs hit the else branch and use the new uploadWithChecksum() code. This ensures deterministic replay: workflows that started before the code change continue on the original branch, and newer executions use the new logic. After all old executions finish, the branching logic can often be removed. Overall, versioning patterns let developers evolve workflows without breaking running executions. Temporal offers multiple options — definition names, task queues, and the getVersion API — each with trade-offs. (Using separate definitions or queues isolates versions at the cost of more infrastructure, while getVersion keeps a single codebase but requires planned version markers.) Regardless, versioning is a key pattern to safely deploy workflow updates in production. Conclusion Temporal’s durable workflow engine incorporates many built-in aids for complex process patterns. By applying established designs — such as sagas for compensating transactions, retry and heartbeat loops for polling, fan-out/fan-in via child workflows, and event-driven actors with signals/queries — engineers can build robust systems without manual boilerplate. Each pattern leverages Temporal features: workflows and activities, promises, signals, queries, and continuations. The examples above show how little code is needed: a few method calls and standard control structures achieve what would otherwise be elaborate orchestration logic. In practice, adopting these patterns means that failures are handled gracefully and state is managed cleanly. For example, the saga code snippet illustrates reversing partial work on error, while the parallel batch example shows how to process unbounded data safely with continueAsNew. In summary, understanding Temporal’s idioms — as documented by the Temporal team and community — empowers developers to focus on business logic while the platform ensures reliability. Mastery of these workflow patterns leads to systems that are easier to reason about, easier to maintain, and resilient in production.

By Akhil Madineni

When Your Documentation Manages Itself: mdship and AI-Assisted Markdown

If you write technical documentation in markdown, you already know the tension: some parts of your document are hand-written prose, while others — a table of contents, an included code snippet, a rendered diagram — are generated from somewhere else. How you handle that boundary says a lot about your workflow. Most documentation toolchains resolve it the same way preprocessors like PET or Jamal do: separate the source from the output. You maintain a template file, run a build step, and get a rendered document as the result. Clean, predictable, and easy to reason about — but it adds a build step, and the output file is not the thing you actually edit or share. mdship takes a different approach. It is a command-line tool and MCP server that edits your markdown in place: it reads the file, updates specific sections, and writes the result back to the same file. Everything else — your prose, your headings, your structure — is untouched. No separate output file, no build pipeline. The document you see is the document you ship. Think of it less like a preprocessor and more like a very opinionated editor that knows how to regenerate a table of contents, pull in a code snippet from another file, or render a Mermaid diagram — all within the file you are already editing. One File: The Trade-Off Working in a single file has real advantages for technical writers. The managed content — including snippets, generated TOC entries — is visible inline while you are editing. You can read the full document as your readers will see it, without switching to a preview mode or running a build. There is no output file to track separately, and markdown-aware tools like GitHub or your IDE render it correctly wherever it lives. The downside is equally real: because managed and hand-written content share the same file, it is easy to accidentally edit a section that is meant to be regenerated. You fix a typo in an included code snippet; on the next run, your fix is gone. You add a note inside a generated TOC block; mdship overwrites it without warning. Preprocessor tools sidestep this entirely. The source is one file, the output is another, and you never edit the output directly. The separation of concerns is clean. But you pay for it: every change requires a build step, the output is not portable without that step, and contributors who are not familiar with the toolchain may not know which file to edit. Neither model is universally better. mdship makes the pragmatic choice that for most documentation workflows, a single file with good guardrails beats a clean architecture that requires a build. Content Integrity: The Guardrail The guardrail is a checksum. Every time mdship writes content into a managed section — a TOC block, an INCLUDE block, a MERMAID block — it records a checksum of that content inside the opening placeholder marker, under a key called _content_generated_. On the next run, before overwriting anything, it verifies that the checksum still matches. If it does not, mdship stops and reports an error instead of silently discarding your edits. Plain Text ERROR: Placeholder TOC content was manually edited. Hash mismatch detected. Delete _content_generated_ line to override and accept data loss. This turns an accidental overwrite — which would otherwise be invisible until you notice the missing content — into an explicit decision. You can delete the _content_generated_ line to tell mdship "I know, proceed anyway," or you can pass --force on the command line to skip the check for a single run. Either way, you are opting in, not being surprised. AI-Generated Sections: The Same Idea, Extended The same pattern extends naturally to sections written by an LLM. mdship supports an  placeholder: an HTML comment embedded in the markdown file that contains a prompt. When you invoke the /ai-placeholder skill in Claude Code, it reads the prompt and writes the generated content between the opening and closing markers — directly into the file, in place, just like any other mdship operation. The workflow has three steps, enforced by the skill: Check: before writing anything, the skill calls mdship ai-check via MCP to verify that the existing content has not been manually edited since it was last generated. If the checksum does not match, the skill stops and reports the conflict to you rather than overwriting your edits.Generate: if the check passes (or there is no checksum yet, meaning the section is new), the LLM reads the prompt and writes the content.Seal: after writing, the skill calls mdship ai-fix via MCP to record a new checksum for the freshly generated content, protecting it against accidental edits until the next intentional update. The MCP integration means these calls happen automatically, as part of the skill's defined behavior — not as something the LLM has to remember to do. The Prompt Is Documentation, Too There is a subtler benefit to this approach that is easy to overlook. The prompt that instructs the LLM remains embedded in the file as a non-rendered HTML comment, right above the content it produced. It does not live in a commit message, a Jira ticket, or a separate prompt library that may be hard to find six months later. It is part of the document. This has practical consequences. If you need to regenerate a section — because the underlying API changed, or a referenced file was updated, or you simply want a fresh pass — you re-run the same prompt against the same file. The instruction is already there; you do not have to reconstruct it. The prompt can also reference external files: other documentation pages, source code, configuration files. If those change, rerunning the prompt automatically picks up the changes. The document becomes self-updating in the sense that the machinery to update it is built in. Conclusion mdship's in-place editing model and its LLM integration are two expressions of the same design choice: keep everything in one file, protect it with checksums, and let the tooling manage the regeneration cycle rather than the author. For technical writers, this means fewer context switches, no build step, and a document that carries both its content and the instructions for maintaining that content in a single portable file. The trade-off — shared space for managed and hand-written content — is managed by the checksum guardrail, which turns silent overwrites into explicit decisions. Whether the content is generated by mdship itself or by an LLM following an embedded prompt, the contract is the same: write it, seal it, and trust that the next update will ask before it overwrites.

By Peter Verhas

CORE

AI Is Finding Bugs Faster Than Enterprises Can Patch — Here's What Data Security Teams Should Do

I have spent the better part of a decade building data protection products for global enterprises. Cloud DLP, CASB, SSPM, Behavior Threats, AI Access Security, ISPM, etc. The kinds of things that sit between a user, an agent, or an application and the sensitive data nobody wants to see in the wrong place. Every conversation I have had with a customer security architect this year eventually arrives at the same question. The threat landscape has clearly changed. What does that mean for the controls we already own? This article is the analysis I have been sharing with security architects across industries who are evaluating how their data protection programs need to evolve. It is grounded in what is publicly documented, what it actually changes for enterprise data security, and where I would direct the next dollar of investment based on a decade of building these products at scale. What Actually Shifted, With Sources There are three publicly verifiable data points worth understanding before any control conversation makes sense. Discovery Is Becoming Inexpensive Mozilla shipped Firefox 150 in April 2026 with two hundred and seventy-one fixes that came out of a single sweep using an early version of Anthropic’s Mythos preview model. That is roughly four times the project’s typical annual baseline, in one pass. Mozilla also added the most honest sentence I have read on this topic all year. They said they have not seen any bug in the set that an elite human researcher could not have found, given enough time. SecurityWeek covered the details: securityweek.com/claude-mythos-finds-271-firefox-vulnerabilities. Read that caveat carefully. The thing that became automated is not novelty. It is the cost of finding a class of bugs that humans were always capable of finding. When the price of an action drops by an order of magnitude, the action gets done at scale. That is the shift, and it is the shift that matters. Patching Is Not Getting Cheaper at the Same Rate HackerOne paused new submissions to its Internet Bug Bounty program on March 27, 2026. The IBB is the oldest crowdsourced vulnerability reward program for open source, dating back to 2013. The pause was not a budget decision. It was an admission that the gap between AI-assisted discovery volume and the ability of volunteer maintainers to ship patches had become unbridgeable on the existing incentive model. Dark Reading’s coverage is here: darkreading.com on the IBB pause. Earlier in the year, the curl project removed bounties from its program for the same reason, after a wave of low-quality AI-generated submissions overwhelmed triage. If the upstream open source ecosystem is struggling to keep pace with discovery, every enterprise that ships software with open source dependencies is downstream of that struggle. That is most enterprises. Autonomous Agents Are Already Creating Real Incidents In April 2026, the Cloud Security Alliance published two surveys that I think every data security team should read. The first study found that fifty-three percent of organizations have had AI agents exceed their intended permissions, and forty-seven percent have already experienced a security incident involving an agent in the past year. The second, published a week later, reported that eighty-two percent of enterprises have discovered previously unknown agents running in their environments, and sixty-five percent have had an agent-related incident. The most common consequence was data exposure. CSA’s findings: Enterprise AI Security Starts with AI Agents and Autonomous but Not Controlled. Take those three threads together. Bug discovery is industrializing. The patch side is bottlenecked. And inside the enterprise, autonomous agents are already operating in places nobody fully maps. That is the operating reality, not a forecast. Why This Matters More for Data Security Than for Any Other Function Most of the AI security conversation is framed around vulnerabilities and exploits. I think that framing misses what is actually changing for enterprises. When a class of vulnerabilities becomes cheaper to discover, the average time between exposure and exploitation shortens. When average exposure time shortens, the probability that any given control fails inside that window goes up. When more controls fail more often, the consequence shows up at the data layer. Data is the asset. Everything else is a path to it. The CSA finding I keep coming back to is the one that says agent incidents most often produce data exposure, not service outages. That tracks with what I see at customer sites. The blast radius of an agent compromise is determined by the data the agent had access to, the policies that were being watched, and the speed at which someone noticed. None of those three is improving on the timeline that adversaries are improving. If an agent has access to your sensitive data, the agent is part of your data security perimeter, whether your DLP product knows it or not. That sentence is the part of the conversation that I find most data security teams are not yet having internally. It needs to happen this quarter. Three Things Data Security Programs Should Rethink Now 1. Stop Treating Non-Human Identities as a Hygiene Problem CyberArk’s 2025 Identity Security Landscape, surveying 2,600 cybersecurity decision-makers globally, found that machine identities now outnumber human identities by more than 80 to 1 in the typical enterprise, up from roughly 45 to 1 in their 2024 study. GitGuardian’s State of Secrets Sprawl 2025 report found 23.8 million new secrets exposed on public GitHub in 2024 alone, a 25 percent year-over-year increase, with non-human identities flagged as the dominant credential population behind that growth. The exact ratio in any given environment is a question for the IAM team, but the order of magnitude is consistent across every serious study I have read, and it is rising fast. Most enterprise IAM programs were designed around human users. Periodic access reviews. Quarterly attestation cycles. Manager signoff. None of that infrastructure was built for a population that is now eighty times larger, that provisions itself, and that often outlives its original use case. CSA’s research found that only 21 percent of organizations have a formal decommissioning process for AI agents. Everyone else is accumulating what the report calls retirement debt: agents who completed their task months ago and still hold credentials, tokens, and data access. From a data security standpoint, the practical consequence is that an enterprise’s most overprivileged identity is rarely a person. It is a service account from 2022 that nobody remembers, an OAuth grant that an integration test attached to a real production scope, or a workflow agent that picked up admin-level permissions during deployment because the person setting it up did not want to debug a permission-denied error at 11 p.m. These identities are reachable by adversaries through a single credential compromise, and they often have direct access to the kinds of data that DLP policies were written to protect at the human user layer. The remediation requires a structured non-human identity program with a named owner, a defined lifecycle covering provisioning, rotation, and decommissioning, and quarterly access reviews that apply to bots the way they apply to humans. Workload identity federation rather than long-lived secrets. Scoped service accounts. Logging that captures what each non-human identity touched, not just whether it authenticated successfully. From a tooling perspective, this work sits at the intersection of CASB, IAM, and DLP, and in most enterprises, it has no clear owner across those three functions. Establishing that ownership is the precondition for everything else. 2. Refresh Classification and Tagging for an Agentic Environment In my own work on DLP product strategy, I have come to think of classification and tagging as the foundation that every other data control sits on. If sensitive content is correctly identified at the moment it is created or ingested, downstream policies have a fighting chance. If it is not, no amount of policy authoring downstream will catch up. Most enterprise tagging programs were designed for documents flowing through email, endpoints, and a manageable list of SaaS applications. The current generation of AI agents and copilots flows through none of those choke points cleanly. An agent reads a corpus, generates a derivative artifact, and writes that artifact somewhere else. The original tag, if there was one, often does not survive the round trip. The derivative may contain sensitive content that was reassembled from sources that were each individually below the policy threshold. Three practical refreshes are worth funding now. Treat AI-generated outputs as a first-class data class. Anything produced by an agent or copilot needs provenance metadata that travels with it: which model produced it, against which prompt, derived from which sources, with which level of human review. Most enterprise classification taxonomies do not have a slot for this yet. Add one.Lower the threshold for tagging at ingestion. The cost of misclassifying a sensitive document used to be that a human eventually emailed it to the wrong person. The cost now includes an agent reading it as part of a larger context and producing a derivative that lands in a SaaS workspace your DLP product does not inspect. Err on the side of more aggressive classification at the source.Audit your DLP coverage of LLM endpoints and agentic SaaS surfaces. Most DLP deployments I see in the field have rich coverage of email and endpoints, partial coverage of cloud applications, and almost no coverage of the LLM and agent traffic that has become a meaningful share of how sensitive data now leaves the environment. That is the coverage gap most likely to show up in a 2026 incident report. 3. Put a Model in the Pull Request Path This is one of the few areas where the offensive shift in AI capability cuts directly in defenders’ favor, and most enterprise application security programs are not yet using it. The traditional SAST and DAST queue is where AppSec hours go to die. Thousands of unverified findings, most of them noise, validated entirely by humans on a backlog that never empties. The newer pattern is to put a model-based reviewer in the pull request path itself. Every PR is reviewed by an automated agent for security defects before a human sees it. Findings show up as inline comments. High-confidence findings can block the merge. OpenAI publicly stated in April 2026 that its Codex Security agent has contributed to over 3,000 critical and high-severity vulnerability fixes across the ecosystem since launch, and that its Codex for Open Source program now provides free security scanning to more than 1,000 open-source projects. Anthropic, Semgrep, and several other vendors have shipped comparable capabilities. Whether you build on a commercial offering or assemble an internal pipeline, the workflow is what matters. One nuance worth knowing about. Standard commercial models often refuse legitimate dual-use security queries by policy. Binary reverse engineering, exploit reasoning, malware analysis. If your AppSec team has been telling you that AI tools “do not work for security,” this refusal threshold is usually the reason. Both Anthropic’s Glasswing program and OpenAI’s Trusted Access for Cyber, expanded on April 14, 2026, to thousands of verified individual defenders, exist precisely to provide a lower refusal threshold for verified defensive use cases. Enterprise procurement and legal teams should start the verification paperwork now, not after a need arises. The Supply Chain Is the Other Half of the Data Exposure Problem Two recent incidents are worth holding in mind whenever this conversation comes up. On September 8, 2025, eighteen widely used npm packages, including chalk, debug, and ansi-styles, were trojanized after a phishing campaign targeting the maintainer known as qix. Those eighteen packages collectively account for over 2.6 billion weekly downloads. The malicious versions were live for roughly two hours and were written to drain cryptocurrency wallets, but the same access could have been used to exfiltrate environment secrets, build credentials, or sensitive data from any CI pipeline that pulled the bad version during that window. Palo Alto Networks Unit 42 and others published detailed breakdowns: paloaltonetworks.com on the qix incident. A week later, on September 15, 2025, the Shai-Hulud worm became the first self-propagating malware in the npm ecosystem, compromising hundreds of packages in its initial wave and continuing to evolve through follow-on campaigns into late 2025 and early 2026. The malware integrated TruffleHog to scan for secrets in compromised environments, harvested credentials from cloud instance metadata services where available, and weaponized GitHub Actions workflows for persistence. Palo Alto Networks Unit 42, ReversingLabs, Wiz, and others have continued to track variants of the same family. The reason these matter for a data security conversation is that the attacker's objective in both cases was credentials and secrets in build environments. From there, the path to data is short. A compromised CI runner with cloud credentials can read whatever those credentials can read. A compromised GitHub token can read whatever the org allows. A compromised npm publish token can introduce a future payload that does both. Treat the build pipeline as a data security boundary, not just an engineering productivity surface. A dependency firewall that validates package provenance before installation (Sonatype Nexus Firewall, JFrog Xray, Socket.dev, or equivalents) is the highest-leverage single control I know of for closing this attack surface. The Shadow Agent Problem Is a DLP Problem in Disguise The single most striking statistic in the April 2026 CSA research, to me, was that eighty-two percent of organizations had discovered previously unknown AI agents in their environment over the past year, and forty-one percent had discovered them more than once. The agents most commonly emerged in internal automation and scripting environments, in custom assistants and plugins built on LLM platforms, in SaaS tools with built-in automation, and in developer-created workflows. This is, structurally, the same problem that shadow IT was a decade ago, and the same problem that shadow SaaS became five years ago. The difference is that the average shadow agent has read access to more sensitive data than the average shadow application ever did, because agents are useful precisely in proportion to how much context they can reach. A finance team’s reconciliation agent, helpfully built in an afternoon, often ends up with broader visibility into financial data than the human who built it. A customer support copilot frequently has a service account with access to the entire ticket database, including PII. None of this is malicious. It is the path of least resistance for getting an agent to do something useful. Three controls help close the gap, and they are mostly extensions of capabilities a mature data security team already owns. CASB and SSPM coverage of LLM and agent platforms. The platforms hosting these agents (custom GPTs, Copilot Studio, internal MCP servers, vendor copilots) are SaaS applications. They need posture management, sanctioned application policies, and inline data protection just as much as Salesforce or Workday do. Most CASB and SSPM deployments are still catching up here. Push your vendor.Inline DLP on prompt and completion traffic. The point at which sensitive data leaves the environment is increasingly the prompt itself. Inline data inspection at the LLM gateway, using the same content matchers (EDM, IDM, OCR, vector ML) you trust for email and endpoints, is the right architectural pattern. The vendors are building this, but few enterprises have it deployed.An agent registry, even a basic one. Until the agent population is enumerable, no policy applied to it is provable. A spreadsheet is fine to start. The goal is to be able to answer, on any given Monday, three questions: which agents exist in production, what data each one can read, and who is the human owner of each. CSA’s data shows that most enterprises cannot answer those questions today. What I Would Actually Start on This Week Comprehensive ninety-day plans tend to lose momentum after the first two weeks of execution. The more effective approach, which I have refined over years of operationalizing data security programs at enterprise scale, is a focused set of starting moves that can ship in two weeks and that compound into a larger program over the quarter. Run an inventory pass for AI agents and copilots in your environment. Spreadsheet is fine. Capture name, owner, data scope, and approval status. The goal is to convert the CSA shadow agent statistic from an industry survey number into a number you actually have for your own organization.Review the data scope of every service account and OAuth grant tied to an LLM, agent, or copilot platform. Most of them were sized for development convenience, not production. Tighten the ones that need tightening. Decommission the ones that are no longer in active use.Pilot a model-based reviewer in the pull request path of one codebase. Measure the false positive rate and developer satisfaction at week four. If the numbers are reasonable, expand. If they are not, tune and try again.Add provenance metadata to your data classification taxonomy. Even if the only label you can ship this quarter is “generated by an AI system,” shipping it now is more valuable than waiting for a perfect schema. Tagging at ingestion is the part of the program that compounds, and it has been undersized for the agent era at most enterprises I have seen.Open the verified access conversation with your AI vendors. Anthropic Glasswing, OpenAI Trusted Access for Cyber, and equivalent programs from other providers offer pathways to models with reduced refusal thresholds for legitimate defensive work. The application process involves coordination with General Counsel and procurement, which is why initiating it before an urgent need is critical. Programs of this kind will become foundational infrastructure for enterprise security teams over the next two years. These moves represent the structural transition that enterprise data security programs need to make over the next eighteen months. Programs that begin this work now will spend that window refining the controls and integrating them across their existing security stack. Programs that delay will spend the same window writing postmortems that explain why the controls were not in place. Conclusion The cybersecurity industry has navigated several genuine inflection points over the past decade, and the current moment qualifies as one of them on a specific structural ground: the cost curve for finding software flaws has bent, while the cost curve for shipping patches has not. The gap between those two curves is where every enterprise security program now operates, and the consequences land first at the data layer, which is where my work has been concentrated for the past decade. Data security teams that internalize this framing now will spend 2026 building defensible programs around a fundamentally changed threat economy. Teams that wait for a more dramatic signal will spend the same period responding to incidents that the structural shift made predictable.

By Priyanka Neelakrishnan

Data Engineering

Functions of Data Engineering

AI/ML

Big Data

Data

Databases

IoT

DZone's Featured Data Engineering Resources

The Latest Data Engineering Topics