10 min read

Written by protopia

Sensitive Data, Open Models, and the AI Factory’s Inference Privacy Layer

Protopia AI Stained Glass Transform (SGT) is now available for NVIDIA Nemotron 3 Super and Nemotron 3 Nano Omni. As an integrated data-privacy layer, SGT is designed to minimize plaintext exposure from the inference path, so an enterprise’s most sensitive workloads can run on the multi-tenant AI factory infrastructure where supported.

Nemotron 3 models provide leading accuracy with fastest throughput, open weights, published recipes, enterprise-ready agents, and model customization for control, trust, and AI specialization.

However, processing sensitive data on shared infrastructure risks plaintext exposure across the serving stack (such as in logs, memory, and caches). To protect this information (e.g., customer records, claims, clinical notes, source code, other sensitive data) business units often revert to isolated, single-tenant capacity, which inevitably destroys the high ROI that justified the AI investment in the first place.

You can see this in how enterprises describe their own deployments. In healthcare, PHI is pinned to on-prem inference while general workloads run elsewhere. In financial services, a bank carves its AI factory into separate slices for each line of business, wealth management, the retail bank, the commercial bank, so no group’s sensitive data shares infrastructure with another’s. Federal programs stand up parallel sovereign and air-gapped deployments to keep sensitive telemetry off multi-tenant infrastructure. In manufacturing, the most sensitive design and process IP is kept out of the shared AI environment altogether, so the workflows that would benefit from it most never get to use it.

Each of these is a reasonable answer to a real constraint. Each also narrows the aperture, shrinking the set of places the open model can run to the few the data owner fully controls and leaving that infrastructure well below the utilization it was justified on. That idle, walled-off capacity, bought but unusable because sensitive workloads cannot share it, is the isolation tax. Protopia built Stained Glass Transform to help address that constraint.

NOW AVAILABLE: SGT FOR NVIDIA NEMOTRON 3

Protopia AI Stained Glass Transform (SGT) is now available for NVIDIA Nemotron 3 Super and Nemotron 3 Nano Omni. Nemotron is built for agentic work, and an SGT for it enables those agentic workloads to run on multi-tenant AI factories while avoiding plaintext exposure for the data they reason over.

Why NVIDIA Nemotron

The Nemotron family of models is built for the enterprise AI factory, with leading accuracy and highest efficiency for agentic workloads, and reframed the AI factory itself as a value-creation engine rather than only a cost center: the more you buy, the more you make. That only holds when the factory is highly utilized. SGT lets the high-value and sensitive workloads run on the same multi-tenant factory built to generate tokens and produce revenue for the whole organization, instead of in siloed, under-utilized carveouts. The high-value agentic work is no longer stranded behind an isolation tax, and can start contributing to business value.

The Nemotron 3 (Ultra, Super, Nano) family provides a highly capable, open-weight reasoning core for enterprise agentic systems. Released under the OpenMWD License, it allows teams to match model size to task—spanning from narrow, real-time workflows to complex, multimodal reasoning—customize, and deploy long running agentic workflows on their own infrastructure.

That combination already shows up across industries with high-value work and sensitive data, where deployments cluster into various potential patterns:

Enterprises are looking to scale agentic AI across their environments, including workflows that depend on proprietary, data. With NVIDIA Nemotron and Protopia’s Stained Glass Transform, they can extend open-model deployments to more of those workloads while using AI factory infrastructure more efficiently.

In regulated production at scale, payments, lending, and insurance teams run Nemotron 3 reasoning models for claims processing, document extraction, and transaction enrichment, collapsing brittle multi-step parsing into single-model agents.

In security and high-assurance SaaS, cybersecurity and enterprise software vendors embed Nemotron 3 Super in their agent platforms for investigation, triage, and tool-calling at production tempo.

Where proprietary IP sits at the center of the workflow, semiconductor, biotech, retail, and industrial firms build multi-agent systems that reason over design libraries, formulations, and supply chain models, where model control can make deeper customization feasible.

And at the source of multimodal work, document workflows in financial services and insurance combine scanned forms, photos, and handwriting, while manufacturing inspection pulls camera and sensor feeds into the same agent loop, with Nemotron 3 Nano Omni handling vision and text in a single model rather than stitching together per-modality services.

What SGT for Nemotron Delivers

SGT converts sensitive inputs into a protected, stochastic representation before they leave the data owner’s trust boundary. With SGT, the Nemotron serving environment, its logs, caches, GPU memory, and observability tooling, are designed not to hold raw data, and there is no decoder downstream, so the intended protection extends across the operational surfaces where sensitive inputs would otherwise persist during inference. And the protection is not limited to the base model: because the SGT is built for Nemotron itself, an enterprise that fine-tunes Nemotron can fine-tune the existing SGT alongside it, so a customized model can run on multi-tenant infrastructure with similar protection.

For the buyer, three things change.

First, where Nemotron runs is no longer constrained by data sensitivity. The same enterprise agentic workflow’s inference requests can run on the on-prem enterprise AI factory, in a sovereign region, in a hybrid split, or in the cloud, chosen on infrastructure economics and platform design rather than on what data each environment is allowed to touch.

Second, on infrastructure the enterprise operates, the isolation tax is reduced. Sensitive workloads that used to require their own carveout can share the same multi-tenant capacity as every other workload, which improves utilization toward the level the purchase was planned around. Where the enterprise instead consumes capacity it does not operate, the same change removes the dedicated-tenancy premium it would otherwise pay for isolation.

Third, sensitive and multimodal inputs become usable with the open model. PII, PHI, financial records, source code, and proprietary content can be sent into Nemotron 3 Super inference while avoiding plaintext exposure beyond the trust boundary. With Nano Omni the same holds for images and video carrying sensitive content.

Open weights and open recipes are how enterprises take control of their AI deployments. What has been missing was a way to use the most sensitive data with these models while avoiding plaintext exposure in the multi-tenant AI factory, and without giving up the ROI. SGT for Nemotron 3 closes that gap.

Where This Goes Next: “Useful” Agentic AI

Nemotron 3 Super is built for advanced reasoning in a multi-agent systems, and long-running enterprise agents are where sensitive data and inference meet constantly. SGT plugs seamlessly into NVIDIA OpenShell and NeMo Agent workflows through Protopia AI’s SafeCLAW, a subagent that performs the stochastic transform locally so only the protected representation ever leaves for inference. A sensitive agentic workflow can then route to a Nemotron endpoint on a multi-tenant AI factory like any other call. That pattern, more secure agent workflows handling sensitive work on multi-tenant infrastructure, is the subject of our companion piece, “Useful Agentic AI Has Arrived“.

What It Took to Build

Building an SGT for a 120B open model on a million-token-context architecture, on the timeline we had, came down to three teams working in parallel. NVIDIA provided expertise on NVIDIA NVIDIA GB200 NVL72 infrastructure, and this platform expanded what was feasible in the time available. This helped us run longer contexts with less parallelism, so training ran faster, and the systems held up through the runs without hardware-related crashes.

Dedicating a rack to evaluation mattered just as much: its scale and layout let us stand up many vLLM and Stained Glass Proxy deployments at once and test a wide range of generation-based metrics in days rather than the weeks the same work would take on prior-generation hardware. HPE Services took on the task of integration of Protopia AI SafeCLAW into NVIDIA OpenShell, giving the agentic harness they build for their enterprise customers a simple route to putting their AI factories to work. Protopia’s research and engineering teams adapted the SGT training methodology to Nemotron’s hybrid architecture and delivered the SGTs in record time.

The result is an SGT for Nemotron 3 Super and Nemotron 3 Nano Omni that is designed to preserve the open model’s accuracy and performance while removing plaintext exposure during inference.

Enterprises need to scale agent deployments without fragmenting policy, governance, or architecture. Agents run within governed OpenShell workspaces, while SafeCLAW enforces policy-based routing of sensitive interactions to sovereign inference endpoints aligned with the Stained Glass Transform Proxy. HPE AI Services ensure organizations maintain a consistent user experience while ensuring end-to-end protection of sensitive data by anchoring access control at the inference layer.

See It This Summer

Stained Glass Transform for Nemotron 3 Super and Nano Omni is here. The fastest way to put your most sensitive data to work with an open model is to start with SGT early access.

Request early access

We’re rolling out inference privacy for Nemotron live at HPE Discover, June 15 to 18 in Las Vegas. Stop by Booth 2034 to see SGT for Nemotron Super running on a multi-tenant AI factory and to talk through what it takes to stand it up on your own infrastructure.

You’ll have another chance to meet the team at the RAISE Summit in Paris this July, where we’ll be on the floor and ready to dig into what open models on sensitive data look like in production.

Share this article

Related blogs

Partner

Powering Shared, Secure Sovereign AI

brittany
July 20, 2026

Wand AI and Protopia AI Partner to Power Sovereign AI on Shared, Secure, High-Utilization Backend

Wand AI and Protopia AI today announced a partnership to deliver sovereign AI at national scale. The collaboration combines Wand’s operating system for the hybrid human and AI workforce with Protopia AI’s inference privacy layer, enabling a government or enterprise to run its most sensitive workloads across one shared sovereign backend at full utilization, without exposing data to the infrastructure operator or to other tenants.

Announcement

Private Token Factories

brittany
July 14, 2026

Private Token Factories: How Rafay and Protopia AI Let Sensitive Workloads Run on Shared GPU Capacity

The direct win is being able to serve workloads that used to require dedicated, single-tenant endpoints at multi-tenant utilization instead. In Protopia’s modeled deployment economics, that shift meaningfully lowers annual infrastructure cost for those workloads, since they’re no longer sitting in underutilized carve-outs (actual savings depend on workload mix and scale).

Inference Privacy for Agentic AI Workflows

brittany
June 17, 2026

Useful Agentic AI Has Arrived

Protopia AI’s SafeCLAW makes inference safe for sensitive data, so the work that was previously paying the isolation tax can route to the multi-tenant factory like everything else.

SGT for NVIDIA Nemotron 3

brittany
June 16, 2026