Frequently Asked
Questions
Product Differentiation and Technology
Protopia AI’s solution is fundamentally different from data masking solutions in the market. We are not scanning the data to finding anything in particular to mask. For inference, our Stained Glass Factory™ software runs as an optimization step at the end of model training. This optimization step creates a Stained Glass Transform™ corresponding to the trained model. Using this Stained Glass Transform™, you will transform all data elements in each data record (both what may or may not have been masked by a masking solution) yet your model will be able to use that transformed data to make accurate predictions without needing to go back to the original data. In no part of this process does our software scan the data being used for predictions.
Yes, SGT is model-agnostic and can be applied to various deep-learning models, including LLMs and computer vision models. The materials demonstrate its use with Mistral-based LLMs, Vision Models, and other AI models for diverse tasks.
SGT applies transformations after the initial input embedding layer. This ensures minimal disruption to the model’s architecture and functionality. Tokenization and embedding transformations are handled entirely by SGT inside the enterprise, and only the transformed embeddings are sent to the model hosting environment. The model itself remains unchanged, benefiting from the enhanced privacy without requiring any modifications.
Performance and Scalability
SGT is orders of magnitude faster (900x-15,000x faster) than similar protection techniques like Fully Homomorphic Encryption (FHE) or Secure Multi-Party Computation (SMPC). It adds milliseconds of latency (while other techniques often take seconds).
Please see table below.
Cryptographic Technique |
Release Year |
DNN |
Dataset |
Inference Time (sec) – Encrypted (A) |
Inference Time (sec) – Conventional |
Inference Time (sec) – Protopia (B) |
Speedup vs Cryptography (B:A) |
FALCON [2] |
2020 |
VGG-16 |
ImageNet |
12.96 |
0.0145 |
0.0148 |
906x |
Crypten [3] |
2019 |
ResNet-18 |
ImageNet |
8.30 |
0.0121 |
0.0123 |
691x |
GAZELLE [4] |
2018 |
ResNet-32 |
CIFAR-100 |
82.00 |
0.0112 |
0.0113 |
7,454x |
MiniONN [5] |
2017 |
LeNet-5 |
MNIST |
9.32 |
0.0007 |
0.0007 |
14,121x |
SGT demonstrated notable speedups compared to conventional cryptographic protection. Source.
SGT is designed to minimize the impact on model accuracy. In many cases, models trained or deployed with SGT-transformed data achieve near identical performance to their plain-text counterparts in industry standard benchmarks.
Model |
|
HellaSwag |
MMLU |
Truthful QA |
ARC |
Mistral w/ Stained Glass |
98.44% |
76.67% |
55.39% |
67.90% |
51.02% |
Mistral w/o Stained Glass |
0% |
76.53% |
57.21% |
68.27% |
50.94% |
Utility and Obfuscation Results of Mistral + Stained Glass Transform
The computational overhead of applying Stained Glass Transform™ is minimal. This is made possible by our patented technology which runs as an optimization stage at the end of model training.
The stochastic representation of data is fully compatible with the target model, so all existing metrics such as accuracy, precision, perplexity, etc. can be calculated exactly the same as without SGT. When the content of the obfuscated prompts are important, Protopia AI employs a unique identifier which the model provider can use to request access to the originals from the data owner.This allows for accurate measurement of model performance without exposing the raw data and maintaining data sovereignty These techniques are fully compatible with fine-tuning scenarios.
Data Security and Privacy
Protopia AI’s proprietary Stained Glass Transform (SGT) technology was created to safeguard your most valuable, private data and rapidly build AI innovation. SGT converts the input data and prompts that organizations absolutely do not wish exposed when running their AI into randomized representations. These are useless for human interpretation but retain the full utility of the underlying data for AI systems. This mitigates your risks in a few ways:
Data Leakage, Sensitive Information Theft | |
Challenge | Protopia’s Solution |
AI initiatives lack access to the most valuable enterprise data due to security and privacy concerns, hindering potential breakthroughs. | SGT creates safe representations of your organizational and your enterprise customers’ most valuable data to maximize the quality of data available for Generative AI while ensuring that no raw data leaves its root of trust. |
Sensitive information movements from new data workflows and processes, driven by the increased demand of AI increases exposure risk to malicious actors and compromising your security posture. | Sensitive data is transformed into randomized representations, unlocking AI/LLM utility without exposing or leaking original data. |
Because Stained Glass Transform turns text into transformed input embeddings, SGT can integrate easily into existing Retrieval Augmented Generation (RAG) pipelines. The retrieval mechanism (often some sort of vector database) remains unchanged for the application (unprotected documents fill the RAG database within the enterprise and regular sentence embeddings are used for the search). Compiled prompts (including the retrieved documents and any other user queries) are then jointly sent through Stained Glass Transform before leaving the enterprise and sent to the model provider.
After creating a Stained Glass Transform for a foundation LLM, that model can be fine-tuned using protected data. Because SGT’s outputs are embeddings fully compatible with its corresponding base model, fine-tuning a foundation model looks exactly the same as without SGT, except that the data is first transformed. Use cases might include fine-tuning the LLM for improved performance on transformed data or for fine-tuning it for performance on new types of text it did not see during pre-training. SGT is fully compatible with either directly fine-tuning the LLM weights or with other techniques such as LoRA.
SGT complements encryption. While encryption protects data in transit and at rest, SGT safeguards data when it is being used by AI models. SGT’s transformations render data unintelligible to humans but still understandable by the target model, regardless of encryption, adding an extra layer of protection.
No, there is not. The transformed data is not encrypted for there to exist a key. As such, there is also no decryption.
Yes, we can enable you to provide customized versions of your data that do not expose all the information in each data record for customers or 3rd party AI service providers that want to validate their ML models with your data.
No, we never input any customer’s data. Transformations are done within the customer’s own data ingestion pipeline.
Protopia’s solution works within your enterprise’s own infrastructure. Neither the model, nor data need to be exposed to the outside world.
SGT strengthens data sovereignty by ensuring that sensitive data never leaves the client’s trusted environment in plain text. Stochastic transformations are unintelligible and irreversible to all observers except the machine learning model, helping organizations meet compliance requirements related to data protection and privacy.
Highly regulated industries, like banking, often prohibit the use of external models due to strict data security and compliance requirements. SGT provides a viable alternative to techniques like fully homomorphic encryption (which is not yet mature) for inference and fine-tuning. This allows banks to leverage the power of LLMs while maintaining complete control over their sensitive data.
Protopia AI prioritizes platform security and employs standard security best practices, including VPC peering, to protect customer data. No extraordinary measures beyond industry norms are taken, ensuring a secure and reliable environment for deploying SGT.
Ownership and Intellectual Property
Protopia AI’s Stained Glass Transform (SGT) technology does not induce any transfer of data ownership and intellectual property. Protopia AI provides a tool, Stained Glass Engine, to enable customers to create a custom SGT for their own model and for their own data without needing to share either. The data owner retains full rights over their data, before and after transformation.
Protopia AI does not own the customer generated stained glass transforms themselves. Protopia sells licenses to the Stained Glass Engine software that customers use to create the transforms. The customer that generates the transforms for their target AI models has a license to use the engine and their resulting transforms.
The original data owner retains full ownership of the data, both before and after it is transformed using the stained glass process. Protopia AI does not ingest or take possession of the data at any point. The transformed data remains the property of the original data owner.
Deployment and Integration
Stained Glass Transform™ can be applied anywhere on the path of data from storage (on-premises or cloud) to the platform being used for ML. For inference, this is done dynamically with minimal latency as data is being loaded for inference and a new copy of the transformed data is not necessary. As such, storage requirements do not increase by using Protopia AI’s solution. For training, it is often desirable to create a new transformed training dataset using Stained Glass Transform™ before access to that data is given to data scientists.
Stained Glass Transform™ is incredibly simple to deploy because it has very low computation overhead. This simplicity makes it very flexible where you deploy the transformations: edge or server-side. In many cases it makes most sense to deploy the transformations close to where the data lives, at on-premises storage servers, or cloud data platforms
No, SGT is hardware-agnostic and can run on commodity CPUs or GPUs. It does not require confidential computing environments or specialized security hardware.
SGT is designed for seamless integration into existing AI/ML pipelines. It is implemented via lightweight PyTorch hooks to avoid changes to the model training code, and offers specialized integrations with ML libraries such as Hugging Face Transformers and PyTorch Lightning. There are minimal code changes and no disruption to existing workflows.
Protopia AI offers SGT as containerized software, enabling flexible deployment across diverse environments. They are also building partnerships with infrastructure providers like AWS, OCI, and others, to simplify access and deployment for enterprise customers.
SGT can be integrated directly into an application using our python package or as an API service. For a minimally invasive integration, Protopia AI also offers an OpenAI-compatible REST API to be deployed within the enterprise which receives data, transforms it with SGT, and then forwards it onto an external model provider. Because it mimics the OpenAI REST specification, many applications can use it simply by pointing to the instance inside of the enterprise. Because SGT is applied inside the enterprise, no unprotected data is ever sent to a model provider.
Customer Support and Implementation
Protopia AI understands that many large enterprises, while eager to adopt AI, lack the in-house expertise to effectively implement and manage complex AI systems. SGT simplifies this process by offering both self-service options for developers and custom tuning services for clients with more specialized needs. Protopia AI also provides guidance and support to ensure successful implementation and maximize the value of SGT for clients of all expertise levels.
Collaboration with Protopia AI begins with licensing the Stained Glass Engine (SGE), which includes Stained Glass Core and other necessary components. Protopia AI provides comprehensive tools and support, enabling customers to conduct their own training and fine-tune SGT for their specific use cases.
SGT is ideally trained per use case using data that closely resembles the intended inference data. This ensures optimal performance and privacy for the specific application. Each model requires its own SGT. Protopia AI recommends quick training on diverse datasets like OpenOrca or specialized datasets tailored to the client’s industry or domain (e.g., finance datasets for banking applications).
Yes, Protopia AI offers out-of-the-box SGT implementations for popular models like Llama2 and Mistral, simplifying deployment and accelerating time to value for customers. These models can be quickly fine-tuned using the Stained Glass Engine and the provided tools, allowing customers to tailor them to their specific use cases while maintaining data confidentiality.
The easiest way to experience SGT is to request a login to Protopia AI’s web demo, which provides a hands-on environment for exploring its capabilities. Please contact us to get access.
The pre-configured demo environment is the fastest way to explore SGTs capabilities. We offer integration guides and custom configuration services as well. Please contact us directly to learn more.
For each open-source model, a custom SGT can be readily generated by running Stained Glass Engine. For proprietary vendor models, integration with the vendor who owns the model is required to create a custom SGT. Protopia AI has existing relationships with many model providers.