Hardware encrypted inference.
No logs. No limits.
An encrypted AI gateway for every model. Drop-in OpenAI SDK replacement with zero data retention.
Integration
Secure computation, end-to-end
Inference leading LLMs run inside verifiably secure runtimes, powered by Intel TDX and NVIDIA Confidential Computing architectures.
1import OpenAI from "openai"
2
3const zima = new OpenAI({
4 baseURL: "https://www.zima.chat/api/v1",
5 apiKey: process.env.ZIMA_KEY,
6})
7
8const response = await zima.chat.completions.create({
9 model: "qwen3-next-80B-a3B-instruct",
10 messages: [{ role: "user", content: "..." }],
11})
12
13// encrypted in hardware. zero data retained.Hardware-grade protection for every inference request.
Infrastructure
Inference you can count on
Hardware Encryption
Silicon-level TEE enclaves protect every inference request. Data encrypted in memory, sealed from host infrastructure.
Zero Retention
No prompts, outputs, or metadata survive past completion. Memory cryptographically wiped after every request.
OpenAI Compatible
Drop-in replacement for the OpenAI SDK. Switch your base URL, keep your existing code and tooling.
100+ Models
OpenAI, Anthropic, Google, Meta, and Mistral through a single encrypted endpoint. One key, zero vendor sprawl.
#Architecture
Verified Inference Path
From SDK request to protected output, without storing prompts or responses.
#Models
Model Pricing
Provider costs passed through at-rate. Zero markup.
| Model | Provider | Input / 1M | Output / 1M |
|---|---|---|---|
| GPT-4o | OpenAI | $2.50 | $10.00 |
| Claude Sonnet 4 | Anthropic | $3.00 | $15.00 |
| Gemini 2.5 Pro | $1.25 | $5.00 | |
| Llama 3.1 70B | Meta | $0.50 | $0.75 |
| Mistral Large | Mistral | $2.00 | $6.00 |
Your prompts never leave the enclave
Intel TDX + NVIDIA confidential compute — data encrypted in memory, sealed from host infrastructure.
Even we can't read it.
1 / 5
“We evaluated six AI gateways. Zima was the only one where our security team signed off without a single objection.”
Sarah Chen
VP Engineering, Meridian Health
Frequently asked questions
Security, compatibility, and pricing.
No. Requests exist only for the duration of execution inside the enclave. No prompts, outputs, metadata, or logs are persisted. This is enforced by hardware, not policy.
Access is blocked immediately. If the system cannot cryptographically verify that the runtime environment is running approved, unmodified software, no data is processed.
Yes. Zima uses the chat/completions format. Point your existing OpenAI SDK at our base URL and it works — same request shape, same response shape, zero retention.
Every model in the catalog is labeled with its security tier. TEE-hosted models run entirely inside hardware enclaves. Proxied models are routed through encrypted transit but execute on the provider’s infrastructure.
Per-token, per-model, pay-as-you-go. No platform fees, no hidden tiers, no commitments. Pricing is displayed on every model card before you send a request.
Get Started
Start encrypting your inference today.
Point the OpenAI SDK at Zima, keep model choice and pricing visibility, and move sensitive traffic into attested hardware.