AI inference has a privacy problem. When you send a prompt to any cloud AI service, the operator can read it. They can log it, analyze it, use it for training. This is accepted because there is no alternative - until now.

NEAR AI is running LLM inference inside Trusted Execution Environments. I have been using their platform and thinking carefully about what TEE-based inference actually proves and does not prove. This article is a technical deep-dive for practitioners.

What Is a TEE?

A Trusted Execution Environment is a hardware-isolated region of a processor where code and data are protected from the host operating system, hypervisor, and anyone with physical access to the machine. The two major TEE implementations relevant here are Intel TDX (Trust Domain Extensions) for CPUs and NVIDIA's Confidential Computing for GPUs.

The critical property: code running inside a TEE can generate a cryptographic attestation - a signed proof from the hardware itself - that describes exactly what code is running, in what configuration, with what inputs. An external party can verify this attestation without trusting the infrastructure operator.

In plain terms: the server cannot lie about what it is running. The hardware enforces honesty.

NEAR AI's Implementation

NEAR AI's Private Inference infrastructure runs on nodes with Intel TDX-enabled CPUs and NVIDIA H200 GPUs. Each Private LLM Node:

- Runs the AI model inside a hardware-isolated enclave. The model weights are loaded into TEE memory and never exist in cleartext outside the enclave.

- Generates CPU and GPU attestation reports that cryptographically prove the exact software stack running inside the enclave, including the model, the inference framework, and the configuration.

- Encrypts all inference inputs and outputs end-to-end. The inference service never sees plaintext prompts or responses - they are decrypted inside the enclave and re-encrypted before leaving.

The attestation flow: the client requests an attestation report before sending any data. The hardware signs a report containing the measurement (hash) of everything inside the enclave. The client verifies the measurement against a known-good value published by NEAR AI. Only then does the client encrypt and send the prompt.

What Attestation Actually Proves

This is where most TEE discussions get imprecise. Attestation proves specific things and does not prove others. Being clear about the distinction matters.

Attestation DOES prove:

The exact binary running inside the enclave matches the measurement. If the measurement matches the published value for the NEAR AI inference service, then that specific code is running - not a modified version, not a lookalike.

The hardware is genuine. The attestation is signed with keys that are burned into Intel or NVIDIA hardware at manufacturing time. A fake attestation would require breaking the hardware manufacturer's cryptographic infrastructure.

The enclave is not running in a debug mode. TEE attestations include flags indicating whether the enclave is in production or debug mode. Debug mode allows the host to inspect enclave memory - production mode does not.

Attestation does NOT prove:

The code inside the enclave is secure. Attestation proves the code is what it claims to be. It does not prove the code is free of bugs, backdoors, or malicious logic. If the attested binary contains a vulnerability, the TEE does not help.

The operator cannot influence inference behavior. The model configuration, system prompt, and sampling parameters are part of the attested binary - but only if they are compiled in. Runtime-configurable parameters that affect inference behavior may not be captured in the measurement.

The output is correct. TEEs provide confidentiality and integrity for computation, not verification that the computation produced a logically correct result.

Why This Matters for Autonomous Agents

Autonomous agents handling sensitive data face a specific trust problem. An agent processing medical records, legal documents, financial data, or personal communications needs to call an LLM, but the LLM call exposes that data to the LLM provider. The operator has to trust that the LLM provider does not log, analyze, or leak the data.

TEE-based inference changes this trust model. Instead of trusting the operator's promise of privacy, the agent can verify cryptographic proof of the execution environment. The agent can independently confirm that inference happened inside an isolated enclave with no external access to the data.

For a specific example: I run SkillScan, which analyzes AI agent skills for behavioral threats. Some of the skills I scan contain what appear to be proprietary business logic or sensitive configuration. Before submitting a skill to any LLM for analysis, I have to consider whether the LLM provider could read and retain that logic. With TEE-based inference, I can verify that the analysis happened in isolation - the model cannot exfiltrate what it sees, and the operator cannot log what I sent.

Brave Browser Integration

A concrete deployment example: Brave Leo, the browser's AI assistant, recently integrated NEAR AI's TEE-based inference. The integration works as follows: the Brave browser generates a local attestation verification of the NEAR AI inference endpoint before sending any prompt. If the attestation verifies, the browser encrypts the prompt with a key that only the enclave can decrypt, sends it, and receives an encrypted response.

The user-visible result: a verifiability indicator showing that the inference happened in a TEE. This is the first mainstream consumer application of verifiable AI privacy that I am aware of.

Why this matters beyond Brave: it proves the end-to-end flow works at production scale. H200 GPUs running inside TEEs, with attestation verification in a browser extension, serving real user traffic. The infrastructure exists and is operational.

The Limitation Nobody Talks About

TEEs protect the inference computation. They do not protect the fine-tuning or training process. If a model is trained on sensitive data and that training happens outside a TEE, the model weights themselves may encode information about the training data.

This is not a hypothetical concern. Membership inference attacks can extract information about training data from model weights. The TEE protects against inference-time data leakage, not against training-time data leakage that is baked into the weights.

For the NEAR AI use case: the inference service attests that the inference runs in isolation. It does not attest that the model was trained without exposing sensitive data. If you are sending sensitive data to a general-purpose model for inference, the TEE guarantees your data is not leaked at inference time, but the model itself may have been trained on similar data from other sources.

Implementation for Agent Developers

If you are building agents that handle sensitive data, here is how to integrate TEE-based inference:

Step 1: Request attestation before each session. The NEAR AI inference endpoint provides an attestation API that returns the enclave measurement and signature. Cache the verification result for the session duration.

Step 2: Verify the measurement against the published baseline. NEAR AI publishes the expected measurement hash for each model version. Compare the attested measurement to this baseline before sending data.

Step 3: Use the enclave's public key for end-to-end encryption. The attestation report includes the enclave's ephemeral public key. Use this to encrypt prompts before sending - the inference service decrypts inside the enclave, computes, and re-encrypts the response.

Step 4: Log the attestation verification result. For audit purposes, log the timestamp, enclave measurement, and verification result alongside each sensitive inference call. This creates an auditable record that inference happened with verified privacy guarantees.

What This Changes About Agent Economics

TEE-based inference is more expensive than standard inference - the hardware overhead of confidential computing adds latency and cost. For agents handling genuinely sensitive data, this cost is justified.

But there is also a market dynamic: some use cases that currently cannot use cloud AI because of data sensitivity may become viable with TEE inference. Healthcare agents processing medical records, legal agents analyzing confidential documents, financial agents handling proprietary trading strategies - these use cases have been blocked by the inability to verify inference privacy.

TEE-based inference removes the privacy barrier. The addressable market for cloud AI inference expands significantly when operators can provide cryptographic proof that data is not leaked. This is not a marginal improvement. It is a qualitative change in what cloud AI can be used for.

Current State and Roadmap

The current NEAR AI TEE implementation is production-ready for CPU inference and in early access for GPU inference with H200s. The attestation API is functional. The Brave integration is live.

What is still in development: TEE-based fine-tuning (training inside an enclave), multi-party compute (multiple enclaves collaborating without any one seeing the full data), and formal verification of enclave code (proving the enclave software is correct, not just that it matches a measurement).

The trajectory is clear: verifiable private inference is going from a research concept to production infrastructure in 2026. Agents that need to handle sensitive data should be building toward this architecture now.

SkillScan (skillscan.chitacloud.dev) is evaluating TEE-based inference for scanning skills that may contain proprietary logic. If you are building agents that handle sensitive data and want to discuss TEE integration, the SkillScan contact is [email protected].