Confidential computing framework for GPT-based applications.
OpenAI-compatible API, multiple LLM backends, and TEE-backed isolation for data and model privacy.
Cube AI is a framework for building GPT-based applications using confidential computing. It protects user data and AI models with a trusted execution environment (TEE), which is a secure area of the processor that ensures code and data loaded inside it remain confidential and intact. This provides strong data confidentiality and code integrity even when the host environment is not fully trusted.
Traditional GPT-based applications often rely on public cloud services where operators or hardware providers can access prompts and model responses. Cube AI addresses these privacy concerns by executing inference inside TEEs, ensuring that user data and AI models remain protected from unauthorized access outside the enclave.
- Secure Computing: Cube AI uses secure enclaves to protect user data and AI models from unauthorized access.
- Trusted Execution Environment (TEE): Cube AI uses a trusted execution environment to ensure that AI models are executed securely and in a controlled environment.
- Scalability: Cube AI can handle large amounts of data and AI models, making it suitable for applications that require high performance and scalability.
- Multiple LLM Backend Support: Supports both Ollama and vLLM for flexible model deployment and high-performance inference.
- OpenAI-Compatible API: Provides familiar API endpoints for easy integration with existing applications.
Cube AI now supports vLLM, a high-throughput and memory-efficient inference engine for Large Language Models. vLLM provides:
- High Throughput: Optimized for serving multiple concurrent requests with continuous batching
- Memory Efficiency: Advanced memory management techniques for large models
- Fast Inference: Optimized CUDA kernels and efficient attention mechanisms
- Model Compatibility: Supports popular architectures including LLaMA, Mistral, Qwen, and more
Cube AI integrates with Ollama for local model deployment, providing:
- Model management and deployment
- Local inference
- Support for various open-source models
Cube AI uses TEEs to protect user data and AI models from unauthorized access. The TEE provides a secure execution space for trusted applications. In Cube AI, inference runs inside the TEE so prompts, responses, and model data are protected even if the host OS is compromised.
- Docker and Docker Compose
- NVIDIA GPU with CUDA support (recommended for vLLM)
- Hardware with TEE support (AMD SEV-SNP or Intel TDX)
-
Clone the repository
git clone https://github.com/ultravioletrs/cube.git cd cube -
Start Cube AI services
make up
-
Get your authentication token
All API requests require JWT authentication. Once services are running, obtain a token:
curl -ksSiX POST https://localhost/users/tokens/issue \ -H "Content-Type: application/json" \ -d '{ "username": "[email protected]", "password": "m2N2Lfno" }'
Response:
{ "access_token": "eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9...", "refresh_token": "..." } -
Create a domain
All API requests require a domain ID in the URL path. You can fetch a domain ID from the UI or create one via the API:
curl -ksSiX POST https://localhost/domains \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \ -d '{ "name": "Magistrala", "route": "magistrala1", "tags": ["absmach", "IoT"], "metadata": { "region": "EU" } }'
Response (includes
id):{ "id": "d7f9b3b8-4f7e-4f44-8d47-1a6e5e6f7a2b", "name": "Magistrala", "route": "magistrala", "tags": ["absmach", "IoT"], "metadata": { "region": "EU" }, "status": "enabled", "created_by": "c8c3e4f1-56b2-4a22-8e5f-8a77b1f9b2f4", "created_at": "2025-10-29T14:12:01Z", "updated_at": "2025-10-29T14:12:01Z" }Notes:
nameandrouteare required fields.routemust be unique and cannot be changed after creation.metadatamust be a valid JSON object.- Save the
idvalue for subsequent API requests.
-
Verify the installation
List available models (replace
YOUR_DOMAIN_IDwith the domain ID from step 4):curl -k https://localhost/proxy/YOUR_DOMAIN_ID/v1/models \ -H "Authorization: Bearer YOUR_ACCESS_TOKEN" -
Make your first AI request
curl -k https://localhost/proxy/YOUR_DOMAIN_ID/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \ -d '{ "model": "tinyllama:1.1b", "messages": [ { "role": "user", "content": "Hello! How can you help me today?" } ] }'
Cube AI exposes all services through a Traefik reverse proxy. All protected endpoints require the Authorization: Bearer <token> header with a valid JWT token.
Base URL: https://localhost/proxy/
Replace {domainID} with your domain ID from the Getting Started section.
| Method | Path | Description |
|---|---|---|
| GET | /{domainID}/v1/models |
List available models |
| POST | /{domainID}/v1/chat/completions |
Create chat completions |
| POST | /{domainID}/v1/completions |
Create text completions |
| GET | /{domainID}/api/tags |
List Ollama models |
| POST | /{domainID}/api/generate |
Generate completions |
| POST | /{domainID}/api/chat |
Chat completions |
Example:
# OpenAI-compatible endpoint
curl -k https://localhost/proxy/YOUR_DOMAIN_ID/v1/chat/completions \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"model":"tinyllama:1.1b","messages":[{"role":"user","content":"Hello"}]}'
# Ollama API endpoint
curl -k https://localhost/proxy/YOUR_DOMAIN_ID/api/tags \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN"Base URL: https://localhost/users
| Method | Path | Description |
|---|---|---|
| POST | /users |
Register new user account |
| POST | /users/tokens/issue |
Issue access and refresh token (login) |
| POST | /users/tokens/refresh |
Refresh access token |
| POST | /password/reset-request |
Request password reset |
| PUT | /password/reset |
Reset password with token |
Example:
curl -ksSiX POST https://localhost/users/tokens/issue \
-H "Content-Type: application/json" \
-d '{
"username": "[email protected]",
"password": "m2N2Lfno"
}'Base URL: https://localhost/domains
| Method | Path | Description |
|---|---|---|
| POST | /domains |
Create new domain |
| GET | /domains |
List domains with filters |
| GET | /domains/{domainID} |
Get domain details |
| PATCH | /domains/{domainID} |
Update domain name, tags, and metadata |
| POST | /domains/{domainID}/enable |
Enable a domain |
| POST | /domains/{domainID}/disable |
Disable a domain |
| POST | /domains/{domainID}/freeze |
Freeze a domain |
Example:
curl -ksSiX POST https://localhost/domains \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"name": "Magistrala",
"route": "magistrala1",
"tags": ["absmach", "IoT"],
"metadata": {
"region": "EU"
}
}'Configure vLLM settings through the environment:
make up-vllmFor Ollama integration:
make up-ollamaProject documentation is hosted at Cube AI docs repository.
Cube AI is published under the permissive Apache-2.0 license.
