TwinHermes

Your always-on agent, hosted on infrastructure you control.

TwinHermes is the infrastructure-as-code that deploys Nous Research's self-hosted Hermes Agent onto a hardened, always-on GCP VM — reachable by chat, SSH, an OpenAI-compatible API, and a web dashboard, with every secret kept on your own box.

Deployed and operating. Live at https://hermes.twindevs.ai and on Telegram as @TwinHermesGCP_bot. This is an internal Twindevs deployment, not a public sign-up product.

Open the portal How privacy works

Why it exists

What you get.

Self-hosted, not someone else's cloud agent — the Hermes Agent runs on a GCP VM you provision and own. API keys, OAuth tokens, skills, and memories all live on the box (chmod 600), never in a third-party agent platform.
One agent, four ways to reach it — the same persistent agent answers over Telegram and Discord, a direct SSH/CLI session, an OpenAI-compatible HTTPS API, and a login-gated web dashboard.
Reproducible from scratch — the entire deployment is captured as Terraform plus a handful of provisioning scripts, so the VM, network, firewall, static IP, and gateway service can be rebuilt the same way every time.
Provider-agnostic with a built-in fallback chain — routes across GPT-5.5, Claude, Grok, Gemini, OpenRouter, and NVIDIA NIM, so one provider being down or rate-limited doesn't take the agent offline.
Low, predictable cost — runs on a single e2-medium VM at roughly $25-35/month for infrastructure; the autonomous background loop is routed to free models to keep always-on cost near zero.

Capabilities

Built for control, not lock-in.

OpenAI-compatible API endpoint

Caddy serves a public HTTPS API at hermes.twindevs.ai/v1/* (plus a /health check), guarded by a mandatory bearer token. Point any OpenAI-compatible client at it and talk to the agent the way you'd call any chat-completions endpoint.

Always-on chat over Telegram and Discord

Message the agent from your phone via @TwinHermesGCP_bot. New devices get a pairing code that the operator approves on the box, and a DM allowlist controls who can talk to it. Discord is configured through the same gateway setup.

Login-gated web dashboard

A browser dashboard at the root domain redirects to /login and, once signed in, exposes sessions, chat, model config, logs, and channels with live WebSocket updates. It binds 0.0.0.0:9119 behind its own form-login and signs an HttpOnly session cookie, so live features keep working through the proxy.

SSH workstation access via IAP

Connect from anywhere with GCP Identity-Aware Proxy tunneling — no public-IP allowlist to maintain. Firewall opens port 22 only to the IAP range (35.235.240.0/20); your Google identity tunnels in. On the box you run the hermes CLI directly (status, doctor, model, logs).

Model routing with a fallback chain

config.yaml sets GPT-5.5 (via Codex OAuth) as the primary and falls back through claude-opus-4-8, grok-4.3, and a Gemini-class model. Auxiliary roles route background/loop work and fast tasks to cheaper or free models. The routing is documented in the repo and re-tunable with the hermes CLI.

Autonomous background worker

Hermes runs a turn-triggered learning loop that can author skills and write memories, plus hermes cron for scheduled prompts and hermes webhook for event-driven runs. Subagent delegation handles multi-step work one job at a time. Background work is pinned to free models to keep cost near zero.

Terraform-provisioned, systemd-run

Terraform stands up the project network, subnet, firewall rules, static regional IP, and the e2-medium Debian 12 VM. Provisioning scripts install the Hermes gateway as unprivileged systemd user services (hermes-gateway, hermes-dashboard) so it stays up 24/7 across reboots.

Secrets kept local and least-privilege by default

API server and dashboard bind to loopback and are reached only through Caddy. The agent runs as an unprivileged hermes user, .env and auth.json are chmod 600, dangerous-command approval is manual, and nothing secret is committed — env files are gitignored and live only on the box.

How it works

Connect, organize, operate.

Provision the infrastructure: a bootstrap script enables the required GCP APIs and checks billing and service-account roles, then `terraform apply` creates the VM, network, firewall (with IAP), and static IP.
Configure the VM: a script pushes the .env, config.yaml, and Caddyfile to the box and installs the Hermes gateway as systemd user services.
Bring the providers online: the operator runs one-time OAuth bootstraps (GPT-5.5/Codex, Claude Max, SuperGrok) over the IAP tunnel and drops in API keys for the remaining providers.
Wire up access: point Cloudflare DNS at the static IP (gray-cloud) so Caddy auto-issues TLS, then set up messaging — add bot tokens and approve device pairing for Telegram and Discord.
Use it from anywhere: chat from your phone, curl the OpenAI-compatible API with the bearer token, sign in to the web dashboard, or SSH into the box to drive the hermes CLI directly — all hitting the same persistent agent.

Who it's for. Operators and small teams who want a persistent, self-hosted AI agent on infrastructure they own rather than a managed third-party agent service · Developers who need an OpenAI-compatible endpoint backed by a real agent (memory, skills, tools) instead of a bare model API · Privacy-conscious users who require that API keys, OAuth tokens, conversation sessions, and learned memories stay on their own VM · Power users who want to reach one agent from multiple surfaces — phone chat, terminal, API, and a browser dashboard

FAQ

Questions, answered.

Is TwinHermes the AI model itself?

No. TwinHermes is the deployment and operations layer — Terraform, provisioning scripts, config, and docs — for Nous Research's Hermes Agent. The actual inference is fully remote: the agent calls provider APIs and OAuth subscriptions (OpenAI/Codex, Anthropic, xAI, Gemini, OpenRouter, NVIDIA NIM). The VM runs the gateway, sessions, and tools, not the models.

Where do my keys and data live?

On your own VM. Secrets sit in gitignored local env files and on the box at /home/hermes/.hermes/.env (chmod 600). The API server and dashboard bind to loopback and are only reachable through Caddy. The repo commits no real secret values.

How is access secured?

The only public ports are 80/443 served by Caddy with auto-TLS. The API requires a bearer token; the dashboard enforces its own form-login. SSH is reached through GCP Identity-Aware Proxy — port 22 is open only to the IAP range, and your Google identity tunnels in, so there's no public-IP allowlist to maintain. The agent runs as an unprivileged user with manual approval required for dangerous commands.

What does it cost to run?

The design estimates roughly $25-35/month for infrastructure (an e2-medium VM, a static IP, and egress). Inference is billed separately to your own provider subscriptions and keys, and the autonomous background loop is routed to free models to keep the always-on cost near zero.

Can I use a different model or change the routing?

Yes. The provider routing and fallback chain are defined in config.yaml and are documented as re-tunable via the hermes CLI. The repo ships an operator-ranked default (GPT-5.5 primary, then Claude, Grok, and Gemini-class fallbacks), but the exact models and order are meant to be adjusted. Note that exact model IDs and a couple of provider details (for example the Nous Portal mechanism and a Gemini key value) are flagged in the design as items to validate at configuration time.

The rest of the suite

One privacy standard, five tools.

TwinMail

Inbox at the speed of intent.

Learn more
TwinContacts

Make your contacts trustworthy — and keep them that way.

Learn more
TwinVault

Your household's accounts, credentials, and security posture — in one local vault.

Learn more
TwinSystem

One repo for the whole smart home.

Learn more

Start with TwinHermes.

Privacy-first by default. Your data stays yours.

Open the portal See pricing