OpenClaw Part 2: Voice AI, 1Password, and Real Costs
AI

📞 OpenClaw Part 2: Voice AI, 1Password, and Real Costs

Three weeks later: voice is live, secrets are in a vault, and the small-business cost model is clearer

Three weeks later

In the first article I documented the installation of OpenClaw on a DigitalOcean VPS: hardening, Google Workspace, and persistent memory. At the end I mentioned three things on the roadmap: voice, dashboard, and more automation.

Three weeks later, voice is live. This article documents what was implemented, what the test actually cost, and what a similar setup could look like for a small business evaluating a voice AI workflow.

1Password: secrets in a vault

In the first article, credentials lived as JSON files with restrictive permissions (chmod 700/600). It worked, but it was fragile: a rotated secret meant manually editing files on the server.

The solution was 1Password with its CLI (op) and a dedicated Service Account.

The implementation

CLI: op v2.33.1 installed on the VPS.

Service Account: a dedicated service account, with exclusive access to the “OpenClaw” vault. The service account token (OP_SERVICE_ACCOUNT_TOKEN) is injected from /opt/openclaw.env and referenced in the env block of openclaw.json.

“OpenClaw” vault: stores the secrets Nova needs - the ElevenLabs API key, the Twilio Account SID, Auth Token, and phone number.

--reveal flag: required to access sensitive fields. Without this flag, op returns references, not values. It is an additional layer of explicit intent.

Why it matters

An AI assistant with access to your terminal, email, and calendar already handles secrets - API keys, OAuth tokens, service credentials. The question is not whether to protect them, but how.

With 1Password:

Voice: Twilio + ElevenLabs + OpenAI

This is the part that turns the assistant from text into something that sounds human on the phone.

Voice architecture

The full pipeline:

  1. Incoming call - Twilio receives the call at the assigned number.
  2. Webhook - Twilio sends the call to the webhook endpoint configured on the VPS.
  3. DNS - A dedicated subdomain points to the VPS, and Caddy reverse proxies to 127.0.0.1:3334.
  4. STT - OpenAI transcribes audio to text.
  5. LLM - OpenAI generates the response used by OpenClaw during the call.
  6. TTS - ElevenLabs converts the response to speech (voice: Sarah, model eleven_multilingual_v2).
  7. Response - Audio returns to the caller via Twilio.

Voice security

Inbound allowlist: only authorized numbers can call. Everything else is rejected. This is not an open call center - it is a personal assistant with phone access.

Documented bug

During implementation, the voice_call tool did not work inside the Docker sandbox. The workaround was to configure the main agent explicitly GitHub issue #56367.

Real costs: data from the March 28 voice test

In the first article I estimated $25-70/month for the base infrastructure. The following data is only from the voice test session, not the general operating cost of OpenClaw:

ServiceCostDetail
OpenAI (STT + responses)$0.101,209 tokens, 68 requests logged during the test
ElevenLabs (TTS)$0.08786 characters, 48 sec audio, 13 requests
Twilio (voice calls)$4.1316 voice transactions during the test
Twilio (phone number)$2.30/moFixed monthly cost
Total variable (session)$4.31Excluding phone number monthly fee

What the test cost

The cleanest number from this first run is the full variable cost of the session: $4.31.

Twilio accounted for most of it. OpenAI and ElevenLabs were small at this scale. Across 16 voice transactions, Twilio averaged about $0.26 per transaction during the test session.

That is the more honest way to frame it. I do not yet have enough volume to claim a polished production average cost per completed call, but I do have enough data to say that the variable cost is still measured in cents, not dollars.

ElevenLabs pricing reference

ModelPublic reference pricing
Flash/Turbo TTS~$0.06-0.08 / 1K characters
Multilingual v2/v3 TTS~$0.12-0.17 / 1K characters
Scribe STT~$0.22-0.40 / hour

Source: elevenlabs.io/pricing and elevenlabs.io/pricing/api

Updated monthly cost

ComponentMonthly cost
VPS (DigitalOcean)$12-24
OpenAI (responses + STT)$10-50
Twilio (number + calls)$5-15
ElevenLabs (TTS)$2-10
Google APIsFree (within quotas)
1Password (Service Account)Included in business plan
Estimated total$30-100/month

This is the cost of my personal implementation - single user, low volume, research use.

How much does it cost to set up a voice AI service for a small business?

Everything above is my personal test. The practical question is different: if a small business wanted something similar, what would the monthly operating cost look like?

Voice traffic in small businesses

According to industry data, small businesses answer only 37.8% of inbound calls (AMBS Call Center, citing 411 Locals). A receptionist can handle between 50 and 100 calls per day (AMBS Call Center, citing LiveAgent). For a business with 15 to 25 employees, a planning volume of 50 to 100 daily calls is conservative.

Monthly operating costs

ComponentPlanMonthly costSource
VPS (DigitalOcean)Basic 2 vCPU, 4 GB RAM$24digitalocean.com/pricing
LLM / responsesOpenAI or another supported provider$30-80Varies by provider
STT (OpenAI)Transcription API$10-25platform.openai.com
TTS (ElevenLabs)Starter $5/mo - Pro $99/mo$5-99elevenlabs.io/pricing
Voice (Twilio)Inbound $0.0085/min, outbound $0.0140/min + number $1.15/mo$25-60twilio.com/voice/pricing
Secrets (1Password)Teams Starter Pack up to 10 users$201password.com/pricing
Email & productivity (Google Workspace)Business Standard $14/user/mo (annual) x 20 users$280workspace.google.com/pricing
OpenClawOpen-source, free$0docs.openclaw.ai
Total monthly infrastructure$394-588/month

In this proof of concept, OpenAI handled transcription and language responses end to end.

If the business already pays for Google Workspace and 1Password, the incremental voice AI stack drops to roughly $94-288/month.

Implementation cost (one-time)

OpenClaw is open-source and free, but putting it into production still requires a professional: server hardening, DNS, reverse proxy, integrations with Google Workspace, Twilio, ElevenLabs, and 1Password, plus voice configuration, testing, and documentation.

ItemEstimate
DevOps/systems engineer rate (freelance)$60-100/hour
Estimated implementation hours30-50 hours
Total implementation cost$1,800-5,000

Sources: average freelance DevOps rate in the US $60-100/hour (ZipRecruiter, Upwork).

Market reference

A 2022 Gartner forecast, cited publicly by Business Standard, estimated that conversational AI deployments would reduce contact center labor costs by $80 billion by 2026. Vendor summaries still put a voice AI interaction around ~$0.40 versus $7-$12 for a human-handled call (Ringly.io, citing Teneo.ai).

Vendor-reported cases

A dental practice with 40 daily calls automated appointment scheduling and reported $36,000 in annual operational savings, with a 2.9-month payback. An HVAC company eliminated its external answering service ($800/month) and captured revenue that had previously been lost after hours, reporting $48,000 annually in recovered value (P0STMAN, 2025).

In both cases, voice AI did not replace staff - it absorbed repetitive work the existing team could not keep up with.

Lessons learned (Part 2)

Plaintext secrets are technical debt. If your AI assistant has access to external APIs, those tokens should be in a vault from day one. Not after the first scare.

Voice changes the dynamic. A text assistant is a tool. An assistant that answers the phone with a natural voice is perceived as a service. The difference is not technical - it is about user expectation.

Open-source in production requires tolerance for ambiguity. Open issues, temporary workarounds, versions that break things. If you need everything to work on day one, use a SaaS. If you want total control, accept the cost of being your own support team.

Unit economics matter more than headlines. Not the total monthly cost, not the API line item - the important question is what each useful interaction costs. In this early test, the full variable session cost was $4.31, and Twilio averaged about $0.26 across 16 voice transactions. That is not a production benchmark yet, but it is enough to show that the economics are already understandable.

Author’s note

This analysis is the product of personal research and proof of concept. I’ve been working with [Microsoft Teams] (https://www.microsoft.com/en-us/microsoft-teams/small-medium-business), DID and virtual SBC since 2019 - PBX in the cloud in production. What I’m testing here is the next step: how artificial intelligence can improve and automate what already works.


This article is the second part of the OpenClaw series. The first part covers installation, hardening, and Google Workspace: Implementing OpenClaw: A Self-Hosted AI Assistant.

By: Cesar Rosa Polanco - Based on a real case, with editorial support from artificial intelligence.

First time here?

Explore the key topics and articles on this blog.

Start Here →
← Back to articles Available in Spanish →