Three weeks later
In the first article I documented the installation of OpenClaw on a DigitalOcean VPS: hardening, Google Workspace, and persistent memory. At the end I mentioned three things on the roadmap: voice, dashboard, and more automation.
Three weeks later, voice is live. This article documents what was implemented, what the test actually cost, and what a similar setup could look like for a small business evaluating a voice AI workflow.
1Password: secrets in a vault
In the first article, credentials lived as JSON files with restrictive permissions (chmod 700/600). It worked, but it was fragile: a rotated secret meant manually editing files on the server.
The solution was 1Password with its CLI (op) and a dedicated Service Account.
The implementation
CLI: op v2.33.1 installed on the VPS.
Service Account: a dedicated service account, with exclusive access to the “OpenClaw” vault. The service account token (OP_SERVICE_ACCOUNT_TOKEN) is injected from /opt/openclaw.env and referenced in the env block of openclaw.json.
“OpenClaw” vault: stores the secrets Nova needs - the ElevenLabs API key, the Twilio Account SID, Auth Token, and phone number.
--reveal flag: required to access sensitive fields. Without this flag, op returns references, not values. It is an additional layer of explicit intent.
Why it matters
An AI assistant with access to your terminal, email, and calendar already handles secrets - API keys, OAuth tokens, service credentials. The question is not whether to protect them, but how.
With 1Password:
- Secrets are rotated from the 1Password dashboard, without touching the server.
- The Service Account has access only to the vault it needs, nothing else.
- There is an audit trail for every access.
- If the server is compromised, secrets are not sitting in plaintext on disk.
Voice: Twilio + ElevenLabs + OpenAI
This is the part that turns the assistant from text into something that sounds human on the phone.
Voice architecture
The full pipeline:
- Incoming call - Twilio receives the call at the assigned number.
- Webhook - Twilio sends the call to the webhook endpoint configured on the VPS.
- DNS - A dedicated subdomain points to the VPS, and Caddy reverse proxies to
127.0.0.1:3334. - STT - OpenAI transcribes audio to text.
- LLM - OpenAI generates the response used by OpenClaw during the call.
- TTS - ElevenLabs converts the response to speech (voice: Sarah, model
eleven_multilingual_v2). - Response - Audio returns to the caller via Twilio.
Voice security
Inbound allowlist: only authorized numbers can call. Everything else is rejected. This is not an open call center - it is a personal assistant with phone access.
Documented bug
During implementation, the voice_call tool did not work inside the Docker sandbox. The workaround was to configure the main agent explicitly GitHub issue #56367.
Real costs: data from the March 28 voice test
In the first article I estimated $25-70/month for the base infrastructure. The following data is only from the voice test session, not the general operating cost of OpenClaw:
| Service | Cost | Detail |
|---|---|---|
| OpenAI (STT + responses) | $0.10 | 1,209 tokens, 68 requests logged during the test |
| ElevenLabs (TTS) | $0.08 | 786 characters, 48 sec audio, 13 requests |
| Twilio (voice calls) | $4.13 | 16 voice transactions during the test |
| Twilio (phone number) | $2.30/mo | Fixed monthly cost |
| Total variable (session) | $4.31 | Excluding phone number monthly fee |
What the test cost
The cleanest number from this first run is the full variable cost of the session: $4.31.
Twilio accounted for most of it. OpenAI and ElevenLabs were small at this scale. Across 16 voice transactions, Twilio averaged about $0.26 per transaction during the test session.
That is the more honest way to frame it. I do not yet have enough volume to claim a polished production average cost per completed call, but I do have enough data to say that the variable cost is still measured in cents, not dollars.
ElevenLabs pricing reference
| Model | Public reference pricing |
|---|---|
| Flash/Turbo TTS | ~$0.06-0.08 / 1K characters |
| Multilingual v2/v3 TTS | ~$0.12-0.17 / 1K characters |
| Scribe STT | ~$0.22-0.40 / hour |
Source: elevenlabs.io/pricing and elevenlabs.io/pricing/api
Updated monthly cost
| Component | Monthly cost |
|---|---|
| VPS (DigitalOcean) | $12-24 |
| OpenAI (responses + STT) | $10-50 |
| Twilio (number + calls) | $5-15 |
| ElevenLabs (TTS) | $2-10 |
| Google APIs | Free (within quotas) |
| 1Password (Service Account) | Included in business plan |
| Estimated total | $30-100/month |
This is the cost of my personal implementation - single user, low volume, research use.
How much does it cost to set up a voice AI service for a small business?
Everything above is my personal test. The practical question is different: if a small business wanted something similar, what would the monthly operating cost look like?
Voice traffic in small businesses
According to industry data, small businesses answer only 37.8% of inbound calls (AMBS Call Center, citing 411 Locals). A receptionist can handle between 50 and 100 calls per day (AMBS Call Center, citing LiveAgent). For a business with 15 to 25 employees, a planning volume of 50 to 100 daily calls is conservative.
Monthly operating costs
| Component | Plan | Monthly cost | Source |
|---|---|---|---|
| VPS (DigitalOcean) | Basic 2 vCPU, 4 GB RAM | $24 | digitalocean.com/pricing |
| LLM / responses | OpenAI or another supported provider | $30-80 | Varies by provider |
| STT (OpenAI) | Transcription API | $10-25 | platform.openai.com |
| TTS (ElevenLabs) | Starter $5/mo - Pro $99/mo | $5-99 | elevenlabs.io/pricing |
| Voice (Twilio) | Inbound $0.0085/min, outbound $0.0140/min + number $1.15/mo | $25-60 | twilio.com/voice/pricing |
| Secrets (1Password) | Teams Starter Pack up to 10 users | $20 | 1password.com/pricing |
| Email & productivity (Google Workspace) | Business Standard $14/user/mo (annual) x 20 users | $280 | workspace.google.com/pricing |
| OpenClaw | Open-source, free | $0 | docs.openclaw.ai |
| Total monthly infrastructure | $394-588/month |
In this proof of concept, OpenAI handled transcription and language responses end to end.
If the business already pays for Google Workspace and 1Password, the incremental voice AI stack drops to roughly $94-288/month.
Implementation cost (one-time)
OpenClaw is open-source and free, but putting it into production still requires a professional: server hardening, DNS, reverse proxy, integrations with Google Workspace, Twilio, ElevenLabs, and 1Password, plus voice configuration, testing, and documentation.
| Item | Estimate |
|---|---|
| DevOps/systems engineer rate (freelance) | $60-100/hour |
| Estimated implementation hours | 30-50 hours |
| Total implementation cost | $1,800-5,000 |
Sources: average freelance DevOps rate in the US $60-100/hour (ZipRecruiter, Upwork).
Market reference
A 2022 Gartner forecast, cited publicly by Business Standard, estimated that conversational AI deployments would reduce contact center labor costs by $80 billion by 2026. Vendor summaries still put a voice AI interaction around ~$0.40 versus $7-$12 for a human-handled call (Ringly.io, citing Teneo.ai).
Vendor-reported cases
A dental practice with 40 daily calls automated appointment scheduling and reported $36,000 in annual operational savings, with a 2.9-month payback. An HVAC company eliminated its external answering service ($800/month) and captured revenue that had previously been lost after hours, reporting $48,000 annually in recovered value (P0STMAN, 2025).
In both cases, voice AI did not replace staff - it absorbed repetitive work the existing team could not keep up with.
Lessons learned (Part 2)
Plaintext secrets are technical debt. If your AI assistant has access to external APIs, those tokens should be in a vault from day one. Not after the first scare.
Voice changes the dynamic. A text assistant is a tool. An assistant that answers the phone with a natural voice is perceived as a service. The difference is not technical - it is about user expectation.
Open-source in production requires tolerance for ambiguity. Open issues, temporary workarounds, versions that break things. If you need everything to work on day one, use a SaaS. If you want total control, accept the cost of being your own support team.
Unit economics matter more than headlines. Not the total monthly cost, not the API line item - the important question is what each useful interaction costs. In this early test, the full variable session cost was $4.31, and Twilio averaged about $0.26 across 16 voice transactions. That is not a production benchmark yet, but it is enough to show that the economics are already understandable.
Author’s note
This analysis is the product of personal research and proof of concept. I’ve been working with [Microsoft Teams] (https://www.microsoft.com/en-us/microsoft-teams/small-medium-business), DID and virtual SBC since 2019 - PBX in the cloud in production. What I’m testing here is the next step: how artificial intelligence can improve and automate what already works.
This article is the second part of the OpenClaw series. The first part covers installation, hardening, and Google Workspace: Implementing OpenClaw: A Self-Hosted AI Assistant.
By: Cesar Rosa Polanco - Based on a real case, with editorial support from artificial intelligence.