When someone says “I use AI at work,” they are often talking about three distinct layers: the model, the rules the model ships with from the factory, and the platform that connects it to your data. Understanding those layers helps move from using AI to using it with better judgment. This article shares Anthropic’s model family through real cases from my own infrastructure, and explains when Microsoft Copilot may make more sense than calling Claude directly.
First layer: the model
Anthropic maintains four model tiers. The most useful choice depends on the task, not on the prestige of the name. Here are the four, each with a real case from my own work.
Haiku: the automation model
What it is. The fastest and cheapest model in the family. It trades reasoning depth for speed and minimal cost.
Real case. My server runs a skill that publishes articles to a family member’s blog automatically, triggered by cron, with no human intervention. It also runs a daily stock alert: it checks 10 tickers through a financial API, applies a buy rule, and notifies via Slack and email, Monday through Friday at the same time. Neither task requires creativity. They require obedience.
The lesson I learned the hard way. I tested the publishing skill with a model from another company, and it fabricated outputs and ignored the skill’s written rules. Haiku, with explicit instructions (including telling it exactly which instruction file to read), has been more consistent. For volume automation, discipline can matter more than power.
Use it for: repetitive cron tasks, classification, notifications, pipelines where cost per run matters because it runs hundreds of times.
Less suited for: architecture decisions, analysis with many variables, or any task where shallow reasoning can produce a costly mistake.
Sonnet: the workhorse
What it is. The balanced tier. Enough reasoning for serious professional work, at a cost that lets you use it every day without thinking twice.
Real case. I built a skill that takes PDF invoices and extracts structured JSON data using the Claude API: vendor, date, amounts, line items. First test on a real invoice: 98% extraction confidence. Cost: between 1 and 2 US cents per invoice. Doing that by hand takes minutes per document; with Sonnet, seconds and cents. That same model tier is the one I use to draft technical documentation, translate articles on this blog, and process reports.
Use it for: data extraction from documents, writing and translation, file analysis, code of normal complexity. 80% of real daily work falls here.
Less suited for: the extremes. If the task is trivial and runs at volume, Haiku is cheaper. If the task is a complex decision with interconnected pieces, it may fall short.
Opus: the judgment model
What it is. The deep reasoning model. Slower and more expensive, designed for problems where the cost of being wrong far exceeds the cost of inference.
Real case. I use it in three concrete scenarios: security audits of my server (permission reviews, exposed services, firewall configuration), the full migration of my agent platform from one system to another, and architecture sessions where a bad initial decision means redoing weeks of work. In the May audit, Opus found an exposed metrics dashboard that earlier sessions with smaller models had missed. That single finding justified the cost of every session that month.
Use it for: audits, migrations, architecture design, debugging problems where the cause is not obvious, decisions with long-term consequences.
Less suited for: routine tasks. Paying Opus prices to extract data from an invoice is usually an unnecessary expense.
Fable 5: the long horizon
Update (June 13, 2026): the evening before this article was published, the US government issued an export control directive and Anthropic suspended access to Fable 5 and Mythos 5 for all customers. Other models are unaffected. Anthropic considers this a misunderstanding and says it is working to restore access. I am keeping this section as a reference for where this model tier fits, with the caveat that it is not available today.
What it is. The new tier, above Opus. Released on June 9, 2026, it is the first publicly available version of the Mythos class, the model Anthropic had restricted since April to cybersecurity and critical infrastructure partners. According to Anthropic’s official announcement, the longer and more complex the task, the larger Fable 5’s lead over its other models, and it can work autonomously for longer than any prior Claude.
Real case. I do not have one yet: at the time of writing, the model has just launched. But I have a clear place in mind for the first test: the major upgrade of my agent platform, which I have deliberately postponed because it is a multi-hour job with many dependent steps, where the agent must read official documentation, verify each step, and avoid improvising. That is very close to the task profile this tier appears designed for.
Use it for: long-horizon agentic work: agents that operate for hours or days on a complete project, large migrations, deep research with many dependent steps.
Less suited for: simpler tasks. If Opus solves your problem, Fable is probably more than you need. On access, at the time of writing it was included in paid plans through June 22, 2026, with usage credits after that; that condition was overtaken by the suspension described in the note above.
There is another enterprise caveat: Fable 5 does not operate under zero data retention; Anthropic requires 30-day data retention on the API for this model. If you work with regulated data, that condition matters as much as model capability.
The general rule
My practical rule would be to start with the smallest model that can do the job and move up only when it fails. It is easy to do the opposite: use the biggest available model for everything, pay more than needed in many cases, and not necessarily get a better result.
Second layer: the factory rules
A useful nuance is worth keeping in mind here. Even with a carefully written prompt, clear instructions, or a system built on top, some behaviors may be applied consistently because they are trained into the model’s weights and, in some cases, reinforced by external classifiers that inspect traffic outside the prompt. They are not designed to be switched off from the prompt, and those classifiers may operate outside that prompt layer.
Fable 5 is a recent example. When its classifiers detect a request related to three areas (cybersecurity, biology and chemistry, and distillation, meaning attempts to extract the model’s capabilities to train competing models), the response is automatically handled by Claude Opus 4.8, and the user is informed when it happens. In the apps, fallback is automatic with a notification; in the API, according to Anthropic’s Fable and Mythos documentation, a declined request returns a refusal response; the developer can handle fallback server-side, client-side, or manually. Anthropic’s early data indicates that more than 95% of sessions involve no fallback at all. The version without these classifiers, Mythos 5, exists only under restricted access for approved organizations.
Two important caveats are worth adding. First, robustness: Anthropic reports that an external bug bounty produced no universal jailbreak in over 1,000 hours of testing, but a government AI safety institute managed to elicit improper responses with specific techniques in the first days. The key word is “universal”: there is no single trick that switches everything off, but it is not an infallible system either. Second, false positives: in the first days, the classifiers have proven conservative and have blocked innocent requests; Anthropic has acknowledged this and is tuning them.
In practical terms: when you build your own agent on the API, you control the system prompt, the tools, and the memory. You do not control the model’s values or its classifiers. That separation is part of the design.
Third layer: the wrapper
The same model behaves differently depending on who drives the inference. Claude on claude.ai, Claude in Claude Code, and raw Claude on the API are the same engine with three different drivers: different system prompt, different tools, different context.
One detail that helps illustrate this separation is that Anthropic publishes the system prompts used by claude.ai and its mobile apps, and clarifies on that same page that those instructions do not apply to the API. An API call carries no built-in system prompt: the developer writes their own. The wrapper layer is not necessarily hidden; it can be a documented, auditable layer.
In my case, the wrapper is my own self-hosted agent: I decide which tools it has, which memory it loads, and which channels it uses. A lot of control over the driving layer, in exchange for maintaining it myself.
So where does Copilot fit?
Microsoft Copilot is not a model. It is an orchestration system that drives models from other providers. Microsoft defines it as “model diverse by design”: instead of betting on a single model, it built a system that uses leading models from OpenAI and Anthropic, and Claude is now available in Copilot’s mainline chat through the Frontier program.
The difference with building your own wrapper is not only about power; it is also about connection and governance. Copilot rests on Work IQ, the knowledge layer that connects the models to your organization’s data, protected with Enterprise Data Protection. According to Microsoft’s official documentation, Copilot respects the identity and permissions model, inherits sensitivity labels, applies retention policies, and supports auditing of interactions. Your AI sees what you can see in SharePoint, Outlook, and Teams. For many companies, that control is not a minor detail; it is often part of the requirement.
But that same design has a consequence worth considering: Copilot also inherits badly configured permissions. If your SharePoint has had folders shared with the whole company for years, Copilot will find them with speed and consistency that manual review often cannot match. Microsoft’s own deployment guidance instructs remediating oversharing before enabling Copilot: applying sensitivity labels, removing excessive or anonymous access, and rescoping sharing links. Copilot does not replace prior governance: it makes gaps more visible.
A special mention goes to Copilot Cowork: Microsoft brought the technology platform behind Claude Cowork into Microsoft 365 Copilot. The result is an agent for long-running, multi-step work inside Microsoft 365: you describe the outcome you want, it creates a plan, reasons across your files and tools, and moves the work forward with visible progress.
The recommendation
My practical recommendation is this.
If your organization already lives in Microsoft 365, Copilot is usually the more natural option. Not because the advantage is in the model (Copilot also uses leading models like GPT and Claude), but because the connection to your data and the governance are already solved. If you build your own infrastructure, self-hosted agents, or products, Claude directly via API usually makes more sense: more control of the wrapper, more responsibility on you.
How to enable it (no technical steps)
If you are a user and want Copilot with the latest models and Cowork, this is not something you turn on yourself. It requires a Microsoft 365 Copilot license per user and your organization’s enrollment in the Frontier program, which is separate from the preview release channel. The concrete action is to contact your IT administrator and ask for two things: a Microsoft 365 Copilot license and Frontier enrollment. The rest is the administrator’s job, not yours.
Maybe the better question is not “which AI is best?” It may be this: what task am I trying to solve, what data does it need to see, and who governs that connection?
By: Cesar Rosa Polanco - Written from a real experience, with artificial intelligence used as an editorial support tool.