The setup
People tend to talk about AI cost in two useless ways. One group says it is basically free because a single prompt costs fractions of a cent. The other group waves around a surprise bill like it proves the whole category is broken.
The useful answer is in the middle. AI agents cost money. Usually not terrifying money. You need to know the shape of the cost before you automate something customer-facing.
Here is a real breakdown from a small customer support agent we run for a service business. It handled 4,200 conversations in a month. Most were simple: appointment questions, price range requests, order status, location info, and basic troubleshooting.
Token costs
The average conversation used 1,900 input tokens and 420 output tokens across retrieval, classification, and answer generation. That matters because input and output are priced differently.
For 4,200 conversations, the agent processed about 7.98 million input tokens and 1.76 million output tokens. With the model mix we used, that came to $31.20 for language generation. Could it be cheaper? Yes. Could it be more expensive if you send the entire knowledge base into every prompt like a maniac? Also yes.
The boring cost-control trick is context discipline. Retrieve the three useful chunks. Do not paste the company handbook into every call. Nobody needs the 2021 holiday policy to answer, do you offer weekend appointments?
Get the weekly field note
One practical AI automation idea, one cost note, and one thing we tried that did not work.
Search and storage
The agent used a vector database for knowledge retrieval. The knowledge base had 184 documents, mostly short help articles and internal notes. Storage was tiny. Query volume was not huge. Monthly vector database cost: $12.80.
You can avoid a separate vector database for very small projects, but retrieval usually pays for itself once the agent needs to answer from more than a handful of documents. Good retrieval also reduces token costs because you send less context into the model.
Hosting
The agent ran on a small serverless setup with a queue for retries. Hosting was $18.00 for the month. The queue added $2.40. We could shave this down, but the point was boring reliability, not winning a cheapest-possible architecture contest.
If your workflow runs only when a form is submitted, hosting might be lower. If you are doing voice calls, real-time transcription, or long-running tasks, it will be higher. Measure the workflow you actually have.
Monitoring
Monitoring cost $9.00. That covered logs, error alerts, and a small review queue. This is the line item people skip, then later wonder why nobody noticed the bot had been apologizing for the weather for two days.
A production agent needs review. Not necessarily a full command center with blinking lights. Just enough visibility to know when confidence drops, when handoffs spike, and when customers ask questions your knowledge base does not cover.
Final number
Language generation: $31.20. Vector database: $12.80. Hosting and queue: $20.40. Monitoring: $9.00. Total: $73.40/month for 4,200 conversations.
That is about 1.75 cents per conversation. The old manual process took roughly three minutes per conversation. Even if you value admin time at $20/hour, the payback is obvious.
The conclusion is not that AI is always cheap. It is that the bill is understandable if you design the system with restraint. Count the conversations. Estimate the tokens. Budget for search, hosting, and monitoring. Then decide whether the task is worth automating. Magic is not a budget category.