Most of what gets written about AI in IT operations is written by people selling it. Vendor decks, conference keynotes, analyst reports, LinkedIn carousels. Almost none of it is written by the people actually running it across production environments.
Two and a half years in, I have opinions. Some of what AI does in my day-to-day is real and useful. Some of it is theater. Some of it I tried, watched closely, and pulled back from. The version I run today looks nothing like the version a vendor deck would have me believe is possible. It works because of where I drew the lines, not in spite of them.
AI assists my team. AI does not replace my team. Those two sentences sound similar and describe completely different operating models. The leadership work is figuring out where the boundary sits in your environment.
Where it's earning its keep
The clearest wins in my operation are the unsexy ones.
The first is cost monitoring for AI coding tools themselves. When we were asked to roll out an AI coding assistant across engineering, we inherited a question every leader eventually has to answer about a consumption-priced tool: what is this actually costing us, broken down by the dimensions that matter (developer, project, model, time)?
We answered it with a deliberately decoupled architecture. On the client side, each developer runs a small local widget that surfaces their week-to-date and month-to-date spend against budget thresholds. It is built to a minimum-privilege standard: no outbound network calls, no credentials, no authentication material, and a narrow, documented capability surface scoped to the data the developer already has access to. It cleared security review without exception, and it should have. It was designed to.
Separately, and independently, each assistant instance emits usage telemetry over OpenTelemetry into an internal Prometheus and Grafana stack. That is where the organizational view lives: cost by developer, cost by project, cost by model, monthly projection, alerting at eighty percent of weekly budget. Two collection paths. Two failure domains. Two audiences. Neither depends on the other.
The principle behind both is the part I would want any leader in this seat to internalize: the AI is the subject of the monitoring, not the operator of it. The platform we built around it is plain Prometheus and Grafana. Boring. Mature. The team runs it without me in the room. If we retired the AI tooling tomorrow, the platform is still there. It would simply have nothing to observe.
The second win is documentation injection. We deployed a small MCP server called Context7 that retrieves current vendor documentation on demand and grounds the model in it at query time. The problem it solves is structural: training data goes stale, and a model with stale data answers confidently anyway. Before we installed it, the assistants were calling SDK methods that had been deprecated two years earlier. After we installed it, that class of failure quietly disappeared. Install took an afternoon. Invocation is a single phrase appended to a prompt.
This is the unsexy version of AI in IT operations. Not autonomous. Not replacing engineers. Just getting better information in front of the model before it answers, so the answers get better.
Where I caught it doing something I had to undo
The clearest failure I've had came during a virtualization-platform configuration session. I was working through an authentication question and asked the assistant to help me write a curl command against a management API. It produced a working command. The credential was embedded in plain text, in the terminal scrollback, sitting in my session history.
It worked. That was the problem.
What's worth saying out loud is that I had already told it not to do this. There were standing instructions in the system prompt against inline credentials. It produced one anyway. I rotated the password immediately, switched the workflow to environment variables and an API token, and went back into the assistant's configuration to tighten the rules until the behavior actually changed across new sessions. That second part is the real lesson. Guardrails are not declarative. You write a rule, you test it, you watch for the cases where the model interprets around it, and you tune.
The broader takeaway stuck. AI is fast. Fast is not the same as careful. Speed compresses the time you have to second-guess what just happened in front of you. A senior engineer has decades of muscle memory pulling them away from inline credentials before their fingers hit the keys. AI has no such muscle memory by default. You install it, you verify it took, then you tune. Until you do, it will produce working answers you would not have written yourself, because the things you would have noticed mid-keystroke are not things the model notices at all.
Where the audit trail breaks down
The biggest line I have drawn so far is not about credentials. It is about who gets logged as the decision-maker.
I spent real time evaluating whether to give an AI assistant interactive shell access to production point-of-sale terminals across the operational footprint. The technical paths were clean. SSH-over-RMM works. Tactical RMM has the access patterns we would need. AWS SSM and Cloudflare Tunnel with Zero Trust would each have closed the loop on their own. Speed-to-resolution on common point-of-sale issues would have dropped by a meaningful margin.
I did not do it. The reason was not technical. The change management framework I operate under assumes humans decide and tooling executes. Endpoint management platforms log what ran on what asset; they do not log "an AI decided this script should run" in a form that survives an audit. There is no production audit framework today that treats an AI as the decision-maker on an action against a production endpoint, and I am not going to be the operator who explains in a postmortem that an unaccountable system pushed a script that took down the line.
So I kept human-in-the-loop. AI proposes. Technician approves. Script runs. The audit trail still anchors at a human. It is slower. It is also defensible in an audit and defensible in a postmortem. I would rather give back the speed than borrow against the audit trail.
Bounded versus unbounded
A pattern emerged across these decisions, and it is the lens I now use whenever someone in my organization asks whether we should let AI do something.
What kind of decision is this? If it is bounded (cost dashboards, documentation lookups, code suggestions you review before they run), AI earns its keep. The downside is capped. The audit trail is not load-bearing. The wrong answer is annoying, not catastrophic.
If it is unbounded (incident response, vendor commitments, regulatory implications, anything where credentials, payments, or customer data sit downstream), AI is a research assistant at most. The decision and the action belong to a human, with a name attached, who can be paged at two in the morning to explain it.
I wrote a few months back that standardization is a leadership problem, not a technical one. AI in operations is the same shape. The technical question is the easier half. The harder half is deciding what kind of work belongs where, and holding the line when speed is on offer.
What I'd Tell Another IT Leader
Draw the boundary by default. Do not wait for an incident to define where AI's authority ends. Define it at the top, write it down, and make every team lead read it before they wire AI into a workflow. Boundaries built after a postmortem are more expensive than boundaries built before one.
Treat AI as a junior team member, not a senior one. Junior team members produce useful output. They also produce output that needs to be reviewed by someone with more experience before it goes anywhere consequential. The review step is not optional. It is the value.
Watch the audit trail before you watch the speed. Speed gets celebrated. Audit trails get audited. The first time someone in compliance asks who authorized a script that ran on a production endpoint, you want a name and a timestamp, not a model version.
Build the unsexy cases first. Cost dashboards. Documentation freshness. Internal tooling that takes two hours to ship. Those wins compound. They build organizational confidence in AI without putting anything load-bearing on it. By the time the harder questions arrive, your team has the operating instincts to answer them.
AI in IT operations is real. It is also bounded. The marketing wants you to believe it is unbounded, because unbounded is what justifies the spend. Operating reality is different. The operators I trust most are running smaller, more careful deployments than the press releases suggest, and they are doing it on purpose.
The job is not to figure out how much AI to use. The job is to figure out which work belongs to humans and which work does not, and to hold that line when the pressure to blur it shows up. Because it will. And the operators who held the line will still be there to explain the decision when it matters.
Adam Cooper is a Technical Director writing about distributed IT operations, maritime technology, and AI in production environments. Connect on LinkedIn or get in touch.