The Wrong Way to Build It

The obvious way to build an AI chat assistant on top of ServiceNow is to give the model access to the instance and let it figure things out. User asks a question, model queries for incidents, model reads the results, model answers. Simple.

That approach asks the model to do two different jobs at the same time: retrieve structured data from a live system, and reason about what that data means. Those are not the same kind of problem. One is deterministic. The other is probabilistic. Mixing them produces a system that is slower, more expensive, inconsistent, and harder to secure than it needs to be.

The ServiceNow AI chat was built on a different boundary: code handles structure, Claude handles reasoning. The model never touches raw instance data. It gets clean, curated context assembled by the application layer — and then it reasons over that context. Those are two separate phases, and the handoff only happens after the mechanical work is done.

What "Mechanical Work" Looks Like in ServiceNow

Every message the bot receives triggers a context-building pass before Claude is called. The user's role is checked against ServiceNow's role table. Based on that role, the appropriate query functions run.

A standard employee gets their open incidents, active service requests, and assigned tasks — queried directly via GlideRecord with their sys_user_id as a filter. A security probe is caught by regex before it reaches the model at all. Rate limiting runs a GlideAggregate count against the audit table in the last 60 seconds. If the user is over the limit, the request stops there — Claude is never called.

None of that is AI work. It is structured data retrieval and conditional logic. Code owns it completely.

What Claude receives is something closer to a briefing than a database. The context passed to the model looks like this:


User: Alex Standard | Role: standard
Open incidents: INC0010031 (P3 - VPN not connecting, opened 2 days ago)
Active requests: REQ0010001 (New laptop - Pending Approval)
Available catalog items: Password Reset, VPN Access Request, New Hardware Request...
Conversation history: [last 6 turns]

That is a structured summary assembled by deterministic code. Claude's job is to read it and respond usefully — not to go find it.

The Catalog Problem

Catalog item matching was the clearest case where this boundary mattered.

The original approach let Claude recommend catalog items and then waited for the user to confirm. The problem was that the follow-up turn had no reliable way to know which item the user was confirming. Claude would ask for the item name again. The user would repeat it. Claude would ask for mandatory fields. Conversations took twice as long as they should have.

The fix was to move name resolution out of the model entirely. On every turn, before the Claude call, the Script Include scans the last six assistant messages and the current user message for catalog item names using string matching. If a match is found, the item's mandatory fields are injected into the context immediately. Claude sees the item and its required fields already resolved — it never has to re-ask.

That is a string search. It runs in microseconds. Handing it to Claude would have made it slower, less reliable, and harder to debug.

Security Is the Same Pattern

The bot's pre-filter works on the same principle. Common abuse patterns — jailbreak attempts, "pretend you are," off-topic requests, prompt reveal probes — are checked by regex before the message is evaluated. If a match is found, the request is logged, flagged, and rejected without ever reaching the model.

Asking Claude to detect jailbreak attempts would be slower and less consistent. Regex is deterministic. It either matches or it does not. The security boundary is enforced before inference begins, which means it cannot be reasoned around.

The Write Action Boundary

Write actions follow the same logic in the other direction. When Claude determines that an incident should be created, it does not create it. It emits a structured signal in its response — create_incident_post_troubleshooting, create_catalog_request, request_update — and the application layer parses that signal and executes the ServiceNow write.

Claude's job is to decide whether the action is appropriate and collect the required information conversationally. The actual write — GlideRecord.insert(), GlideRecord.update() — is deterministic code. The model never touches the database directly.

This separation has a practical consequence beyond architecture cleanliness: it makes the write layer independently testable, auditable, and lockable. The rate limiter runs before write actions are processed. The audit table records every action signal. Those controls operate at the code layer, which means they cannot be bypassed by the model producing a different output.

The Boundary Is the Design

The two-phase pattern is not a workaround for model limitations. It is the correct architecture for any system where AI reasoning operates on live structured data.

The boundary between deterministic retrieval and probabilistic reasoning is a design decision. Draw it too early and the model is doing structural work — fetching data, parsing formats, enforcing rate limits — that it is not suited for and that produces inconsistent results. Draw it correctly and the model receives exactly what it needs: clean context, scoped to the user, stripped of irrelevant noise, with all the mechanical preconditions already satisfied.

In the ServiceNow chat, that boundary is GlideRecord. Everything before it is code. Everything after it is Claude. The model is fast, consistent, and cheap to run because it is only doing the part of the job that actually requires a model.

The Result

Token consumption is predictable because the context handed to Claude is always the same shape — a structured briefing, not a raw query dump. Output quality is consistent because the input is consistent. Security controls are reliable because they run in code, not in the model's judgment.

The bot works well not because the model is powerful, but because the model is given good data and a narrow job. The structural work is already done before Claude is called. That preparation is where most of the engineering lives — and it is the part that determines whether the system is production-ready or not.