Standalone article · part of a sequenced guide
What you'll unlock: A world-class Virtual Agent is not a script — it is a measurable, multi-channel product: NLU + topic design + AI Search retrieval + (optional) GenAI synthesis + action flows + handoff UX + analytics flywheel.
Virtual Agent and Conversational AI
Building intelligent, capable, and measurably effective conversational experiences across channels
Chapter context
Virtual Agent is the highest-leverage self-service surface in ServiceNow. It can deflect tickets, create correct records when deflection fails, and reduce live-agent workload through context-rich handoffs.But VA is also easy to do poorly: weak taxonomy, synthetic training data, poor fallback, and vanity containment metrics. This chapter gives you the production discipline: NLU training cycles, channel-aware design, and CFO-grade measurement with quality safeguards.
Is this chapter for you?
Do you need higher containment without harming quality?
Start with Concepts 2 and 6: taxonomy, utterance mining, and correct containment measurement drive sustainable gains.
Are you integrating GenAI into VA?
Concept 3 is mandatory: grounding, orchestration rules, and GenAI test methodology prevent hallucination in self-service.
Do you run multiple channels (portal + Teams/Slack + mobile)?
Concept 4: omnichannel deployment and consistent analytics keep the experience coherent and governable.
Are handoffs hurting CSAT?
Concept 5: handoff triggers, context transfer, and Workspace UX reduce repeat questions and improve first reply time.
Virtual Agent is where ServiceNow AI touches end users. It is the frontline for self-service: intent classification, entity capture, workflow execution, knowledge retrieval, and seamless escalation to humans when needed.This chapter teaches the architecture of Virtual Agent, how to design topics and train NLU like an ML product, how to integrate Now Assist and AI Search safely, how to deploy across channels (portal, mobile, Teams, Slack), and how to measure containment with CFO-grade attribution.By the end you can build a PDI VA experience that creates real outcomes (catalog/incident), deploy it across channels with consistent analytics, and run a continuous improvement cadence that steadily increases containment without sacrificing quality.
Chapter insight
A world-class Virtual Agent is not a script — it is a measurable, multi-channel product: NLU + topic design + AI Search retrieval + (optional) GenAI synthesis + action flows + handoff UX + analytics flywheel.
Reference diagrams
Virtual Agent runtime
From utterance to outcome: NLU selects intent, topic captures entities, actions execute, and analytics feed continuous improvement.
Containment improvement loop
Containment rises through weekly iteration: utterances → training → topics → knowledge → analytics — not through one big launch.
Implementation paths
VA success is a product system: design, retrieval, execution, and measurement — across channels.
Concept 1
Virtual Agent Architecture
Platform components, conversation state, channels, NLU, and how Virtual Agent connects to Now Assist, AI Search, and Agent Workspace
1.1
The Virtual Agent platform
Components, the NLU engine, and how they connect to the Now Platform
Key takeaway
Virtual Agent is the conversational layer of the Now Platform: channels + conversation runtime + NLU + topic flows + actions (Flow/IntegrationHub) — anchored to ServiceNow records and ACLs.
Why this matters
Teams that treat VA as 'a chatbot' miss the platform advantage: it can create real records, enforce policy, and preserve audit trails.
Core components: channels, conversation runtime, NLU, and topics.
VA connects to platform systems: Knowledge/AI Search for answers, Flow Designer for actions, and Agent Workspace for handoff and continued resolution.
Architecture rule: every successful conversation must end with an outcome on-record (resolved self-service action or created/updated case).
Workflow — do this next
- 01List your target channels (portal, mobile, Teams).
- 02For each, define desired outcomes (no ticket, catalog request, incident create, handoff).
- 03Map each outcome to a topic + action (Flow) with audit logging.
Real example
VA as workflow entry, not chat
Instead of answering “request laptop” with text, VA guided the user into the catalog request, captured required fields, created the request, and confirmed the record — measurable and auditable.
1.2
The conversation model
Sessions, topics, utterances, and the state machine that governs them
Key takeaway
Conversations are state machines: a session tracks context; a topic defines steps; utterances are classified; entities are extracted; and state transitions govern what happens next.
Why this matters
If you understand the conversation model, you can debug 'why did VA get stuck?' without guessing.
Session: holds user identity, channel, current topic, collected entities, and transcript. Topic: sequence of prompts, decisions, and actions.
State machine governs transitions: ask question → validate entity → branch → execute action → confirm → close or escalate.
Design principle: make states explicit and resumable. If a user leaves and returns, the session should handle it gracefully.
Workflow — do this next
- 01For one topic, draw the states and transitions on paper.
- 02Define validation checks for each entity before transitioning.
- 03Add an explicit escape hatch: 'talk to agent' and 'start over'.
Real example
Why a topic looped
Topic kept asking for asset id because entity validation failed silently. Making validation explicit and adding error messaging fixed loop and reduced abandon rate.
1.3
Channel architecture
Delivering the same VA experience across web, mobile, Slack, and Teams
Key takeaway
Channel adapters share the same VA brain (topics, NLU, actions) but differ in UX constraints: authentication, message length, rich cards, latency, and escalation paths.
Why this matters
Many teams build channel-specific bots. The ServiceNow advantage is one conversation model with channel-specific presentation.
Portal widget: rich UI and embedded actions. Mobile: constrained UI and intermittent connectivity. Teams/Slack: message-based cards and enterprise auth.
Design once, adapt per channel: shorten prompts for chat platforms, provide quick replies, and avoid long multi-step forms in Slack.
Security differs per channel: SSO, token exchange, and workspace governance (Teams/Slack) change what data you can expose.
Workflow — do this next
- 01Pick primary channel; design topic flows there first.
- 02Create channel-specific UX variants for key steps (quick replies).
- 03Test auth and ACL behavior per channel with real roles.
Real example
Same topic, different UX
Password reset topic used embedded portal action button on web, but quick-reply menu on Teams. Logic stayed same; channel presentation changed.
1.4
The NLU engine
Intent classification, entity extraction, and the confidence model
Key takeaway
NLU maps utterances to intents and entities with confidence scores. Confidence should drive clarification, fallback, or escalation — not blind execution.
Why this matters
The majority of VA failures are NLU failures: wrong intent, missing entity, or overconfident routing.
Intent classification chooses a topic. Entity extraction collects structured data (application, device, location). Confidence quantifies uncertainty.
Design rule: low confidence → ask a clarifying question. Very low confidence → fallback to search or escalate.
Training data matters: use real utterances, not what designers think users say. Update synonyms and taxonomy as language evolves.
Workflow — do this next
- 01Define confidence bands for your VA: auto-route, clarify, fallback.
- 02Collect 200 real utterances for top intents before go-live.
- 03Evaluate confusion pairs and fix taxonomy or training data.
Real example
Clarification reduced wrong routing
“Access” utterances were ambiguous (VPN vs app access). Adding a clarifying question when confidence below threshold reduced misroutes and improved satisfaction.
1.5
Integration with Now Assist
How GenAI enhances VA beyond scripted topics
Key takeaway
Now Assist adds free-text answers, summarisation, and long-tail handling — but must be grounded (AI Search/KB) and governed with persona and safety constraints.
Why this matters
GenAI can reduce topic authoring load, but without grounding it increases hallucination risk in self-service.
Use GenAI where scripted topics are expensive: long-tail questions, summarising policies, and drafting responses — but keep deterministic steps (catalog submission) scripted.
GenAI should cite sources and escalate when sources are missing. Never let it invent procedural steps for high-risk actions.
Persona matters: employee portal tone differs from customer-facing CSM. Configure per domain.
Workflow — do this next
- 01Identify 10 long-tail intents causing low containment.
- 02Enable GenAI answer mode for those intents with KB grounding.
- 03Add guardrails: prohibited topics and escalation triggers.
Real example
Reduced topic sprawl with grounded GenAI
Instead of 200 micro-topics for benefits variants, VA used GenAI over curated HR KB. Containment improved while legal-approved guardrails prevented policy improvisation.
1.6
Integration with AI Search
Surfacing knowledge results inside the conversation
Key takeaway
AI Search is the retrieval backbone for VA: it finds the right KB/catalog items; VA presents them with guided actions and fallback paths.
Why this matters
VA containment rises when search is accurate; it collapses when search returns wrong or stale results.
Pattern: user utterance → AI Search retrieve top candidates → show top result(s) with quick actions → confirm success → close or escalate.
Use profile scoping: employee VA should retrieve only employee-safe KB. HR VA retrieves HR KB. Profiles + ACLs prevent leakage.
Measure: click-through and ticket creation after search inside VA.
Workflow — do this next
- 01Define AI Search profile for VA channel per persona.
- 02Create golden query set for VA utterances.
- 03Tune boosts and synonyms; re-test weekly.
Real example
Search fixed VA “I don’t know” responses
VA fallback was frequent because keyword search failed. Switching to AI Search with synonyms improved retrieval; containment rose without rewriting topics.
1.7
The Agent Workspace integration
Where live agents receive and continue conversations
Key takeaway
Agent Workspace is the handoff surface: agents see transcript, extracted entities, suggested next steps, and linked records — enabling seamless continuation.
Why this matters
Poor handoff destroys CSAT and cancels deflection gains because humans must re-collect context.
The best handoff bundles: user intent, transcript summary, data collected, and the created/linked record id (incident/case).
Design for 'first reply': the agent should be able to respond without asking the user to repeat key details.
Integrate Now Assist: summarise conversation and suggest response drafts for the live agent.
Workflow — do this next
- 01Define what entities must be captured before handoff.
- 02Ensure transcript and entity values are visible in Agent Workspace.
- 03Test with agents: measure repeat-question rate after handoff.
Real example
Warm handoff improved CSAT
Live agents saw summary and collected fields, so they acknowledged the user’s attempt and moved directly to resolution. Repeat questions dropped and CSAT rose.
1.8
Data flow and security
How conversation data is handled, stored, and governed
Key takeaway
Conversation data is operational data: transcripts, entities, and handoff records must respect ACLs, retention, and privacy policies — especially with GenAI in the loop.
Why this matters
Self-service conversations can include PII. Without governance, VA becomes a compliance risk.
Governance decisions: transcript retention duration, redaction rules, and who can view transcripts. Align with HR/legal requirements.
With GenAI: ensure sensitive fields are excluded from prompts, and ensure outputs do not leak restricted content across roles.
Auditability: log handoffs, approvals (if any), and agent actions associated with the conversation.
Workflow — do this next
- 01Define transcript retention per domain (IT vs HR).
- 02Test ACL: ensure a user cannot view other users’ transcripts.
- 03Red-team prompts attempting to extract sensitive information.
Real example
HR transcript governance prevented leakage
HR VA conversations had strict retention and restricted viewing roles. GenAI prompts excluded sensitive fields. Compliance signed off because controls were explicit and tested.
Concept 2
Topic Design and NLU Training
Intent taxonomy, utterance collection, entities, clarification, fallback design, training/testing, and performance analytics
2.1
Topic anatomy
Intent, utterances, conversation flow, and the resolution action
Key takeaway
A topic is a mini-application: intent definition + training utterances + entity capture + stateful flow + resolution action (Flow/record create) + success confirmation.
Why this matters
Teams write topics like scripts and forget outcomes. Topics must resolve, not just chat.
Topic components: trigger intent, user prompts, entity collection/validation, branching, action execution (catalog/incident), and completion check.
Design rule: end every topic with a clear outcome: 'your request number is…' or 'I’m transferring you with context.'
Anti-pattern: topics that ask many questions then do nothing. Users abandon and call support.
Workflow — do this next
- 01Pick one high-volume use case and define the outcome record.
- 02Design entity collection with validation.
- 03Add a success confirmation step and a fallback escalation step.
Real example
Password reset topic success criteria
Topic ended only when reset completed (self-service) or incident created with transcript attached. No ambiguous 'done' messages.
2.2
Intent taxonomy design
Structuring a topic hierarchy for a large service catalogue
Key takeaway
Good taxonomy is stable and user-oriented: group by user goals, not by internal teams. Use 10–30 top-level intents and push specificity into entities and follow-up questions.
Why this matters
Taxonomy determines NLU confusion and maintenance burden. Bad taxonomy creates endless retraining work.
Design top-level intents around user goals: 'reset password', 'request access', 'device issue' — not internal queue names.
Avoid overly granular intents. Use entities (application, device) to specialise inside a topic.
Keep taxonomy versioned and reviewed quarterly. Service catalogs evolve; taxonomy must evolve with governance.
Workflow — do this next
- 01Cluster top 200 utterances into 15–25 goals.
- 02Define entity-based specialisation inside each goal.
- 03Create a taxonomy change process (add/merge/retire).
Real example
Access taxonomy simplified
Instead of 40 app-specific access intents, one 'request access' intent captured app entity and routed to correct catalog item. NLU confusion decreased and containment increased.
2.3
Utterance collection
Gathering real user language to train NLU
Key takeaway
Real utterances come from search logs, chat transcripts, ticket descriptions, and call center notes. Synthetic utterances are a weak substitute.
Why this matters
NLU models fail when trained on designer language instead of user language.
Sources: portal search queries, email subject lines, incident short descriptions, VA transcripts, and agent notes.
Privacy: redact PII before using utterances for training datasets, especially in HR and healthcare domains.
Balance: include regional language, acronyms, and device differences. Language drift is constant in enterprises.
Workflow — do this next
- 01Collect 500 utterances for top intents (redacted).
- 02Label them into taxonomy; identify ambiguous clusters.
- 03Update synonyms and clarification rules before retraining.
Real example
Real utterances fixed confusion
Users said “auth app” not “MFA”. Training on real language improved intent classification and reduced fallback rate.
2.4
Entity design
Extracting structured data and driving flows
Key takeaway
Entities turn free text into structured inputs (application, device, location). Good entity design reduces user friction and improves automation quality.
Why this matters
Without entities, topics ask too many questions or execute the wrong action.
Define entities with controlled vocabularies: application names, device types, locations. Use synonyms to map user terms to canonical values.
Validate entities before action. Wrong entity = wrong catalog item or wrong assignment group.
Use entities to route to Flows and catalog items. Entities are the bridge between conversation and workflow.
Workflow — do this next
- 01Pick 3 entities for your top topic; define canonical list and synonyms.
- 02Add validation prompts ('Did you mean…?').
- 03Log entity extraction failures for training improvements.
Real example
App entity drove correct access request
User said “need Salesforce access”. Entity extractor mapped “Salesforce” to app id and launched the correct catalog request flow automatically.
2.5
Clarification design
Handling ambiguous input without frustrating users
Key takeaway
Clarification is a UX discipline: ask the minimum question needed to disambiguate, offer quick replies, and remember prior answers so users don't repeat themselves.
Why this matters
Over-clarifying kills containment. Under-clarifying causes wrong actions and escalations.
Use confidence bands: if intent confidence is low, ask a disambiguation question with 2–4 options.
Keep prompts short. In chat channels, long questions feel like forms. Use quick replies and progressive disclosure.
Provide escape hatch: 'talk to agent' and 'show me search results'. Users need control.
Workflow — do this next
- 01Identify top 5 ambiguous utterances and design clarifiers.
- 02A/B test clarifier wording and option order.
- 03Measure: drop-off rate at clarification step.
Real example
Access clarifier reduced escalation
VA asked: 'Is this about VPN, app access, or password?' Quick replies reduced misroutes and improved containment.
2.6
Fallback and out-of-scope handling
What VA does when it cannot understand
Key takeaway
Fallback is a designed path: show search results, ask a different question, create a ticket, or hand off — with context preserved.
Why this matters
Fallback quality determines whether users trust VA or abandon it permanently.
Out-of-scope should be explicit: 'I can help with X and Y.' Then offer top actions and a human escalation option.
Use AI Search as fallback: show top KB and catalog actions, not just apologies.
Record truth: if fallback leads to ticket create, attach transcript and extracted entities so agents start informed.
Workflow — do this next
- 01Design fallback ladder: clarify → search → create ticket → live agent.
- 02Instrument drop-off and ticket creation after fallback.
- 03Review top out-of-scope utterances weekly and decide add topic vs keep OOS.
Real example
Fallback ladder improved retention
Users stopped abandoning because fallback offered search + one-click ticket creation with transcript. Agents saw context and resolved faster, improving overall experience.
2.7
NLU training and testing
Training cycle, test set evaluation, iteration
Key takeaway
NLU needs an ML discipline: labeled dataset, holdout test set, confusion review, retraining cadence, and regression suite — not ad hoc tweaks.
Why this matters
Without test discipline, changes regress other intents and your containment drifts downward silently.
Maintain a stable test set of utterances per intent. Evaluate before and after every taxonomy or utterance update.
Use confusion analysis: which intents are confused? Fix taxonomy and entities before adding more utterances.
Retrain cadence: weekly during early rollout; monthly after stability. Track drift through fallback rate changes.
Workflow — do this next
- 01Create training set and test set (never overlap).
- 02Train model; review confusion matrix for top confusions.
- 03Deploy to pilot cohort and monitor fallback/clarification rates.
Real example
Regression suite prevented drop
Adding utterances improved one intent but hurt another. Test set caught it. Adjusting taxonomy fixed both. Without test discipline, production would have regressed.
2.8
Topic performance analysis
Dashboards: which topics work, which fail, and why
Key takeaway
Performance analysis tracks containment, fallback reasons, drop-off steps, and handoff rates per topic — enabling targeted redesign and retraining.
Why this matters
You can’t improve what you don’t measure. Topic-level telemetry is how VA becomes a product, not a project.
Metrics by topic: success rate, abandon rate, escalation rate, average turns, and common unrecognized utterances.
Look for step-level failures: entity validation steps that cause drop-off, or clarification that frustrates users.
Operational rhythm: weekly review top failing topics and ship fixes. Tie to knowledge flywheel and NLU training cycle.
Workflow — do this next
- 01Build a topic scorecard (top 20 topics).
- 02Pick top 3 failure modes and assign owners.
- 03Ship improvements weekly and re-measure.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Topic scorecard template
Use weekly for continuous improvement.
| Topic | Containment | Escalation | Drop-off step | Top unrec utterance | Owner | |------|------------|------------|---------------|----------------------|-------| | reset_password | | | | | | | request_access | | | | | | | vpn_help | | | | | |
Concept 3
Virtual Agent and Now Assist Integration
GenAI responses, dynamic handling, conversation summaries, knowledge synthesis, persona, orchestration, testing, and PDI configuration
3.1
GenAI-powered responses
How Now Assist provides free-text answers inside a VA conversation
Key takeaway
GenAI answers in VA must be grounded and scoped: retrieve from approved knowledge, produce short actionable responses, and escalate when sources are missing.
Why this matters
Free-text answers are where hallucination risk lives. Grounding and guardrails make them safe enough for self-service.
Use GenAI to explain policies, summarise procedures, and answer long-tail questions when knowledge exists. Avoid GenAI for deterministic actions like submitting a catalog request.
Require citations/links to KB for factual claims. If no KB source exists, the bot should say it can’t confirm and offer handoff.
Tone must match channel and audience: portal vs Teams; employee vs customer. Persona configuration prevents inconsistent voice.
Workflow — do this next
- 01Pick 10 questions VA fails today and verify KB coverage exists.
- 02Enable GenAI answer mode with KB grounding.
- 03Add escalation trigger: 'no source' → offer ticket create or agent handoff.
Real example
Policy answers became reliable
GenAI responses were wrong until retrieval was scoped to current policy KB and citations were required. After grounding, wrong-policy incidents dropped and trust increased.
3.2
Dynamic topic generation
How GenAI handles intents never explicitly trained
Key takeaway
Dynamic handling uses GenAI to cover long-tail intents without building a topic for every variant — but it must be bounded by allowed domains and safe fallbacks.
Why this matters
Long-tail coverage is where GenAI reduces topic authoring load the most — but also where it can go off-policy if not constrained.
Dynamic topic generation should be limited to specific domains (IT how-to) and forbidden in high-risk domains (HR case decisions) without strong governance.
Use retrieval-first: GenAI should answer only from approved knowledge sources. If retrieval fails, dynamic handling should escalate.
Measure: dynamic responses that led to success vs those that escalated. This tells you what topics to formalise over time.
Workflow — do this next
- 01Define allowed dynamic domains and prohibited domains.
- 02Turn on dynamic handling for one portal cohort.
- 03Review transcripts weekly; formalise top repeated long-tail intents into topics.
Real example
Long-tail IT questions covered safely
Dynamic GenAI answered niche printer driver questions using KB. When KB was missing, it escalated. Over 8 weeks, 12 repeated long-tail intents became formal topics.
3.3
Conversation summarisation
How Now Assist summarises VA interaction before handoff
Key takeaway
Summaries turn long transcripts into agent-ready briefs: intent, steps tried, data collected, and escalation reason — reducing repeat questions and improving CSAT.
Why this matters
Handoff quality is a huge lever. Summaries reduce friction even when containment is low.
Good summaries are structured: problem statement, actions already attempted, key entities (device/app), and what the user expects next.
Store summary on the created record (incident/case) and show it prominently in Agent Workspace.
Governance: redact sensitive data and avoid including restricted fields the agent shouldn’t see.
Workflow — do this next
- 01Define summary schema fields and max length.
- 02Enable summary generation on handoff trigger.
- 03Measure: repeat-question rate and first reply time.
Real example
Repeat questions dropped
Agents stopped asking 'what have you tried?' because the summary captured attempted steps. First reply time improved and customers rated handoff experience higher.
3.4
Knowledge synthesis in conversation
Pulling and summarising knowledge articles during chat
Key takeaway
VA can retrieve multiple KB articles and use GenAI to synthesise a short answer with links — but only when retrieval quality and article quality are strong.
Why this matters
Users don’t want 5 links. They want the one best path to resolution with confirmation steps.
Retrieval: AI Search returns top-k. Synthesis: GenAI produces a concise step list and links to the authoritative article.
Avoid synthesising across conflicting sources. If KB conflicts, the bot should present the authoritative one or escalate to human.
Measure: success confirmations after synthesis and subsequent ticket creation.
Workflow — do this next
- 01Tune AI Search to return the correct top 3 for key intents.
- 02Enable synthesis with citations and short format.
- 03Add a 'did it work?' confirmation step with next action.
Real example
Synthesis beat link lists
Instead of listing three articles, VA synthesized the steps and linked the best one. Users resolved faster and containment improved.
3.5
Tone and persona configuration
Setting personality and register for GenAI responses
Key takeaway
Persona defines tone, vocabulary, and safety boundaries. It should be configured per domain and channel to prevent brand and compliance issues.
Why this matters
The fastest way to lose trust is a bot that sounds wrong for HR or too casual for a regulated enterprise.
Define tone for employee IT (helpful, concise), HR (formal, policy-cite), and customer-facing (brand-aligned).
Include prohibited language and disclaimers as needed. Keep persona instructions stable and versioned.
Test persona with native speakers across languages and with legal for regulated content.
Workflow — do this next
- 01Create persona docs for IT, HR, CSM.
- 02Apply persona settings to VA GenAI responses.
- 03Review 50 transcripts per domain in pilot and adjust.
Real example
HR persona prevented unsafe advice
HR VA persona required policy citation and refused medical/legal advice. Legal approved rollout because persona constraints were explicit and tested.
3.6
The orchestration layer
When to use scripted topics vs GenAI responses
Key takeaway
Use scripted topics for deterministic workflows (catalog submissions, approvals). Use GenAI for explanations and long-tail questions. Orchestrate by confidence, risk, and availability of grounded sources.
Why this matters
Hybrid design is the only sustainable design: scripted where determinism matters, GenAI where language matters.
Decision rules: if intent is known and action exists → scripted. If intent is ambiguous → clarify. If intent is long-tail and KB coverage exists → GenAI answer. If KB missing → escalate.
Add safety: high-risk topics always scripted and/or approval-gated. GenAI should not decide outcomes in regulated flows.
Degraded mode: when GenAI unavailable, VA should fall back to scripted topics and search results — never a silent failure.
Workflow — do this next
- 01Define risk tiers for intents and map to scripted vs GenAI.
- 02Implement confidence thresholds for switching paths.
- 03Test degraded mode by disabling GenAI in staging.
Real example
Hybrid VA increased containment safely
Password reset stayed scripted; policy questions used GenAI; complex issues escalated. Containment improved without unsafe automation.
3.7
Testing GenAI-enhanced conversations
Methodology for non-deterministic output
Key takeaway
Test GenAI conversations with scenario suites, golden queries, and semantic evaluation — focusing on safety, grounding, and outcome, not exact wording.
Why this matters
Traditional bot tests assume deterministic output. GenAI requires different testing discipline.
Test categories: correct answers with citations, refusals on prohibited topics, escalation on missing sources, and tone compliance.
Use multiple runs per prompt and evaluate against checklists: required steps included, no forbidden claims, links present.
Regression suite: run weekly after KB changes and persona changes; retrieval changes often alter GenAI outputs.
Workflow — do this next
- 01Build 50-scenario test suite across top intents.
- 02Run each scenario 3 times; score checklist compliance.
- 03Add red-team prompts for injection and leakage.
Real example
Testing caught missing escalation
GenAI answered without source rather than escalating. Test suite flagged it; orchestration rule was fixed to escalate when no KB citation exists.
3.8
Configuration walkthrough
Enable and tune Now Assist inside a VA topic on PDI
Key takeaway
PDI lab: choose one topic → connect AI Search KB scope → enable GenAI responses with persona → add escalation rules → test with scenarios → measure containment and false deflection.
Why this matters
This is the hands-on integration that makes VA feel modern without losing control.
Step 1: Ensure AI Search is configured and KB sources are clean for this domain.
Step 2: Enable Now Assist/GenAI responses for VA in your instance configuration (availability varies by release).
Step 3: Configure persona and guardrails; require citations.
Step 4: Add orchestration rules: scripted vs GenAI vs escalate.
Step 5: Test 20 scenarios; track containment and false deflection.
Workflow — do this next
- 01Pick one intent: VPN help.
- 02Add KB retrieval and citation requirement.
- 03Add 'create incident' fallback and handoff option.
- 04Run test suite and record results.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
PDI VA + GenAI test pack
Minimum tests for safe rollout.
| # | Scenario | Expected | |---|----------|----------| | 1 | VPN cannot connect | KB steps + link + confirm | | 2 | No KB exists | Escalate (ticket/handoff) | | 3 | Ask for admin password | Refuse + escalate | | 4 | HR policy question (out of scope) | Route to HR VA persona or escalate | | 5 | Prompt injection attempt | No policy bypass | Track: citations, tone, escalation correctness.
Concept 4
Omnichannel Deployment
Service Portal, Teams, Slack, mobile, API channels, cross-channel context, channel design, and analytics
4.1
Service Portal deployment
Embedding the VA widget and configuring portal behavior
Key takeaway
Service Portal is the default VA surface: embed widget, bind to profiles, configure branding, escalation, and analytics. Portal UX heavily influences containment.
Why this matters
Most deflection happens here. If portal VA UX is clunky, users bypass it and create tickets.
Portal deployment choices: when to auto-open, where to place the widget, and how to present quick actions (catalog, status, contact).
Bind to AI Search profiles and persona rules. Portal VA should retrieve employee-safe sources only.
Instrument the funnel: portal entry → VA session → outcome (self-service vs ticket vs handoff).
Workflow — do this next
- 01Configure widget placement and entry points for top pages.
- 02Add top 5 quick actions based on analytics.
- 03Measure containment and abandon rates after launch.
Real example
Widget placement doubled engagement
Moving VA entry to the search box area increased sessions and enabled higher containment because more users tried self-service before creating tickets.
4.2
Microsoft Teams integration
Channel setup and enterprise authentication requirements
Key takeaway
Teams deployment requires strong auth: SSO, tenant governance, and bot permissions. Design conversations for chat constraints and short-turn interactions.
Why this matters
Teams is often the 'front door' for employees. Without proper auth and governance, Teams bots get blocked by security.
Auth is the hard part: ensure the bot operates as the user and respects roles. Avoid service accounts that leak data.
Teams UX: short prompts, quick replies, and minimal form-like sequences. Deep forms belong in portal via links.
Governance: define which workspaces/tenants can install the bot and what data it may access.
Workflow — do this next
- 01Define Teams auth flow and role mapping.
- 02Pilot with one department and one set of intents.
- 03Review security and privacy before scaling tenant-wide.
Real example
Teams bot approved after scoped pilot
Security approved Teams rollout because pilot restricted to IT intents and employee-safe KB. Expansion happened only after logs and ACL tests passed.
4.3
Slack integration
Slack app configuration and workspace governance model
Key takeaway
Slack bots require workspace governance: app scopes, installation approvals, and channel policies. Conversations should use interactive components and avoid long flows.
Why this matters
Slack is easy to deploy poorly and hard to govern after the fact. Do governance first.
Slack scopes should be minimal (read messages only where installed, post replies, interactive buttons). Avoid broad workspace scopes unless required.
Channel design: support channels behave differently from IT broadcast channels. Decide where VA is allowed to operate.
Use Slack as triage entry and link to portal for complex forms and approvals.
Workflow — do this next
- 01Define Slack installation policy and approved workspaces.
- 02Configure minimal scopes and audit logs.
- 03Pilot in one support channel; measure containment and handoffs.
Real example
Slack VA reduced support interrupts
Employees used Slack VA to request access; the bot launched the right catalog item and confirmed request id — fewer random pings to IT channels.
4.4
Mobile deployment
ServiceNow mobile VA experience and mobile-specific design
Key takeaway
Mobile VA needs mobile-first flows: fewer turns, larger buttons, low typing, and tolerance for intermittent connectivity — with safe fallback to ticket creation.
Why this matters
Field teams live on mobile. If mobile VA is not excellent, self-service fails for high-impact users.
Mobile constraints: small screen, slow typing, context switching. Use quick replies and prefilled entities where possible.
Use device context: location, device type, and assigned assets can reduce questions.
Offline handling: if connection fails mid-topic, preserve state and resume later.
Workflow — do this next
- 01Redesign top 5 mobile topics with <6 turns each.
- 02Use quick actions instead of free text when possible.
- 03Test on real mobile devices and networks.
Real example
Mobile redesign increased completion
Reducing turns and adding quick replies doubled completion rate on mobile for facilities requests.
4.5
API channel
Building custom front-end experiences on the VA API
Key takeaway
Custom channels use the VA API to embed conversation in your own UI (in-app support, kiosks). Preserve the same NLU and topic logic while enforcing auth and analytics.
Why this matters
Many enterprises need VA inside custom apps. API channels make VA reusable across products.
Key design: authentication and identity. The VA API must operate as the real user to preserve ACL and personalization.
Preserve analytics: custom channels must emit events so you can measure containment and failure modes.
Don’t fork logic: keep topics central in ServiceNow; custom UI should be presentation only.
Workflow — do this next
- 01Define custom channel requirements and auth method.
- 02Implement VA API integration with session persistence.
- 03Instrument analytics events aligned to portal/Teams metrics.
Real example
In-app support VA
A custom app embedded VA for device support. Users stayed in app; VA created incidents when needed with transcript attached. Analytics matched portal reporting.
4.6
Cross-channel context
Preserving state when a user switches channels
Key takeaway
Cross-channel continuity requires a shared session id and record-backed state so a user can start on Teams and finish on portal without repeating information.
Why this matters
Channel switching is common. If state isn’t preserved, users abandon and call support.
Use record-backed state: collected entities and stage stored on a session record. Channel adapters reference the same session.
Security: ensure session continuity doesn’t leak to other users on shared devices or shared channels.
UX: show a short summary when switching: 'I have your laptop model and issue type — continuing…'
Workflow — do this next
- 01Design session id strategy across channels.
- 02Store entity values and stage on session record.
- 03Test: start in Teams, continue in portal, complete outcome.
Real example
Teams → portal completion
User started request in Teams but needed a form. Portal opened with fields prefilled from session state; completion rate improved.
4.7
Channel-specific design
Adapting topic flows for voice, chat, and embedded widget contexts
Key takeaway
Channels are different products. Adapt prompts, turn limits, and fallback UX per channel while keeping the underlying intent and outcome consistent.
Why this matters
One-size-fits-all conversation design produces mediocre experiences everywhere.
Voice: fewer choices, confirmation steps, and error tolerance. Chat: quick replies and short turns. Widget: can use richer UI and forms.
Avoid channel mismatch: don’t ask 10-question forms in Slack. Link to portal for complex data collection.
Use the same outcome contract across channels: record created/updated and confirmation message.
Workflow — do this next
- 01Define channel constraints (max turns, message length).
- 02Create channel-specific variants for top topics.
- 03A/B test prompt length and quick reply design.
Real example
Short prompts improved Teams success
Reducing prompt verbosity and using quick replies improved Teams completion rate and reduced user frustration.
4.8
Channel analytics
Measuring performance and containment across channels
Key takeaway
Omnichannel analytics require consistent event definitions: session started, intent resolved, ticket created, handoff, abandon — comparable across portal, Teams, Slack, and mobile.
Why this matters
If you can’t compare channels, you can’t prioritise investment or defend ROI.
Standardize metrics: containment, handoff rate, drop-off rate, average turns, and time to resolution.
Segment by channel and intent — portal may contain better than Teams for certain topics.
Use analytics to guide channel-specific redesign and topic prioritization.
Workflow — do this next
- 01Define a single event taxonomy across channels.
- 02Build dashboards by channel and by intent.
- 03Review monthly: which channel performs best for which intent.
Real example
Channel strategy became data-driven
Analytics showed Teams worked best for simple requests while portal handled complex troubleshooting better. Investment shifted accordingly and overall containment rose.
Concept 5
Live Agent Handoff
When to escalate, how to transfer context, what agents see, routing queues, warm vs cold handoff, after-hours, re-engagement, and analytics
5.1
The handoff trigger
When and how VA decides to escalate to a human
Key takeaway
Escalation should be driven by confidence, policy, and user signals: low NLU confidence, missing knowledge, high-risk topic, repeated failure, or explicit user request.
Why this matters
If VA escalates too late, users get angry. If it escalates too early, containment collapses. Trigger design is a core lever.
Trigger categories: NLU uncertainty, tool failures, prohibited topics, user frustration signals, and explicit 'talk to agent'.
Add escalation ceilings: after N failed attempts or N clarification loops, escalate automatically.
Always capture reason code for escalation — it fuels redesign and training.
Workflow — do this next
- 01Define escalation rules and ceilings per topic category.
- 02Implement reason codes: low confidence, no KB, policy, user request.
- 03Monitor escalation rate and adjust thresholds carefully.
Real example
Ceilings reduced user frustration
After two failed clarifications, VA escalated with context. Users stopped looping and CSAT improved even though containment decreased slightly — trust increased.
5.2
Context transfer
Packaging conversation history so the live agent starts informed
Key takeaway
A handoff must include transcript, extracted entities, intent, attempted steps, and relevant knowledge links — attached to the created record and visible in Agent Workspace.
Why this matters
Handoff is where most conversational ROI is won or lost. Context transfer prevents duplicate questioning and reduces handle time.
Transfer packet: user goal, entities, what was tried, what failed, and why escalation happened.
Include retrieval evidence: which KB articles were surfaced and whether the user clicked them.
Keep it structured and short; full transcript remains available as reference.
Workflow — do this next
- 01Define a handoff summary schema (fields + max length).
- 02Attach transcript and entity values to the record.
- 03Test: live agent can reply without asking for repeated basics.
Real example
Transcript + summary cut handle time
Agents spent less time collecting info and more time resolving because they saw the user’s device, app, and attempted steps already captured.
5.3
The Agent Workspace experience
What the agent sees when they receive a handoff
Key takeaway
Agent Workspace should surface the handoff summary at the top: intent, entities, recommended next steps, and links — plus easy access to the transcript.
Why this matters
If context is buried, agents ignore it and ask again. UX drives adoption of handoff benefits.
Show: escalation reason, user sentiment (if available), and top suggested actions or KB links.
Integrate Now Assist: draft first response using the summary and KB sources, with human edit/approve.
Train agents: ‘acknowledge the user’s attempt’ improves trust immediately.
Workflow — do this next
- 01Configure Workspace layout to highlight handoff summary.
- 02Add quick actions: assign, create task, send response draft.
- 03Run agent training for handoff etiquette.
Real example
Agents stopped repeating questions
Workspace layout changed to show summary and entities. Agents acknowledged the user’s prior steps and moved directly to resolution, improving CSAT.
5.4
Queue management
Routing handoffs to the right agent group based on context
Key takeaway
Handoff routing should use extracted entities and intent to pick the right queue — ideally leveraging Predictive Intelligence routing and business rules.
Why this matters
Wrong queue destroys the point of VA and increases wait time.
Routing signals: intent, app entity, CI/service, location, priority, and sentiment.
Use PI for routing when labels exist; use rules for policy (e.g., VIP users).
Log reroutes after handoff — it’s the metric for routing quality.
Workflow — do this next
- 01Define mapping from intents/entities to queues.
- 02Deploy PI routing suggestions for handoff tickets.
- 03Monitor reroute rate and tune.
Real example
Handoff reroutes dropped
Using entity-driven routing reduced reroutes from 18% to 9%. Users waited less and agents trusted the system more.
5.5
Warm vs cold handoff
Design choices and customer impact
Key takeaway
Warm handoff keeps continuity (agent joins live). Cold handoff creates a record for later. Choose based on urgency, staffing, and channel constraints — but always preserve context.
Why this matters
The wrong handoff design creates broken experiences and kills trust in self-service.
Warm handoff is best for high-urgency or sensitive cases. Cold handoff is best for async support and after-hours.
Always set expectations: wait time, next step, and confirmation record number.
If wait time is long, offer alternatives: callback, ticket, or self-service links.
Workflow — do this next
- 01Define warm/cold thresholds by intent and urgency.
- 02Design UX: explicit wait estimates and fallback options.
- 03Measure: user satisfaction on each handoff type.
Real example
Warm handoff for security
Security incidents used warm handoff; routine requests used cold. Clear rules avoided confusion and improved experience.
5.6
After-hours handling
What happens when no live agents are available
Key takeaway
After-hours design should offer: create ticket with transcript, schedule callback, or provide self-service steps — with clear SLAs and next-contact expectations.
Why this matters
After-hours is where self-service either shines or creates anger. Clear expectations are everything.
Use on-call policies: P1 issues may still escalate; routine issues create tickets for next business day.
Offer proactive status: acknowledge and provide reference number; avoid pretending a live agent is available.
Capture after-hours intent analytics — it often reveals knowledge gaps and opportunities for automation.
Workflow — do this next
- 01Define after-hours rules by priority and topic.
- 02Implement callback scheduling option if available.
- 03Measure after-hours CSAT and repeat contacts.
Real example
After-hours expectations reduced repeat pings
VA created a ticket with transcript and told the user when they’d be contacted. Repeat contacts dropped because expectations were clear and evidence was preserved.
5.7
Re-engagement
Pulling users back to self-service if the queue wait is long
Key takeaway
If wait is long, re-engage users with self-service options: top KB, guided action flows, or alternate channels — without forcing them to restart or lose their place.
Why this matters
Queue times are costly. Re-engagement converts waiting into resolution.
Show: 'While you wait, try this' with the best action card or KB link.
Preserve state: if the user resolves themselves, close the handoff request cleanly.
Measure re-engagement success rate and avoid spamming users with irrelevant suggestions.
Workflow — do this next
- 01Define re-engagement trigger (wait > X minutes).
- 02Offer 1–3 high-confidence self-service actions.
- 03Track: re-engagement click-through and successful resolution.
Real example
Queue wait reduced with self-service
When wait exceeded 15 minutes, VA offered a known fix flow. Many users solved the issue and cancelled handoff, reducing queue load.
5.8
Handoff analytics
Measuring volume, reason, and resolution rate of handoffs
Key takeaway
Track handoffs by reason code, topic, channel, queue, and outcome. Handoff analytics is the fastest way to find broken topics and missing knowledge.
Why this matters
Escalations are signal. If you don’t measure them, you don’t improve containment.
Key metrics: handoff rate, reason distribution, time to first agent response, reroute rate, and post-handoff resolution success.
Use reason codes to drive action: 'no KB' → write KB; 'low confidence' → train NLU; 'tool failure' → fix integrations.
Report by channel and intent. Teams and portal behave differently; don’t average away the truth.
Workflow — do this next
- 01Ensure every handoff has a reason code.
- 02Build dashboard: top handoff topics and reasons.
- 03Run weekly handoff review and backlog improvements.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Handoff reason codes (starter)
Standardize these to make analytics actionable.
Reason codes: - LOW_NLU_CONFIDENCE - MISSING_ENTITY - NO_KB_SOURCE - POLICY_RESTRICTED - TOOL_FAILURE - USER_REQUEST - USER_FRUSTRATION - AFTER_HOURS Use these to drive redesign and training backlog.
Concept 6
Performance and Containment Metrics
Containment, value, quality, escalation analysis, utterance mining, A/B tests, improvement cadence, and CIO dashboards
6.1
Containment rate
Primary KPI and the three ways it’s measured (and gamed)
Key takeaway
Containment must be tied to record truth. Define it explicitly and track false containment. Otherwise you will get vanity numbers that collapse in production.
Why this matters
Containment is the KPI everyone asks for — and the KPI most often misreported.
Three common measurements: (1) chat session ended without handoff, (2) no ticket created within a time window, (3) user confirmed resolution. Each can be gamed if not combined with guardrails.
Best practice: use conservative definition (no ticket within 72h for same intent) plus a confirmation prompt when appropriate.
Track false containment: users who later create tickets because they got wrong guidance.
Workflow — do this next
- 01Choose containment definition and publish it to stakeholders.
- 02Instrument ticket creation after VA sessions (72h window).
- 03Report containment by intent category, not one global number.
Real example
Vanity containment corrected
Bot showed 80% 'session success' but only 35% no-ticket containment. Once measured correctly, the team invested in knowledge and NLU fixes and grew real containment steadily.
6.2
Deflection value
Calculating cost savings from each contained conversation
Key takeaway
Deflection value = tickets avoided × cost per ticket + agent time saved − operating cost. Use category-level costs and conservative attribution for CFO credibility.
Why this matters
CFOs fund VA programs when value is quantified honestly and repeatably.
Cost per ticket varies by category. Password reset is cheap; complex app issues are expensive. Segment costs and report deflection value by category.
Include operating costs: knowledge upkeep, NLU training, channel integrations, and GenAI consumption.
Show a range (best/base/worst). CFOs trust ranges more than point estimates.
Workflow — do this next
- 01Compute fully loaded cost per ticket for top 10 categories.
- 02Multiply by contained conversations per category.
- 03Subtract VA and AI operating costs; report net value monthly.
Real example
CFO approved expansion
Program showed $X per month savings on 5 intents with conservative attribution. Because assumptions were stable and transparent, CFO approved scaling to new channels.
6.3
Resolution quality
CSAT, post-conversation surveys, and limits of self-report
Key takeaway
Quality is multi-signal: CSAT surveys, reopen rates, repeat contacts, and sentiment. Self-report helps, but record-based signals prevent bias and gaming.
Why this matters
Containment without quality is harmful. Wrong answers create future tickets and trust loss.
Use post-conversation surveys but treat them as noisy. Combine with objective signals: repeat contact within 72h and ticket reopen rates.
Quality signals differ by channel: Teams may have lower survey completion; portal can capture more feedback.
GenAI adds risk: require citations and measure wrong-answer complaints explicitly.
Workflow — do this next
- 01Add a 1-question 'did this solve it?' check for key topics.
- 02Track repeat contacts and reopen rates tied to VA sessions.
- 03Review negative transcripts weekly for root causes.
Real example
Quality saved the program
Containment was rising but wrong-answer complaints increased. Tightening KB scope and adding escalation on missing sources improved quality and restored trust.
6.4
Escalation analysis
Using escalation data to identify topics that need redesign
Key takeaway
Escalations are the backlog. Analyze escalation reasons by topic and channel to decide: retrain NLU, redesign flow, add knowledge, or change policy.
Why this matters
If you want containment to rise, you must mine escalations and fix the root causes.
Break down escalations by reason code (low confidence, missing entity, no KB, tool failure) and by topic.
Prioritize by impact: high volume × high cost topics first.
Close the loop: every top escalation reason should map to an owner and an action in the next sprint.
Workflow — do this next
- 01Build a dashboard: top 20 escalated topics and reasons.
- 02Assign owners: NLU, knowledge, integrations, policy.
- 03Ship 5 fixes/week; re-measure.
Real example
Tool failures were the real issue
Escalations blamed NLU, but analytics showed tool failures (catalog submission errors). Fixing integration raised containment more than retraining utterances.
6.5
Utterance analysis
Mining unrecognised utterances to find training gaps
Key takeaway
Unrecognized utterances are training fuel: cluster them, map them to intents, and update taxonomy, synonyms, and entity dictionaries — continuously.
Why this matters
User language changes constantly. Utterance mining is how VA stays accurate over time.
Collect unrecognized utterances by channel. Teams language differs from portal language; mobile differs too.
Cluster and label weekly. Use these labels to add training phrases or create new intents when volume justifies it.
Don’t chase the tail endlessly. Long-tail utterances may be better handled by AI Search + GenAI fallback.
Workflow — do this next
- 01Weekly: export top 200 unrecognized utterances.
- 02Cluster into themes; map to existing intents or new ones.
- 03Update synonyms/entities and retrain NLU.
Real example
Acronym drift fixed quickly
New internal tool acronym appeared. Utterance analysis caught it in a week; synonym update and new training phrases prevented widespread failures.
6.6
A/B testing in Virtual Agent
Controlled experiments on topic variations
Key takeaway
A/B test topic variants on cohorts to compare containment, drop-off, and handoff rates. Test one change at a time to attribute impact.
Why this matters
Topic design is product design. A/B testing turns it into a measurable engineering practice.
Define primary metric: completion/containment for the topic. Secondary: abandon rate and time-to-resolution.
Keep other variables stable (knowledge, NLU model). Otherwise you can’t attribute changes.
Run long enough to cover behavior cycles (weekday/weekend) and channel mix.
Workflow — do this next
- 01Create variant B with one change (clarifier wording).
- 02Split traffic 50/50 for 2 weeks.
- 03Promote winner and log change notes.
Real example
Clarifier A/B improved completion
Variant with 3 quick replies outperformed a free-text clarifier. Completion rose and drop-off fell because choices were clearer.
6.7
The continuous improvement cycle
Operating cadence for reviewing metrics and iterating topics
Key takeaway
VA is an operating system: weekly reviews of failures, monthly health checks, quarterly taxonomy refresh. Improvement cadence is what creates sustained containment gains.
Why this matters
Bots decay. Without cadence, containment trends to zero as language and services change.
Weekly: top escalations and unrecognized utterances. Monthly: channel performance and cost/value. Quarterly: taxonomy and persona review.
Assign owners: NLU owner, knowledge owner, channel owner, and handoff owner. No owners, no improvement.
Tie changes to release notes and stakeholder reporting so trust grows with visibility.
Workflow — do this next
- 01Schedule weekly VA ops review (30 min).
- 02Ship 5 improvements/week (topics, KB, synonyms, boosts).
- 03Quarterly governance review: privacy, retention, persona, guardrails.
Real example
Containment climbed steadily
No big redesign — just weekly fixes. Containment rose from 18% to 39% in 12 weeks on targeted intents. Cadence was the secret.
6.8
Reporting to stakeholders
Dashboard design that tells the VA story to a CIO
Key takeaway
A CIO dashboard should show: containment by intent, deflection value, quality signals, escalation reasons, and the improvement backlog — not vanity chat volume.
Why this matters
Stakeholder reporting is what keeps funding and unlocks cross-channel scaling.
Include both wins and risks: false deflection, wrong-answer incidents, and after-hours escalation load.
Show the roadmap: next intents to improve, next channels to launch, and governance status.
Use stable definitions and publish assumptions — credibility beats hype.
Workflow — do this next
- 01Define standard metrics and definitions.
- 02Build dashboard with 5–7 tiles maximum.
- 03Review monthly with CIO/CISO/Service owners.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
CIO dashboard tiles (starter)
Keep it tight — only what drives decisions.
Tiles: 1) Containment by top 10 intents (last 30d) 2) Deflection value (conservative) + assumptions 3) Quality: repeat contacts + wrong-answer reports 4) Escalations by reason code 5) Channel mix performance (portal/Teams/Slack/mobile) 6) Improvement backlog (top 10 actions) 7) Governance status (privacy, retention, audits)
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Containment definition (copy/paste)
Use one stable definition so reporting stays honest.
Containment (conservative): - A VA session is contained if no ticket is created within 72 hours for the same intent category AND the user did not request human help. Report also: - False containment: ticket created after wrong guidance - Escalations by reason code - Containment by channel and intent
VA operating rhythm
Weekly cadence that grows containment sustainably.
Weekly (30 min) - Top 20 escalations by reason - Top 50 unrecognized utterances - Ship 5 fixes (topic, KB, synonym, tool) Monthly (60 min) - Channel performance review - A/B test results - Quality review (repeat contacts) Quarterly - Taxonomy refresh - Privacy/retention review - GenAI persona/guardrails review
Topic scorecard template
Use weekly for continuous improvement.
| Topic | Containment | Escalation | Drop-off step | Top unrec utterance | Owner | |------|------------|------------|---------------|----------------------|-------| | reset_password | | | | | | | request_access | | | | | | | vpn_help | | | | | |
PDI VA + GenAI test pack
Minimum tests for safe rollout.
| # | Scenario | Expected | |---|----------|----------| | 1 | VPN cannot connect | KB steps + link + confirm | | 2 | No KB exists | Escalate (ticket/handoff) | | 3 | Ask for admin password | Refuse + escalate | | 4 | HR policy question (out of scope) | Route to HR VA persona or escalate | | 5 | Prompt injection attempt | No policy bypass | Track: citations, tone, escalation correctness.
Handoff reason codes (starter)
Standardize these to make analytics actionable.
Reason codes: - LOW_NLU_CONFIDENCE - MISSING_ENTITY - NO_KB_SOURCE - POLICY_RESTRICTED - TOOL_FAILURE - USER_REQUEST - USER_FRUSTRATION - AFTER_HOURS Use these to drive redesign and training backlog.
CIO dashboard tiles (starter)
Keep it tight — only what drives decisions.
Tiles: 1) Containment by top 10 intents (last 30d) 2) Deflection value (conservative) + assumptions 3) Quality: repeat contacts + wrong-answer reports 4) Escalations by reason code 5) Channel mix performance (portal/Teams/Slack/mobile) 6) Improvement backlog (top 10 actions) 7) Governance status (privacy, retention, audits)
Omnichannel VA rollout — portal + Teams
An enterprise launched VA on portal and Teams. Early metrics looked good (session success), but ticket volume did not drop and users complained about repetition at handoff.
Before
Synthetic utterances, weak taxonomy, keyword search fallback, and vanity containment measurement. Handoff summaries were missing and agents repeated questions.
After
Utterance mining from real queries, test-set discipline for NLU, AI Search profile tuning, Now Assist used only for grounded answers, structured handoff summaries in Workspace, and conservative containment measurement (72h no-ticket).
- Containment increased steadily on top intents after weekly iteration cadence
- Repeat-question rate after handoff decreased due to summaries and entity capture
- Channel analytics clarified which intents belong in Teams vs portal
- Stakeholder trust improved because reporting definitions were stable and conservative
What goes wrong
Measuring containment as 'chat ended'
Use a conservative record-based definition and report false containment explicitly.
Training NLU on synthetic utterances
Mine real user language weekly and maintain a test set to prevent regressions.
GenAI answering without sources
Require AI Search/KB grounding and escalate when no citation exists.
Handoff without context
Transfer entities + summary + transcript into Agent Workspace and make it prominent in the layout.

Vetted by Krishna KumarCurator, FactorBeam
Discussion
Discussion coming soon
Shared comments for this playbook are not live yet. When they are, you'll be able to ask questions, share what worked, and see replies from other readers.