Standalone article · part of a sequenced guide

What you'll unlock: Production Cowork is 60–80% automated prep with explicit human gates — reliability and governance are features, not paperwork.

View full guide New here? Start Chapter 1

Tool guideChapter 5 of 7

Production-Grade Cowork Operations

~150 min read

Running Cowork as a serious operational system — reliability, governance, and the discipline that makes automation trustworthy

Chapter context

Your Cowork pilots work on your machine — then someone asks 'can we bet the Monday board pack on this?' Chapter 5 is the answer: production-grade operations.Without this chapter, automation stays a hero skill; with it, Cowork becomes operational infrastructure finance and security can endorse.

Is this chapter for you?

Will stakeholders act on output without re-checking sources?

Production readiness gate (§1.1) + validation Skill (§1.2) mandatory.

Does workflow touch confidential data or external sends?

T3 approval + data classification (§2.2–2.4) before schedule.

Are multiple people creating Skills?

Team RBAC + CoE review (§2.6, §3.5) before shared production.

Does leadership ask for ROI on AI spend?

Token ledger (§1.7) + ROI worksheet (§3.6) from month one.

Chapters 1–4 built skills, schedules, pipelines, and connectors. Chapter 5 is how you run Cowork as production infrastructure — trustworthy enough for leadership and finance to depend on.You will implement validation, alerting, review queues, minimum access, change management, and the organisational patterns that compound over years.

Chapter insight

Production Cowork is 60–80% automated prep with explicit human gates — reliability and governance are features, not paperwork.

Reference diagrams

Production workflow gates

Nothing reaches production without validation — uncertainty routes to review.

Skill runTAR executeTransform

ValidateDeterministic QAGate

ReviewT2 queueHuman

Approveflag / T3Governance

EmitProd pathDeliver

Governance layers

Access, data class, audit — stacked before unattended schedules.

Min accessPer Skill scopeSecurity

HITL tierT0–T3Approval

Change mgmtVersion + regressionQuality

Auditrun_id trailAccountability

Implementation paths

Trust comes from gates and logs — not from hoping the model behaved.

Concept 1

Reliability & Quality Control

Making Cowork workflows reliable enough to run without supervision — the engineering discipline for automated operations

1.1

The reliability requirements for automated Skills

What you need to be true before a Skill runs without review

Key takeaway

Production gate: five clean runs, defined acceptance tests, failure policy, owner, and HITL tier — all documented before unattended schedule.

Why this matters

Teams schedule on enthusiasm; production requires a checklist security and finance would sign.

Before unattended production, verify: (1) TAR spec signed off. (2) Test matrix passed (Ch 2 §1.8). (3) Output validation Skill attached. (4) Alert on failure configured. (5) Rollback path documented. (6) Token estimate within budget. (7) HITL tier explicit.

Reliability is tiered: T0 auto-deliver only for low-stakes internal transforms; T1 notify; T2 staging review (default); T3 approval flag before external write.

Label workflow specs NATIVE vs PATTERN so reviewers know what the product guarantees vs what your team implements.

Workflow — do this next

01Complete PRODUCTION_READINESS checklist (artifact).
02Stakeholder sign-off on HITL tier.
03Enable schedule only after checklist green.
04Re-run checklist on any Skill version bump.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Production readiness gate

[ ] TAR spec + owner
[ ] Test matrix 5/5
[ ] QA/validation Skill attached
[ ] Failure alerts configured
[ ] HITL tier documented
[ ] Token estimate approved
[ ] Rollback procedure written

1.2

Output validation

How to verify that a Skill produced acceptable output — the quality check that runs after the Skill

Key takeaway

Validation is deterministic code-style checks on outputs — schema, counts, reconciliations — not 'ask Claude if it looks good.'

Why this matters

LLM self-review of its own output is circular; production needs measurable gates.

VALIDATE_* Skills run after primary transform: required files exist, JSON schema matches, totals reconcile to source within tolerance, banned patterns absent. Emit VALIDATION.json Pipeline blocks downstream on FAIL.

Separate validation from narrative QA — numbers first, prose second. Golden fixtures from human baseline become regression oracles.

Workflow — do this next

01List 5–10 deterministic checks per workflow.
02Implement VALIDATE Skill or script gate.
03Block emit/notify on FAIL.
04Log validation results in manifest.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Validation check template

checks:
  - name: schema_match
  - name: row_count_delta < 0.2
  - name: revenue_sum_matches_source
  - name: no_pii_in_output
on_fail: stop_pipeline + notify

1.3

Failure alerting

How to know immediately when a Skill fails rather than discovering it hours later

Key takeaway

Fail loud, succeed quiet — failure alerts are immediate, paginated for critical workflows; success digests bundle.

Why this matters

Discovering failure at standup means the automation already failed the business once.

Alert payload: workflow name, run_id, failed Skill + version, error class, partial outputs path, owner @mention. Severity: P1, P2, P3

No alert on success except daily digest. Test alerts monthly — muted channels kill programs silently.

Workflow — do this next

01Map workflows to P1/P2/P3.
02Wire failure → correct channel per tier.
03Include run_id in every alert.
04Monthly fire-drill: simulate failure.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Failure alert template

[P2] Cowork FAIL: {workflow}
run_id: {id} | Skill: {name}_v{n}
error: {class} — {message}
partial: {paths}
owner: @{handle}

1.4

Graceful degradation

Skills that produce partial output gracefully rather than failing silently when inputs are incomplete

Key takeaway

Degraded success is explicit — manifest flags missing inputs; output sections say 'unavailable' — never fabricate or omit silently.

Why this matters

Silent omission on missing Gmail or CSV reads produced board-ready lies — degradation must be visible.

Pattern: per-input try/fetch → on fail append to degraded_inputs[] in manifest → primary Skill still runs on available data → Result template includes Data completeness

Do not mark run SUCCESS if P1 input missing — use PARTIAL status and route to review queue.

Workflow — do this next

01Classify inputs required vs optional.
02Required missing → FAIL or PARTIAL+review.
03Optional missing → degrade with flag.
04Template section for completeness.

Real example

Monday brief with missing CRM export

manifest: degraded_inputs: [crm_export]. Brief includes Finance + Slack sections; CRM section states 'export not found — pipeline metrics omitted.' Status PARTIAL → review queue.

1.5

The review queue

Designing workflows so uncertain outputs land in a human review queue rather than going straight to a destination

Key takeaway

Review queue = folder or ticket list with review.md, confidence scores, and SLA — default destination for T2 workflows.

Why this matters

Uncertainty is normal; routing it to humans is design, not failure.

Route to queue when: validation WARN (not FAIL), classifier confidence < threshold, sensitivity keywords hit, or PARTIAL degraded status. Queue item: output files + REVIEW_BRIEF.md (what to check, 60-second scan guide).

Queue SLA: P1 items 4h, P2 24h. Stale queue alerts ops lead. Approve → move to prod path + approval.flag; reject → archive + feedback to Skill owner.

Workflow — do this next

01Create ~/Cowork/review_queue/{workflow}/.
02Define routing rules in pipeline router.
03REVIEW_BRIEF template per workflow.
04Weekly queue hygiene meeting.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

REVIEW_BRIEF.md template

# Review — {workflow} / {run_id}
## 60-second check
- [ ] Key number: ___
## Confidence / flags
## Approve → move to {prod_path}

1.6

Skill regression testing

How to verify that a Skill still performs correctly after you modify it — the regression check before redeployment

Key takeaway

Golden fixtures + diff against expected outputs — rerun full test matrix on every version bump before schedule update.

Why this matters

Small TAR edits break column order or totals; regression catches drift before Monday 6am.

Maintain REGRESSION_FIXTURES/ per Skill: anonymised inputs + expected outputs (json/md). On vN+1: run all fixtures; diff key fields; human spot-check if diff tool flags change. Block promotion if golden test fails.

Changelog in SKILL_SPEC: what changed, why, which fixtures updated. Pipelines: regression entire DAG when any node bumps major version.

Workflow — do this next

01Capture golden fixtures at v1 promotion.
02Automate diff on key fields where possible.
03Regression gate in change management (Ch 5 §2.5).
04Archive old fixtures when schema versions sunset.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Regression run log

| Skill version | fixture | key_fields_match | reviewer | date |
|---------------|---------|------------------|----------|------|

1.7

Token usage monitoring

Tracking how much each Skill costs in tokens — the cost visibility that prevents budget surprises

Key takeaway

Per-workflow token ledger — estimate from manual runs, alert on 2× weekly average, finance digest monthly.

Why this matters

Unmonitored schedules discover budget at invoice — too late for trust.

Track: tokens per run_id, per Skill, per workflow, per week. Cowork run history + export to TOKEN_LEDGER.csv. Compare actual vs estimate; investigate outliers (duplicate runs, missing pagination caps).

Cost controls: model routing, max_files, chunking, condition triggers vs empty cron. ROI section (§3.6) uses token cost as input.

Workflow — do this next

01Record baseline tokens from 5 manual runs.
02Set weekly budget per workflow.
03Alert at 80% and 120% of budget.
04Monthly finance review with ledger export.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

TOKEN_LEDGER.csv columns

week, workflow, run_count, tokens_total, tokens_per_run, budget, variance_pct

1.8

The production Skill audit

The periodic review of your running Skills — what to check, how often, and what to do when a Skill is underperforming

Key takeaway

Quarterly audit every production Skill: still needed, still passing regression, owner active, permissions minimal, token trend OK.

Why this matters

Skill libraries rot — duplicates, stale schedules, and zombie workflows erode trust and budget.

Audit checklist per Skill: last 10 runs success rate, validation pass rate, queue rejection rate, token trend, incident count, owner, downstream dependents. Outcomes: keep, tune, deprecate, merge duplicate.

Underperforming: freeze schedule → root cause (Task, Action, Result, input drift) → fix version → regression → re-enable. Kill Skills unused 90 days unless compliance-mandated.

Workflow — do this next

01Export SKILL_INDEX + run stats quarterly.
02Score each Skill keep/tune/kill.
03Assign owners for tune items.
04Document deprecations with migration path.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Skill audit scorecard

| Skill | success% | validation% | tokens/wk | incidents | action |
|-------|----------|-------------|-----------|-----------|--------|

Concept 2

Governance & Safety

The oversight structures that make autonomous operation safe — what a professional Cowork deployment looks like

2.1

The principle of minimum access

Giving Cowork access to only the systems and files it genuinely needs for each Skill

Key takeaway

Scope per Skill, not per machine — narrow folders, connector allowlists, and model tools are the default.

Why this matters

Broad access simplifies day-one demos and complicates day-90 incident response.

Minimum access matrix: each Skill lists read paths, write paths, connectors, channels. No Skill gets 'whole Drive' or 'all Gmail.' Quarterly permission review removes paths no longer referenced in TAR specs.

Separate sandbox profile from production profile

Workflow — do this next

01Document allowlists in every TAR spec.
02Remove unused paths from Cowork settings.
03Quarterly access review with IT.
04Deny by default for new connectors.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Skill access matrix

| Skill | read | write | connectors | HITL |
|-------|------|-------|------------|------|

2.2

Sensitive data handling in Cowork

What to never put in a Skill, how to handle credentials, and the data hygiene for automated workflows

Key takeaway

Never in Skills: secrets, full customer PII dumps, unretractable instructions to exfiltrate. Credentials in connector store only.

Why this matters

Skill files copy to wikis, git, and logs — treat them as published documentation.

Data classes: public, internal, confidential, restricted. Restricted workflows require T3 approval, encrypted staging if policy demands, retention limits on raw/. Redact logs; manifest records counts not content for mail/message bodies.

Prohibited in TAR: API keys, passwords, SSN patterns in examples, 'send all files to external URL.' Train teams on prompt injection via fetched docs (Ch 2 §1.7).

Workflow — do this next

01Classify each production workflow data tier.
02Scan Skills for secret patterns quarterly.
03Redact manifest logs for confidential runs.
04Incident playbook if PII leaked to wrong folder.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Cowork data hygiene rules

- No secrets in Skills
- Staging retention per class
- Restricted → T3 + encrypted staging if required
- Manifest: metadata only for mail bodies

2.3

Output destinations and access control

Where Skill outputs go and who can see them — the data governance of automated output

Key takeaway

Three-tier paths: sandbox, staging (review), production — ACLs match human access; automation does not bypass folder permissions.

Why this matters

Cowork writing board packs to a world-readable folder is a governance failure, not a Skill bug.

Map destinations: who can read, who can write, who gets notified. Production paths require validation PASS + HITL tier met. Shared drives: use service account with explicit ACL, not personal home directory.

Version outputs by run_id or date — never overwrite sole copy. Deprecation: archive old reports, do not delete without retention policy check.

Workflow — do this next

01Define path taxonomy in OPS_WIKI.
02ACL review per production destination.
03Automated writes only to staging until promoted.
04Audit random sample of output ACLs quarterly.

2.4

The human approval gate

Designing Skills with an explicit approval step before output is delivered or action is taken

Key takeaway

Approval = explicit artifact (approval.flag, ticket ID, button in review queue) — not 'someone probably saw the Slack message.'

Why this matters

Ambiguous approval is how automated emails and customer-facing docs slip out.

HITL tiers (Ch 1): T0 none (rare), T1 notify, T2 review queue (default), T3 dual approval for external/customer/restricted. SEND_* Skills check approval/{id}.flag exists and signer != Skill runner identity.

Approval audit: who approved, when, run_id, output hash. Revoke approval.flag on output edit — require re-approval.

Workflow — do this next

01Assign HITL tier per workflow in SOP.
02Implement approval.flag pattern for writes/sends.
03Log approvals in manifest.
04Legal sign-off for T3 customer comms.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

approval.flag spec

{
  "run_id": "...",
  "approver": "user@company.com",
  "approved_at": "ISO8601",
  "output_hash": "sha256..."
}

2.5

Skill change management

How to update a running Skill safely without disrupting downstream workflows

Key takeaway

Version bump → regression → staged rollout → schedule pointer update — never edit production Skill in place without version suffix.

Why this matters

In-place edits break audit trail and leave pipelines on unknown behaviour mid-week.

Change flow: PR to TAR spec → implement SKILL_vN+1 → regression fixtures → parallel run in shadow mode (write sandbox only) → compare outputs → flip workflow to new version → monitor 3 runs → deprecate vN.

Breaking handoff schema: bump schema_version; support N-1 for one sprint; notify downstream Skill owners.

Workflow — do this next

01Never edit v1 in place — create v2.
02Shadow run 3 cycles before cutover.
03Update SKILL_INDEX version column.
04Announce breaking changes in #ops.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Skill change management checklist

[ ] TAR PR reviewed
[ ] Version _vN+1 created
[ ] Regression pass
[ ] Shadow runs OK
[ ] Workflow pointer updated
[ ] 3 prod runs monitored

2.6

Cowork for teams

The governance model for shared Skills and shared connectors — who can create, modify, and schedule

Key takeaway

RBAC: creators draft Skills, owners approve production, schedulers bind triggers, admins manage connectors — four roles minimum.

Why this matters

Shared Cowork without roles becomes nobody's problem until something breaks.

Roles: Creator, Owner, Scheduler, Admin. Shared library in git or drive; changes via PR not direct UI edit for production Skills.

Team connectors use service identity; personal OAuth only for individual sandboxes.

Workflow — do this next

01Define roles in COWORK_GOV.md.
02Assign owner per production Skill.
03Require owner approval for schedule changes.
04Quarterly role recertification.

2.7

Audit logging for Cowork

Maintaining a record of what Cowork did, when, and what it produced — the accountability trail

Key takeaway

Correlate run history + manifest + connector audit + approval flags — answer 'what happened?' in one run_id lookup.

Why this matters

Regulators and security ask for trails; building them after an incident is too late.

Log retention per policy (e.g. 90d run metadata, 1y approval records). Export monthly AUDIT_BUNDLE for compliance. Include: trigger, Skills versions, paths read/written, connector calls, validation result, approver.

Platform caveat: Cowork activity may not yet appear in org Compliance API or all enterprise audit logs. Your AUDIT_BUNDLE is the authoritative trail today alongside connector vendor logs.

Workflow — do this next

01Standardise manifest schema across workflows.
02Monthly audit export to compliance folder.
03Verify Compliance API coverage quarterly (Ch 7 §2.7).
04Test run_id lookup drill quarterly.
05Align retention with legal.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Monthly AUDIT_BUNDLE contents

run_history_export.csv
manifests/{month}/
approval_flags/
connector_audit_summary.json

2.8

The Cowork security review

The periodic assessment of your Cowork configuration for security risks, stale permissions, and unnecessary access

Key takeaway

Semi-annual security review: permissions, connectors, offboarded users, Skill content scan, incident retrospective.

Why this matters

Automation attack surface grows with every connector and scheduled write.

Review agenda: filesystem scope vs SKILL_INDEX, connector OAuth scopes, revoked employees, Skills with web/MCP tools, staging retention, failed auth attempts, injection near-misses. Output: findings + remediation owners + dates.

Pair with IT security; use same checklist as Ch 4 §1.8 connector audit. Red findings block new production schedules until resolved.

Workflow — do this next

01Schedule semi-annual review on calendar.
02Run automated secret scan on Skill repo.
03Verify offboarding revoked Cowork access.
04Publish findings to leadership.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Cowork security review checklist

[ ] Filesystem scope minimal
[ ] Connector scopes current
[ ] Offboarded users revoked
[ ] No secrets in Skills
[ ] Staging retention compliant
[ ] Incident log reviewed

Concept 3

Scaling Your Cowork Operation

Growing from a few helpful Skills to a comprehensive operations layer — the scaling principles and the organisational discipline

3.1

The Cowork inventory

Mapping every recurring task in your role to a potential Cowork Skill — the audit that reveals the full automation opportunity

Key takeaway

Full inventory = every recurring task with frequency, minutes, inputs, automate-now/next/never — the master backlog for your COO layer.

Why this matters

Ad hoc Skill sprawl misses high-ROI work; inventory reveals the 80% you have not touched.

Audit sources: calendar repeats, email templates, weekly reports, standup rituals, close checklists. Columns: task, cadence, mins/run, data source, Skill exists?, priority score. Revisit quarterly as roles change.

Cross-reference Ch 3 §1.8 schedule inventory and Ch 2 SKILL_INDEX — one master COWORK_INVENTORY.md.

Workflow — do this next

01Block 2h — list all recurring tasks 90 days.
02Score frequency × minutes.
03Mark automate-now vs human-only.
04Feed top 10 into roadmap.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

COWORK_INVENTORY.md

| Task | Cadence | min/run | Source | Skill? | Priority | Status |
|------|---------|---------|--------|--------|----------|--------|

3.2

The automation priority matrix

Which tasks to automate first based on frequency, time cost, and automation reliability — the sequencing framework

Key takeaway

Prioritise high-frequency × high-minutes × high-reliability — quick wins build trust for harder workflows.

Why this matters

Automating rare complex judgment first kills programs; sequencing matters as much as tooling.

Matrix axes: Impact, Feasibility. Quadrant 1 (high/high): automate now. Q2 (high impact, low feasibility): invest in Skill design. Q3: batch later. Q4: never automate.

Reliability score: can you write acceptance tests? If no, defer until yes.

Workflow — do this next

01Plot inventory items on 2×2.
02Pick 3 Q1 items for next sprint.
03One connector workflow max per quarter early on.
04Replot after each production launch.

Real example

Q1 vs Q4

Q1: weekly CSV normalise (weekly, 45 min, schema stable). Q4: negotiate enterprise contract (yearly, judgment-heavy) — human-only.

3.3

Skill interdependency mapping

Understanding which Skills depend on others and how to manage the dependency graph as it grows

Key takeaway

Dependency graph: nodes = Skills, edges = handoff manifests — version bumps ripple; map before you change.

Why this matters

Upstream schema change without downstream update breaks pipelines silently at join step.

Maintain DEPENDENCY_GRAPH.md or diagram: FINANCE_INGEST → NORMALISE → VARIANCE → MEMO. Tag each edge with schema_version. Before deprecating Skill, list dependents and migration plan.

Critical path: workflows on board-week calendar — change freeze 7 days before. Non-critical Skills can iterate faster.

Workflow — do this next

01Draw graph for all production pipelines.
02Label schema_version on edges.
03Change freeze policy for critical path.
04Update graph on every new pipeline.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Dependency graph notation

INGEST_v2 --[manifest v1]--> ANALYSE_v3 --[metrics.json]--> REPORT_v1

3.4

Cowork for a team operation

Scaling from personal automation to a shared operational layer — the Skills and governance that work at team scale

Key takeaway

Shared layer = central SKILL_INDEX, shared connectors, named owners, team review queue — not everyone's laptop with different paths.

Why this matters

Personal Cowork genius does not scale; shared library and ops machine do.

Team patterns: dedicated ops host or VM always on for schedules; shared staging/review paths; Slack #cowork-ops for failures; weekly 15min triage rotation. Personal sandboxes fork shared Skills with PATH_MAPPING.

Avoid N duplicate Skills — merge FINANCE_MEMO_v3 for all analysts; parameterise client folder in Context.

Workflow — do this next

01Stand up shared ops environment.
02Consolidate duplicate Skills.
03Rotate triage duty weekly.
04Single source of truth for SKILL_INDEX.

3.5

The Cowork centre of excellence

Building a small internal capability for Skill development, quality assurance, and library management

Key takeaway

CoE = 1–2 operators + playbook — TAR reviews, regression gates, library hygiene — not a 20-person platform team.

Why this matters

Without CoE, quality variance across creators erodes trust; with CoE, teams ship Skills faster safely.

CoE services: TAR review office hours, regression fixture templates, connector allowlist maintenance, quarterly audits, training onboarding (§3.7). Metrics: production Skills count, success rate, time-to-production for new Skills.

CoE does not own every workflow — domain owners own Skills; CoE owns standards and gates.

Workflow — do this next

01Name CoE lead (often ops or chief of staff).
02Publish standards doc from this playbook.
03Weekly 30min office hours for Skill authors.
04Track CoE metrics monthly.

3.6

Measuring Cowork ROI

How to quantify the time, cost, and quality impact of your Cowork automation — the business case for continued investment

Key takeaway

ROI = (hours saved × loaded rate) − token cost − ops overhead — track monthly per workflow, not vibes.

Why this matters

Finance funds what you measure; hero stories do not survive budget season.

Hours saved: (manual baseline mins − review mins) × runs/month. Quality: error rate, missed deadline count, queue rejection rate. Cost: TOKEN_LEDGER + ops host + CoE time. Report simple dashboard to leadership quarterly.

Include intangible: faster Monday standup, fewer Sunday nights — but anchor on measurable hours first.

Workflow — do this next

01Baseline manual time before automation.
02Track review time post-automation.
03Subtract token + infra cost.
04Quarterly ROI slide for leadership.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Cowork ROI worksheet

hours_saved_mo = (manual_mins - review_mins) * runs / 60
value = hours_saved * loaded_rate
cost = tokens_mo + infra
ROI = value - cost

3.7

Cowork onboarding for new team members

Using the Skill library as a way to transfer operational knowledge — the documentation and training pattern

Key takeaway

Onboarding = read SKILL_INDEX → sandbox golden tests → shadow triage → own one Skill — ops knowledge in artifacts, not tribal chat.

Why this matters

New hires historically relearned Monday rituals by osmosis; Skill library is the curriculum.

Week 1: playbook Ch 1–2, run manual workflows, read TAR specs for team Skills. Week 2: fix one documentation gap, pass one regression fixture. Week 3: triage rotation. Week 4: propose automate-next from inventory.

Pair with PATH_MAPPING for their sandbox; share bundles (Ch 2 §3.8) standardised.

Workflow — do this next

01Create ONBOARDING.md linking SKILL_INDEX.
02Assign buddy + one sandbox workflow.
03Golden test completion = gate to prod access.
0430-day feedback to CoE.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Cowork onboarding checklist

[ ] Read playbook Ch 1-2
[ ] Run 3 team workflows manually
[ ] Pass golden test in sandbox
[ ] 1 triage shift
[ ] Update one TAR doc

3.8

The fully automated operation

What it looks like when Cowork is running a significant portion of your operational workload — the vision and the discipline that makes it possible

Key takeaway

Full automation is 60–80% prep and review, not 100% unattended — humans at gates, machines on cadence, audit always on.

Why this matters

Pursuing 100% unattended sets up disappointment; realistic vision sustains multi-year programs.

Mature state: morning triage 15 min; Sunday nights gone; board inputs staged Friday; failures rare and loud; SKILL_INDEX 30+ Skills with owners; ROI positive; security reviews clean. Humans decide; Cowork prepares.

Discipline never ends: regression on change, quarterly audits, inventory refresh, CoE standards, no secret bypass of HITL for 'just this once.'

Workflow — do this next

01Define your 12-month maturity targets.
02Measure % ops hours on review vs manual prep.
03Celebrate gates working, not zero humans.
04Annual retrospective — playbook update.

Real example

Mature ops lead Monday

6:15 triage dashboard — all green. 6:20 skim brief + weekly report in review queue — 2 flags, approve. 6:35 standup. Cowork ran 6 workflows overnight; human time 20 min not 90.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Production readiness gate

Checklist before unattended production schedule.

Test matrix · validation · alerts · HITL · token budget · rollback

COWORK_INVENTORY.md

Master map of recurring tasks to Skills.

Task · cadence · minutes · Skill · priority · status

Cowork ROI worksheet

Monthly value vs token + infra cost.

hours_saved × loaded_rate − tokens − infra

Production readiness gate

[ ] TAR spec + owner
[ ] Test matrix 5/5
[ ] QA/validation Skill attached
[ ] Failure alerts configured
[ ] HITL tier documented
[ ] Token estimate approved
[ ] Rollback procedure written

Validation check template

checks:
  - name: schema_match
  - name: row_count_delta < 0.2
  - name: revenue_sum_matches_source
  - name: no_pii_in_output
on_fail: stop_pipeline + notify

Failure alert template

[P2] Cowork FAIL: {workflow}
run_id: {id} | Skill: {name}_v{n}
error: {class} — {message}
partial: {paths}
owner: @{handle}

REVIEW_BRIEF.md template

# Review — {workflow} / {run_id}
## 60-second check
- [ ] Key number: ___
## Confidence / flags
## Approve → move to {prod_path}

Regression run log

| Skill version | fixture | key_fields_match | reviewer | date |
|---------------|---------|------------------|----------|------|

TOKEN_LEDGER.csv columns

week, workflow, run_count, tokens_total, tokens_per_run, budget, variance_pct

Skill audit scorecard

| Skill | success% | validation% | tokens/wk | incidents | action |
|-------|----------|-------------|-----------|-----------|--------|

Skill access matrix

| Skill | read | write | connectors | HITL |
|-------|------|-------|------------|------|

Cowork data hygiene rules

- No secrets in Skills
- Staging retention per class
- Restricted → T3 + encrypted staging if required
- Manifest: metadata only for mail bodies

approval.flag spec

{
  "run_id": "...",
  "approver": "user@company.com",
  "approved_at": "ISO8601",
  "output_hash": "sha256..."
}

Skill change management checklist

[ ] TAR PR reviewed
[ ] Version _vN+1 created
[ ] Regression pass
[ ] Shadow runs OK
[ ] Workflow pointer updated
[ ] 3 prod runs monitored

Monthly AUDIT_BUNDLE contents

run_history_export.csv
manifests/{month}/
approval_flags/
connector_audit_summary.json

Cowork security review checklist

[ ] Filesystem scope minimal
[ ] Connector scopes current
[ ] Offboarded users revoked
[ ] No secrets in Skills
[ ] Staging retention compliant
[ ] Incident log reviewed

COWORK_INVENTORY.md

| Task | Cadence | min/run | Source | Skill? | Priority | Status |
|------|---------|---------|--------|--------|----------|--------|

Dependency graph notation

INGEST_v2 --[manifest v1]--> ANALYSE_v3 --[metrics.json]--> REPORT_v1

Cowork ROI worksheet

hours_saved_mo = (manual_mins - review_mins) * runs / 60
value = hours_saved * loaded_rate
cost = tokens_mo + infra
ROI = value - cost

Cowork onboarding checklist

[ ] Read playbook Ch 1-2
[ ] Run 3 team workflows manually
[ ] Pass golden test in sandbox
[ ] 1 triage shift
[ ] Update one TAR doc

Series B — Cowork as approved ops infrastructure

A 60-person company ran successful Cowork pilots but security blocked company-wide rollout after an over-scoped connector incident.

Before

No validation gates, personal OAuth on production, Skills edited in place, no ROI data, hero-dependent triage.

After

Chapter 5 playbook adoption: production readiness gate, team RBAC, CoE office hours, quarterly security review, ROI dashboard — security approved shared ops host.

Production incidents → 3 in Q1 to 0 in Q2 after validation gates
Security review → passed with min-access matrix
Documented ROI → $42k annualised value vs $8k token+ops cost
Onboarding → new ops hire productive in 2 weeks via SKILL_INDEX

What goes wrong

Scheduling without validation — confident wrong numbers reach leadership.

§1.1 readiness gate + §1.2 deterministic validation before emit.

Editing production Skills in place — no regression, unknown behaviour.

§2.5 version bump + shadow runs + regression fixtures.

Everyone admin on connectors — security shutdown.

§2.1 min access + §2.6 RBAC + semi-annual §2.8 review.

Automation sprawl — 40 Skills, 12 duplicates, no owners.

§1.8 quarterly audit + §3.1 inventory + §3.5 CoE consolidation.

Vetted by Krishna KumarCurator, FactorBeam

Discussion

Discussion coming soon

Shared comments for this playbook are not live yet. When they are, you'll be able to ask questions, share what worked, and see replies from other readers.