Data as of: 2026-03-27
TL;DR
- Agentic AI goes far beyond chatbots: it plans, executes multi-step tasks, uses external tools and APIs, maintains memory, and self-corrects—all with minimal human prompting.
- The California market is setting the pace nationally: Salesforce (San Francisco) processed over 3.2 trillion Agentforce tokens in fiscal Q3 2026, achieving an 84% support-case resolution rate with only 2% of requests requiring human escalation.
- Santa Monica-based Bonsai Health engaged 235,000+ patients and scheduled over 36,000 medical appointments autonomously; Palo Alto’s Penguin AI targets 90–120 day ROI from agentic healthcare workflows.
- Key risks include hallucinations, privacy leaks, cost spikes, and California’s new CPPA Automated Decision-Making Technology (ADMT) rules (finalized July 2025), which impose opt-out rights and transparency obligations on systems that substantially replace human decisions.
- Start with one high-volume, well-scoped workflow that has clear success metrics and an accessible API surface—then instrument it before you scale.
1. What Is Agentic AI? Definition and Taxonomy
Conventional AI tools wait to be asked. You provide a prompt, the model returns an answer, and the exchange ends. Agentic AI systems operate differently: they receive a goal, decompose it into subtasks, decide what actions to take, call external tools and APIs, observe the results, update their plans, and iterate until the objective is met—or until a human checkpoint stops them.
The five capabilities that separate agentic systems from ordinary large language model (LLM) assistants are:
Planning and decomposition. The agent breaks a complex goal into a sequence of smaller steps, reasoning about dependencies and ordering.
Tool use and integrations. Agents can call web search, code execution environments, CRMs, databases, email clients, calendar APIs, or any service exposed via a function interface. Importantly, they choose which tool to use and when.
Memory and state. Short-term memory (the active context window) stores intermediate results. Longer-term memory—external vector databases or structured stores—lets agents recall past interactions or organizational knowledge across sessions.
Multi-agent coordination. Large workflows can be decomposed across specialized sub-agents: a planning agent, a research agent, a writing agent, a QA agent. A coordinator orchestrates them, collects results, and synthesizes a final output.
Evaluation loops and self-correction. Agents review their own outputs against a goal or rubric, detect failures, and retry with a different approach—without waiting for a human to notice the error.
When to use agentic AI vs. simpler automation: A useful suitability checklist asks five questions. Is the workflow multi-step with conditional branching? Does it require pulling data from multiple systems? Does the “right” action vary meaningfully with context? Is there a clear, measurable success criterion? And is there a safe way to intervene if the agent makes an error? If the answer to most questions is yes, agentic AI is worth evaluating. If the task is linear and rule-based with a fixed decision tree, conventional RPA or a simple prompt-chained LLM will be cheaper and more reliable.

2. Why Now? Market Signals and Technical Readiness
Three converging forces made 2025–2026 the breakout window for production agentic AI.
Model capability crossing a threshold. Reasoning models released in 2025—including Google’s Gemini 2.5/3 series, OpenAI’s o-family, and Anthropic’s Claude—can reliably decompose goals, plan over many steps, and recover from errors. Gartner reported a 1,445% surge in enterprise inquiries about multi-agent AI systems from Q1 2024 to Q2 2025. [Source: Gartner data cited by Kala Bio/Globe Newswire, March 2026, globenewswire.com]
Cost-performance inflection. API pricing for frontier models dropped sharply through 2024–2025, bringing per-task costs into ranges where automation produces positive ROI on workflows that previously required human labor.
Market size and investment. The global agentic AI market is projected to grow from roughly $5–28 billion in 2024 (estimates vary by scope) to $103–196 billion by 2034, at compound annual growth rates in the 35–42% range. [Source: Precedence Research via California Management Review, August 2025, cmr.berkeley.edu; Devcom analysis, February 2026, devcom.com] How we know: Multiple independent research firms converge on high-growth projections; the specific figures vary because scope definitions differ, but directional consensus is strong.
Enterprise adoption evidence. A 2025 Gravitee survey found that approximately 72% of medium and large enterprises were already using agentic AI, with an additional 21% planning adoption within two years. [Source: Gravitee survey data cited in Devcom, February 2026] Deloitte predicted that 25% of companies using generative AI would launch agentic pilots in 2025, growing to 50% by 2027. [Source: California Management Review, August 2025, cmr.berkeley.edu] That said, a McKinsey “State of AI” survey found that only a minority of businesses had agents fully scaled company-wide; most remained in experimentation mode. [Source: Fortune / McKinsey, December 2025, fortune.com] The honest picture: early adopters with well-scoped pilots are seeing strong results; organization-wide deployment remains hard.
3. How California Companies Are Using Agentic AI: Case Studies
Case Study 1 — Salesforce Agentforce (San Francisco, SaaS/CRM)
Business problem (baseline): Salesforce’s own customer support portal, help.salesforce.com, handled hundreds of thousands of monthly customer service interactions that required routing, research, and resolution—many of them repetitive.
Agentic AI solution: Salesforce deployed Agentforce, its own multi-agent platform, on the support portal. Agents autonomously interpret customer queries, retrieve relevant documentation via retrieval-augmented generation (RAG), execute tool calls against Salesforce’s data graph, and either resolve cases or escalate to human agents with a pre-populated context summary.
Governance and safeguards: Agentforce includes a built-in Trust Layer for data privacy, bias mitigation, and hallucination reduction. The 2% escalation rate implies the system correctly identifies the cases it cannot resolve.
Measured impact: Agentforce handled 380,000 conversations at an 84% full resolution rate, with only 2% requiring human escalation. By Q1 FY2026, it had handled over 750,000 requests, cutting overall support case volume by 7% year-over-year. Agentforce ARR surpassed $500 million in Q3 FY2026, up 330% year-over-year, making it Salesforce’s fastest-growing product ever. Across external customers reporting results, Agentforce users collectively reported over $100 million in annualized cost savings. [Source: Salesforce Q4 FY2025 earnings release, February 2025; Salesforce Q3 FY2026 earnings release, December 2025; Salesforce Agentforce Metrics page, salesforce.com/agentforce/metrics]
“Everyone knows that they want to bring AI in, become more productive, become more efficient, become elevated. But they all know now they’ve got to become Agentic Enterprises.”
— Marc Benioff, Chair and CEO of Salesforce, Q3 FY2026 earnings call, December 2025
Case Study 2 — Bonsai Health (Santa Monica, Healthcare)
Business problem (baseline): Specialty medical practices lacked capacity to proactively reach patients for follow-ups, preventive care reminders, and appointment scheduling. Staff were occupied with reactive call handling.
Agentic AI solution: Bonsai Health deployed specialty-trained AI agents that integrate with EHR and practice management systems. Agents proactively identify patients overdue for care, initiate outreach (via text or other channels), answer questions, and schedule appointments—without routing through a call center.
Governance and safeguards: Agents escalate to staff for clinical questions and out-of-scope requests. HIPAA-compliant data handling is required given the healthcare context.
Measured impact: Since launch, Bonsai engaged more than 235,000 patients across over 100 healthcare groups and specialty practices, scheduling over 36,000 appointments. The company secured $7 million in seed funding led by Bonfire Ventures in September 2025. [Source: Healthcare Innovation Group, September 2025, hcinnovationgroup.com]
Case Study 3 — Assort Health (San Francisco, Healthcare Operations)
Business problem (baseline): Medical call centers were bottlenecked by high volumes of routine patient inquiries—scheduling, lab results routing, prescription renewals, referrals—leaving patients on hold and staff stretched.
Agentic AI solution: Assort Health’s agentic platform integrates with EHR and practice management workflows. Specialty-specific agents handle end-to-end patient engagement across touchpoints, resolving routine requests autonomously without requiring call center staff.
Governance and safeguards: The platform routes clinical and safety-sensitive queries to human staff. The company’s CEO explicitly frames the agents as removing barriers to care, not replacing clinical judgment.
Measured impact: Assort Health closed a $76 million Series B in September 2025, signaling strong commercial traction. The company reports improvements across care navigation, lab test follow-up, prescription renewal, and referral workflows. [Source: Healthcare Innovation Group, September 2025, hcinnovationgroup.com; Assort Health press release]
Case Study 4 — Penguin AI (Palo Alto, Healthcare/Fintech Operations)
Business problem (baseline): Prior authorizations, claims adjudication, risk adjustment, and medical records summarization consumed large portions of payer and provider operations staff time, with high error rates.
Agentic AI solution: Penguin AI built purpose-specific small language models (SLMs) for healthcare workflows—prior auth, risk adjustment, and claims processing—and deployed them as out-of-the-box agents on its platform. The SLM approach targets higher accuracy than general-purpose LLMs for domain-specific decisions.
Governance and safeguards: Task-specific models reduce hallucination risk compared to general models applied off-shelf. The platform targets 90–120-day time to positive ROI as a design constraint, which implies deliberately bounded scope.
Measured impact: The company has raised $29.7 million in venture funding. [Source: Healthcare Innovation Group, September 2025, hcinnovationgroup.com]
“We built our own small language models for prior auth, risk adjustment, and claims adjudication, and then we give you our agents out of the box. That’s what a platform is supposed to do. It’s supposed to give you what you need so you can get to ROI in 90 to 120 days.”
— Fawad Butt, Founder and CEO, Penguin AI [Source: Healthcare Innovation Group, September 2025]
Case Study 5 — Google DeepMind (Mountain View, Technology / Multi-Sector)
Business problem (baseline): Google’s own data center scheduling required continuous optimization of compute allocation across global infrastructure at a scale and speed humans cannot manually manage.
Agentic AI solution: Google DeepMind’s AlphaEvolve, an evolutionary coding agent built on the Gemini model, was applied to algorithmic and scheduling optimization. The agent iteratively generates, evaluates, and refines solutions across domains.
Measured impact: AlphaEvolve developed a new heuristic for data center scheduling that recovered on average 0.7% of Google’s worldwide compute resources—a meaningful efficiency gain at Google’s scale. Tested on 50 open mathematical problems, AlphaEvolve matched state-of-the-art algorithms in 75% of cases and found improved solutions in 20% of cases. [Source: Google DeepMind published research, May 2025; Google Blog, January 2026, blog.google]
Case Study 6 (Lesson Learned) — Enterprise Agentic AI Pilot (Composite)
A pattern reported across multiple 2025 deployments—including early Agentforce customers and Fortune 500 pilots documented by Deloitte and McKinsey—illustrates what goes wrong when pilots are under-scoped on governance.
What went wrong: Teams launched agents on customer-facing workflows without sufficient evaluation frameworks. Agents encountered edge cases—ambiguous queries, data that was out of date in the retrieval store, or tool calls that returned errors—and either hallucinated plausible-sounding but wrong answers, or entered retry loops that generated unexpected API costs.
Specific documented case: Safari365, an early Agentforce customer, initially faced hallucinations, missing guardrails, and technical gaps. Through hands-on leadership involvement and iterative improvement, the company ultimately exceeded its 15% efficiency target, achieving over 30% efficiency gains. [Source: Salesforce Agentforce Metrics page, salesforce.com/agentforce/metrics]
Mitigations applied:
- Constrain agent scope to one well-defined workflow before expanding.
- Instrument every tool call and LLM response before go-live; set alert thresholds for unexpected patterns.
- Build evaluation sets (sets of test cases with known correct answers) and run them before every deployment change.
- Establish a human review queue for the edge-case “long tail” before the agent can handle it autonomously.
- Set a hard cost cap on token spend per session to contain runaway retry loops.
4. Implementation Playbook for California Companies
Opportunity Selection
The agent suitability checklist for your specific workflow:
- Is the objective specific and measurable?
- Does the workflow touch at least two external systems via APIs?
- Does context (customer type, product version, account status) meaningfully change the right response?
- Is cycle time or volume high enough that automation yields material ROI?
- Is there a defined escalation path for the agent to hand off to a human?
A “yes” to four or five of these suggests the workflow is a strong candidate.
Data and Integrations
Most agentic workflows rely on four layers: a retrieval store (vector database with your internal documents, FAQs, or policies), structured data sources (CRM, ERP, ticketing system), action endpoints (APIs that let the agent write back—update a record, send a notification, place an order), and an audit log. Under California CPRA/CCPA, each layer that touches personal information of California residents requires a data processing agreement with vendors, a data minimization review, and—for decision-relevant ADMT uses—disclosure to affected individuals.
Architecture Patterns
Single-agent with tool use is the right starting point for most companies: one LLM orchestrator with three to five tool functions. It is easier to debug and cheaper to run. Graduate to multi-agent architecture (a coordinator plus specialized sub-agents) when individual tasks become complex enough that a single model context window is insufficient, or when parallelism matters for latency.
Human-in-the-loop approval gates are non-negotiable for any action with significant financial, medical, legal, or reputational consequences. Design the gate as a default “on” that you can relax after you’ve accumulated evidence of reliability.
Build vs. Buy Decision Matrix
| Factor | Lean toward building | Lean toward buying |
|---|---|---|
| Time to value | > 6 months acceptable | Need results in < 90 days |
| Data sensitivity | Cannot share with third-party | Acceptable with DPAs |
| Workflow specificity | Highly unique | Common (support, coding, HR) |
| Change velocity | Stable process | Rapidly evolving |
| Engineering capacity | Strong ML/LLM team | Limited AI engineering |
Tooling Landscape (Vendor-Neutral Descriptions)
Model providers: major frontier model APIs (Anthropic, Google, OpenAI, Meta open-source) plus smaller specialized models for domain tasks. Orchestration frameworks: graph-based agent orchestration libraries (e.g., LangGraph-pattern tools) that manage state machines and tool routing. Vector databases: embedding stores for RAG (several California-based vendors and major cloud offerings exist). Observability and guardrails: agent trace logging, latency/cost dashboards, prompt injection defense layers, and output evaluation frameworks. Policy engines: rule-based filters that block specific output categories before they reach users.
Security, Privacy, and CCPA Compliance
California’s CPPA finalized ADMT regulations in July 2025. Systems that use personal data to “replace or substantially replace human decision-making” in employment, finance, housing, or education contexts must provide opt-out rights to California consumers and disclose how the system was used in decisions affecting them. Violations can trigger enforcement by the California Attorney General and, under AB 316 (effective 2025), companies cannot assert as a defense that the AI autonomously caused harm. [Source: CDF Labor Law, July 2025; IAPP California Legislative Wrap-Up, 2025]
Practical steps: implement data minimization before personal information enters any AI pipeline; use least-privilege access so agents can only read and write to systems they need; maintain full audit trails of every tool call and LLM decision; conduct a data protection impact assessment for any agentic workflow that touches sensitive personal information; and review vendor DPAs for data residency commitments.
Staffing and RACI at a Glance
You need five competencies on any serious agentic AI team: an LLM engineer who understands prompt engineering, function calling, and RAG architecture; a platform/DevOps engineer for deployment, cost controls, and monitoring; a domain SME who knows the target workflow deeply enough to write evaluation test cases; a security/privacy reviewer familiar with CPRA obligations; and a product manager who owns the definition of success and the human-in-the-loop design.
30-60-90 Day Timeline
Days 1–30 (Discovery and pilot design): Map the target workflow end-to-end; identify all required tool integrations; build a golden evaluation set of 50–100 test cases; select model and orchestration approach; get legal sign-off on data handling.
Days 31–60 (Pilot build and evaluation): Deploy to an internal or shadow environment; run evaluation sets daily; instrument all costs and latencies; identify failure modes and add guardrails; conduct a human review of a random sample of agent outputs.
Days 61–90 (Limited production and expansion plan): Launch to a small cohort of real users with human oversight; measure KPIs vs. baseline; document lessons and cost model; prepare the business case for scale.
5. ROI Model and KPIs
Simple ROI formula:
Annualized savings = (Volume of tasks automated per year) × (Average human labor cost per task) × (Automation rate)
Less: Annual AI infrastructure and labor cost (engineering, monitoring, governance)
Worked example (customer support automation, stated assumptions):
- 100,000 support interactions per year
- Average fully-loaded labor cost per human-handled interaction: $12
- Assumed automation rate: 70% (the agent fully resolves 70,000 interactions)
- Gross labor savings: $840,000/year
- Annual AI costs (API tokens, engineering overhead, monitoring): $180,000
- Net annual savings: ~$660,000
- Payback from a $200,000 implementation investment: approximately 4 months
Sensitivity: If automation rate drops to 50%, net savings fall to roughly $420,000. If API costs rise 2x, the model still works. The biggest risk is lower-than-expected automation rates, which usually trace to insufficient retrieval data quality or an under-scoped tool set.
KPIs to track from day one:
- Automation (deflection) rate: share of interactions resolved without human intervention
- Cycle time: end-to-end time per completed task
- Resolution quality: accuracy rate vs. golden test set, customer satisfaction score
- Cost per task: total AI infrastructure cost divided by tasks completed
- Human escalation rate: the “fail” signal; sustained escalation above 10–15% usually means the agent’s scope is too broad
- Safety incidents: policy violations, data exposure events, or hallucinations surfaced in production
6. Risks, Limitations, and Governance
Hallucination in agentic settings is more dangerous than in chatbots. A single wrong fact in a chatbot answer is embarrassing. A hallucinated tool argument in an agentic system can trigger an irreversible action—sending a wrong email, placing an incorrect order, or deleting a record. Mitigation: validate all agent-generated parameters before executing write operations; use structured output schemas; and maintain a human gate for any high-consequence action.
Prompt injection. If agents read content from external sources (emails, documents, web pages), adversarial content in those sources can attempt to redirect the agent’s behavior. Mitigation: treat all external content as untrusted; use a separate parsing step that never executes instructions found in retrieved content.
Cost spikes. Agents can enter retry loops that generate enormous token spend in minutes. Mitigation: set hard per-session and per-day token budget caps; alert on anomalous spend in real time.
CPPA enforcement risk. The California Privacy Protection Agency issued over $100 million in enforcement actions in 2024 and finalized ADMT rules in July 2025. [Source: anonym.legal CPRA analysis, 2025] Any agentic system that makes or substantially influences decisions about California consumers in employment, finance, housing, or education contexts must build opt-out mechanisms and audit trails before launch—not after.
California regulatory watchlist for 2026: AB 316 (no “the AI did it” defense in civil liability); Transparency in Frontier AI Act (safety testing and transparency reports for large models); the ADMT opt-out regulations (effective December 2025 or January 2026); and AB 2013 (training data disclosure requirements effective January 2026). The state legislature reconvened in January 2026 with 22+ more AI bills under consideration. [Source: IAPP California 2025 legislative wrap-up; Kronenberger Rosenfeld, November 2025]
7. 90-Day Action Plan
Weeks 1–2: Identify your top three candidate workflows using the suitability checklist. Pick the one with the highest volume, clearest success metric, and lowest risk if something goes wrong.
Weeks 3–4: Map the workflow in detail. Document every system involved, every decision point, and every exception that currently routes to a human. Get IT and legal aligned on data handling.
Weeks 5–8: Build the pilot. Start with the core “happy path” (the 70–80% of cases that follow a predictable pattern). Wire up two to three essential tools. Write your evaluation test set before you write your first prompt.
Weeks 9–10: Internal evaluation. Run the agent against the test set daily. Shadow-mode the agent alongside live human work—compare outcomes without showing users the agent’s responses yet.
Weeks 11–12: Limited live deployment. A small cohort (10–20% of volume). Measure every KPI. Review a random sample of agent outputs weekly. Document what broke and why.
Post-90 days: Scale incrementally. Add scope only after the current scope is stable. Build the business case for the next workflow based on documented results from the first.
FAQ
Where should a California company start?
Pick the highest-volume, most repetitive workflow that has a clear “right answer” for most cases and accessible APIs. Customer support triage, contract clause extraction, and prior authorization processing are common first wins.
What does it cost to get started?
A minimal pilot—one workflow, one agent, internal testing—can be built for $50,000–$150,000 in engineering labor plus $5,000–$20,000/month in API costs at moderate volume. Enterprise platform solutions (Salesforce Agentforce, Google Vertex AI Agent Builder) may reduce build time but add per-seat or per-token licensing costs.
Build or buy?
Buy or use a managed platform if you need results in 90 days and your workflow is common (support, HR, coding). Build if your workflow is unique, your data is too sensitive for third-party processing, or you need deep integration with proprietary systems.
How do we handle CCPA/CPRA compliance?
Minimize personal data before it enters any AI pipeline. For workflows that substantially replace human decisions affecting Californians, build an opt-out mechanism and an audit trail before go-live. Get a CPRA-aware DPA from every AI vendor. Conduct a Privacy Risk Assessment for high-stakes use cases.
How do we measure quality, not just volume?
Build a golden evaluation set (test cases with known correct answers) before deployment. Run it on every code change. In production, sample a random 1–5% of agent outputs for human review weekly.
When should we not use agentic AI?
When the task is fully deterministic and rule-based (use RPA instead), when errors have irreversible high-stakes consequences and human oversight is not technically feasible, when personal data cannot be legally processed by AI systems, or when the volume is too low to justify the implementation investment.
What skills do we actually need in-house?
At minimum: one LLM engineer, one domain expert who owns the evaluation set, and someone accountable for compliance. You can contract out platform engineering and security review, but domain knowledge and eval ownership must stay internal.
References
- Salesforce Q4 FY2025 Earnings Release. Salesforce, February 26, 2025. salesforce.com
- Salesforce Q3 FY2026 Earnings Release. Salesforce, December 3, 2025. salesforce.com
- Salesforce Q1 FY2026 Earnings Release. Salesforce, May 28, 2025. salesforce.com
- Salesforce Agentforce Metrics Page. Salesforce, accessed March 2026. salesforce.com/agentforce/metrics
- “Adoption of AI and Agentic Systems: Value, Challenges, and Pathways.” California Management Review / Ankit Chopra, August 15, 2025. cmr.berkeley.edu
- “2025 was the year of agentic AI. How did we do?” Fortune / John Kell, December 15, 2025. fortune.com
- “Venture Capitalists See Big Opportunity for Agentic AI in Healthcare.” Healthcare Innovation Group, September 30, 2025. hcinnovationgroup.com
- “2025: The State of AI in Healthcare.” Menlo Ventures / Morning Consult survey, October 21, 2025. menlovc.com
- “How and Why 70% of Healthcare Companies Are Implementing AI.” AI Magazine / NVIDIA survey, February 2026. aimagazine.com
- “California Finalizes AI Regulations for Automated Decision-Making Technology.” CDF Labor Law LLP, July 2025. cdflaborlaw.com
- “California Finalizes Groundbreaking Regulations on AI, Risk Assessments, and Cybersecurity.” Ogletree, October 16, 2025. ogletree.com
- “2025 California AI Law Updates.” Kronenberger Rosenfeld LLP, November 6, 2025. kr.law
- “California 2025 Legislative Wrap-Up.” IAPP, 2025. iapp.org
- California Department of Justice Office of the Attorney General Legal Advisory on AI. California DOJ, 2025. oag.ca.gov
- “California DOJ Attorney General AI Legal Advisory.” Anonym.legal CPRA analysis, 2025. anonym.legal
- “Google’s Year in Review: 8 Areas with Research Breakthroughs in 2025.” Google Blog, January 7, 2026. blog.google
- “AlphaEvolve and data center scheduling.” Google DeepMind, citing May 2025 publication. Wikipedia / DeepMind
- “Kala Bio Launches AI Agent Revolution.” Globe Newswire, March 11, 2026. globenewswire.com
- “Building the Agentic Enterprise: Salesforce News and Stories That Shaped 2025.” Salesforce, December 23, 2025. salesforce.com
- “Struggling to Get AI Agents to Work? This Google Research Could Help.” Fortune, January 2026. fortune.com
- “2026 Playbook: Agentic AI Adoption in California Tech.” Landbase, January 19, 2026. landbase.com
- “AI in Healthcare Investment Trends.” Qubit Capital, January 2026. qubit.capital
