While the years 2023 and 2024 were characterized by euphoria about the ...
The year 2025 marks a fundamental turning point in corporate IT. While the years 2023 and 2024 were characterized by euphoria about the generative capabilities of artificial intelligence (AI), we are now facing a harsh reality that we at Dreher Consulting refer to as the "Pilot Purgatory".
The data is clear: although 88% of companies say they are using AI in at least one business function, almost two-thirds remain in the experimental or pilot phase.1
The financial discrepancy is even more alarming: only a minority of companies have been able to demonstrate a measurable impact on EBIT (earnings before interest and taxes).1 The reason for this is not technological immaturity, but a fundamental misunderstanding of the economic mechanisms at work. The first wave of AI adoption focused on "co-pilots" - tools that assist humans. This led to individual productivity gains, but rarely reduced structural costs, as humans remained "in the loop" and the time gained was often absorbed by new tasks (Parkinson's law).
This report serves as a strategic blueprint to break this deadlock.
We analyze the technological paradigm shift from Generative AI (content creation) to Agentic AI (autonomous execution). AI agents that are capable of planning and executing complex storylines autonomously offer for the first time the possibility of decoupling the marginal costs of transactions from human working time.3

Based on first-principles thinking and a MECE (Mutually Exclusive, Collectively Exhaustive) analysis, we show how companies can make the leap from isolated pilots to profitable, scalable AI operations.
A sober look at the current market landscape reveals a significant gap between activity and results. McKinsey data shows that while the use of AI in organizations has exploded, scaling has stagnated. Only about a third of organizations have begun to roll out AI solutions across the enterprise.3 The overwhelming majority of projects remain stuck in silos - trapped between proof-of-concept (PoC) and production.
The reasons for this are structural:
Lack of process integration: AI is often seen as a technological "add-on" instead of fundamentally rethinking processes ("re-wiring").
Data fragility: Pilots work on cleansed test data, but fail due to the complexity and impurity of real production data.5
Lack of ambition: While "high performers" use AI to develop new business models, many companies limit themselves to incremental efficiency gains that do not justify the high implementation costs.1
To realize the cost reduction potential, we need to define the technology precisely. We are in transition from passive models to active agents
.
Table 1: Technological evolution and economic implication
|
Characteristic |
Generative AI (GenAI) |
Agentic AI (autonomous agents) |
|
Core function |
Creation & summarization |
Planning & execution |
|
Trigger |
Human prompt ("Write an email") |
System event or objective ("Process all invoices") |
|
Context |
Session-based (short-term) |
Persistent storage & access to company data (RAG) |
|
Interaction |
Chat interface |
API calls & system integrations |
|
Economic lever |
Personal productivity (soft ROI) |
Labor substitution (hard ROI) |
Agentic AI is distinguished by three core competencies: Perception of data from live systems, Reasoning to break down complex goals into subtasks, and Action through direct system intervention.1 Only through this autonomy is it possible to take humans out of the critical path of transaction processing and thus realize real cost reductions.
In order to generate "profit", we need to understand the unit economics of AI. This is a shift from fixed costs (personnel) to variable costs (compute/token), which tend towards zero at scale.
In traditional operating models, transaction volume correlates linearly with labor costs. To process twice as many customer inquiries, you need - ceteris paribus - twice as many staff. Agentic AI breaks this linearity. After the initial investment in training and integration, the costs scale logarithmically. The marginal costs of an additional transaction only correspond to the inference costs (tokens) and the API fees.
Analysis shows that AI agents can reduce the cost per transaction by 90-95% compared to onshore workers and 50-70% compared to offshore workers.7

A key concept for communicating with the CFO is the "J-curve". AI projects rarely deliver immediate ROI.
Investment phase (valley of tears): High expenditure on data cleansing, model training and infrastructure. The ROI is negative.
Learning phase (human-in-the-loop): The agent is productive but requires high human supervision. Efficiency is low as employees have to both work and correct the AI.
Scaling phase: The confidence of the agent increases, the deflection rate (degree of autonomy) grows from 20 % to 80 %. Here, the curve crosses the zero line and generates exponential profit.8
Companies that cancel projects during the learning phase because "it's quicker to do it yourself" never realize the profit. They pay the set-up costs without reaping the harvest.
At Dreher Consulting, we are introducing the LCOAI (Levelized Cost of AI) metric, analogous to the levelized cost of electricity in the energy industry. It calculates the total cost per useful output over the life cycle of the system.10
$$ LCOAI = \frac{\text{Development costs} + \sum(\text{Inference costs}) + \sum(\text{Maintenance costs})}{\text{Number of successfully automated transactions}} $$
This formula forces you to be honest: an agent that is only used 500 times a year is often more expensive than a human. An agent that handles 500,000 transactions is unbeatably cheap. Volume is the key to amortizing fixed costs.
In order to capture potential "MECE" (non-overlapping and complete), we categorize use cases according to two dimensions: Complexity of task and volume of transactions.
Goal: Direct cost reduction (laboratory substitution)
This is where the greatest and fastest realizable levers lie ("low hanging fruits"). These are rule-based, repetitive tasks.
Customer Operations (Service & Support):
Use case: Fully autonomous processing of Tier 1 requests (returns, status queries, address changes).
Evidence: Klarna replaced the labor of 700 full-time equivalents (FTEs) with an AI agent that handled 2.3 million conversations and improved the profit forecast by 40 million dollars.11
Mechanism: Integration with CRM and ERP allows the agent not only to respond, but to execute the transaction (e.g. refund).
Use Case: Invoice Matching. Agents compare incoming invoices with purchase orders (PO) and goods receipt documents. In the event of discrepancies, they contact the supplier autonomously.
Evidence: A global media company consolidated data from 80 general ledgers and identified millions in "shadow IT" spend through AI analytics.12
Goal: Increased productivity & throughput
People are not replaced here, but rather massively accelerated ("super-powering").
Use case: Generation of boilerplate code, unit tests and documentation.
Evidence: Development cycles can be shortened by 20-30%. This does not necessarily reduce headcount costs, but increases output with the same cost base (avoidance of new hires).12
Risk: "Lazy reviews" by developers can lead to a loss of quality if the generated code is not critically reviewed.11
Healthcare Revenue Cycle Management (RCM):
Use case: processing of service denials. Agents analyze the denial reason, correct the coding and resubmit the claim.
Evidence: Reduction in days outstanding (A/R days) by 35 days and a 7% reduction in the denial rate.13
Goal: Strategic competitive advantage
Use case: predictive risk analysis. Agents monitor thousands of external signals (weather, strikes, geopolitics) and simulate effects on the supply chain in real time.
Evidence: Enabling proactive route changes before the disruption occurs, which prevents expensive special trips and production stops.14
Technology alone does not generate value; it requires an operational embedding system. We call this the Architecture of Agency. Bain & Company rightly emphasizes that most pilots fail not because of AI, but because of the lack of a data strategy.5

Agents need access to ground truth. An agent that is trained on outdated or contradictory data does not hallucinate randomly - it hallucinates systemically.
Data Products: Treat data sets (e.g. "customer data", "product catalog") as products with clear SLAs (Service Level Agreements), owners and quality metrics.15
Vectorization Pipeline: In order to make unstructured data (PDF manuals, email archives) usable, it must be transferred to vector databases (RAG - Retrieval Augmented Generation). This is the agent's "long-term memory".
Autonomy requires control. An agent that is allowed to book or communicate autonomously represents an operational risk.
The 3-layer security model:
The goal is not 100% automation, but optimal automation. Successful systems automatically forward transactions with a low confidence score to human experts. These corrections must be fed back into the system (feedback loop) in order to continuously improve the model.6
In the following, we present the concrete roadmap for the transformation. This guide is phase-based and covers the critical milestones.

Goal: Identification of "golden use cases" where data availability meets economic relevance.
Atomic process decomposition: Break these processes down into the smallest work steps. Apply the "autonomy test":
Is the input digital?
Are the decision rules explicit?
Is the result measurable?
Data audit: Check the availability of the necessary data via API. No agent scaling without API.
Output: A prioritized list of 3 use cases with calculated ROI potential.
Goal: Technical proof and establishment of governance.
Workflow redesign: Do not automate the existing process! A bad process will only get worse faster with AI. Redesign the process under the assumption that the agent is the main actor and the human only handles the exception.
Shadow Mode Deployment: Let the agent run parallel to the human without executing any actions. Compare the agent's decisions with those of the experts.
Baseline measurement: Establish metrics for AHT (Average Handling Time), error rates and costs per ticket before implementation.
Output: A functioning agent in shadow mode with an accuracy of >80%.
Goal: Transition from monitoring to autonomy.
Confidence thresholds: Implement threshold values. If the agent is >90% confident, it acts autonomously. Below that: Forwarding to humans.
Active learning: Every human correction is logged and used to fine-tune the model.
Change management (the 70% rule): Invest 70% of the effort in people.8 Don't train employees to "operate" the AI, but to "train" it and manage complex exceptions.
Output: An agent in live operation with >50% deflection rate. First realization of cost savings.
Goal: P&L effectiveness.
Workforce alignment: Stop backfilling positions in automated areas (utilize natural turnover). Move high performers to more value-adding roles (e.g. customer service to sales).
Platform strategy: Abstract the components (security, logging, ERP connection) into a central "agent platform" to reduce the marginal costs for the next agent.
A robust KPI dashboard is required to demonstrate success to stakeholders. Soft factors such as "employee satisfaction" are not enough.
Table 2: The Agentic AI KPI framework
|
Category |
KPI |
KPI Description |
Target value (benchmark) |
|
Financials |
Cost per transaction |
Total costs (tech + people) divided by volume. |
Reduction of >50 % |
|
Financials |
LCOAI |
Levelized Cost of AI (see chapter 2.3). |
Must be < human costs |
|
Operational |
Deflection rate |
Proportion of cases that are solved without human intervention. |
> 60 % (top performers: >80 %) |
|
Quality |
Resolution Accuracy |
Percentage of correct resolutions (no ticket reopening). |
> 95 % |
|
Technique |
Hallucination rate |
Frequency of factually incorrect statements. |
< 1 % (critical!) |
Scenario: IT helpdesk in a medium-sized company (100,000 tickets/year).

Status quo (human):
Costs per ticket: € 8.00 (full costs).
Total costs: € 800,000 p.a.
Agent scenario (investment):
Development & setup: € 100,000 (one-off).
Ongoing costs (hosting, token, maintenance): € 60,000 p.a.
Result:
Assumption: 60% deflection rate (60,000 tickets autonomous).
Remaining tickets for people: 40,000 * €8 = €320,000.
Total new costs: €320,000 (human) + €60,000 (AI) = €380,000.
Savings year 1: € 800,000 - € 380,000 - € 100,000 (investment) = € 320,000 net savings.
ROI year 1: 3.2x.
This calculation example 7 illustrates the immense leverage effect as soon as the fixed costs of development are amortized by the volume.
No transformation process is without risk. The following pitfalls must be managed proactively.
Efficiency gains often lead to work "expanding". If a report is created in 5 minutes instead of 5 hours, managers suddenly demand 10 reports instead of one.
Mitigation: Clear governance over output. Use the time gained explicitly for new value creation or realize the savings through hiring freezes. Productivity without reducing working hours does not reduce costs.8
Quickly assembled agents tend to be unstable.
Mitigation: Treat prompts as code. Use versioning, automated tests (eval frameworks) and CI/CD pipelines for agents.
Who is liable if an agent wrongly grants a discount?
Mitigation: Define clear financial authority limits (e.g. "Up to €50 autonomous, above that, approval"). Implement audit trails that log every decision made by the agent in an audit-proof manner.17
In conclusion, our analysis of the McKinsey data shows that the most important predictor of success is not technology, but ambition.1 Companies that only use AI to "save 5% on costs" often fail due to implementation hurdles. Companies that use AI to reinvent their business model - e.g. through 24/7 real-time service or fully automated supply chains - realize the massive profit pools.
At Dreher Consulting, we advise our clients to stop playing around.
The technology is ready. The economics have been validated. It is now up to the management level to set the organizational course. The path "From Pilot to Profit" is not a technical upgrade. It is an operational transformation.
Use this checklist to assess the maturity of your initiative.
[ ] Target definition: Have we defined whether we want to reduce costs (efficiency) or grow (innovation)? (Avoid mixed targets).
[ ] Data readiness: Are the core systems (ERP, CRM) accessible via API? Is the data clean?
[ ] Atomic tasks: Have the processes been broken down into the smallest, most logical steps?
1. Status Quo 2025: Die Anatomie des "Pilot Purgatory"
2. First Principles Economics: Die Ökonomie der Intelligenz
3. Strategische Identifikation: Das MECE-Framework zur Potenzialanalyse
4. Das Operative Betriebsmodell: Architecture of Agency
5. Handlungsanleitung: "From Pilot to Profit"
6. Deep Dives: Kennzahlen und Erfolgsmessung
7. Risikomanagement und Herausforderungen
Talk directly to Dreher's AI Assistant - Click the mic icon and ask out loud or type your question. Get expert answers in seconds, available 24/7.
Ask now by voice