AI Due Diligence Red Flags: 12 Warning Signs PE Investors Must Catch Before Closing
The AI hype cycle has made it easy for portfolio company management to overstate AI capability, understate technical debt, and obscure unit economics. Here's what experienced PE investors look for — and what to demand when they find it.
The core problem: AI capabilities are notoriously difficult to evaluate in standard due diligence. Management decks are full of benchmark scores, buzzwords, and AI roadmap slides — but they rarely tell you whether the AI actually drives revenue, whether it's defensible, or whether there's a regulatory time bomb buried in the data stack.
These 12 red flags represent the most common failures we see in PE AI due diligence. Most are discoverable with the right questions and two to three days of focused technical review. All of them have caused write-downs.
Management attributes significant ARR to 'AI-powered' features with no way to isolate that revenue from the base product. Ask for cohort data showing conversion lift, retention delta, or pricing premium for AI SKUs — if they can't produce it, the claim is marketing.
How to test it:
Request a revenue bridge: what did the product earn before the AI feature launched vs. after, holding all other variables constant.
The entire AI stack runs through a single third-party API (typically OpenAI) with no fallback, no fine-tuned model, and no switching plan. If that provider raises prices 3x or deprecates the model version, the product breaks or margins collapse.
How to test it:
Ask: what happens if OpenAI raises prices by 40%? What's your model switching cost and timeline?
The company trained models on scraped web data, user-generated content, or third-party datasets without clear licensing. Pending litigation from the New York Times vs. OpenAI era has already put acquirers on notice — undisclosed data provenance is a deal-stopper.
How to test it:
Require a data lineage report. Who provided the training data? What were the terms? Has legal reviewed them?
The company reports 94% accuracy on a benchmark that doesn't reflect actual deployment conditions. Benchmark gaming is endemic in AI — lab performance frequently degrades 15–30% in production on real, messy customer data.
How to test it:
Ask for production accuracy metrics with timestamps, not benchmark scores. What's the false positive rate on real customer workloads?
There's no systematic process for detecting model drift, bias incidents, or output degradation. For regulated industries (finance, healthcare, legal), this is an immediate red flag. For all others, it's a time bomb — models decay silently.
How to test it:
Ask: how would you know if the model started performing worse today? What's your drift detection and retraining cadence?
The AI capability is entirely held by one or two engineers. When they leave, the model degrades, breaks, and no one can rebuild it. This is especially dangerous post-acquisition when key technical talent often departs.
How to test it:
Map the bus factor: how many people can independently rebuild the core model pipeline? What's the documentation coverage?
GPU and inference costs grow faster than revenue as usage scales. Many AI-forward companies are cash-flow negative on their AI features at scale and don't know it because they haven't stress-tested the unit economics.
How to test it:
Build a unit economics model: cost-per-inference × usage volume at 2x, 5x, 10x current customers. Does gross margin hold?
Existing customer contracts include opt-out rights or restrictions that prevent the company from using customer data to train or improve models. This quietly breaks the 'data flywheel' thesis and can invalidate the AI moat narrative.
How to test it:
Review a sample of 10 customer contracts specifically for data usage, training rights, and opt-out clauses.
The product generates factual claims, recommendations, or reports that customers act on — but there's no human review step, confidence scoring, or output auditing. One high-profile hallucination incident can trigger customer churn and reputational damage faster than any other failure mode.
How to test it:
Ask for examples of the worst outputs the model has produced. Is there a human-in-the-loop for high-stakes outputs?
Employees are using personal ChatGPT accounts, Copilot, or other AI tools to process customer data outside of approved systems. This creates data leakage risk and potential breach of customer data processing agreements.
How to test it:
Ask: what's your AI acceptable use policy? How do you enforce it? Have you audited what tools employees are actually using?
The product touches categories flagged as high-risk under the EU AI Act (hiring, credit scoring, medical diagnosis, critical infrastructure) with no compliance roadmap. Enforcement timelines are here — fines run to 3% of global annual turnover.
How to test it:
Classify the product against EU AI Act risk categories. Does the company have a conformity assessment plan?
The company calls itself 'AI-powered' but the core product is rule-based automation with a thin LLM wrapper on one feature. This matters for valuation multiple justification — AI companies trade at higher multiples, and if the AI thesis doesn't hold, the entry multiple collapses.
How to test it:
Do a feature-by-feature audit: which capabilities are genuinely ML/AI vs. deterministic logic? What % of ARR would exist without the AI features?
What to Do When You Find These Red Flags
Flags #7, #10 are typically manageable post-close with the right operating plan. Price them into the deal and build remediation milestones into the 100-day plan.
Flags #2, #4, #5, #6, #9, #12 typically warrant purchase price adjustments, escrow arrangements, or rep & warranty coverage. Quantify the remediation cost and use it as leverage.
Flags #1, #3, #8, #11 unresolved are potential deal-stoppers. Misrepresented AI revenue, unresolved IP exposure, and active regulatory non-compliance are liabilities that often exceed acquisition price.
A Systematic Framework for AI Due Diligence
The best PE teams run AI due diligence in four parallel workstreams across a two-week sprint, with technical experts, legal, and financial analysts working simultaneously:
Technical Architecture Review
Model stack, vendor dependencies, compute economics, data pipeline
Covers: Red flags #2, #4, #5, #7
Data & Legal Audit
Training data provenance, customer contract data rights, regulatory classification
Covers: Red flags #3, #8, #11
Commercial AI Validation
Revenue attribution, benchmark vs. production accuracy, customer interviews
Covers: Red flags #1, #4, #12
People & Process Assessment
AI team depth, documentation, governance processes, shadow AI usage
Covers: Red flags #6, #9, #10
Run a systematic AI audit before your next close
PortCoAudit AI delivers a structured AI readiness scorecard for any portfolio company in 48 hours — covering all 12 red flag categories with evidence-based scoring and actionable remediation plans.
Related Reading
PE AI Due Diligence Checklist
The complete 40-point checklist for AI-forward acquisition targets
AI Risk Assessment for PE Acquisitions
Quantifying AI-related risk in the deal model
How PE Firms Use AI for Due Diligence
The tools and processes top-quartile PE firms are deploying
AI Portfolio Company Audit Checklist
Post-acquisition audit framework for AI capabilities