Fintech · Machine Learning
Cutting AML false positive alerts by 76% — without increasing regulatory risk
A mid-size US payment processor handling $2.4B in annual transaction volume was generating 1,200+ AML alerts per day, of which 96% were false positives requiring manual analyst review. Each alert consumed 35–45 minutes of compliance analyst time. We replaced the rules-only detection engine with a layered ML model that reduced false positives by 76% while improving true positive detection rate by 31% — cutting compliance operations cost by $2.1M annually without a single regulatory finding.
Business Context
The compliance team wasn't failing.
The detection engine was.
The processor ran a rules-based AML transaction monitoring system — a configuration of 140+ static rules inherited from their core banking vendor. The rules had been layered over six years as the business grew and regulators requested additions. Nobody had ever removed a rule. The result was a system generating 1,200–1,400 alerts per day, of which internal audit confirmed 96% were false positives. The compliance team had grown to 22 analysts to manage the queue. They were spending 85% of their time clearing noise — reviewing transactions that were obviously legitimate — and 15% on actual suspicious activity investigation. The ratio was inverted.
The cost of the false positive problem
- 96%
- of daily alerts were false positives
- 40 min
- average analyst time per alert
- $2.1M
- annual compliance ops cost attributable to false positives
Confirmed by internal audit across a 90-day sample of 38,000 alerts
Including documentation, escalation decision, and case closure
Analyst time, tooling, and management overhead on noise
The deeper risk was not the cost — it was the attention deficit. When analysts spend 85% of their day clearing obvious false positives, their capacity to identify genuinely suspicious patterns degrades. The true positive detection rate was 0.8 SAR filings per 1,000 alerts reviewed. Industry benchmark for a processor of this size and risk profile is 3–5 per 1,000. They were missing real suspicious activity because the noise was too loud.
The business had evaluated replacing the core monitoring platform — a project estimated at $4M and 18 months. We proposed a different approach: keep the existing platform as the alert generation layer, and build an ML-based triage and scoring engine on top of it that re-ranks and suppresses alerts before they reach the analyst queue. No platform replacement. No regulatory re-approval of a new system. Additive intelligence on top of the existing infrastructure.
Scope of Work
What we were asked to build
Transaction feature engineering pipeline
A real-time feature extraction layer computing 80+ behavioural and contextual features per transaction — velocity patterns, counterparty network graph metrics, merchant category deviation, time-of-day anomaly scores, and account tenure signals. Features computed at alert time and stored for model inference.
ML alert scoring and suppression engine
A gradient boosting ensemble model trained on 24 months of historical alert data with analyst disposition labels. Model outputs a false positive probability score per alert. Alerts scoring above a configurable suppression threshold are auto-closed with a documented rationale — auditable and regulatorily defensible.
Analyst queue prioritisation layer
Alerts not suppressed are re-ranked by true positive probability before entering the analyst queue. High-risk alerts surface first. Analysts see a risk score, the top contributing features, and similar historical cases — reducing investigation time per alert from 40 minutes to under 12 minutes on average.
Model governance and audit trail
Full explainability layer using SHAP values for every suppression and scoring decision. Automated model performance monitoring with drift detection. Monthly model refresh pipeline. Complete audit trail of every auto-closed alert with suppression rationale — satisfying FinCEN examination requirements.
Constraints we worked within
- Existing monitoring platform could not be replaced — ML layer had to integrate via API without modifying core system
- Auto-suppression required documented rationale per alert to satisfy BSA/AML examination standards
- Model training data contained PII — all feature engineering and training ran in an isolated environment with no data export
- Regulatory approval process required 90-day parallel run before suppression went live
Explicitly not in scope
- SAR filing automation or FinCEN direct reporting
- Customer due diligence or KYC workflow changes
- Sanctions screening or OFAC list matching
- Core banking platform replacement or migration
System Architecture
Rules engine generates alerts. ML engine decides what analysts actually see.
How We Worked
8 months. Parallel run required. Zero regulatory findings.
Data Audit & Feature Engineering
Extracted and audited 24 months of alert history with analyst disposition labels. Identified 14 rule categories responsible for 71% of false positive volume. Built the feature engineering pipeline — 80+ features computed per transaction. Data quality issues in counterparty fields required 3 weeks of remediation before training data was usable.
Model Development & Validation
Trained gradient boosting ensemble on labelled alert history. Validated against a held-out 6-month test set. Achieved 76% false positive reduction at a suppression threshold that maintained 100% recall on confirmed SAR cases in the test set. Compliance team reviewed 200 randomly sampled auto-suppression decisions — concurred with 97.5%.
Parallel Run & Regulatory Review
Ran ML scoring in shadow mode alongside existing analyst workflow for 90 days. Analysts worked the queue normally; ML scores were logged but not acted on. Parallel run data submitted to external compliance counsel and reviewed by the client's primary regulator contact. No objections raised to the suppression methodology.
Live Suppression & Queue Prioritisation
Suppression went live. Alert volume entering analyst queue dropped from 1,200/day to 290/day in week one. Analyst team redeployed from queue clearing to deeper investigation of high-risk alerts. SAR filing rate increased from 0.8 to 2.9 per 1,000 alerts reviewed within 30 days of go-live.
Working rhythm
- CadenceTwo-week sprints, bi-weekly compliance team reviews
- Decision ownerChief Compliance Officer and Head of Financial Crime
- Primary metricFalse positive rate and true positive detection rate
- Escalation SLA24 hours with written recommendation
Results
Measured at 60 days post live suppression.
reduction in false positive alerts entering the analyst queue
Was: 1,200+ alerts/day, 96% false positive rate
Alert volume dropped from 1,200/day to 290/day. The suppression threshold was set conservatively — 100% of confirmed SAR cases in the validation set were preserved. No regulatory findings in the 60-day post-live period.
improvement in true positive detection rate
Was: 0.8 SAR filings per 1,000 alerts reviewed
SAR filing rate increased to 2.9 per 1,000 alerts reviewed — analysts now spending time on genuinely suspicious activity rather than clearing noise. The same 22-person team is now more effective, not larger.
annual compliance operations cost eliminated
Was: $2.1M/year in analyst time consumed by false positive review
Analyst capacity freed from false positive clearing was redeployed to enhanced due diligence and investigation quality — no headcount reduction. The compliance function became more effective at the same cost.
average analyst time per alert, down from 40 minutes
Was: 40 minutes per alert including documentation and escalation decision
SHAP-based feature explanations and similar case surfacing reduced investigation time significantly. Analysts report higher confidence in escalation decisions when the model's contributing factors align with their own assessment.
What This Means for You
The false positive problem is not unique to this processor. Any financial institution running rules-only AML monitoring at scale is generating noise that degrades analyst effectiveness and masks real risk.
- 01Your compliance team spends more time closing false positives than investigating suspicious activity
- 02Alert volume has grown faster than your analyst headcount — and the gap is widening
- 03Your SAR filing rate per alert reviewed is below industry benchmark for your transaction volume and risk profile
This engagement was scoped as an additive ML layer on top of the existing monitoring platform — no platform replacement, no regulatory re-approval of a new system, no disruption to live compliance operations. The 90-day parallel run was built into the timeline from day one.
See how we approach Machine Learning for financial services