Seen enough to have a conversation?
No sales deck. Just a direct conversation about your problem.
Talk to an engineerFintech · Machine Learning
Cutting AML false positive alerts by 76% — without increasing regulatory risk
A mid-size US payment processor handling $2.4B in annual transaction volume was generating 1,200+ AML alerts per day, of which 96% were false positives requiring manual analyst review. Each alert consumed 35–45 minutes of compliance analyst time. We replaced the rules-only detection engine with a layered ML model that reduced false positives by 76% while improving true positive detection rate by 31% — cutting compliance operations cost by $2.1M annually without a single regulatory finding.
Sound familiar?
- Your compliance team spends most of their day clearing false positives instead of investigating real risk
- Alert volume has grown faster than analyst headcount — and the gap keeps widening
- Your SAR filing rate per alert is below industry benchmark for your transaction volume
- 76%
- reduction in false positive alerts
- 31%
- improvement in true positive detection
- $2.1M
- annual compliance cost eliminated
The Problem
The compliance team wasn't failing.
The detection engine was.
The processor ran a rules-based AML transaction monitoring system — a configuration of 140+ static rules inherited from their core banking vendor. The rules had been layered over six years as the business grew and regulators requested additions. Nobody had ever removed a rule. The result was a system generating 1,200–1,400 alerts per day, of which internal audit confirmed 96% were false positives. The compliance team had grown to 22 analysts to manage the queue. They were spending 85% of their time clearing noise — reviewing transactions that were obviously legitimate — and 15% on actual suspicious activity investigation. The ratio was inverted.
The deeper risk was not the cost — it was the attention deficit. When analysts spend 85% of their day clearing obvious false positives, their capacity to identify genuinely suspicious patterns degrades. The true positive detection rate was 0.8 SAR filings per 1,000 alerts reviewed. Industry benchmark for a processor of this size and risk profile is 3–5 per 1,000. They were missing real suspicious activity because the noise was too loud.
The business had evaluated replacing the core monitoring platform — a project estimated at $4M and 18 months. We proposed a different approach: keep the existing platform as the alert generation layer, and build an ML-based triage and scoring engine on top of it that re-ranks and suppresses alerts before they reach the analyst queue. No platform replacement. No regulatory re-approval of a new system. Additive intelligence on top of the existing infrastructure.
The cost of the false positive problem
- 96%
- of daily alerts were false positives
- 40 min
- average analyst time per alert
- $2.1M
- annual compliance ops cost attributable to false positives
Confirmed by internal audit across a 90-day sample of 38,000 alerts
Including documentation, escalation decision, and case closure
Analyst time, tooling, and management overhead on noise
Scope of Work
What we were asked to build
Transaction feature engineering pipeline
A real-time feature extraction layer computing 80+ behavioural and contextual features per transaction — velocity patterns, counterparty network graph metrics, merchant category deviation, time-of-day anomaly scores, and account tenure signals. Features computed at alert time and stored for model inference.
ML alert scoring and suppression engine
A gradient boosting ensemble model trained on 24 months of historical alert data with analyst disposition labels. Model outputs a false positive probability score per alert. Alerts scoring above a configurable suppression threshold are auto-closed with a documented rationale — auditable and regulatorily defensible.
Analyst queue prioritisation layer
Alerts not suppressed are re-ranked by true positive probability before entering the analyst queue. High-risk alerts surface first. Analysts see a risk score, the top contributing features, and similar historical cases — reducing investigation time per alert from 40 minutes to under 12 minutes on average.
Model governance and audit trail
Full explainability layer using SHAP values for every suppression and scoring decision. Automated model performance monitoring with drift detection. Monthly model refresh pipeline. Complete audit trail of every auto-closed alert with suppression rationale — satisfying FinCEN examination requirements.
Constraints we worked within
- Existing monitoring platform could not be replaced — ML layer had to integrate via API without modifying core system
- Auto-suppression required documented rationale per alert to satisfy BSA/AML examination standards
- Model training data contained PII — all feature engineering and training ran in an isolated environment with no data export
- Regulatory approval process required 90-day parallel run before suppression went live
Explicitly not in scope
- SAR filing automation or FinCEN direct reporting
- Customer due diligence or KYC workflow changes
- Sanctions screening or OFAC list matching
- Core banking platform replacement or migration
System Architecture
Rules engine generates alerts. ML engine decides what analysts actually see.
How We Worked
8 months. Parallel run required. Zero regulatory findings.
Data Audit & Feature Engineering
Extracted and audited 24 months of alert history with analyst disposition labels. Identified 14 rule categories responsible for 71% of false positive volume. Built the feature engineering pipeline — 80+ features computed per transaction. Data quality issues in counterparty fields required 3 weeks of remediation before training data was usable.
Model Development & Validation
Trained gradient boosting ensemble on labelled alert history. Validated against a held-out 6-month test set. Achieved 76% false positive reduction at a suppression threshold that maintained 100% recall on confirmed SAR cases in the test set. Compliance team reviewed 200 randomly sampled auto-suppression decisions — concurred with 97.5%.
Parallel Run & Regulatory Review
Ran ML scoring in shadow mode alongside existing analyst workflow for 90 days. Analysts worked the queue normally; ML scores were logged but not acted on. Parallel run data submitted to external compliance counsel and reviewed by the client's primary regulator contact. No objections raised to the suppression methodology.
Live Suppression & Queue Prioritisation
Suppression went live. Alert volume entering analyst queue dropped from 1,200/day to 290/day in week one. Analyst team redeployed from queue clearing to deeper investigation of high-risk alerts. SAR filing rate increased from 0.8 to 2.9 per 1,000 alerts reviewed within 30 days of go-live.
Working rhythm
- CadenceTwo-week sprints, bi-weekly compliance team reviews
- Decision ownerChief Compliance Officer and Head of Financial Crime
- Primary metricFalse positive rate and true positive detection rate
- Escalation SLA24 hours with written recommendation
Results
Measured at 60 days post live suppression.
reduction in false positive alerts entering the analyst queue
Alert volume dropped from 1,200/day to 290/day. The suppression threshold was set conservatively — 100% of confirmed SAR cases in the validation set were preserved. No regulatory findings in the 60-day post-live period.
improvement in true positive detection rate
SAR filing rate increased to 2.9 per 1,000 alerts reviewed — analysts now spending time on genuinely suspicious activity rather than clearing noise. The same 22-person team is now more effective, not larger.
annual compliance operations cost eliminated
Analyst capacity freed from false positive clearing was redeployed to enhanced due diligence and investigation quality — no headcount reduction. The compliance function became more effective at the same cost.
average analyst time per alert, down from 40 minutes
SHAP-based feature explanations and similar case surfacing reduced investigation time significantly. Analysts report higher confidence in escalation decisions when the model's contributing factors align with their own assessment.
Is This Your Situation?
The false positive problem is not unique to this processor.
Any financial institution running rules-only AML monitoring at scale is generating noise that degrades analyst effectiveness and masks real risk.
- Your compliance team spends more time closing false positives than investigating suspicious activity
- Alert volume has grown faster than your analyst headcount — and the gap is widening
- Your SAR filing rate per alert reviewed is below industry benchmark for your transaction volume and risk profile
This engagement was scoped as an additive ML layer on top of the existing monitoring platform — no platform replacement, no regulatory re-approval of a new system, no disruption to live compliance operations. The 90-day parallel run was built into the timeline from day one.
Seen enough to have a conversation?
We scope every engagement before we quote. No sales deck. Just a direct conversation about your problem.
More Work
Decomposing a monolithic payment platform into microservices — cutting deployment time from 11 hours to 18 minutes and shrinking PCI DSS audit scope by 74%
Read case studyIncreasing RevPAR by 23% across 14 properties through ML-driven dynamic pricing
Read case studyReducing unplanned downtime by 67% across 3 production lines through ML-based predictive maintenance
Read case study