Seen enough to have a conversation?

No sales deck. Just a direct conversation about your problem.

Fintech · Machine Learning

Cutting AML false positive alerts by 76% — without increasing regulatory risk

A mid-size US payment processor handling $2.4B in annual transaction volume was generating 1,200+ AML alerts per day, of which 96% were false positives requiring manual analyst review. Each alert consumed 35–45 minutes of compliance analyst time. We replaced the rules-only detection engine with a layered ML model that reduced false positives by 76% while improving true positive detection rate by 31% — cutting compliance operations cost by $2.1M annually without a single regulatory finding.

Sound familiar?

Your compliance team spends most of their day clearing false positives instead of investigating real risk
Alert volume has grown faster than analyst headcount — and the gap keeps widening
Your SAR filing rate per alert is below industry benchmark for your transaction volume

76%
reduction in false positive alerts: 31%
improvement in true positive detection: $2.1M
annual compliance cost eliminated

The Problem

The compliance team wasn't failing.
The detection engine was.

The processor ran a rules-based AML transaction monitoring system — a configuration of 140+ static rules inherited from their core banking vendor. The rules had been layered over six years as the business grew and regulators requested additions. Nobody had ever removed a rule. The result was a system generating 1,200–1,400 alerts per day, of which internal audit confirmed 96% were false positives. The compliance team had grown to 22 analysts to manage the queue. They were spending 85% of their time clearing noise — reviewing transactions that were obviously legitimate — and 15% on actual suspicious activity investigation. The ratio was inverted.

The deeper risk was not the cost — it was the attention deficit. When analysts spend 85% of their day clearing obvious false positives, their capacity to identify genuinely suspicious patterns degrades. The true positive detection rate was 0.8 SAR filings per 1,000 alerts reviewed. Industry benchmark for a processor of this size and risk profile is 3–5 per 1,000. They were missing real suspicious activity because the noise was too loud.

The business had evaluated replacing the core monitoring platform — a project estimated at $4M and 18 months. We proposed a different approach: keep the existing platform as the alert generation layer, and build an ML-based triage and scoring engine on top of it that re-ranks and suppresses alerts before they reach the analyst queue. No platform replacement. No regulatory re-approval of a new system. Additive intelligence on top of the existing infrastructure.

The cost of the false positive problem

96%
of daily alerts were false positives: 40 min
average analyst time per alert: $2.1M
annual compliance ops cost attributable to false positives

Scope of Work

What we were asked to build

Transaction feature engineering pipeline

A real-time feature extraction layer computing 80+ behavioural and contextual features per transaction — velocity patterns, counterparty network graph metrics, merchant category deviation, time-of-day anomaly scores, and account tenure signals. Features computed at alert time and stored for model inference.

ML alert scoring and suppression engine

A gradient boosting ensemble model trained on 24 months of historical alert data with analyst disposition labels. Model outputs a false positive probability score per alert. Alerts scoring above a configurable suppression threshold are auto-closed with a documented rationale — auditable and regulatorily defensible.

Analyst queue prioritisation layer

Alerts not suppressed are re-ranked by true positive probability before entering the analyst queue. High-risk alerts surface first. Analysts see a risk score, the top contributing features, and similar historical cases — reducing investigation time per alert from 40 minutes to under 12 minutes on average.

Model governance and audit trail

Full explainability layer using SHAP values for every suppression and scoring decision. Automated model performance monitoring with drift detection. Monthly model refresh pipeline. Complete audit trail of every auto-closed alert with suppression rationale — satisfying FinCEN examination requirements.

Constraints we worked within

Existing monitoring platform could not be replaced — ML layer had to integrate via API without modifying core system
Auto-suppression required documented rationale per alert to satisfy BSA/AML examination standards
Model training data contained PII — all feature engineering and training ran in an isolated environment with no data export
Regulatory approval process required 90-day parallel run before suppression went live

Explicitly not in scope

SAR filing automation or FinCEN direct reporting
Customer due diligence or KYC workflow changes
Sanctions screening or OFAC list matching
Core banking platform replacement or migration

System Architecture

Rules engine generates alerts. ML engine decides what analysts actually see.

Transaction Stream

Rules-Based Monitor

Existing platform — 140+ rules

Feature Engineering Layer

80+ behavioural & network features

Velocity Signals

Tx frequency, amount deviation

Counterparty Graph

Network centrality, new connections

Merchant Category Model

MCC deviation, geo anomaly

Account Tenure Signals

Age, dormancy, pattern shift

ML Scoring Engine

Gradient boosting ensemble + SHAP

Auto-Suppressed

Analyst Queue (ranked)

Audit Trail + SHAP Log

Primary alert processing pipeline

Feature and audit layer

How We Worked

8 months. Parallel run required. Zero regulatory findings.

Month 1–2

Data Audit & Feature Engineering

Extracted and audited 24 months of alert history with analyst disposition labels. Identified 14 rule categories responsible for 71% of false positive volume. Built the feature engineering pipeline — 80+ features computed per transaction. Data quality issues in counterparty fields required 3 weeks of remediation before training data was usable.

Month 3–4

Model Development & Validation

Trained gradient boosting ensemble on labelled alert history. Validated against a held-out 6-month test set. Achieved 76% false positive reduction at a suppression threshold that maintained 100% recall on confirmed SAR cases in the test set. Compliance team reviewed 200 randomly sampled auto-suppression decisions — concurred with 97.5%.

Month 5–7

Parallel Run & Regulatory Review

Ran ML scoring in shadow mode alongside existing analyst workflow for 90 days. Analysts worked the queue normally; ML scores were logged but not acted on. Parallel run data submitted to external compliance counsel and reviewed by the client's primary regulator contact. No objections raised to the suppression methodology.

Month 8

Live Suppression & Queue Prioritisation

Suppression went live. Alert volume entering analyst queue dropped from 1,200/day to 290/day in week one. Analyst team redeployed from queue clearing to deeper investigation of high-risk alerts. SAR filing rate increased from 0.8 to 2.9 per 1,000 alerts reviewed within 30 days of go-live.

Working rhythm

CadenceTwo-week sprints, bi-weekly compliance team reviews
Decision ownerChief Compliance Officer and Head of Financial Crime
Primary metricFalse positive rate and true positive detection rate
Escalation SLA24 hours with written recommendation

Results

Measured at 60 days post live suppression.

Independent measurement · 90 days post go-live

before·1,200+ alerts/day, 96% false positive rate

reduction in false positive alerts entering the analyst queue

Alert volume dropped from 1,200/day to 290/day. The suppression threshold was set conservatively — 100% of confirmed SAR cases in the validation set were preserved. No regulatory findings in the 60-day post-live period.

before·0.8 SAR filings per 1,000 alerts reviewed

improvement in true positive detection rate

SAR filing rate increased to 2.9 per 1,000 alerts reviewed — analysts now spending time on genuinely suspicious activity rather than clearing noise. The same 22-person team is now more effective, not larger.

before·$2.1M/year in analyst time consumed by false positive review

annual compliance operations cost eliminated

Analyst capacity freed from false positive clearing was redeployed to enhanced due diligence and investigation quality — no headcount reduction. The compliance function became more effective at the same cost.

before·40 minutes per alert including documentation and escalation decision

0min

average analyst time per alert, down from 40 minutes

SHAP-based feature explanations and similar case surfacing reduced investigation time significantly. Analysts report higher confidence in escalation decisions when the model's contributing factors align with their own assessment.

Is This Your Situation?

The false positive problem is not unique to this processor.

Any financial institution running rules-only AML monitoring at scale is generating noise that degrades analyst effectiveness and masks real risk.

Your compliance team spends more time closing false positives than investigating suspicious activity
Alert volume has grown faster than your analyst headcount — and the gap is widening
Your SAR filing rate per alert reviewed is below industry benchmark for your transaction volume and risk profile

This engagement was scoped as an additive ML layer on top of the existing monitoring platform — no platform replacement, no regulatory re-approval of a new system, no disruption to live compliance operations. The 90-day parallel run was built into the timeline from day one.

Seen enough to have a conversation?

We scope every engagement before we quote. No sales deck. Just a direct conversation about your problem.

Talk to an engineer See how we approach Machine Learning for financial services

Scoped before quoted — no surprise costs

Response within 1 business day

170+ engagements delivered

More Work

Fintech·Custom Software

18 mindeployment time (was 11 hours)

Decomposing a monolithic payment platform into microservices — cutting deployment time from 11 hours to 18 minutes and shrinking PCI DSS audit scope by 74%

Read case study

Travel & Hospitality·Machine Learning

23%increase in RevPAR

Increasing RevPAR by 23% across 14 properties through ML-driven dynamic pricing

Read case study

Manufacturing·Machine Learning

67%reduction in unplanned downtime

Reducing unplanned downtime by 67% across 3 production lines through ML-based predictive maintenance

Read case study