Skip to main content

Fintech · Custom Software Development

Decomposing a monolithic payment platform into microservices  cutting deployment time from 11 hours to 18 minutes and shrinking PCI DSS audit scope by 74%

A US-based B2B payment processor handling $1.8B in annual transaction volume had built their platform as a single Java monolith. As transaction volume grew 8x in three years, every deployment became an 11-hour maintenance window, PCI DSS audits consumed 6 weeks of engineering time annually, and a single bad release to the reporting module twice took down live payment processing. We decomposed the monolith into 9 domain-bounded microservices using the Strangler Fig pattern — zero downtime migration, 99.97% uptime at 90 days, and a PCI audit scope reduced from the entire codebase to a single isolated service.

Business Context

The monolith wasn't the problem.The deployment model it forced was.

The processor had built something real — $1.8B in annual volume, 340 merchant clients, a transaction success rate their sales team was proud of. The platform had been written in Java, grown incrementally over five years, and had reached 680,000 lines of code across a single deployable unit. Nine engineers worked in the same codebase. Every feature branch was a coordination exercise. Every deployment required a full system restart. The platform processed payments 24 hours a day, seven days a week. There was no maintenance window that didn't cost someone a transaction.

What the monolith was actually costing them

11 hrs
median deployment window

Real data from 34 financial services monolith migrations: median P95 deploy time 14.2 hours — softwaremodernizationservices.com

6 weeks
annual PCI DSS audit burden

Entire 680K-line codebase in scope — every service touched cardholder data paths

payment outages from reporting bugs

Cascade failures: reporting module memory leak twice brought down the transaction pipeline in 12 months

The PCI DSS problem was the one that kept their CTO up at night. Because cardholder data flowed through the monolith's shared memory space, every service — reporting, reconciliation, merchant onboarding, notifications — was technically in PCI scope. Their QSA (Qualified Security Assessor) reviewed all of it. Six weeks of engineering time per year, every year, with findings that required patching modules that had nothing to do with card data. The compliance cost was structural, not operational. It would not improve with better processes. It required a different architecture.

The deployment problem had a compounding effect on the business. The team was shipping once every 3–4 weeks. Merchant feature requests were queuing for months. Two enterprise prospects had asked for specific API capabilities during sales cycles — both deals closed before the features shipped. The CTO's estimate: $400K in ARR delayed or lost in the prior 12 months due to release velocity. The monolith was not a technical problem. It was a revenue problem.

Scope of Work

What we were asked to build

01

Domain decomposition and migration architecture

Six-week DDD (Domain-Driven Design) workshop to identify bounded contexts within the monolith. Produced a decomposition map of 9 services with extraction priority order, data ownership boundaries, and inter-service communication contracts. Strangler Fig migration plan with rollback thresholds defined before a single line was written.

02

API gateway and traffic routing layer

Kong API Gateway deployed in shadow mode from day one — logging all traffic against the monolith without routing. Gradual traffic shifting (5% → 25% → 50% → 100%) per service as each microservice reached production readiness. Auto-rollback triggered if error rate exceeded 0.3% or P99 latency exceeded 900ms during cutover.

03

Isolated PCI cardholder data service

Card data handling extracted into a single, network-isolated service with its own database, mTLS-only ingress, and hardware security module (HSM) integration for key management. All other services receive only payment tokens — no cardholder data in scope. PCI DSS audit surface reduced from 680,000 lines to the 14,000-line card vault service.

04

Event-driven transaction pipeline

Apache Kafka event bus replacing synchronous inter-module calls for the transaction lifecycle — PaymentInitiated, PaymentAuthorized, FraudChecked, PaymentSettled, ReconciliationQueued. Async processing decoupled the transaction pipeline from reporting and reconciliation, eliminating the cascade failure mode.

05

Observability and incident response infrastructure

Distributed tracing (Jaeger), structured logging, and Prometheus/Grafana dashboards deployed before the first service went live. Any transaction traceable end-to-end within seconds. MTTR target: under 15 minutes for P1 incidents. Achieved 11-minute median MTTR at 90 days post go-live.

Constraints we worked within

  • Zero planned downtime — the platform processed live payments 24/7 and had no maintenance window; migration had to be fully live-traffic capable
  • PCI DSS 4.0 compliance required throughout migration — no intermediate state where compliance posture regressed
  • Existing merchant API contracts could not change — all 340 merchants integrated against the monolith's API surface; the gateway had to present an identical interface
  • 9-month delivery window — tied to the client's annual QSA audit cycle; the new architecture had to be auditable before the next scheduled assessment
  • Java monolith could not be rewritten — Strangler Fig only; parallel rewrite was explicitly ruled out after reviewing failure rate data (33% success rate across industry)

Explicitly not in scope

  • Merchant-facing portal or dashboard
  • Fraud model development (existing third-party fraud service retained)
  • Acquiring bank integrations (existing connections preserved through the gateway)
  • Data warehouse or analytics infrastructure

System Architecture

One monolith decommissioned. Nine bounded services. Zero downtime migration.

Core payment pipeline (synchronous)
Supporting services (async via Kafka)

How We Worked

9 months. 4 phases. One constraint that shaped everything: live payments never stop.

Month 1–2

DDD Workshop & Decomposition Map

Embedded with the engineering team. Six weeks of domain mapping — interviewed product, engineering, and compliance stakeholders. Identified 9 bounded contexts. Extraction order set: supporting domains first (notifications, merchant onboarding, reconciliation), core payment pipeline last. Kong gateway deployed in shadow mode. Zero routing changes.

Month 3–5

Supporting Services Extraction

Notifications, merchant onboarding, and reconciliation extracted and live. Kafka event bus stood up. Each service extracted with its own Postgres instance — database-per-service from day one. Traffic shifted to each new service at 5% → 100% over two-week windows with automated rollback guards. No incidents.

Month 6–8

Core Pipeline & Card Vault

Transaction processing, authorisation, and settlement extracted. Card vault service built with HSM integration and network isolation — this was the highest-risk extraction. Ran parallel processing (monolith + new service) for 3 weeks before cutover, comparing outputs on every transaction. PCI scope reduction validated with QSA before traffic shift.

Month 9

Monolith Decommission & Handoff

Final monolith modules decommissioned. 100% of traffic on microservices. Penetration test completed — 1 medium finding (resolved), 0 critical or high. Full runbook, architecture documentation, and on-call playbooks delivered. On-site for decommission week.

Working rhythm

  • CadenceTwo-week sprints, weekly architecture review with CTO
  • Decision ownerCTO (architecture), Head of Compliance (PCI decisions)
  • Rollback thresholdError rate >0.3% or P99 >900ms → auto-revert to monolith
  • Parallel validationCard vault ran 3-week parallel mode before any traffic shift
  • On-siteMonth 1–2 DDD workshop and decommission week

Results

Measured at 90 days post monolith decommission.

deployment time — down from an 11-hour maintenance window

Was: 11-hour median deployment requiring full system restart and live payment suspension

Each of the 9 services deploys independently via its own CI/CD pipeline. A change to the reconciliation service no longer touches the transaction pipeline. The team shipped 34 deployments in the first 30 days post go-live — more than they had shipped in the prior 8 months combined.

0 min

reduction in PCI DSS audit scope — from 680,000 lines to a single 14,000-line card vault service

Was: entire monolith codebase in PCI scope; 6 weeks of annual audit burden

The QSA confirmed the new scope boundary before the card vault went live. The first post-migration audit took 9 days — down from 6 weeks. All other services handle only tokenised payment references; no cardholder data in scope outside the vault.

0%

uptime at 90 days — up from 99.1% on the monolith

Was: 99.1% uptime; two cascade failures in 12 months caused by reporting module bugs

Service isolation eliminated the cascade failure mode. A bug in the reconciliation service now affects reconciliation only — the transaction pipeline continues processing. The two incident types that caused the prior outages have not recurred. MTTR for P1 incidents: 11-minute median.

0.97%

increase in peak transaction throughput — same infrastructure spend

Was: monolith scaled as a single unit; payment processing and reporting competed for the same resources

Transaction processing and fraud check services now scale independently on Kubernetes. During end-of-month reconciliation peaks — previously the highest-risk period for the monolith — the reconciliation service scales horizontally while the transaction pipeline remains unaffected. Infrastructure cost per transaction reduced by 34%.

0.1x

What This Means for You

The decomposition approach we used here is not unique to this processor. It applies to any payment platform where the monolith has become the primary constraint on deployment velocity, compliance posture, and the ability to scale the transaction pipeline independently.

This engagement used the Strangler Fig pattern exclusively — no parallel rewrite, no big-bang cutover, no planned downtime. The monolith processed live payments until the day it was decommissioned. Nine months from kickoff to full decommission across 9 services and $1.8B in annual transaction volume.

Tell us what you're building.

"They don't force us to go their way; instead, they follow our way of thinking."

★★★★★Marek StrzelczykHead of New Products & IT, GS1 Polska

What happens next

  • We respond to every inquiry within 1 business day.
  • A 30-minute discovery call — no templates, no sales scripts.
  • An honest assessment of fit. We'll tell you early if we're not the right partner.