CASE STUDY 02

Delivering: Registered Traveler System at Orlando Airport

Most large government systems don't go live in public.

They go through long development cycles, controlled testing, staged rollouts, and multiple layers of review. By the time anyone outside the program sees them, they've already been proven.

This one didn't.

We were asked to build and launch a biometric identity system — end to end — in about twelve weeks. Enrollment kiosks, biometric capture, backend identity management, integration with TSA systems, card issuance. All of it.

And it wasn't a quiet launch. It was designed to be public. Media coverage was expected. There was a narrative attached to it.

I volunteered to take it on.

We pulled together a small team — about a dozen people, all strong in their domains — and built the system under extreme time compression. Hardware from one group, biometrics from another, software integration, infrastructure, airport deployment. Everything moving at once.

There was no real opportunity for full-scale testing. We tested what we could, where we could, but the first true test was going to be real customers in a live airport.

On the morning of the launch, the system worked.

People enrolled. The process flowed. You could see it: capture the data, submit, immediate acknowledgment, move to the next person. Lines were short. It felt like we had pulled it off.

By early afternoon, it started to break.

All of the kiosks depended on a backend database. Every enrollment had to be written and acknowledged before the system would move forward. Under load, the database began to saturate. Response times slowed. Then they stalled.

At the kiosks, nothing advanced.

The enrollment agents had only seen the system working properly in training. When it stopped responding, they didn't know whether a transaction had gone through. Some restarted the process and re-entered applicants, which increased the load and made the problem worse.

Lines grew. Four people became ten, then twenty. Some customers stayed. Others walked away, deciding to try again on a future trip.

In the background, the escalation chain lit up. Field teams calling their leads, leads calling program management, everyone asking the same question: what is happening?

We could see quickly that the database was maxed out. The question was why.

The initial effort focused on system metrics — CPU, memory, throughput. That didn't get us to a solution. Late in the day, we shifted approach. Instead of looking at symptoms, we went back to the code.

The issue was in the queries. They were written in a way that forced the database to do far more work than necessary under load. Individually, they looked acceptable. At scale, they collapsed the system.

We rewrote them.

That night, we deployed an updated build and tested it across the kiosks under load. The difference was immediate. Transactions completed quickly. The system behaved the way it had during limited testing, but now at real volume.

The next morning, it ran.

From the outside, the story shifted from "this might not work" to "this works." The public launch held. The program continued.

The work didn't end there. For weeks afterward, we focused on stability, data integrity, and resilience. When you've had a failure like that in public, you don't assume anything. You verify everything.

What mattered was that we didn't lose the moment.

We built something in a fraction of the normal time, put it in front of real users under real scrutiny, hit a failure, and recovered fast enough that the system — and the program — survived intact.