Case Study· Personal Project

Building a Public Face Recognition API — And Why I Over-Engineered It

Computer VisionBackendPythonFastAPIFirebase

A face-lookup feature shipped as a working API in under a second per request, without the scale infrastructure it didn’t need yet.

3 endpoints, ~737ms–1.8s latency across 20K+ facial records on CPU-only inference.

20K+

Facial records tested

737 ms

Fastest endpoint

FaceNet512

Model

The Starting Point

I was learning computer vision and wanted to go beyond tutorials. As a backend engineer, the natural extension was: build something production-shaped. After validating the idea with a few AI tools, I confirmed it was feasible and started building a public face recognition API — three endpoints, real inference, real dataset.

What I Built and Decided

GPU vs CPU inference

First decision: run FaceNet512 on GPU or CPU. On GPU, cold start took 10+ seconds due to CUDA driver loading. On CPU, cold start was faster and inference latency was close enough to GPU that the difference didn't justify the overhead. Went with CPU on a private server.

Microservice architecture

The original plan was a microservice design built for scale — separate services, multi-region pricing model, and an evaluation of Firebase vs Supabase vs full self-host. This was the wrong frame for an MVP. All of it was running on a single server anyway.

Storage and embedding strategy

Facial photos are stored resized to 512px. DeepFace automatically generates and caches embeddings in a .pkl file on first scan — so search operates against cached embeddings, not raw images. Metadata is stored in Firestore. At 20K+ records, Firestore latency became noticeable and migrating to self-hosted PostgreSQL on the same server was considered.

The over-engineering realization

The microservice design, multi-region pricing research, and database provider debates were all premature. None of them mattered at MVP stage. The API worked. The latency was acceptable. The lesson: solve the problem in front of you, not the one three steps ahead.

Endpoint Latency

Endpoint	Description	Latency
/enroll	Register a new face	~1,844 ms
/verify	1:1 identity check	~1,037 ms
/find	1:N search across 20K+ records	~737 ms

Tested against 20K+ facial embeddings on a private server using CPU inference.

Takeaway

The API works. The latency is acceptable for the use case. But the bigger lesson was about scope: microservice design, multi-region pricing, and database provider debates are not MVP problems. Ship first. Optimize the parts that actually break.