HRM at Large Database Scale — Why Most Platforms Slow Down
An HRM that runs comfortably at 100 employees can become unusable at 1000, and the customer often does not see it coming. The slowdown is gradual, the patterns are predictable, and the fix is architectural — not a tuning knob you turn at the last minute.
What slows down first
In rough order of when they start hurting:
1. The attendance dashboard
The dashboard typically aggregates "who is in today" across the whole company. At 100 employees this is a sub-second query. At 1000 it is 5+ seconds. At 5000 the dashboard starts timing out. The root cause is almost always missing or poorly designed indexes on the attendance event collection.
2. Payroll runs
A payroll run reads attendance, leave, overtime, deductions, tax slabs for every active employee, computes net pay, writes a payslip record, fires an email. Naive implementations hold open a database transaction for the whole run and pull everything into memory. At 100 employees that is fine. At 1000 the run starts taking 30+ minutes; at 5000 it is unfinishable inside a single API request window. The fix is to run payroll as a queued background job in a separate worker tier.
3. Year-end exports
An annual attendance report or tax certificate export reads a year of events for every employee. The volume scales linearly, the export does not. By 2000 employees the export is reading hundreds of millions of rows in one pass — anything but a streaming export will OOM.
4. The audit log query
"Show me everything user X did last quarter" against a billion-row audit table without the right index will scan the table.
5. Reports the customer builds themselves
Custom report builders that translate UI selections to ad-hoc database queries hit query planner limits quickly at scale. The platform has to either constrain what reports can be built or pre-aggregate the heavy ones.
The architectural fixes
The patterns that keep an HRM fast at large database scale:
Indexes on every query path
Every collection has a compound index on (company, ...) where the rest of the key matches the query patterns the application actually uses. Tenant scoping is the most expensive filter; the company index must be the leading key.
Read paths use projection
Reading whole documents when you only need three fields is wasteful at 100x scale. The HRM should be opinionated about projections — list endpoints return list-shaped documents, detail endpoints return detail-shaped documents.
Long jobs in a worker tier
Payroll, year-end exports, attendance recomputation, bulk email — all queued, all run in a worker tier separate from the API. Worker failures retry; API requests do not block.
Pagination everywhere, including silently
Every list endpoint paginates. Endpoints that ostensibly return "all" of something should actually return the first page and a continuation token. This prevents one large customer from accidentally requesting a 50,000-row response.
Append-only event logs for attendance
Attendance is the hottest write path in the HRM. An append-only event log with deterministic derivation of the day's flag (see our deep dive on idempotent attendance) keeps writes cheap and prevents the day-of-the-month spike that brings down naive designs.
Pre-aggregated dashboards
The "company-wide today" dashboard should not run a live aggregation across the whole attendance collection on every page load. The platform should maintain a small denormalized "today" record that the API reads directly. Rebuilding the denormalization is cheap; querying the raw collection at scale is not.
Clustered API tier
A single Node process per company stops working past a certain headcount. The API runs as a cluster of identical processes behind a load balancer — Zaffre HRM uses four cluster workers in production, with horizontal scale-out.
What this looks like in Zaffre HRM
The Zaffre HRM production architecture today, optimized for 1000+ employee customers:
- API tier: 4 clustered Fastify processes behind PM2, horizontally scalable.
- Worker tier: 4 BullMQ worker processes for payroll, exports, attendance recomputation, email delivery.
- Cron tier: dedicated cron process for daily close-out, payroll automation, leave accrual.
- Data tier: MongoDB with compound indexes leading on company ObjectId for every tenant collection.
- Attendance: append-only event log, idempotent check-in keyed on (company, employee, day, source).
- Payroll: deterministic rule engine, end-of-cycle reconciliation, runs as a queued job.
- Audit: immutable log on every state change, indexed for (company, actor, time-range).
The result is an HRM that serves 100 employees and 5000 employees on the same code path, with predictable performance and no surprise cliffs.
See the 1000+ employees page for the architectural detail, or book a demo to walk through it.