1M rows
Dataset
20s
Fastest
3m 19s
Slowest (completed)
What’s the most efficient way to insert 1 million records into MongoDB on constrained hardware? I ran this benchmark on a GCP e2-micro VM — 1 shared vCPU, 1 GB RAM, Intel Broadwell — using the Yelp dataset (1 million rows). The constraint is intentional — decisions that don’t matter on a 64GB machine start to matter here.
main.js — row-by-row insert
Loads the entire CSV into memory, then inserts each record individually. Crashed after ~2 minutes with a heap out-of-memory error. On a 1GB VM, holding 1 million records in RAM simultaneously is not viable.
mainV2.js — streaming with 1K batch
Switched to a streaming CSV parser with insertMany in batches of 1,000. Memory stayed stable at 609 MB and the import completed in 3 minutes 19 seconds. The bottleneck here is round trips — 1,000 batches of 1K each means 1,000 separate insertMany calls to MongoDB.
mainV3.js — streaming with 10K batch
Increased batch size to 10,000 records (~3.45 MB per batch, 100 total batches). Memory usage barely changed (625 MB) but time dropped to 54 seconds — 3.7× faster than V2. Fewer round trips to MongoDB is the main driver.
main.go — goroutines with 10K batch
Rewrote mainV3.js in Go using a buffered channel (capacity 10) and goroutines for concurrent batch insertion. Finished in 20 seconds with ~400 MB memory — lower than the Node.js versions despite doing more work concurrently. Goroutines are more efficient than the Node.js event loop for this type of I/O-bound workload. Channel capacity was capped at 10 because more goroutines on a 1GB VM causes OOM.
| Approach | Lang | Strategy | Time | Memory |
|---|---|---|---|---|
| main.js | Node.js | Row-by-row | DNF | OOM |
| mainV2.js | Node.js | Batch 1K | 3m 19s | 609 MB |
| mainV3.js | Node.js | Batch 10K | 54s | 625 MB |
| main.go | Go | Goroutine + Batch 10K | 20s | ~400 MB |