Top 50 MongoDB Interview Questions in 2026 (With Answers and Code)

MongoDB is still the most-asked NoSQL database in backend interviews. Even if your interviewer prefers Postgres, they'll ask Mongo questions to gauge whether you understand denormalization, document modeling, and the tradeoffs of schemaless storage.

These 50 questions came up in 2025-2026 backend, full-stack, and data engineering interviews. Each answer is short and concrete.

Document Model and Schema Design (1-10)

1. What is MongoDB?

A document-oriented database. It stores BSON (binary JSON) documents in collections. Schema is flexible by default - you choose how strict to be with $jsonSchema validators.

2. BSON vs JSON?

BSON adds binary types JSON doesn't have: ObjectId, Date, Decimal128, Binary, Long. It's also typed and length-prefixed for fast parsing.

3. What is `_id` and how is it generated?

The primary key. Auto-generated as an ObjectId (12 bytes: timestamp + machine + process + counter). Must be unique within a collection. You can replace it with any unique value.

4. Embed vs reference - how do you choose?

Embed when:

The child only matters in the context of the parent (order line items).
You always read them together.
The child doesn't grow unbounded.

Reference when:

The child is referenced by many parents.
It changes independently and you don't want to update many copies.
It can grow large (the 16MB document limit is real).

5. What is the 16MB document limit?

Maximum size of a single BSON document. Hit it when embedding unbounded arrays (chat messages, log lines). Use a separate collection or GridFS for large/unbounded data.

6. What is GridFS?

A spec for storing files larger than 16MB by chunking them across two collections (fs.files and fs.chunks). Mostly superseded by S3-style object storage in 2026 - use GridFS only when you specifically want files alongside your data.

7. Schemaless - really?

No. Schema is just enforced in your application instead of the database, by default. You can also enforce it server-side with $jsonSchema validators. Most production setups use Mongoose (Node) or Beanie/Pydantic (Python) for schema validation in the app.

8. What's the polymorphic document pattern?

Storing different shapes in one collection, distinguished by a type field. Useful for activity feeds, events, audit logs - anything where reads filter by type and you want one collection.

9. The "extended reference" pattern?

When you reference another document but copy a few of its fields to avoid a join on read. Classic example: order documents store the customer name + email, even though the canonical record lives in customers. Faster reads, harder updates - good when reads vastly outnumber writes.

10. Bucket pattern - what and when?

Group time-series or event data into buckets (e.g. one document per device per hour). Reduces document count and index size. Built into Time Series collections in modern Mongo.

Indexing (11-20)

11. What indexes does MongoDB support?

Single field, compound, multikey (indexes on array elements), text, geospatial (2d, 2dsphere), hashed, wildcard, partial, sparse, TTL, and Atlas Search (Lucene-backed).

12. The ESR rule for compound indexes?

Equality, Sort, Range. Order index keys with equality predicates first, then sort fields, then range predicates. Improves index usage and avoids in-memory sorts.

13. What is a covered query?

A query that's answered entirely from the index without touching the documents. The index must contain every projected and filtered field. Major perf win for hot queries.

14. What is `explain()`?

Tool to inspect query plans: db.coll.find(...).explain('executionStats'). Look for IXSCAN (index used) vs COLLSCAN (full scan), nReturned, totalKeysExamined, totalDocsExamined. The closer the last two are to nReturned, the better.

15. What is a multikey index?

An index on an array field. Mongo creates one index entry per array element. Heavy on storage and unique constraints behave differently across array elements.

16. TTL indexes - what for?

Automatic document expiration. { createdAt: 1 } with expireAfterSeconds: 86400 deletes docs 24h after createdAt. Used for sessions, audit logs, ephemeral cache.

17. Partial vs sparse index?

Sparse: indexes only documents where the field exists.
Partial: indexes only documents matching a filter expression. Strictly more powerful - prefer partial indexes in new code.

18. What is a wildcard index?

Indexes all fields or a path's subfields. Useful for unpredictable schema (user-defined fields, attribute bags). Costs more space than targeted indexes - use when query patterns are truly dynamic.

19. Hashed index - when?

Required for hashed sharding. Useful when you want even distribution but the natural key is monotonic (timestamps, ObjectIds). Doesn't support range queries.

20. How do you find unused indexes?

db.coll.aggregate([{ $indexStats: {} }]). Look at accesses.ops. Drop indexes that haven't been hit in your retention window - they cost write throughput and disk.

Querying and Aggregation (21-30)

21. `find` vs `aggregate`?

find is simple filter + projection + sort. aggregate is the pipeline - filter, group, transform, join, reshape. New code should bias toward aggregation; it's more expressive and the optimizer handles complex pipelines well.

22. Common aggregation stages?

$match, $project, $group, $sort, $limit, $lookup (joins), $unwind, $facet (multiple pipelines on same input), $bucket, $out / $merge (write results).

23. `$lookup` - is it a real join?

A left outer join. Pulls matching docs from another collection into an array field. Performance depends on indexes on the foreign collection. For high-throughput paths, prefer denormalization over $lookup.

24. Why put `$match` early in the pipeline?

It reduces the document set before expensive stages. The optimizer also pushes $match past projections automatically when it can - putting it first guarantees it.

25. `$project` vs `$set`?

$project reshapes the document - includes/excludes fields. $set (alias $addFields) only adds or replaces fields, keeping the rest. Use $set when you just want to add a computed field.

26. What is `$facet`?

Runs multiple sub-pipelines on the same input documents in one stage. Handy for dashboards: count, top-10, histogram - all from one query.

27. How do you do pagination correctly?

Don't use skip + limit for deep pages - it's O(skip). Use range queries on a sorted indexed field: find({ _id: { $gt: lastId } }).limit(20). Cursor-based pagination scales.

28. What is `$expr`?

Lets you use aggregation expressions in find queries - including comparing two fields in the same document. db.orders.find({ $expr: { $gt: ['$total', '$paid'] } }).

29. Text search - native or Atlas Search?

Native text index is fine for basic word search. Atlas Search (Lucene-backed) is dramatically better: relevance ranking, faceting, autocomplete, fuzzy matching, vector search. New apps in 2026 should default to Atlas Search if they're on Atlas.

30. What is vector search in Mongo?

Atlas Search supports $vectorSearch over float arrays - kNN with HNSW indexes. Used to store embeddings for RAG. Lets you keep vectors next to your operational data instead of in a separate vector DB.

Replication, Sharding, Transactions (31-40)

31. What is a replica set?

A group of Mongo nodes (typically 3) keeping copies of the same data. One primary handles writes; secondaries replicate via oplog. Automatic failover - if the primary dies, secondaries elect a new one.

32. What is the oplog?

A capped collection on the primary recording every write. Secondaries tail it to stay in sync. Also feeds Change Streams.

33. What are write concerns?

How many nodes must acknowledge a write before it's confirmed. w: 1 (primary only), w: 'majority' (default), w: 2, etc. Production should use 'majority' for durability across failover.

34. Read preference - primary, secondary, nearest?

primary (default): read from primary, always fresh.
secondary / secondaryPreferred: scale reads, but eventually consistent.
nearest: lowest latency.

Use secondaries for analytical queries and dashboards, never for read-after-write.

35. What is sharding?

Horizontal partitioning across multiple replica sets. Mongo routes queries by shard key. Use sharding when one replica set runs out of write throughput or storage.

36. How do you choose a shard key?

Goals: high cardinality, even distribution, queries should target one shard when possible. Bad: monotonic keys (_id, timestamp) - they hot-spot one shard. Good: hashed keys, or compound keys mixing entity ID with low-cardinality dimension.

37. Can you change a shard key?

Modern Mongo (5.0+) supports reshardCollection - online reshard with downtime measured in seconds for small writes. Still risky on large data; plan it carefully.

38. Does MongoDB support transactions?

Yes - multi-document ACID transactions across replica sets and shards. They have overhead; design schema so most ops are single-document (atomic by default) and reach for transactions only when you really need cross-document atomicity.

39. What are Change Streams?

Real-time feed of database changes. Built on the oplog. Used for invalidating caches, syncing search indexes, event-driven architectures. Filter with aggregation pipelines.

40. What's MongoDB Atlas?

Mongo's managed cloud service. In 2026 most new Mongo workloads run on Atlas - it includes Atlas Search, vector search, charts, triggers (serverless functions), and online archive. Self-hosting is mostly legacy or compliance-driven now.

Performance, Operations, Modern Features (41-50)

41. How do you fix slow queries?

explain('executionStats') to confirm a COLLSCAN.
Add or fix the index per ESR.
Re-run; check totalDocsExamined / nReturned ratio.
If still slow, look for in-memory sorts or large $lookup joins.

42. WiredTiger - what is it?

The default storage engine. B-tree based, uses snappy/zlib/zstd compression, MVCC for concurrent readers. Cache size defaults to half of RAM minus 1GB - tune wiredTigerCacheSizeGB for memory-constrained boxes.

43. How does Mongo handle concurrency?

Document-level concurrency control. Multiple writers can update different documents in the same collection concurrently. WiredTiger's MVCC means readers don't block writers and vice versa.

44. What are Time Series collections?

A specialized collection type optimized for timestamped data. Stores buckets internally (collapses many measurements into one document) for storage efficiency. Use for IoT, metrics, sensor data.

45. How do you do upserts?

updateOne(filter, update, { upsert: true }). Use $setOnInsert to set fields only on insert. Be careful - upserts can produce duplicate inserts under retry; use a unique index to be safe.

46. Bulk writes - why?

db.coll.bulkWrite([...]) batches inserts/updates/deletes in one round trip. Massive throughput improvement for ETL and migrations. Set ordered: false to keep going on errors.

47. How do you prevent injection?

Parameterize queries through the driver - never string-concat user input into query objects. Validate field shape (Zod, Pydantic). Treat object-shaped user input as untrusted: a JSON { "$ne": null } becomes a query operator if you blindly merge it.

48. What is Queryable Encryption?

Client-side field-level encryption that supports equality and range queries on encrypted fields. The server never sees plaintext. Use for PII, healthcare, financial data with compliance requirements.

49. Online Archive - what does it do?

Atlas feature that automatically tiers old data from Atlas to cheaper object storage (S3) while keeping it queryable. Cuts storage cost for log-style or audit data without losing access.

50. When should you NOT use MongoDB?

Heavy multi-row transactions across many entities (Postgres handles this better).
Strong relational schema that won't change (Postgres + JSONB is often a better fit).
Strict reporting/BI workloads with complex joins (use a warehouse).
"We need a graph database" (use Neo4j or similar).

Mongo is the right tool when documents map to your domain naturally and you want flexibility.

How to Use These

The interviewer is checking whether you understand tradeoffs, not just syntax. For schema questions, always explain the access pattern that drove the decision. For performance questions, walk through the explain plan instead of guessing. For sharding, pick the shard key by reading the workload.

Get hands-on: spin up a local Mongo, load a few million docs, run explain on every query. Pattern recognition kicks in fast.