MongoDB - Limiting Collection Size
Join the DZone community and get the full member experience.
Join For FreeRecently I had a small task: storing some client events in MongoDb. This was supposed to be a separate collection, with just one simple requirement: we only needed to keep last 100 events per client.
Now, when you think about limiting the collection size in MongoDb, the first thing that comes to mind is capped collections.
- Capped collections are collections the size of which is limited. That is, it will always have the size you set up for it: for example, 1000 records. When the allocated collection space is full, it starts to write new records over the old ones. There are some limitations, though. For example, a capped collection can't be sharded (but usually you don't need this, as it's used to keep the kind of info you don't need too much of). Also, you can't delete from them, and you can't make updates that increase the document size. The last restriction comes from the guarantee that the collection always preserves its natural order, that is, the documents are stored on disk in the order you created them. More on capped collections here: MongoDB documentation.
- TTL, or time to live, is a collection property that is set by a special kind of index. The index should be created on a date BSON field and have expireAfterSeconds property set. This property is the time (in seconds) each record will be kept. Such command will create a TTL collection out of one:
db.clientEvents.ensureIndex( { "createDate": 1 }, { expireAfterSeconds: 3600 } )
This collection will always keep just one hour's worth of data. The rest will be deleted in a special background job. From this, one of the restriction comes, namely, that TTL collections can't be capped (because we remember one can't delete from those, right?). You can read about the other limitations here: MongoDb expire data tutorial.
- When a record is saved, check how many records client already has.
- If it's more than 100, remove the oldest one.
- Then, insert our new one.
- Get record count for the client;
- Find the oldest record for the client;
- Remove it.
- On save, we generate a random number between 1 and 50;
- If the random is say 12 (or 25 or whatever), we get the last 100 records per client;
- And we remove all the client events that have IDs NOT in this result set.
Topics:
Published at DZone with permission of Maryna Cherniavska. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments