Over a million developers have joined DZone.

Update Fixed number of MongoDB Documents

· Java Zone

Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code! Brought to you in partnership with ZeroTurnaround.

Recently I worked on a project which uses MongoDB as a source data system and uses R for analysis and MongoDB again for output storage.

In this project we faced a different problem. We were using R to process source data present in MongoDB and if we gave large number of documents for analysis to R it was becoming slower and a bottleneck. To avoid this bottleneck we had to implement processing of a fixed number of documents in R for a batch.

To achieve this we needed some kind of record number in MongoDB, but being a distributed database getting some sequential number in MongoDB was not supported. Also our MongoDB source was getting populated by a distributed real-time stream so implementing some logic on application side was also deterrent.

To have some batchId field for a fixed number of documents in MongoDB we implemented below algorithm :

1. Find for documents which didn't had batchId field.

2. Sort by some timestamp field.

3. Limit the number of documents (say 10000).

4. Append batchId field to documents and save them (get value of batchId from audit table).

MongoDB shell command for this is :

db['collection1'].find({batchId:null}).sort({systemTime:1}).limit(10000).forEach(
  function (e) {
// get value of batchId from audit table
  e.batchId = 1;
  db['collection1'].save(e);
  }
);

Using the above code we appeneded batchId to MongoDB documents and picked only current batchId for analysis in R.

Java code for above MongoDB shell command is :

public class UpdateMongoBatchId {
	public static void main(String[] args) {
		Integer batchId = new Integer(args[0]);
		
		try {
			Mongo mongo = new Mongo("10.x.x.x", 27017);
			DB db = mongo.getDB("dbname");
			DBCollection coll1 = db.getCollection("collname");
		
			// MongoDB find conditions
			BasicDBObject searchQuery = new BasicDBObject();
			searchQuery.put("batchId", null);
			BasicDBObject searchFields = new BasicDBObject();
			BasicDBObject sortOrder = new BasicDBObject();
			sortOrder.put("systemTime", 1);
			
			DBObject currDocument;
			
			DBCursor cursor = coll1.find(searchQuery).sort(sortOrder).limit(MongoVariables.BATCH_SIZE);
			try {
				while (cursor.hasNext()) {
					currDocument = cursor.next();
					currDocument.put("batchId", batchId);
					coll1.save(currDocument);
				}

			} catch (Exception e) {
				// TODO: handle exception
			} finally { 
				cursor.close();
			}
			System.out.println("Updated batchId to MongoDB");
		} catch (Exception e) {
			// TODO: handle exception
		} finally {
			if (mongo != null) {
				mongo.close();
			}
		}
	}
}

The Java Zone is brought to you in partnership with ZeroTurnaround. Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code!

Topics:

Published at DZone with permission of Rishav Rohit, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}