MongoDB Aggregation Queries for "Counts Per Day" (Part 1)
To feed a heatmap display with MongoDB data, explore grouping documents to return a daily aggregation query.
Join the DZone community and get the full member experience.
Join For FreeI need a MongoDB query to retrieve document counts per day to feed a heatmap display (using https://kamisama.github.io/cal-heatmap/), for my Amateur Radio received signals historical visualization service, SpotViz.
The data to feed Cal-heatmap looks like this:
{
"946721039":4,
"946706853":2,
"946706340":7,
...
}
What’s interesting about this data structure is the property name is variable, and I’m not sure how to project a result into a property name in a MongoDB query. I asked this question on StackOverflow: “Return a computed value as field name in MongoDB query?” – so far I haven’t had any answers or suggestions, so I’m not sure this is possible.
There doesn’t seem to be a way to do exactly what I need, so my next challenge was how to group documents per day (ignoring the time part of a date), and return a count per day.
I started with a working Aggregation query from the shell, and then took that and implemented using the MongoDB Java api. The challenge with this query is that there doesn’t seem to be any out of the box feature that allows you to select matching documents based on a date and exclude the time portion of new Date(). What I need is the equivalent of ‘find counts of documents that are grouped by the same day’. The catch is to not group docs by exactly the same yyyy/MM/dd hh:mm:ss values, but to group by only the same yyyy/MM/dd values.
Since there is a way to extract the year, month and day values from a date with the aggregation $year, $month, $dayOfMonth operators, these could be used to get the result I need (the counts per day), but this format doesn’t help me get the property name for the counts in a seconds past 1/1/1970, e.g. “946721039”.
A query using this approach would look like this:
db.Spot.aggregate(
[
{$match: {spotter: "kk6dct"}},
{$group: { _id : {
year:{$year:"$spotReceivedTimestamp"},
month:{$month:"$spotReceivedTimestamp"},
day:{$dayOfMonth:"$spotReceivedTimestamp"}
},
count:{$sum: 1 }
}
}
])
… this approach follows a suggestion from this SO post.
This approach to group the document counts by day is good, but it doesn’t return the docs in the format I need with each day represented by seconds since 1/1/1970.
A better approach would be to group by millis for the date, and return that value. Converting a date in mongo to another format however seems to be somewhat challenging – I spent probably far too much time to work out a query to do this, getting close, but still not what I wanted, and ended up with this rather complex query:
db.Spot.aggregate(
[
{$match: {spotter: "kk6dct"}},
{$group: { _id : {
yearval:{$year:"$spotReceivedTimestamp"},
monthval:{$month:"$spotReceivedTimestamp"},
dayval:{$dayOfMonth:"$spotReceivedTimestamp"},
"h" : {
"$hour" : "$spotReceivedTimestamp"
},
"m" : {
"$minute" : "$spotReceivedTimestamp"
},
"s" : {
"$second" : "$spotReceivedTimestamp"
},
"ml" : {
"$millisecond" : "$spotReceivedTimestamp"
}
},
count:{$sum: 1 }
}
},
{$project :
{
"date" : {
"$subtract" : [
"$spotReceivedTimestamp",
{
"$add" : [
"$ml",
{
"$multiply" : [
"$s",
1000
]
},
{
"$multiply" : [
"$m",
60,
1000
]
},
{
"$multiply" : [
"$h",
60,
60,
1000
]
}
]
}
]
}
}
}
])
What I was attempting to do with this approach was to use the $project stage to subtract the $hour, $minute and $second values converted to millis from each of the timestamp values to get just the millis value of the yyyy/MM/dd but ignoring the time part. This is about as close as I got, but I couldn’t get the math to work, or at least convert between types so the calculations would work the way I wanted.
My next attempt was based on the suggestion in this SO post. This is a much simpler approach to the problem – my new query looks like this:
db.Spot.aggregate(
[
{$match: {spotter: "kk6dct"}},
{"$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$spotReceivedTimestamp", new Date("1970-01-01") ] },
{ "$mod": [
{ "$subtract": [ "$spotReceivedTimestamp", new Date("1970-01-01") ] },
1000 * 60 * 60 * 24
]}
]
},
count:{$sum: 1 }
}
}
])
If I try and break this down into words, then what I’m doing is:
– for date x, calculate millis since 1/1/1970 (the epoch date)
– subtract from this the number of millis since the start of the day (this is the millis since 1/1/1970 mod number of millis in a day, the remainder of one divided by the other)
… the result is the millis of each date at midnight, i.e. excluding the time part.
Ok, almost there! How I then took this query and converted into the MongoDB Java Drvier API is coming in part 2.
If you enjoyed this article and want to learn more about MongoDB, check out this collection of tutorials and articles on all things MongoDB.
Published at DZone with permission of Kevin Hooke, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments