Excerpts From the RavenDB Performance Team Report: Dates Take a Lot of Time
Join the DZone community and get the full member experience.
Join For FreeRavenDB uses a lot of dates, from the last modified metadata on a document to the timestamp of an index or when a query was started or… you get the point, lots and lots of dates.
Dates in RavenDB are usually formatted in the following manner:
2015-01-15T00:41:16.6616631
This is done using the following date time format:
yyyy'-'MM'-'dd'T'HH':'mm':'ss.fffffff
This is pretty awesome. It generate readable dates that are lexicographically sorted. There is just one problem with that, this is really expensive to do. How expensive? Well, outputting 10 million dates using the following manner:
dateTime.ToString(Default.DateTimeFormatsToWrite, CultureInfo.InvariantCulture)
This takes 13.3 seconds, or just about 750 dates per millisecond. The costs here are partly the allocations, but mostly it is about the fact that the format provider needs to first parse the format specifier, then do quite a bit of work to get it working. And DateTime itself isn’t very cheap. The solution presented is ugly, but it works, and it is fast.
public unsafe static string GetDefaultRavenFormat(this DateTime dt, bool isUtc = false) { string result = new string('Z', 27 + (isUtc ? 1 : 0)); var ticks = dt.Ticks; // n = number of days since 1/1/0001 int n = (int)(ticks / TicksPerDay); // y400 = number of whole 400-year periods since 1/1/0001 int y400 = n / DaysPer400Years; // n = day number within 400-year period n -= y400 * DaysPer400Years; // y100 = number of whole 100-year periods within 400-year period int y100 = n / DaysPer100Years; // Last 100-year period has an extra day, so decrement result if 4 if (y100 == 4) y100 = 3; // n = day number within 100-year period n -= y100 * DaysPer100Years; // y4 = number of whole 4-year periods within 100-year period int y4 = n / DaysPer4Years; // n = day number within 4-year period n -= y4 * DaysPer4Years; // y1 = number of whole years within 4-year period int y1 = n / DaysPerYear; // Last year has an extra day, so decrement result if 4 if (y1 == 4) y1 = 3; // If year was requested, compute and return it var year = y400 * 400 + y100 * 100 + y4 * 4 + y1 + 1; // n = day number within year n -= y1 * DaysPerYear; // Leap year calculation looks different from IsLeapYear since y1, y4, // and y100 are relative to year 1, not year 0 bool leapYear = y1 == 3 && (y4 != 24 || y100 == 3); int[] days = leapYear ? DaysToMonth366 : DaysToMonth365; // All months have less than 32 days, so n >> 5 is a good conservative // estimate for the month int month = n >> 5 + 1; // m = 1-based month number while (n >= days[month]) month++; // If month was requested, return it // Return 1-based day-of-month var day = n - days[month - 1] + 1; fixed (char* chars = result) { var v = _fourDigits[year]; chars[0] = v[0]; chars[1] = v[1]; chars[2] = v[2]; chars[3] = v[3]; chars[4] = '-'; v = _fourDigits[month]; chars[5] = v[2]; chars[5 + 1] = v[3]; chars[7] = '-'; v = _fourDigits[day]; chars[8] = v[2]; chars[8 + 1] = v[3]; chars[10] = 'T'; v = _fourDigits[(ticks / TicksPerHour) % 24]; chars[11] = v[2]; chars[11 + 1] = v[3]; chars[13] = ':'; v = _fourDigits[(ticks / TicksPerMinute) % 60]; chars[14] = v[2]; chars[14 + 1] = v[3]; chars[16] = ':'; v = _fourDigits[(ticks / TicksPerSecond) % 60]; chars[17] = v[2]; chars[17 + 1] = v[3]; chars[19] = '.'; long fraction = (ticks % 10000000); v = _fourDigits[fraction / 10000]; chars[20] = v[1]; chars[21] = v[2]; chars[22] = v[3]; fraction = fraction % 10000; v = _fourDigits[fraction]; chars[23] = v[0]; chars[24] = v[1]; chars[25] = v[2]; chars[26] = v[3]; } return result; }
We use the same general pattern that we used with etags as well, although here we are also doing a lot of work to figure out the right parts of the date. Note that we don’t have any allocations, and we again use the notion of a lookup table to all the pre-computed 4 digits number. That allows us to process 10,000,000 dates in just over 2 seconds (2,061 ms, to be exact). Or roughly 4,850 dates per millisecond. In other words, we are about 15% of the speed of the original implementation.
This code is ugly, in fact, the last few posts has contained pretty much ugly code, that is hard to understand. But it is significantly faster than the alternative, and what is even more important, those pieces of code are actually being used in RavenDB’s hot path. In other words, that means that we have actually seen significant performance improvement when introducing them to the codebase.
Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments