Over a million developers have joined DZone.
Platinum Partner

Fail, Fail, Fail: More Job Candidate Fails

· Big Data Zone

The Big Data Zone is presented by Exaptive.  Learn how rapid data application development can address the data science shortage.

Sometimes, reading candidates answers is just something that I know is going to piss me off.

We have a question that goes something like this (the actual question is much more detailed):

We have a 15TB csv file that contains web log, the entries are sorted by date (since this is how they were entered). Find all the log entries within a given date range. You may not read more than 32 MB.

A candidate replied with an answer that had the following code:

    string line = string.Empty;
    StreamReader file;
     
    try
    {
        file = new StreamReader(filename);
    }
    catch (FileNotFoundException ex)
    {
       Console.WriteLine("The file is not found.");
       Console.ReadLine();
       return;
   }
    
   while ((line = file.ReadLine()) != null)
   {
       var values = line.Split(',');
       DateTime date = Convert.ToDateTime(values[0]);
       if (date.Date >= startDate && date.Date <= endDate)
           output.Add(line);
    
       // Results size in MB
       double size = (GetObjectSize(output) / 1024f) / 1024f;
       if (size >= 32)
       {
           Console.WriteLine("Results size exceeded 32MB, the search will stop.");
           break;
       }
   }

My reply was:

The data file is 15TB in size, if the data is beyond the first 32MB, it won't be found.

The candidate then fixed his code. It now includes:

  var lines = File.ReadLines(filename);

Yep, this is on a 15TB file.

Now I’m going to have to lie down for a bit, I am not feeling so good.

The Big Data Zone is presented by Exaptive.  Learn about how to rapidly iterate data applications, while reusing existing code and leveraging open source technologies.

Topics:

Published at DZone with permission of Ayende Rahien , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}