Platinum Partner
java,opinion,bigdata,humor,tools & methods,big data,fail

Fail, Fail, Fail: More Job Candidate Fails

Sometimes, reading candidates answers is just something that I know is going to piss me off.

We have a question that goes something like this (the actual question is much more detailed):

We have a 15TB csv file that contains web log, the entries are sorted by date (since this is how they were entered). Find all the log entries within a given date range. You may not read more than 32 MB.

A candidate replied with an answer that had the following code:

    string line = string.Empty;
    StreamReader file;
     
    try
    {
        file = new StreamReader(filename);
    }
    catch (FileNotFoundException ex)
    {
       Console.WriteLine("The file is not found.");
       Console.ReadLine();
       return;
   }
    
   while ((line = file.ReadLine()) != null)
   {
       var values = line.Split(',');
       DateTime date = Convert.ToDateTime(values[0]);
       if (date.Date >= startDate && date.Date <= endDate)
           output.Add(line);
    
       // Results size in MB
       double size = (GetObjectSize(output) / 1024f) / 1024f;
       if (size >= 32)
       {
           Console.WriteLine("Results size exceeded 32MB, the search will stop.");
           break;
       }
   }

My reply was:

The data file is 15TB in size, if the data is beyond the first 32MB, it won't be found.

The candidate then fixed his code. It now includes:

  var lines = File.ReadLines(filename);

Yep, this is on a 15TB file.

Now I’m going to have to lie down for a bit, I am not feeling so good.

Published at DZone with permission of {{ articles[0].authors[0].realName }}, DZone MVB. (source)

Opinions expressed by DZone contributors are their own.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}