Over a million developers have joined DZone.

Fail, Fail, Fail: More Job Candidate Fails

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Sometimes, reading candidates answers is just something that I know is going to piss me off.

We have a question that goes something like this (the actual question is much more detailed):

We have a 15TB csv file that contains web log, the entries are sorted by date (since this is how they were entered). Find all the log entries within a given date range. You may not read more than 32 MB.

A candidate replied with an answer that had the following code:

    string line = string.Empty;
    StreamReader file;
     
    try
    {
        file = new StreamReader(filename);
    }
    catch (FileNotFoundException ex)
    {
       Console.WriteLine("The file is not found.");
       Console.ReadLine();
       return;
   }
    
   while ((line = file.ReadLine()) != null)
   {
       var values = line.Split(',');
       DateTime date = Convert.ToDateTime(values[0]);
       if (date.Date >= startDate && date.Date <= endDate)
           output.Add(line);
    
       // Results size in MB
       double size = (GetObjectSize(output) / 1024f) / 1024f;
       if (size >= 32)
       {
           Console.WriteLine("Results size exceeded 32MB, the search will stop.");
           break;
       }
   }

My reply was:

The data file is 15TB in size, if the data is beyond the first 32MB, it won't be found.

The candidate then fixed his code. It now includes:

  var lines = File.ReadLines(filename);

Yep, this is on a 15TB file.

Now I’m going to have to lie down for a bit, I am not feeling so good.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.

Topics:

Published at DZone with permission of Ayende Rahien , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}