Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Writing Large Datasets to Excel Files Using EPPlus in C#

DZone's Guide to

Writing Large Datasets to Excel Files Using EPPlus in C#

Recently, I had to resolve an issue of a corrupt report file of an enterprise application in production. People suggested I use EPPlus, and I was NOT disappointed.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Prior to the availability of Open XML SDK and other Excel libraries in .NET, manipulating Microsoft Excel files from ASP.NET web applications was a daunting task. Open XML SDK makes it easy to manipulate documents that conform to the Office Open XML file format specifications — with a trade-off of writing plenty of lines of code. Similar to OpenXML, ClosedXML is another elegant library that further simplifies the process of reading and writing to Excel files and hides the complexities involved, without needing to deal with XML documents.

Recently, I had to resolve an issue of a corrupt report file of an enterprise application in production. Reviewing the logs indicated that the reporting module threw a System.OutOfMemory exception that originated from ClosedXML. The Excel file being created had five sheets with around 400,000 rows each and ClosedXML could not handle this large amount of data.

EPPlus

Apparently, a lot of folks have had the same issue using the library, and many suggested different workarounds or alternative libraries. Catching my attention from the various suggestions is the EPPlus library, which has been around for a while but never really had much publicity like OpenXML and ClosedXML. The library is available for use via Nugget.

To test the performance and robustness of this library, I decided to throw a large dataset at it. I tried to make it create five sheets in a single workbook and save 1M rows in each sheet. It took 61 seconds on a Dell Latitude E5470 (Intel Core i5, 8GB RAM, 250GB SSD) for the library to process and save the records. It handled the large data graciously and also with fewer lines of codes compared to ClosedXML. See the source code below.

using OfficeOpenXml;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Epplus_Poc
{
    class Program
    {
        static void Main(string[] args)
        {
            List codeDetails = PopulateCodeDetails();
            FileInfo fileInfo = new FileInfo(@"C:\Temp\file.xlsx");
            using (ExcelPackage excelPackage = new ExcelPackage(fileInfo))
            {
                var workSheet = GetWorkSheet(excelPackage, 0);
                var workSheet1 = GetWorkSheet(excelPackage,1);
                var workSheet2= GetWorkSheet(excelPackage,2);
                var workSheet3= GetWorkSheet(excelPackage,3);
                var workSheet4 = GetWorkSheet(excelPackage,4);                
                workSheet.Cells["B2"].LoadFromCollection(codeDetails, false, OfficeOpenXml.Table.TableStyles.Medium1);
                workSheet1.Cells["B2"].LoadFromCollection(codeDetails, false, OfficeOpenXml.Table.TableStyles.Medium1);
                workSheet2.Cells["B2"].LoadFromCollection(codeDetails, false, OfficeOpenXml.Table.TableStyles.Medium1);
                workSheet3.Cells["B2"].LoadFromCollection(codeDetails, false, OfficeOpenXml.Table.TableStyles.Medium1);
                workSheet4.Cells["B2"].LoadFromCollection(codeDetails, false, OfficeOpenXml.Table.TableStyles.Medium1);
                excelPackage.Save();
            }
        }
        static ExcelWorksheet GetWorkSheet(ExcelPackage excelPackage, int count)
        {
            var workSheet = excelPackage.Workbook.Worksheets.Add("Content - "+count);
            workSheet.View.ShowGridLines = false;
            workSheet.Cells["B1"].Value = "Code";
            workSheet.Cells["C1"].Value = "Time";
            workSheet.Cells["D1"].Value = "Date";
            workSheet.Cells["B1:D1"].Style.Font.Bold = true;
            return workSheet;
        }
        public static List PopulateCodeDetails()
        {
            List codeDetails = new List();
            Random random = new Random();
            for(int i=1; i<=1000000; i++)
            {
                CodeDetail codeDetail = new CodeDetail();
                codeDetail.Code = random.Next(12324343).ToString();
                codeDetail.Time = DateTime.Now.ToShortTimeString();
                codeDetail.Date = DateTime.Now.ToShortDateString();
                codeDetails.Add(codeDetail);
            }
            return codeDetails;
        }
    }
    public class CodeDetail
    {
        public string Code { get; set; }
        public string Time { get; set; }
        public string Date { get; set; }
    }
}       

The next time you need to process a large dataset and save it to an Excel file, you ought to try EPPlus library.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,excel files ,c# ,epplus ,large datasets ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}