Parsing Excel to a List of Objects

Home / Parsing Excel to a List of Objects

Business people love their Excel spreadsheets. Fact. As such, I work with Excel a lot to parse and manipulate data. However, from a .NET perspective, Excel in its raw form isn’t overly useful. Being able to take an Excel worksheet and build a List<T> can be extremely useful.


One third party library that I like a lot for working with Excel is called EPPlus. It can parse XLSX or XLS files seamlessly without dependencies on Microsoft Office COM objects. This makes it ideal for use as part of domain/utility projects or on machines without MS Office and such. A typical scenario for me is wanting to parse an uploaded Excel worksheet and directly insert the data into a database. In this scenario, it really comes down to using my ORM of preference (usually EF) and wanting to be able to pass that Excel data as a list of my domain objects.

EPPlus has the convention of referring to an Excel file (stream or whatever) as an “ExcelPackage.” The ExcelPackage contains a Workbook and each Workbook has a collection of Worksheets. Generally, I parse the first worksheet in the collection of worksheets. To facilitate easily being able to parse the worksheet, I created an extension method for the worksheet class of “ToList<T>.”

The actual parsing is relatively straight forward. We assume the first row has header column names, and then every other row will be parsed to our object (of type T) with each column corresponding to a specific member of our object. Using this information, we can make the connection between column names and their corresponding member names. In the case where we know the column names don’t match, we can pass in a “mapping” dictionary as well. There are a few nuances with the Excel data since EPPlus, and maybe Excel itself, provides all numbers as doubles. As you can imagine, parsing really just means using .NET’s TryParse methods based on knowing what the underlying members’ scalar type is. Only parsing DateTime requires any true conversion as you’ll see below.

In addition to passing in an optional mapping dictionary, I also check for any DisplayAttributes on each member. If there is a DisplayName provided, that is used to match against the column names.

I won’t bore you with all the details since they’re in the code snippet below. The code is a pretty straight forward iterative conversion algorithm. Next time, I’ll show how a similar technique can be used for generating an Excel template for any type T and even including Excel data validation. These two techniques go hand-in-hand to allow a user to get data into a system using a familiar tool like Excel.

public static class Extentions
    {
        public static List<T> ToList<T>(this ExcelWorksheet worksheet, Dictionary<string, string> map = null) where T : new()
        {
            //DateTime Conversion
            var convertDateTime = new Func<double, DateTime>(excelDate =>
            {
                if (excelDate < 1)
                    throw new ArgumentException("Excel dates cannot be smaller than 0.");

                var dateOfReference = new DateTime(1900, 1, 1);

                if (excelDate > 60d)
                    excelDate = excelDate - 2;
                else
                    excelDate = excelDate - 1;
                return dateOfReference.AddDays(excelDate);
            });

            var props = typeof(T).GetProperties()
                .Select(prop =>
                {
                    var displayAttribute = (DisplayAttribute)prop.GetCustomAttributes(typeof(DisplayAttribute), false).FirstOrDefault();
                    return new
                    {
                        Name = prop.Name,
                        DisplayName = displayAttribute == null ? prop.Name : displayAttribute.Name,
                        Order = displayAttribute == null || !displayAttribute.GetOrder().HasValue ? 999 : displayAttribute.Order,
                        PropertyInfo = prop,
                        PropertyType = prop.PropertyType,
                        HasDisplayName = displayAttribute != null
                    };
                })
            .Where(prop => !string.IsNullOrWhiteSpace(prop.DisplayName))
            .ToList();

            var retList = new List<T>();
            var columns = new List<ExcelMap>();

            var start = worksheet.Dimension.Start;
            var end = worksheet.Dimension.End;
            var startCol = start.Column;
            var startRow = start.Row;
            var endCol = end.Column;
            var endRow = end.Row;

            // Assume first row has column names
            for (int col = startCol; col <= endCol; col++)
            {
                var cellValue = (worksheet.Cells[startRow, col].Value ?? string.Empty).ToString().Trim();
                if (!string.IsNullOrWhiteSpace(cellValue))
                {
                    columns.Add(new ExcelMap()
                    {
                        Name = cellValue,
                        MappedTo = map == null || map.Count == 0 ?
                            cellValue :
                            map.ContainsKey(cellValue) ? map[cellValue] : string.Empty,
                        Index = col
                    });
                }
            }

            // Now iterate over all the rows
            for (int rowIndex = startRow + 1; rowIndex <= endRow; rowIndex++)
            {
                var item = new T();
                columns.ForEach(column =>
                {
                    var value = worksheet.Cells[rowIndex, column.Index].Value;
                    var valueStr = value == null ? string.Empty : value.ToString().Trim();
                    var prop = string.IsNullOrWhiteSpace(column.MappedTo) ?
                        null :
                        props.First(p => p.Name.Contains(column.MappedTo));

                    // Excel stores all numbers as doubles, but we're relying on the object's property types
                    if (prop != null)
                    {
                        var propertyType = prop.PropertyType;
                        object parsedValue = null;

                        if (propertyType == typeof(int?) || propertyType == typeof(int))
                        {
                            int val;
                            if (!int.TryParse(valueStr, out val))
                            {
                                val = default(int);
                            }

                            parsedValue = val;
                        }
                        else if (propertyType == typeof(short?) || propertyType == typeof(short))
                        {
                            short val;
                            if (!short.TryParse(valueStr, out val))
                                val = default(short);
                            parsedValue = val;
                        }
                        else if (propertyType == typeof(long?) || propertyType == typeof(long))
                        {
                            long val;
                            if (!long.TryParse(valueStr, out val))
                                val = default(long);
                            parsedValue = val;
                        }
                        else if (propertyType == typeof(decimal?) || propertyType == typeof(decimal))
                        {
                            decimal val;
                            if (!decimal.TryParse(valueStr, out val))
                                val = default(decimal);
                            parsedValue = val;
                        }
                        else if (propertyType == typeof(double?) || propertyType == typeof(double))
                        {
                            double val;
                            if (!double.TryParse(valueStr, out val))
                                val = default(double);
                            parsedValue = val;
                        }
                        else if (propertyType == typeof(DateTime?) || propertyType == typeof(DateTime))
                        {
                            parsedValue = convertDateTime((double)value);
                        }
                        else if (propertyType.IsEnum)
                        {
                            try
                            {
                                parsedValue = Enum.ToObject(propertyType, int.Parse(valueStr));
                            }
                            catch
                            {
                                parsedValue = Enum.ToObject(propertyType, 0);
                            }
                        }
                        else if (propertyType == typeof(string))
                        {
                            parsedValue = valueStr;
                        }
                        else
                        {
                            try
                            {
                                parsedValue = Convert.ChangeType(value, propertyType);
                            }
                            catch
                            {
                                parsedValue = valueStr;
                            }
                        }

                        try
                        {
                            prop.PropertyInfo.SetValue(item, parsedValue);
                        }
                        catch (Exception ex)
                        {
                            // Indicate parsing error on row?
                        }
                    }
                });

                retList.Add(item);
            }

            return retList;
        }
    }

5 thoughts on “Parsing Excel to a List of Objects”

    1. Sorry – It’s just a POCO class I use to track the index/name/value of the cells.

      public class ExcelMap
      {
      	public string Name { get; set; }
      	public string MappedTo { get; set; }
      	public int Index { get; set; }
      }
      

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.