Over a million developers have joined DZone.

Reading Excel Spreadsheets with Python and xlrd

We will be looking at how we can read an *.xls/*.xlsx file using a package called xlrd.

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Previously, we looked at how to create Microsoft Excel (i.e. *.xls) files using the xlwt package. Today we will be looking at how we can read an *.xls/*.xlsx file using a package called xlrd. The xlrd package can be run on Linux and Mac as well as Windows. This is great when you need to process an Excel file on a Linux server.

We will start out by reading the first Excel file we created in our previous article.

Let’s get started!

Reading an Excel Spreadsheet

In this section, we will look at a function that demonstrates different ways of reading an Excel file. Here’s the code example:

import xlrd

def open_file(path):
    Open and read an Excel file
    book = xlrd.open_workbook(path)

    # print number of sheets
    print book.nsheets

    # print sheet names
    print book.sheet_names()

    # get the first worksheet
    first_sheet = book.sheet_by_index(0)

    # read a row
    print first_sheet.row_values(0)

    # read a cell
    cell = first_sheet.cell(0,0)
    print cell
    print cell.value

    # read a row slice
    print first_sheet.row_slice(rowx=0,

if __name__ == "__main__":
    path = "test.xls"

Let’s break this down a bit. First we import xlrd and then in our function, we open the Excel workbook that was passed in. The next couple of lines show how to introspect the book. We find out how many worksheets there are in the workbook and we print out their names. Next we extract the first worksheet via the sheet_by_index method. We can read an entire row from the worksheet using the row_values method. If we want to get a particular cell’s value, we can call the cell method and pass it the row and column indexes. Finally we use xlrd’s row_slice method to read a portion of the row. As you can see, this last method accepts a row index and the starting and ending column indexes to determine what to return. The row_slice method returns a list of cell instances.

This makes it very easy to iterate over a group of cells. Here’s a small snippet to demonstrate:

cells = first_sheet.row_slice(rowx=0,
for cell in cells:
    print cell.value

The xlrd package supports the following types of cells: text, number (i.e. float), dates (any number format that “looks” like a date), Boolean, error and empty/blank. The package also supports extracting data from named cells, although the project doesn’t support all types of named cells. The reference text is a bit vague on what exactly it does not support though.

If you need to copy cell formatting you will need to download the xlutils package.

Wrapping Up

At this point you should know enough to read most Excel files that were built using Microsoft’s XLS format. There is another package that also supports reading xls/xlsx files called the openpyxl project. You might want to check it out as an alternative.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.

python,bigdata,microsoft excel,big data,xlrd

Published at DZone with permission of Mike Driscoll, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}