Importing Excel Datetimes Into Pandas, Part II

Importing Excel Datetimes Into Pandas, Part II

Import dates & times from Excel .xlsx files into Pandas!

Matthew Alhonte
Matthew Alhonte

What if, like during my data import task a few months back, the dates & times are in separate columns?  This gives us a few new issues.  Let's import that Excel file!

import pandas as pd
import xlrd
import datetime

df = pd.read_excel("hasDatesAndTimes.xlsx", sheet_name="Sheet1")

book = xlrd.open_workbook("hasDatesAndTimes.xlsx")
datemode = book.datemode

And let's see that time variable!

df["Time"]
Index Time
0 0.909907
1 0.909919
2 0.909931
3 0.909942
4 0.909954
df["Time"].map(lambda x: xlrd.xldate_as_tuple(x, datemode))

So far, so good....

df["Time"].map(lambda x: datetime.datetime(*xlrd.xldate_as_tuple(x, 
                                              datemode)))
ValueError: year 0 is out of range

Agh!  Plain datetime won't let us have 0 as our year.

We'll want two packages to fix this.  One is an awesome package for handling dates called arrow.  In order for arrow to recognize what we want it to, though, we'll need some more manipulations - I'll be using the pipe function from toolz in order to make that more readable.

Index Time
0 (0, 0, 0, 21, 50, 16)
1 (0, 0, 0, 21, 50, 17)
2 (0, 0, 0, 21, 50, 18)
3 (0, 0, 0, 21, 50, 19)
4 (0, 0, 0, 21, 50, 20)

Pipe lets us have a nice workflow where we just list the transformations we want, and our value will be "piped" sequentially through each one.

fns = [lambda x: xlrd.xldate_as_tuple(x, datemode),
     lambda x: x[3:6],
      lambda x: map(str, x),
      lambda x: "-".join(x),
       lambda x: arrow.get(x, "H-m-s"),
       lambda x: x.format('HH:mm:ss')
      ]

Let's see a blow-by-blow of how one of our values gets transformed by that.

fnRanges = [fns[:i+1] for i in range(len(fns))]
[pipe(0.909907, *x) for x in fnRanges]

[(0, 0, 0, 21, 50, 16),
 (21, 50, 16),
 <map at 0x7f105151af98>,
 '21-50-16',
 <Arrow [0001-01-01T21:50:16+00:00]>,
 '21:50:16']

The first function takes us from an Excel datetime to a datetime tuple.

The next extracts just the time variables.

We then map that all to a string (which shows up as nothing because map is evaluated lazily).

Then we put a dash between all those elements so it'll be easier to parse as a time.

Then arrow consumes the value, with the format we specified.

And finally gives us a neatly-formatted time, ready to be consumed by a database!

Helper Functions

def mapPipe(ser, fns):
    return ser.map(lambda a: pipe(a, *fns),
        na_action="ignore" )

mapPipe(df['Time'],
   fns)
Index Time
0 21:50:16
1 21:50:17
2 21:50:18
3 21:50:19
4 21:50:20

Dates are a bit easier - though the pipe syntax is still helpful!

dateFns = [lambda x: xlrd.xldate_as_tuple(x, datemode),
      lambda x: arrow.get(*x),
      lambda x: x.format('YYYY-MM-DD')
      ]
mapPipe(df['Date'],
       dateFns)
Index Date
0 2018-08-12
1 2018-08-12
2 2018-08-12
3 2018-08-12
4 2018-08-12

Put it all together....

(df.assign(Date = mapPipe(df['Date'],
       dateFns))
   .assign(Time = mapPipe(df['Time'],
       fns)))
Index Date Time
0 2018-08-12 21:50:16
1 2018-08-12 21:50:17
2 2018-08-12 21:50:18
3 2018-08-12 21:50:19
4 2018-08-12 21:50:20
Code Snippet CornerPandasExcelPython

Matthew Alhonte

Supervillain in somebody's action hero movie. Experienced a radioactive freak accident at a young age which rendered him part-snake and strangely adept at Python.