DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Python Packages for Validating Database Migration Projects
  • Modern Data Processing Libraries: Beyond Pandas
  • dovpanda: Unlock Pandas Efficiency With Automated Insights
  • Comprehensive Guide to Data Analysis and Visualization With Pandas and Matplotlib

Trending

  • Unmasking Entity-Based Data Masking: Best Practices 2025
  • Docker Base Images Demystified: A Practical Guide
  • Unlocking the Potential of Apache Iceberg: A Comprehensive Analysis
  • Beyond Code Coverage: A Risk-Driven Revolution in Software Testing With Machine Learning

Dates in Pandas Cheatsheet

Check out a large collection of not-so-sloppy snippets for doing scientific computing and data visualization in Python with pandas.

By 
Giuseppe Vettigli user avatar
Giuseppe Vettigli
·
Jul. 14, 17 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
12.1K Views

Join the DZone community and get the full member experience.

Join For Free

Lately, I've been working a lot with dates in Pandas, so I decided to make this little cheatsheet with the commands I use the most.

Importing a CSV using a custom function to parse dates:

import pandas as pd

def parse_month(month):
    """
    Converts a string from the format M in datetime format.
    Example: parse_month("2007M02") returns datetime(2007, 2, 1)
    """
    return pd.datetime(int(month[:4]), int(month[-2:]), 1)

temperature = pd.read_csv('TempUSA.csv', parse_dates=['Date'], 
                          date_parser=parse_month, 
                          index_col=['Date'], # will become an index
                          # use a subset of the columns
                          usecols=['Date', 
                                   'LosAngelesMax', 'LosAngelesMin'])
print temperature
            LosAngelesMax  LosAngelesMin
Date                                    
2000-01-01           19.6           10.0
2000-02-01           18.9           10.1
2000-03-01           18.6           10.1
2000-04-01           20.2           12.5
2000-05-01           21.9           14.2

Format the dates in a chart:

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
plt.plot(temperature['LosAngelesMax'])
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
plt.show()

Here's the reference of the date format directives. ISO compliant format: %Y-%m-%dT%H:%M:%S.

Group the DataFrame by month:

print temperature.groupby([temperature.index.month]).mean() 
      LosAngelesMax  LosAngelesMin
Date                              
1         20.092308       8.992308
2         19.223077       9.276923
3         19.253846      10.492308
4         19.992308      11.461538
5         21.076923      13.761538
6         22.123077      15.800000
7         23.892308      17.315385
8         24.246154      17.530769
9         24.384615      16.846154
10        23.330769      14.630769
11        21.950000      11.241667
12        19.241667       8.683333

The resulting DataFrame is indexed by month.

Merging two DataFrames indexed with timestamps that don't match exactly:

date_range_a = pd.date_range('2007-01-01 01:00', 
                            '2007-01-01 3:00', freq='1h')
date_range_b = date_range_a + pd.Timedelta(10, 'm')
df_a = pd.DataFrame(np.arange(len(date_range_a)), 
                    columns=['a'], index=date_range_a)
df_b = pd.DataFrame(['x', 'y', 'z'], 
                    columns=['b'], index=date_range_b)

print 'left DataFrame'
print df_a
print '\nright DataFrame'
print df_b
print '\nmerge_AsOf result'
print pd.merge_asof(df_a, df_b, direction='nearest', 
                    left_index=True, right_index=True)
left DataFrame
                     a
2007-01-01 01:00:00  0
2007-01-01 02:00:00  1
2007-01-01 03:00:00  2

right DataFrame
                     b
2007-01-01 01:10:00  x
2007-01-01 02:10:00  y
2007-01-01 03:10:00  z

merge_AsOf result
                     a  b
2007-01-01 01:00:00  0  x
2007-01-01 02:00:00  1  y
2007-01-01 03:00:00  2  z

The DataFrames have been aligned according to the index on the left.

Aligning two DataFrames:

aligned = df_a.align(df_b)

print 'left aligned'
print aligned[0]
print '\nright aligned'
print aligned[1]
print '\ncombination'
aligned[0]['b'] = aligned[1]['b']
print aligned[0]
left aligned
                       a   b
2007-01-01 01:00:00  0.0 NaN
2007-01-01 01:10:00  NaN NaN
2007-01-01 02:00:00  1.0 NaN
2007-01-01 02:10:00  NaN NaN
2007-01-01 03:00:00  2.0 NaN
2007-01-01 03:10:00  NaN NaN

right aligned
                      a    b
2007-01-01 01:00:00 NaN  NaN
2007-01-01 01:10:00 NaN    x
2007-01-01 02:00:00 NaN  NaN
2007-01-01 02:10:00 NaN    y
2007-01-01 03:00:00 NaN  NaN
2007-01-01 03:10:00 NaN    z

combination
                       a    b
2007-01-01 01:00:00  0.0  NaN
2007-01-01 01:10:00  NaN    x
2007-01-01 02:00:00  1.0  NaN
2007-01-01 02:10:00  NaN    y
2007-01-01 03:00:00  2.0  NaN
2007-01-01 03:10:00  NaN    z

The timestamps are now aligned according to both the DataFrames and unknown values have been filled with NaNs. The missing value can be filled with interpolation when working with numeric values:

print aligned[0].a.interpolate() 
2007-01-01 01:00:00    0.0
2007-01-01 01:10:00    0.5
2007-01-01 02:00:00    1.0
2007-01-01 02:10:00    1.5
2007-01-01 03:00:00    2.0
2007-01-01 03:10:00    2.0
Name: a, dtype: float64

The categorical values can be filled using the fillna method:

print aligned[1].b.fillna(method='bfill') 
2007-01-01 01:00:00    x
2007-01-01 01:10:00    x
2007-01-01 02:00:00    y
2007-01-01 02:10:00    y
2007-01-01 03:00:00    z
2007-01-01 03:10:00    z
Name: b, dtype: object

The method bfill propagates the next valid observation, while ffil the last valid observation.

Convert a Timedelta in hours:

td = pd.Timestamp('2017-07-05 16:00') - pd.Timestamp('2017-07-05 12:00')
print td / pd.Timedelta(1, unit='h')
4.0 

To convert in days, months, minutes, and so on, one just needs to change the unit. Here are the values accepted: D, h, m, s, ms, us, ns.

Convert pandas timestamps in Unix timestamps:

unix_ts = pd.date_range('2017-01-01 1:00', 
                        '2017-01-01 2:00', 
                        freq='30min').astype(np.int64) // 10**9
print unix_ts
Int64Index([1483232400, 1483234200, 1483236000], dtype='int64') 

To convert in milliseconds, divide by 10**6 instead of 10**9.

Convert Unix timestamps in pandas timestamps:

print pd.to_datetime(unix_ts, unit='s') 
DatetimeIndex(['2017-01-01 01:00:00', '2017-01-01 01:30:00',
               '2017-01-01 02:00:00'],
              dtype='datetime64[ns]', freq=None)

To convert from timestamps in milliseconds, change the unit to ms.

Pandas

Published at DZone with permission of Giuseppe Vettigli, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Python Packages for Validating Database Migration Projects
  • Modern Data Processing Libraries: Beyond Pandas
  • dovpanda: Unlock Pandas Efficiency With Automated Insights
  • Comprehensive Guide to Data Analysis and Visualization With Pandas and Matplotlib

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!