Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Solving a Pandas ValueError

DZone's Guide to

Solving a Pandas ValueError

Check out an unexpected behavior that I came across when trying to add a column to a DataFrame and what I did to solve it.

· Big Data Zone ·
Free Resource

How to Simplify Apache Kafka. Get eBook.

I’ve been playing around with Kaggle in my spare time over the last few weeks and came across an unexpected behavior when trying to add a column to a DataFrame.

First, let’s get Pandas into our program scope.

import pandas as pd

Now, we’ll create a DataFrame to play with for the duration of this post:

>>> df = pd.DataFrame({"a": [1,2,3,4,5], "b": [2,3,4,5,6]})
>>> df
   a  b
0  5  2
1  6  6
2  0  8
3  3  2
4  1  6

Let’s say we want to create a new column that returns True if either of the numbers is odd. If not, then it’ll return False .

We’d expect to see a column full of True values... so let’s get started.

>>> divmod(df["a"], 2)[1] > 0
0     True
1    False
2     True
3    False
4     True
Name: a, dtype: bool
 
>>> divmod(df["b"], 2)[1] > 0
0    False
1     True
2    False
3     True
4    False
Name: b, dtype: bool

So far, so good. Now, let’s combine those two calculations together and create a new column in our DataFrame:

>>> df["anyOdd"] = (divmod(df["a"], 2)[1] > 0) or (divmod(df["b"], 2)[1] > 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/markneedham/projects/kaggle/house-prices/a/lib/python3.6/site-packages/pandas/core/generic.py", line 953, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Hmmm, that was unexpected! Unfortunately, Python’s or and and statements don’t work very well against Panda’s Series, so instead, we need to use the bitwise and (&) and or (|).

Let’s update our example:

>>> df["anyOdd"] = (divmod(df["a"], 2)[1] > 0) | (divmod(df["b"], 2)[1] > 0)
>>> df
   a  b  anyOdd
0  1  2    True
1  2  3    True
2  3  4    True
3  4  5    True
4  5  6    True

Much better. And what about if we wanted to check if both values are odd?

>>> df["bothOdd"] = (divmod(df["a"], 2)[1] > 0) & (divmod(df["b"], 2)[1] > 0)
>>> df
   a  b  anyOdd  bothOdd
0  1  2    True    False
1  2  3    True    False
2  3  4    True    False
3  4  5    True    False
4  5  6    True    False

Works exactly as expected! Hooray!

Topics:
big data ,pandas ,python ,tutorial ,dataframe ,series

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}