Solving a Pandas ValueError
Check out an unexpected behavior that I came across when trying to add a column to a DataFrame and what I did to solve it.
Join the DZone community and get the full member experience.
Join For FreeI’ve been playing around with Kaggle in my spare time over the last few weeks and came across an unexpected behavior when trying to add a column to a DataFrame.
First, let’s get Pandas into our program scope.
import pandas as pd
Now, we’ll create a DataFrame to play with for the duration of this post:
>>> df = pd.DataFrame({"a": [1,2,3,4,5], "b": [2,3,4,5,6]})
>>> df
a b
0 5 2
1 6 6
2 0 8
3 3 2
4 1 6
Let’s say we want to create a new column that returns True
if either of the numbers is odd. If not, then it’ll return False
.
We’d expect to see a column full of True
values... so let’s get started.
>>> divmod(df["a"], 2)[1] > 0
0 True
1 False
2 True
3 False
4 True
Name: a, dtype: bool
>>> divmod(df["b"], 2)[1] > 0
0 False
1 True
2 False
3 True
4 False
Name: b, dtype: bool
So far, so good. Now, let’s combine those two calculations together and create a new column in our DataFrame:
>>> df["anyOdd"] = (divmod(df["a"], 2)[1] > 0) or (divmod(df["b"], 2)[1] > 0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/markneedham/projects/kaggle/house-prices/a/lib/python3.6/site-packages/pandas/core/generic.py", line 953, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Hmmm, that was unexpected! Unfortunately, Python’s or
and and
statements don’t work very well against Panda’s Series, so instead, we need to use the bitwise and (&
) and or (|
).
Let’s update our example:
>>> df["anyOdd"] = (divmod(df["a"], 2)[1] > 0) | (divmod(df["b"], 2)[1] > 0)
>>> df
a b anyOdd
0 1 2 True
1 2 3 True
2 3 4 True
3 4 5 True
4 5 6 True
Much better. And what about if we wanted to check if both values are odd?
>>> df["bothOdd"] = (divmod(df["a"], 2)[1] > 0) & (divmod(df["b"], 2)[1] > 0)
>>> df
a b anyOdd bothOdd
0 1 2 True False
1 2 3 True False
2 3 4 True False
3 4 5 True False
4 5 6 True False
Works exactly as expected! Hooray!
Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments