Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# Solving a Pandas ValueError

### Check out an unexpected behavior that I came across when trying to add a column to a DataFrame and what I did to solve it.

· Big Data Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

How to Simplify Apache Kafka. Get eBook.

I’ve been playing around with Kaggle in my spare time over the last few weeks and came across an unexpected behavior when trying to add a column to a DataFrame.

First, let’s get Pandas into our program scope.

``import pandas as pd``

Now, we’ll create a DataFrame to play with for the duration of this post:

``````>>> df = pd.DataFrame({"a": [1,2,3,4,5], "b": [2,3,4,5,6]})
>>> df
a  b
0  5  2
1  6  6
2  0  8
3  3  2
4  1  6``````

Let’s say we want to create a new column that returns `True` if either of the numbers is odd. If not, then it’ll return `False` .

We’d expect to see a column full of `True` values... so let’s get started.

``````>>> divmod(df["a"], 2)[1] > 0
0     True
1    False
2     True
3    False
4     True
Name: a, dtype: bool

>>> divmod(df["b"], 2)[1] > 0
0    False
1     True
2    False
3     True
4    False
Name: b, dtype: bool``````

So far, so good. Now, let’s combine those two calculations together and create a new column in our DataFrame:

``````>>> df["anyOdd"] = (divmod(df["a"], 2)[1] > 0) or (divmod(df["b"], 2)[1] > 0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/markneedham/projects/kaggle/house-prices/a/lib/python3.6/site-packages/pandas/core/generic.py", line 953, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().``````

Hmmm, that was unexpected! Unfortunately, Python’s `or` and `and` statements don’t work very well against Panda’s Series, so instead, we need to use the bitwise and (`&`) and or (`|`).

Let’s update our example:

``````>>> df["anyOdd"] = (divmod(df["a"], 2)[1] > 0) | (divmod(df["b"], 2)[1] > 0)
>>> df
a  b  anyOdd
0  1  2    True
1  2  3    True
2  3  4    True
3  4  5    True
4  5  6    True``````

Much better. And what about if we wanted to check if both values are odd?

``````>>> df["bothOdd"] = (divmod(df["a"], 2)[1] > 0) & (divmod(df["b"], 2)[1] > 0)
>>> df
a  b  anyOdd  bothOdd
0  1  2    True    False
1  2  3    True    False
2  3  4    True    False
3  4  5    True    False
4  5  6    True    False``````

Works exactly as expected! Hooray!

Topics:
big data ,pandas ,python ,tutorial ,dataframe ,series

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.