Pandas: Find Rows Where Column/Field Is Null
I did some experimenting with a dataset I've been playing around with to find any columns/fields that have null values in them. Learn how I did it!
Join the DZone community and get the full member experience.Join For Free
In my continued playing around with the Kaggle house prices dataset, I wanted to find any columns/fields that have null values in them.
If we want to get a count of the number of null fields by column we can use the following code, adapted from Poonam Ligade’s kernel.
import pandas as pd
Count the Null Columns
train = pd.read_csv("train.csv") null_columns=train.columns[train.isnull().any()] train[null_columns].isnull().sum()
LotFrontage 259 Alley 1369 MasVnrType 8 MasVnrArea 8 BsmtQual 37 BsmtCond 37 BsmtExposure 38 BsmtFinType1 37 BsmtFinType2 38 Electrical 1 FireplaceQu 690 GarageType 81 GarageYrBlt 81 GarageFinish 81 GarageQual 81 GarageCond 81 PoolQC 1453 Fence 1179 MiscFeature 1406 dtype: int64
So there are lots of different columns containing null values. What if we want to find the solitary row which has "Electrical" as null?
Single Column Is Null
LotFrontage Alley MasVnrType MasVnrArea BsmtQual BsmtCond BsmtExposure \ 1379 73.0 NaN None 0.0 Gd TA No BsmtFinType1 BsmtFinType2 Electrical FireplaceQu GarageType GarageYrBlt \ 1379 Unf Unf NaN NaN BuiltIn 2007.0 GarageFinish GarageQual GarageCond PoolQC Fence MiscFeature 1379 Fin TA TA NaN NaN NaN
And what if we want to return every row that contains at least one null value? That’s not too difficult – it’s just a combination of the code in the previous two sections.
All Null Columns
LotFrontage Alley MasVnrType MasVnrArea BsmtQual BsmtCond BsmtExposure \ 0 65.0 NaN BrkFace 196.0 Gd TA No 1 80.0 NaN None 0.0 Gd TA Gd 2 68.0 NaN BrkFace 162.0 Gd TA Mn 3 60.0 NaN None 0.0 TA Gd No 4 84.0 NaN BrkFace 350.0 Gd TA Av BsmtFinType1 BsmtFinType2 Electrical FireplaceQu GarageType GarageYrBlt \ 0 GLQ Unf SBrkr NaN Attchd 2003.0 1 ALQ Unf SBrkr TA Attchd 1976.0 2 GLQ Unf SBrkr TA Attchd 2001.0 3 ALQ Unf SBrkr Gd Detchd 1998.0 4 GLQ Unf SBrkr TA Attchd 2000.0 GarageFinish GarageQual GarageCond PoolQC Fence MiscFeature 0 RFn TA TA NaN NaN NaN 1 RFn TA TA NaN NaN NaN 2 RFn TA TA NaN NaN NaN 3 Unf TA TA NaN NaN NaN 4 RFn TA TA NaN NaN NaN
And that's it!
If you liked this post, here are some more great posts by Mark Needham on Pandas:
Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Why I Prefer Trunk-Based Development
CPU vs. GPU Intensive Applications
What Is Plagiarism? How to Avoid It and Cite Sources
How to Implement Istio in Multicloud and Multicluster