【Python】欠損値NaNが含まれる行を抽出する方法

使うデータ
判定方法
NaNを含む列（行）を特定する
NaNを含む列（行）を抽出する

使うデータ

df = pd.DataFrame(
    data={'col_1': [10, 20, 30, np.nan],
          'col_2': [50, 60, 20, 80],
          'col_3': ['a', np.nan, 'c', 'd']}
)
print(df)

   col_1  col_2 col_3
0   10.0     50     a
1   20.0     60   NaN
2   30.0     20     c
3    NaN     80     d

判定方法

isnull()でNaNの判定ができる。

df.isnull()

   col_1  col_2  col_3
0  False  False  False
1  False  False   True
2  False  False  False
3   True  False  False

NaNを含む列（行）を特定する

any()を使う。

df.isnull().any()

col_1     True
col_2    False
col_3     True
dtype: bool

行方向に適用したいときはaxis=1。こっちの方が出番多いかも。

df.isnull().any(axis=1)

0    False
1     True
2    False
3     True
dtype: bool

NaNを含む列（行）を抽出する

df.loc[:, df.isnull().any()]

   col_1 col_3
0   10.0     a
1   20.0   NaN
2   30.0     c
3    NaN     d

行方向に適用したいとき。

df.loc[df.isnull().any(axis=1), :]

   col_1  col_2 col_3
1   20.0     60   NaN
3    NaN     80     d

ちゃんとやるなら。

現場で使える！pandasデータ前処理入門機械学習・データサイエンスで役立つ前処理手法

posted with ヨメレバ

株式会社ロンバート翔泳社 2020年04月20日頃