04wk-14: msno를 이용한 결측치 시각화

Author

최규빈

Published

September 26, 2023

1. 강의영상

2. Import

import numpy as np
import pandas as pd 
import missingno as msno

3. 데이터

df = pd.read_csv("https://raw.githubusercontent.com/guebin/MP2023/main/posts/msno.csv")
df
A B C D E
0 0.383420 1.385096 NaN -0.545132 -0.732395
1 1.084175 0.080613 -0.770527 -0.272143 -0.749881
2 1.142778 1.258419 NaN -0.072007 -0.440757
3 0.307894 0.521400 0.446974 0.329530 -1.457388
4 0.237787 0.132401 -0.516630 0.177995 0.416182
... ... ... ... ... ...
995 0.041092 -1.308165 1.085820 1.136210 NaN
996 -1.286358 1.547987 NaN -0.174334 -0.579486
997 0.710257 1.764058 NaN -0.353928 NaN
998 -1.908729 -0.804691 NaN NaN -0.066739
999 0.650026 2.206549 NaN -0.919945 NaN

1000 rows × 5 columns

4. 결측치 숫자 파악

A. df.info()

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       668 non-null    float64
 1   B       656 non-null    float64
 2   C       608 non-null    float64
 3   D       668 non-null    float64
 4   E       660 non-null    float64
dtypes: float64(5)
memory usage: 39.2 KB

B. msno.bar()

msno.bar(df)

5. 패턴파악 및 시각화

A. msno.matrix()

msno.matrix(df)

B. msno.heatmap()

msno.heatmap(df)

C. msno.dendrogram()

msno.dendrogram(df)