import numpy as np
import pandas as pd
import missingno as msno
04wk-14: msno
를 이용한 결측치 시각화
1. 강의영상
2. Import
3. 데이터
= pd.read_csv("https://raw.githubusercontent.com/guebin/MP2023/main/posts/msno.csv")
df df
A | B | C | D | E | |
---|---|---|---|---|---|
0 | 0.383420 | 1.385096 | NaN | -0.545132 | -0.732395 |
1 | 1.084175 | 0.080613 | -0.770527 | -0.272143 | -0.749881 |
2 | 1.142778 | 1.258419 | NaN | -0.072007 | -0.440757 |
3 | 0.307894 | 0.521400 | 0.446974 | 0.329530 | -1.457388 |
4 | 0.237787 | 0.132401 | -0.516630 | 0.177995 | 0.416182 |
... | ... | ... | ... | ... | ... |
995 | 0.041092 | -1.308165 | 1.085820 | 1.136210 | NaN |
996 | -1.286358 | 1.547987 | NaN | -0.174334 | -0.579486 |
997 | 0.710257 | 1.764058 | NaN | -0.353928 | NaN |
998 | -1.908729 | -0.804691 | NaN | NaN | -0.066739 |
999 | 0.650026 | 2.206549 | NaN | -0.919945 | NaN |
1000 rows × 5 columns
4. 결측치 숫자 파악
A. df.info()
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 668 non-null float64
1 B 656 non-null float64
2 C 608 non-null float64
3 D 668 non-null float64
4 E 660 non-null float64
dtypes: float64(5)
memory usage: 39.2 KB
B. msno.bar()
msno.bar(df)
5. 패턴파악 및 시각화
A. msno.matrix()
msno.matrix(df)
B. msno.heatmap()
msno.heatmap(df)
C. msno.dendrogram()
msno.dendrogram(df)