Lesson 14: 빅데이터와 시각화 – 크롤링 + 시각화
yfinance를 이용한 주식자료 시각화
A. 크롤링 + 데이터정리
- yahoo finance: https://finance.yahoo.com/
Apple: 'AAPL'
삼성전자: '005930.KS'
- 크롤링을 위한 코드
symbols = ['AMZN','AAPL','GOOG','MSFT','NFLX','NVDA','TSLA']
start = '2020-01-01'
end = '2024-01-10'
df = yf.download(symbols,start,end)
df[ 0%% ][************** 29%% ] 2 of 7 completed[************** 29%% ] 2 of 7 completed[**********************57%%* ] 4 of 7 completed[**********************71%%******** ] 5 of 7 completed[**********************86%%*************** ] 6 of 7 completed[*********************100%%**********************] 7 of 7 completed
| Price | Adj Close | Close | ... | Open | Volume | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ticker | AAPL | AMZN | GOOG | MSFT | NFLX | NVDA | TSLA | AAPL | AMZN | GOOG | ... | NFLX | NVDA | TSLA | AAPL | AMZN | GOOG | MSFT | NFLX | NVDA | TSLA |
| Date | |||||||||||||||||||||
| 2020-01-02 | 73.152657 | 94.900497 | 68.368500 | 154.779526 | 329.809998 | 59.744045 | 28.684000 | 75.087502 | 94.900497 | 68.368500 | ... | 326.100006 | 59.687500 | 28.299999 | 135480400 | 80580000 | 28132000 | 22622100 | 4485800 | 23753600 | 142981500 |
| 2020-01-03 | 72.441460 | 93.748497 | 68.032997 | 152.852249 | 325.899994 | 58.787777 | 29.534000 | 74.357498 | 93.748497 | 68.032997 | ... | 326.779999 | 58.775002 | 29.366667 | 146322800 | 75288000 | 23728000 | 21116200 | 3806900 | 20538400 | 266677500 |
| 2020-01-06 | 73.018677 | 95.143997 | 69.710503 | 153.247330 | 335.829987 | 59.034313 | 30.102667 | 74.949997 | 95.143997 | 69.710503 | ... | 323.119995 | 58.080002 | 29.364668 | 118387200 | 81236000 | 34646000 | 20813700 | 5663100 | 26263600 | 151995000 |
| 2020-01-07 | 72.675285 | 95.343002 | 69.667000 | 151.850098 | 330.750000 | 59.749020 | 31.270666 | 74.597504 | 95.343002 | 69.667000 | ... | 336.470001 | 59.549999 | 30.760000 | 108872000 | 80898000 | 30054000 | 21634100 | 4703200 | 31485600 | 268231500 |
| 2020-01-08 | 73.844345 | 94.598503 | 70.216003 | 154.268829 | 339.260010 | 59.861088 | 32.809334 | 75.797501 | 94.598503 | 70.216003 | ... | 331.489990 | 59.939999 | 31.580000 | 132079200 | 70160000 | 30560000 | 27746500 | 7104500 | 27710800 | 467164500 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2024-01-03 | 184.250000 | 148.470001 | 140.360001 | 370.600006 | 470.260010 | 475.690002 | 238.449997 | 184.250000 | 148.470001 | 140.360001 | ... | 467.320007 | 474.850006 | 244.979996 | 58414500 | 49425500 | 18974300 | 23083500 | 3443700 | 32089600 | 121082600 |
| 2024-01-04 | 181.910004 | 144.570007 | 138.039993 | 367.940002 | 474.670013 | 479.980011 | 237.929993 | 181.910004 | 144.570007 | 138.039993 | ... | 472.980011 | 477.670013 | 239.250000 | 71983600 | 56039800 | 18253300 | 20901500 | 3636500 | 30653500 | 102629300 |
| 2024-01-05 | 181.179993 | 145.240005 | 137.389999 | 367.750000 | 474.059998 | 490.970001 | 237.490005 | 181.179993 | 145.240005 | 137.389999 | ... | 476.500000 | 484.619995 | 236.860001 | 62303300 | 45124800 | 15433200 | 20987000 | 2612500 | 41456800 | 92379400 |
| 2024-01-08 | 185.559998 | 149.100006 | 140.529999 | 374.690002 | 485.029999 | 522.530029 | 240.449997 | 185.559998 | 149.100006 | 140.529999 | ... | 473.890015 | 495.119995 | 236.139999 | 59144500 | 46757100 | 17645300 | 23134000 | 3675800 | 64251000 | 85166600 |
| 2024-01-09 | 185.139999 | 151.369995 | 142.559998 | 375.790009 | 482.089996 | 531.400024 | 234.960007 | 185.139999 | 151.369995 | 142.559998 | ... | 475.529999 | 524.010010 | 238.110001 | 42841800 | 43812600 | 19579700 | 20830000 | 3526800 | 77310000 | 96705700 |
1012 rows × 42 columns
B. 시각화
출산율 시각화
A. 크롤링 + 데이터정리
- 대한민국의 저출산문제
ref: https://ko.wikipedia.org/wiki/대한민국의_저출산
- 위의 url에서 5번째 테이블을 읽고싶다.
- 5번째 테이블: 시도별 출생아 수
df_lst = pd.read_html('https://ko.wikipedia.org/wiki/%EB%8C%80%ED%95%9C%EB%AF%BC%EA%B5%AD%EC%9D%98_%EC%A0%80%EC%B6%9C%EC%82%B0')
df = df_lst[4]
df| 지역/연도[6] | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 서울 | 93266 | 91526 | 93914.000 | 84066.000 | 83711.000 | 83005 | 75.536 | 65389 | 58074 | 53.673 | 47400 | 45531 |
| 1 | 부산 | 27415 | 27759 | 28673.000 | 25831.000 | 26190.000 | 26645 | 24906.000 | 21480 | 19152 | 17049.000 | 15100 | 14446 |
| 2 | 대구 | 20557 | 20758 | 21472.000 | 19340.000 | 19361.000 | 19438 | 18298.000 | 15946 | 14400 | 13233.000 | 11200 | 10661 |
| 3 | 인천 | 25752 | 20758 | 21472.000 | 25560.000 | 25786.000 | 25491 | 23609.000 | 20445 | 20087 | 18522.000 | 16000 | 14947 |
| 4 | 광주 | 13979 | 13916 | 14392.000 | 12729.000 | 12729.000 | 12441 | 11580.000 | 10120 | 9105 | 8364.000 | 7300 | 7956 |
| 5 | 대전 | 14314 | 14808 | 15279.000 | 14099.000 | 13962.000 | 13774 | 12436.000 | 10851 | 9337 | 8410.000 | 7500 | 7414 |
| 6 | 울산 | 11432 | 11542 | 12160.000 | 11330.000 | 11556.000 | 11732 | 10910.000 | 9381 | 8149 | 7539.000 | 6600 | 6127 |
| 7 | 세종 | - | - | 1054.000 | 1111.000 | 1344.000 | 2708 | 3297.000 | 3504 | 3703 | 3819.000 | 3500 | 3570 |
| 8 | 경기 | 121753 | 122027 | 124746.000 | 112129.000 | 112.169 | 113495 | 105643.000 | 94088 | 83198 | 83.198 | 77800 | 76139 |
| 9 | 강원 | 12477 | 12408 | 12426.000 | 10980.000 | 10662.000 | 10929 | 10058.000 | 9958 | 8351 | 8283.000 | 7800 | 7357 |
| 10 | 충북 | 14670 | 14804 | 15139.000 | 13658.000 | 13366.000 | 13563 | 12742.000 | 11394 | 10586 | 9333.000 | 8600 | 8190 |
| 11 | 충남 | 20.242 | 20.398 | 20.448 | 18.628 | 18200.000 | 18604 | 17302.000 | 15670 | 14380 | 13228.000 | 11900 | 10984 |
| 12 | 전북 | 16100 | 16175 | 16238.000 | 14555.000 | 14231.000 | 14087 | 12698.000 | 11348 | 10001 | 8971.000 | 8200 | 7745 |
| 13 | 전남 | 16654 | 16612 | 16990.000 | 15401.000 | 14817.000 | 15061 | 13980.000 | 12354 | 11238 | 10832.000 | 9700 | 8430 |
| 14 | 경북 | 23700 | 24250 | 24635.000 | 22206.000 | 22062.000 | 22310 | 20616.000 | 17957 | 16079 | 14472.000 | 12900 | 12045 |
| 15 | 경남 | 32203 | 32536 | 33211.000 | 29504.000 | 29763.000 | 29537 | 27138.000 | 23849 | 21224 | 19250.000 | 16800 | 15562 |
| 16 | 제주 | 5657 | 5628 | 5992.000 | 5328.000 | 5526.000 | 5600 | 5494.000 | 5037 | 4781 | 4500.000 | 4000 | 3728 |
| 17 | 전국 | 470171 | 471265 | 484550.000 | 436455.000 | 435435.000 | 438420 | 406243.000 | 357771 | 326822 | 302676.000 | 272400 | 260562 |
B. 시각화1: 전국 출생아수 시각화
C. 시각화2: 시도별 출생아수 시각화 (line)
D. 시각화3: 시도별 출생아수 시각화 (area)
- 시각화1,시각화2의 정보가 적절히 혼합되어있는 시각화는 없을까?
E. 시각화1,2,3 수정
- 시각화1의 수정
df.set_index('지역/연도[6]')\
.applymap(lambda x: 0 if x == '-' else float(x))\
.applymap(lambda x: x*1000 if x<1000 else x)\
.iloc[:-1,:]\
.sum(axis=0)\
.plot.line(backend='plotly')/tmp/ipykernel_2675596/1644146122.py:2: FutureWarning:
DataFrame.applymap has been deprecated. Use DataFrame.map instead.
/tmp/ipykernel_2675596/1644146122.py:3: FutureWarning:
DataFrame.applymap has been deprecated. Use DataFrame.map instead.
df.set_index('지역/연도[6]')\
.applymap(lambda x: 0 if x == '-' else float(x))\
.applymap(lambda x: x*1000 if x<1000 else x).T\
.loc[:,'서울':'제주']\
.plot.line(backend='plotly')/tmp/ipykernel_2675596/2864061887.py:2: FutureWarning:
DataFrame.applymap has been deprecated. Use DataFrame.map instead.
/tmp/ipykernel_2675596/2864061887.py:3: FutureWarning:
DataFrame.applymap has been deprecated. Use DataFrame.map instead.
df.set_index('지역/연도[6]')\
.applymap(lambda x: 0 if x == '-' else float(x))\
.applymap(lambda x: x*1000 if x<1000 else x).T\
.loc[:,'서울':'제주']\
.plot.area(backend='plotly')/tmp/ipykernel_2675596/869707808.py:2: FutureWarning:
DataFrame.applymap has been deprecated. Use DataFrame.map instead.
/tmp/ipykernel_2675596/869707808.py:3: FutureWarning:
DataFrame.applymap has been deprecated. Use DataFrame.map instead.