08wk-1: 에너지사용량 시각화

Author

최규빈

Published

October 23, 2023

1. 강의영상

2. Imports

import pandas as pd 
import numpy as np
from plotnine import *

3. 예비학습

A. `pd.concat()`

# 예시1 – 위아래로

df1 = pd.DataFrame({'A':[1,2,3],'B':[2,3,4]})
df2 = pd.DataFrame({'A':[-1,-2,-3],'B':[-2,-3,-4]})

display("df1",df1)
display("df2",df2)

'df1'

	A	B
0	1	2
1	2	3
2	3	4

'df2'

	A	B
0	-1	-2
1	-2	-3
2	-3	-4

#pd.concat([df1,df2],axis=0) # (3,2) concat (3,2) -> (6,4) 
pd.concat([df1,df2])

	A	B
0	1	2
1	2	3
2	3	4
0	-1	-2
1	-2	-3
2	-3	-4

인덱스를 재정리하고 싶다면?

pd.concat([df1,df2]).reset_index(drop=True)

	A	B
0	1	2
1	2	3
2	3	4
3	-1	-2
4	-2	-3
5	-3	-4

#

# 예시2 – 좌우로

df1 = pd.DataFrame({'A':[1,2,3],'B':[2,3,4]})
df2 = pd.DataFrame({'C':[-1,-2,-3],'D':[-2,-3,-4]})

display("df1",df1)
display("df2",df2)

'df1'

	A	B
0	1	2
1	2	3
2	3	4

'df2'

	C	D
0	-1	-2
1	-2	-3
2	-3	-4

pd.concat([df1,df2],axis=1) # (3,2) concat (3,2) -> (3,4)

	A	B	C	D
0	1	2	-1	-2
1	2	3	-2	-3
2	3	4	-3	-4

#

B. `pd.merge()`

big = pd.DataFrame({'department':['A','A','B','B'], 'gender':['male','female','male','female'],'count':[1,2,3,1]})
small = pd.DataFrame({'department':['A','B'], 'total':[3,4]})

display("big",big)
display("small",small)

'big'

	department	gender	count
0	A	male	1
1	A	female	2
2	B	male	3
3	B	female	1

'small'

	department	total
0	A	3
1	B	4

big.merge(small)

	department	gender	count	total
0	A	male	1	3
1	A	female	2	3
2	B	male	3	4
3	B	female	1	4

small.merge(big)

	department	total	gender	count
0	A	3	male	1
1	A	3	female	2
2	B	4	male	3
3	B	4	female	1

C. `.applymap()`

# 예시1

np.random.seed(43052)
df = pd.DataFrame({'A':np.random.rand(3), 'B':np.random.rand(3)})
df

	A	B
0	0.817682	0.619777
1	0.049532	0.122541
2	0.838686	0.117128

df.applymap(lambda x: 'Yes' if x>0.5 else 'No')

/tmp/ipykernel_288950/1665022133.py:1: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

	A	B
0	Yes	Yes
1	No	No
2	Yes	No

#

D. `df.astype()`

- 예시1

df = pd.DataFrame({'A':[0,1,2],'B':[4,5,6]})
df

	A	B
0	0	4
1	1	5
2	2	6

df.astype(float)

	A	B
0	0.0	4.0
1	1.0	5.0
2	2.0	6.0

E. 제 아픈기억

# 예제1 – object형은 일괄적으로 문자형임을 의미하는게 아님

np.random.seed(43052)
df = pd.DataFrame({'A':['1','2','0','1',2], 'B':['2','3','0','0',0]})
df

	A	B
0	1	2
1	2	3
2	0	0
3	1	0
4	2	0

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       5 non-null      object
 1   B       5 non-null      object
dtypes: object(2)
memory usage: 208.0+ bytes

df.A[0],df.A[4],df.B[0],df.B[4]

('1', 2, '2', 0)

#

# 예제2 – column이름이 이상하게 들어가 있는 경우가 있음.

df = pd.DataFrame({('A',''):[0,0,0], ('B',''):[1,1,1]})
df

	A	B

0	0	1
1	0	1
2	0	1

df['A']

0    0
1    0
2    0
Name: A, dtype: int64

df[('A','')]

0    0
1    0
2    0
Name: (A, ), dtype: int64

#

4. 에너지사용량 시각화

(1) 아래는 2019년 서울의 에너지사용량을 불러오는 예시코드이다.

pd.read_csv('https://raw.githubusercontent.com/guebin/DV2022/main/posts/Energy/Seoul2019.csv')

	지역	건물동수	연면적	에너지사용량(TOE)/전기	에너지사용량(TOE)/도시가스	에너지사용량(TOE)/지역난방
0	종로구	17,851	9,204,140	63,492	76,653	799
1	중구	10,383	10,078,848	79,223	68,210	497
2	용산구	17,138	10,756,612	51,229	79,805	11,128
3	성동구	13,980	11,804,313	59,832	99,986	0
4	광진구	21,556	12,272,738	68,756	123,447	0
5	동대문구	21,794	12,664,554	65,913	111,420	0
6	중랑구	23,950	15,182,802	59,370	109,284	7,442
7	성북구	27,112	15,938,807	77,007	148,376	0
8	강북구	23,334	9,458,987	47,731	100,045	0
9	도봉구	13,168	10,644,704	44,985	90,379	5,268
10	노원구	9,704	17,197,086	77,010	94,340	50,859
11	은평구	25,200	14,735,131	75,914	130,159	14,370
12	서대문구	17,651	12,559,425	65,164	111,542	6,330
13	마포구	18,844	15,024,186	92,453	114,931	20,148
14	양천구	14,690	15,428,339	70,721	82,857	49,258
15	강서구	20,446	20,641,866	86,809	128,786	35,896
16	구로구	17,204	13,509,894	59,916	120,457	2,963
17	금천구	12,135	7,420,441	34,791	69,814	732
18	영등포구	18,133	14,914,027	87,480	114,238	13,531
19	동작구	20,102	13,612,946	66,811	132,285	899
20	관악구	26,460	14,997,859	85,416	158,543	0
21	서초구	12,856	21,560,285	135,491	121,437	38,866
22	강남구	16,129	29,961,585	180,121	149,045	83,459
23	송파구	19,331	26,573,343	139,117	143,601	71,954
24	강동구	16,636	15,048,315	70,341	121,931	11,921

에너지 사용량은 2018년부터 2021년까지의 기간 동안 서울, 부산 등 여러 지역에 대해 정리되어 있으며, 아래 주소 형식으로 저장되어 있다.

https://raw.githubusercontent.com/guebin/DV2022/main/posts/Energy/Seoul2018.csv
https://raw.githubusercontent.com/guebin/DV2022/main/posts/Energy/Seoul2019.csv
https://raw.githubusercontent.com/guebin/DV2022/main/posts/Energy/Seoul2020.csv
https://raw.githubusercontent.com/guebin/DV2022/main/posts/Energy/Seoul2021.csv
...
https://raw.githubusercontent.com/guebin/DV2022/main/posts/Energy/Busan2018.csv
https://raw.githubusercontent.com/guebin/DV2022/main/posts/Energy/Busan2019.csv
https://raw.githubusercontent.com/guebin/DV2022/main/posts/Energy/Busan2020.csv
https://raw.githubusercontent.com/guebin/DV2022/main/posts/Energy/Busan2021.csv

아래의 url, prov를 참고하여 모든 자료를 불러온 뒤 pd.concat()을 이용하여 하나의 df로 합쳐라.

url = 'https://raw.githubusercontent.com/guebin/DV2022/main/posts/Energy/{}.csv'

prov = ['Seoul', 'Busan', 'Daegu', 'Incheon', 
        'Gwangju', 'Daejeon', 'Ulsan', 'Sejongsi', 
        'Gyeonggi-do', 'Gangwon-do', 'Chungcheongbuk-do', 
        'Chungcheongnam-do', 'Jeollabuk-do', 'Jeollanam-do', 
        'Gyeongsangbuk-do', 'Gyeongsangnam-do', 'Jeju-do']

올바르게 정리된 데이터프레임의 예시는 아래와 같다.

(풀이)

df = pd.concat([pd.read_csv(url.format(p+y)).assign(년도=y, 시도=p) for p in prov for y in ['2018', '2019', '2020', '2021']]).reset_index(drop=True)
df

	지역	건물동수	연면적	에너지사용량(TOE)/전기	에너지사용량(TOE)/도시가스	에너지사용량(TOE)/지역난방	년도	시도
0	종로구	17,929	9,141,777	64,818	82,015	111	2018	Seoul
1	중구	10,598	10,056,233	81,672	75,260	563	2018	Seoul
2	용산구	17,201	10,639,652	52,659	85,220	12,043	2018	Seoul
3	성동구	14,180	11,631,770	60,559	107,416	0	2018	Seoul
4	광진구	21,520	12,054,796	70,609	130,308	0	2018	Seoul
...	...	...	...	...	...	...	...	...
995	서귀포시	34,729	7,233,931	34,641	1,306	0	2019	Jeju-do
996	제주시	66,504	19,819,923	99,212	22,179	0	2020	Jeju-do
997	서귀포시	34,880	7,330,040	35,510	1,639	0	2020	Jeju-do
998	제주시	67,053	20,275,738	103,217	25,689	0	2021	Jeju-do
999	서귀포시	35,230	7,512,206	37,884	2,641	0	2021	Jeju-do

1000 rows × 8 columns

(2) 의미상 숫자형이지만 문자형으로 입력이된 자료를 모두 전처리하라.

(풀이)

df.assign(년도 = df.년도.astype(int))\
.set_index(['년도','시도','지역']).applymap(lambda x: int(str(x).replace(',','')))\
.reset_index()

/tmp/ipykernel_322791/847505444.py:2: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

	년도	시도	지역	건물동수	연면적	에너지사용량(TOE)/전기	에너지사용량(TOE)/도시가스	에너지사용량(TOE)/지역난방
0	2018	Seoul	종로구	17929	9141777	64818	82015	111
1	2018	Seoul	중구	10598	10056233	81672	75260	563
2	2018	Seoul	용산구	17201	10639652	52659	85220	12043
3	2018	Seoul	성동구	14180	11631770	60559	107416	0
4	2018	Seoul	광진구	21520	12054796	70609	130308	0
...	...	...	...	...	...	...	...	...
995	2019	Jeju-do	서귀포시	34729	7233931	34641	1306	0
996	2020	Jeju-do	제주시	66504	19819923	99212	22179	0
997	2020	Jeju-do	서귀포시	34880	7330040	35510	1639	0
998	2021	Jeju-do	제주시	67053	20275738	103217	25689	0
999	2021	Jeju-do	서귀포시	35230	7512206	37884	2641	0

1000 rows × 8 columns

(3) 열의 이름을 아래와 같이 바꾸라.

name_dict = {
    '년도': 'Year',
    '시도': 'Prov',
    '지역': 'Reg',
    '건물동수': 'BldgCount',
    '연면적': 'Area',
    '에너지사용량(TOE)/전기': 'Elec',
    '에너지사용량(TOE)/도시가스': 'Gas',
    '에너지사용량(TOE)/지역난방': 'Heat'
}

(풀이)

df.assign(년도 = df.년도.astype(int))\
.set_index(['년도','시도','지역']).applymap(lambda x: int(str(x).replace(',','')))\
.reset_index().rename(name_dict,axis=1)

/tmp/ipykernel_322791/1695199233.py:2: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

	Year	Prov	Reg	BldgCount	Area	Elec	Gas	Heat
0	2018	Seoul	종로구	17929	9141777	64818	82015	111
1	2018	Seoul	중구	10598	10056233	81672	75260	563
2	2018	Seoul	용산구	17201	10639652	52659	85220	12043
3	2018	Seoul	성동구	14180	11631770	60559	107416	0
4	2018	Seoul	광진구	21520	12054796	70609	130308	0
...	...	...	...	...	...	...	...	...
995	2019	Jeju-do	서귀포시	34729	7233931	34641	1306	0
996	2020	Jeju-do	제주시	66504	19819923	99212	22179	0
997	2020	Jeju-do	서귀포시	34880	7330040	35510	1639	0
998	2021	Jeju-do	제주시	67053	20275738	103217	25689	0
999	2021	Jeju-do	서귀포시	35230	7512206	37884	2641	0

1000 rows × 8 columns

(4) 아래와 같은 그림을 시각화 하라.

# 시각화예시

(풀이)

tidydata = df.assign(년도 = df.년도.astype(int))\
.set_index(['년도','시도','지역']).applymap(lambda x: int(str(x).replace(',','')))\
.reset_index().rename(name_dict,axis=1)\
.drop(['BldgCount','Area'],axis=1)\
.set_index(['Year','Prov','Reg']).stack().reset_index()\
.rename({'level_3':'Type',0:'EnergyUse'},axis=1)\
.pivot_table(index=['Year','Prov','Type'],values='EnergyUse',aggfunc='sum')\
.reset_index()\
.eval('logEnergyUse = log(EnergyUse)')
tidydata 
# --- # 
fig = ggplot(tidydata) 
line = geom_line(aes(x='Year',y='logEnergyUse',color='Type',linetype='Type'))
fig = (fig + line + facet_wrap('Prov',scales='free')).draw()
fig

/tmp/ipykernel_322791/2236388880.py:2: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
/home/cgb2/anaconda3/envs/ag/lib/python3.10/site-packages/pandas/core/arraylike.py:396: RuntimeWarning: divide by zero encountered in log

fig.set_size_inches(10,6)
fig.set_dpi(150)
fig

(5) Prov별로 총 에너지사용량이 많은 상위5개의 Reg을 찾고 아래와 같이 시각화 하라.

# 시각화예시

(풀이)

df.assign(년도 = df.년도.astype(int))\
.set_index(['년도','시도','지역']).applymap(lambda x: int(str(x).replace(',','')))\
.reset_index().rename(name_dict,axis=1)\
.eval('EnergyUse = Elec + Gas + Heat')\
.drop(['BldgCount','Area','Elec','Gas','Heat'],axis=1)\
.pivot_table(index=['Prov','Reg'],columns='Year',values='EnergyUse').sum(axis=1)\
.reset_index().rename({0:'EnergyUse'},axis=1)

/tmp/ipykernel_322791/1149148944.py:2: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

	Prov	Reg	EnergyUse
0	Busan	강서구	200386.0
1	Busan	금정구	451212.0
2	Busan	기장군	287926.0
3	Busan	남구	491030.0
4	Busan	동구	156302.0
...	...	...	...
245	Ulsan	남구	607820.0
246	Ulsan	동구	281094.0
247	Ulsan	북구	334844.0
248	Ulsan	울주군	394217.0
249	Ulsan	중구	395158.0

250 rows × 3 columns

g = df.assign(년도 = df.년도.astype(int))\
.set_index(['년도','시도','지역']).applymap(lambda x: int(str(x).replace(',','')))\
.reset_index().rename(name_dict,axis=1)\
.eval('EnergyUse = Elec + Gas + Heat')\
.drop(['BldgCount','Area','Elec','Gas','Heat'],axis=1)\
.pivot_table(index=['Prov','Reg'],columns='Year',values='EnergyUse').sum(axis=1)\
.reset_index().rename({0:'EnergyUse'},axis=1)\
.groupby('Prov')

/tmp/ipykernel_322791/1748878604.py:2: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

tidydata = pd.concat([df.sort_values('EnergyUse',ascending=False).reset_index(drop=True).reset_index().rename({'index':'Rank'},axis=1).iloc[:5] for k,df in g])\
.reset_index(drop=True)
tidydata 
#---# 
fig = (ggplot(tidydata) + geom_col(aes(x='Rank',y='EnergyUse',fill='Prov')) + facet_wrap('Prov')).draw()
fig.set_size_inches(10,5)
fig.set_dpi(150)
fig

(5) (Prov,Year)별 전기에너지 사용량 비율을 구하고 아래와 같이 시각화 하라. + 제주를 제외한 지역으로 한정하고 시각화하라.

# 시각화예시

(풀이1) – 수업시간에 한 풀이

df.assign(년도 = df.년도.astype(int))\
.set_index(['년도','시도','지역']).applymap(lambda x: int(str(x).replace(',','')))\
.reset_index().rename(name_dict,axis=1)\
.drop(['BldgCount','Area'],axis=1).melt(id_vars=['Year','Prov','Reg'])\
.rename({'variable':'Type','value':'EnergyUse'},axis=1)\
.pivot_table(index=['Prov','Year'],columns='Type',values='EnergyUse',aggfunc='sum')\
.eval('ElecRate = Elec/(Elec+Gas+Heat)').loc[:,'ElecRate'].reset_index()

/tmp/ipykernel_322791/3362619229.py:2: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

	Prov	Year	ElecRate
0	Busan	2018	0.455996
1	Busan	2019	0.463324
2	Busan	2020	0.457401
3	Busan	2021	0.534566
4	Chungcheongbuk-do	2018	0.512447
...	...	...	...
63	Seoul	2021	0.455675
64	Ulsan	2018	0.384385
65	Ulsan	2019	0.392067
66	Ulsan	2020	0.387118
67	Ulsan	2021	0.386115

68 rows × 3 columns

tidydata = df.assign(년도 = df.년도.astype(int))\
.set_index(['년도','시도','지역']).applymap(lambda x: int(str(x).replace(',','')))\
.reset_index().rename(name_dict,axis=1)\
.drop(['BldgCount','Area'],axis=1).melt(id_vars=['Year','Prov','Reg'])\
.rename({'variable':'Type','value':'EnergyUse'},axis=1)\
.pivot_table(index=['Prov','Year'],columns='Type',values='EnergyUse',aggfunc='sum')\
.eval('ElecRate = Elec/(Elec+Gas+Heat)').loc[:,'ElecRate'].reset_index()
#--#
fig = ggplot(tidydata)
point = geom_point(aes(x='Year',y='ElecRate',color='Prov',shape='Prov'))
line = geom_line(aes(x='Year',y='ElecRate',color='Prov'),linetype='dashed')
fig = (fig + point + line).draw()
fig.set_dpi(150)
fig

/tmp/ipykernel_322791/3057913787.py:2: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
/home/cgb2/anaconda3/envs/ag/lib/python3.10/site-packages/mizani/palettes.py:706: UserWarning: Palette can return a maximum of 13 values. 17 values requested.
/home/cgb2/anaconda3/envs/ag/lib/python3.10/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_point : Removed 16 rows containing missing values.
/home/cgb2/anaconda3/envs/ag/lib/python3.10/site-packages/mizani/palettes.py:706: UserWarning: Palette can return a maximum of 13 values. 17 values requested.
/home/cgb2/anaconda3/envs/ag/lib/python3.10/site-packages/plotnine/guides/guides.py:259: PlotnineWarning: geom_point legend : Removed 4 rows containing missing values.

fig = ggplot(tidydata.query("Prov != 'Jeju-do'"))
point = geom_point(aes(x='Year',y='ElecRate',color='Prov',shape='Prov'))
line = geom_line(aes(x='Year',y='ElecRate',color='Prov'),linetype='dashed')
fig = (fig + point + line).draw()
fig.set_dpi(150)
fig

/home/cgb2/anaconda3/envs/ag/lib/python3.10/site-packages/mizani/palettes.py:706: UserWarning: Palette can return a maximum of 13 values. 16 values requested.
/home/cgb2/anaconda3/envs/ag/lib/python3.10/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_point : Removed 12 rows containing missing values.
/home/cgb2/anaconda3/envs/ag/lib/python3.10/site-packages/mizani/palettes.py:706: UserWarning: Palette can return a maximum of 13 values. 16 values requested.
/home/cgb2/anaconda3/envs/ag/lib/python3.10/site-packages/plotnine/guides/guides.py:259: PlotnineWarning: geom_point legend : Removed 3 rows containing missing values.

(풀이2) – draft에 있던 풀이, 수업시간에 풀이한것과 약간다름

df.assign(년도 = df.년도.astype(int))\
    .set_index(['년도','시도','지역']).applymap(lambda x: str(x).replace(',','')).astype(int).reset_index()\
    .rename(name_dict,axis=1)\
    .drop(['BldgCount','Area'],axis=1)\
    .melt(id_vars=['Year','Prov','Reg']).rename({'variable':'Type','value':'EnergyUse'},axis=1)\
    .pivot_table(index=['Year','Prov','Type'],values='EnergyUse',aggfunc='sum').reset_index()\
    .set_index(['Year','Prov','Type']).unstack()\
    .assign(ElecRate = lambda df: df[('EnergyUse','Elec')]/(df[('EnergyUse','Elec')]+df[('EnergyUse','Gas')]+df[('EnergyUse','Heat')]))\
    .iloc[:,-1].reset_index().set_axis(['Year','Prov','ElecRate'],axis=1)

/tmp/ipykernel_322791/2924150206.py:2: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

	Year	Prov	ElecRate
0	2018	Busan	0.455996
1	2018	Chungcheongbuk-do	0.512447
2	2018	Chungcheongnam-do	0.506471
3	2018	Daegu	0.403464
4	2018	Daejeon	0.418136
...	...	...	...
63	2021	Jeollabuk-do	0.466878
64	2021	Jeollanam-do	0.537464
65	2021	Sejongsi	0.435457
66	2021	Seoul	0.455675
67	2021	Ulsan	0.386115

68 rows × 3 columns

이후의 과정은 풀이1와 같음.

5. `pd.merge()` 의 이용

before에서 after로 가는 일반적인 방법이 있을까?

before = df.assign(년도 = df.년도.astype(int))\
    .set_index(['년도','시도','지역']).applymap(lambda x: str(x).replace(',','')).astype(int).reset_index()\
    .rename(name_dict,axis=1)\
    .drop(['BldgCount','Area'],axis=1)
after = before\
    .melt(id_vars=['Year','Prov','Reg']).rename({'variable':'Type','value':'EnergyUse'},axis=1)\
    .pivot_table(index=['Year','Prov','Type'],values='EnergyUse',aggfunc='sum').reset_index()\
    .set_index(['Year','Prov','Type']).unstack()\
    .assign(ElecRate = lambda df: df[('EnergyUse','Elec')]/(df[('EnergyUse','Elec')]+df[('EnergyUse','Gas')]+df[('EnergyUse','Heat')]))\
    .iloc[:,-1].reset_index().set_axis(['Year','Prov','ElecRate'],axis=1)
display('before',before)
display('after',after)

/tmp/ipykernel_322791/2643872.py:2: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

'before'

	Year	Prov	Reg	Elec	Gas	Heat
0	2018	Seoul	종로구	64818	82015	111
1	2018	Seoul	중구	81672	75260	563
2	2018	Seoul	용산구	52659	85220	12043
3	2018	Seoul	성동구	60559	107416	0
4	2018	Seoul	광진구	70609	130308	0
...	...	...	...	...	...	...
995	2019	Jeju-do	서귀포시	34641	1306	0
996	2020	Jeju-do	제주시	99212	22179	0
997	2020	Jeju-do	서귀포시	35510	1639	0
998	2021	Jeju-do	제주시	103217	25689	0
999	2021	Jeju-do	서귀포시	37884	2641	0

1000 rows × 6 columns

'after'

	Year	Prov	ElecRate
0	2018	Busan	0.455996
1	2018	Chungcheongbuk-do	0.512447
2	2018	Chungcheongnam-do	0.506471
3	2018	Daegu	0.403464
4	2018	Daejeon	0.418136
...	...	...	...
63	2021	Jeollabuk-do	0.466878
64	2021	Jeollanam-do	0.537464
65	2021	Sejongsi	0.435457
66	2021	Seoul	0.455675
67	2021	Ulsan	0.386115

68 rows × 3 columns

(해법) – 머리가 안돌아가면 일단 tidydata로 만들고 시작..

small = before.melt(id_vars=['Year','Prov','Reg'])\
.groupby(['Year','Prov']).agg({'value':'sum'}).reset_index()\
.rename({'value':'EnergyUseTotal'},axis=1)

big = before.melt(id_vars=['Year','Prov','Reg'])\
.groupby(['Year','Prov','variable']).agg({'value':'sum'}).reset_index()\
.rename({'variable':'Type','value':'EnergyUse'},axis=1)

big.merge(small).query('Type=="Elec"')\
.eval('ElecRate = EnergyUse/EnergyUseTotal')\
.loc[:,['Year','Prov','ElecRate']].reset_index(drop=True)

	Year	Prov	ElecRate
0	2018	Busan	0.455996
1	2018	Chungcheongbuk-do	0.512447
2	2018	Chungcheongnam-do	0.506471
3	2018	Daegu	0.403464
4	2018	Daejeon	0.418136
...	...	...	...
63	2021	Jeollabuk-do	0.466878
64	2021	Jeollanam-do	0.537464
65	2021	Sejongsi	0.435457
66	2021	Seoul	0.455675
67	2021	Ulsan	0.386115

68 rows × 3 columns

1. 강의영상

2. Imports

3. 예비학습

A. pd.concat()

B. pd.merge()

C. .applymap()

D. df.astype()

E. 제 아픈기억

4. 에너지사용량 시각화

5. pd.merge() 의 이용

6. HW 없음

A. `pd.concat()`

B. `pd.merge()`

C. `.applymap()`

D. `df.astype()`

5. `pd.merge()` 의 이용