Lesson 08: pandas II

Author

최규빈

Published

July 24, 2023

imports

import pandas as pd

예비학습

lambda

- 예제1: 람다표현식(lambda expression)자체가 하나의 오브젝트임

lambda x: (x-2)**2 ### lambda x: (x-2)**2 가 실행되는 순간 메모리상에 함수 오브젝트가 저장됨 
<function __main__.<lambda>(x)>

(사용방법)

(lambda x: (x-2)**2)(2) # 입력2 -> 출력 (2-2)^2 =0 
0
(lambda x: (x-2)**2)(5) # 입력5 -> 출력 (5-2)^2 =9
9
(lambda x: (x-2)**2)(6) # 입력6 -> 출력 (6-2)^2 =16
16
(lambda x: (x-2)**2)(-2) # 입력-2 -> 출력 (-2-2)^2 =16
16

quiz

\(f(x)=\sqrt{x}\) 를 수행하는 함수를 lambda를 이용하여 구성하라.

(lambda x: np.sqrt(x))(2)
1.4142135623730951

- 예제2: 람다표현식에 이름을 줄 수 있음.

f = lambda x: (x-2)**2
f(2),f(4),f(6),f(-2)
(0, 4, 16, 16)

위의 코드는 아래와 같다.

def f(x):
    return (x-2)**2
f(2),f(4),f(6),f(-2)
(0, 4, 16, 16)

- 예제3: 조건부 출력

f = lambda x,y: x if x>y else y # x,y가 입력 -> x>y 일때만 x를 리턴하고 그렇지않으면 y를 리턴 = 큰값을 리턴하라는 소리임 
f(1,20)
20

quiz

임의의 알파벳 문자열을 입력으로 하고, 알파벳이 대문자이면 “대문자 입니다”를 리턴하고 소문자이면 “소문자 입니다”를 리턴하는 함수를 구성하라.

(힌트) 아래코드 이용

(lambda x: "대문자" if x.isupper() else "소문자")('A')
'대문자'

- 예제4: 람다표현식들의 리스트

fl = [lambda x: x, lambda x: x**2, lambda x: x**3]
for f in fl: 
    print(f(2))
2
4
8
x = np.linspace(-1,1,100)
for f in fl:
    plt.plot(x,f(x),'--') 

- 예제5: 람다표현식들의 딕셔너리

fd = {'f1':lambda x: x, 'f2':lambda x: x**2, 'f3':lambda x: x**3}
fd
{'f1': <function __main__.<lambda>(x)>,
 'f2': <function __main__.<lambda>(x)>,
 'f3': <function __main__.<lambda>(x)>}
for k in fd:
    plt.plot(x,fd[k](x),'--')

- 예제6: 람다표현식을 리턴하는 함수 (함수를 리턴하는 함수)

(예비학습) 함수 \(g(x)\)가 정의되어 있을때 \(\frac{d}{dx}g(x)\)의 값을 계산해보기

g = lambda x: x**2 

\(g(x)=x^2\)

\[g'(x)=gg(x)=2x\approx \frac{g(x+0.001)-g(x)}{0.001}\]

gg = lambda x : (g(x+0.001)-g(x))/0.001
gg(4)
8.0010000000037

(목표) 도함수를 구해주는 derivate 함수를 정의하자. 이 함수는 임의의 함수 g를 입력으로 받으면, g의 도함수(gg)가 리턴되는 기능을 가진다.

derivate = lambda g: (lambda x : (g(x+0.001)-g(x))/0.001)

(사용1)

x = np.linspace(0,6.28,1000) 
g = lambda x: np.sin(x)
gg = derivate(g) 
plt.plot(x,g(x),label=r'$g(x)=sin(x)$')
plt.plot(x,gg(x),label=r'$\frac{d}{dx}g(x)=cos(x)$')
plt.legend(fontsize=15)

(사용2)

g0 = lambda x: (1/6)*x**3
g1 = derivate(g0) # (1/2)x^2 
g2 = derivate(g1) # x 
x = np.linspace(-1,1,100)
plt.plot(x,g0(x),'--',label=r'$g_0(x)=\frac{1}{6}x^3$')
plt.plot(x,g1(x),'--',label=r'$g_1(x)=\frac{1}{2}x^2$')
plt.plot(x,g2(x),'--',label=r'$g_2(x)=x$')
plt.legend(fontsize=15)

- 예제7: 예제6의 다른표현

derivate = lambda g: lambda x : (g(x+0.001)-g(x))/0.001

(사용1)

g = lambda x: np.sin(x) 
gg = derivate(g) 
x = np.linspace(0,6.28,1000) 
plt.plot(x,g(x),label=r'$f(x)=sin(x)$')
plt.plot(x,gg(x),label=r'$\frac{d}{dx}f(x)=cos(x)$')
plt.legend(fontsize=15)

(사용2)

g0 = lambda x: (1/6)*x**3
g1 = derivate(g0) # (1/2)x^2 
g2 = derivate(g1) # x 
x = np.linspace(-1,1,100)
plt.plot(x,g0(x),'--',label=r'$g_0(x)=\frac{1}{6}x^3$')
plt.plot(x,g1(x),'--',label=r'$g_1(x)=\frac{1}{2}x^2$')
plt.plot(x,g2(x),'--',label=r'$g_2(x)=x$')
plt.legend(fontsize=15)

map

- 개념: $(f,[x_1,x_2,,x_n] )=$

- 예제1:

x = [1,2,3] 
f = lambda x: x+1
y = list(map(f,x))
x,y
([1, 2, 3], [2, 3, 4])

quiz

\(x=[1,2,3,4,5]\)에 대하여 \(x^2\)을 수행하는 함수를 구현하라.

list(map(lambda x: x**2 ,[1,2,3,4,5]))
[1, 4, 9, 16, 25]

- 예제2: 문자열을 입력으로 받고 대문자이면 True, 소문자이면 False

입력: A,B,C,a,b,c
출력: T,T,T,F,F,F
list(map(lambda x: x.isupper(), ['A','B','C','a','b','c']))
[True, True, True, False, False, False]

- 예제3: 두개의 입력을 받는 함수 (map을 이용하는 것이 리스트 컴프리헨션보다 조금 편한것 같다)

f= lambda x,y: x+y 
x=[1,2,3]
y=[-1,-2,-3]
list(map(f,x,y))
[0, 0, 0]

quiz

t=np.linspace(-3.14,3.14,100)
x=np.sin(t)
y=np.cos(t)
plt.plot(t,x)
plt.plot(t,y)

\(f(x,y) = \max(x,y)\) 를 구현하여 \({\bf x}\),\({\bf y}\) 각 원소에 적용하고 결과를 시각화 하라.

plt.plot(list(map(lambda x,y: max(x,y), x,y)))

- 종합: map과 리스트컴프리헨션과 비교

  • map은 for문을 위한 \(i\)등의 인덱스를 쓰지 않지만 리스트컴프리헨션은 필요함
  • map은 좀 더 리스트컴프리헨션보다 제약적으로 사용할 수 밖에 없음.

Pandas: transform column

lambda

- 예시1: \(x \to x+2\)

f = lambda x: x+2
f(1)
3

- 예시2: \(x,y \to x+y\)

f = lambda x,y: x+y
f(1,2)
3

- 예시3: ‘2023-09’ \(\to\) 9

f = lambda x: int(x[-2:])
f('2023-09')
9

- 예시4: ‘2023-09’ \(\to\) (2023,9)

f = lambda x: (int(x[:4]),int(x[-2:]))
f('2023-09')
(2023, 9)

- 예시5: 문자열이 ‘cat’이면 1 ’dog’ 이면 0 // ’cat이면 1 ’cat’이 아니면 0

f = lambda x: 1 if x=='cat' else 0
f('cat'), f('dog')
(1, 0)

- Note: f로 이름을 정하지 않고 직접 사용 가능

(lambda x: x+1)(2)
3

map

- 개념: map(f,[x1,x2,...xn])=[f(x1),f(x2),...,f(xn)]

- 예시1: x->x+1[1,2,3]에 적용

list(map(lambda x: x+1, [1,-5,3]))
[2, -4, 4]

- 예시2 df.Height열 변환하기

df = pd.read_csv('https://raw.githubusercontent.com/guebin/DV2022/master/posts/FIFA23_official_data.csv').drop(['Loaned From','Best Overall Rating'],axis=1).dropna()
s = df.Height[:5]
s
0    189cm
1    179cm
2    172cm
3    181cm
4    172cm
Name: Height, dtype: object
x = s[0]
x
'189cm'
list(map(lambda x: int(x.replace('cm','')), s))
[189, 179, 172, 181, 172]

# 예시3 df.Height열 변환하기 + 변환된 열 할당하기

df.assign(
    Height = list(map(lambda x: int(x.replace('cm','')), df.Height)),
    Weight = list(map(lambda x: int(x.replace('kg','')), df.Weight)),
)
ID Name Age Photo Nationality Flag Overall Potential Club Club Logo ... Real Face Position Joined Loaned From Contract Valid Until Height Weight Release Clause Kit Number Best Overall Rating
0 209658 L. Goretzka 27 https://cdn.sofifa.net/players/209/658/23_60.png Germany https://cdn.sofifa.net/flags/de.png 87 88 FC Bayern München https://cdn.sofifa.net/teams/21/30.png ... Yes <span class="pos pos28">SUB Jul 1, 2018 NaN 2026 189 82 €157M 8.0 NaN
1 212198 Bruno Fernandes 27 https://cdn.sofifa.net/players/212/198/23_60.png Portugal https://cdn.sofifa.net/flags/pt.png 86 87 Manchester United https://cdn.sofifa.net/teams/11/30.png ... Yes <span class="pos pos15">LCM Jan 30, 2020 NaN 2026 179 69 €155M 8.0 NaN
2 224334 M. Acuña 30 https://cdn.sofifa.net/players/224/334/23_60.png Argentina https://cdn.sofifa.net/flags/ar.png 85 85 Sevilla FC https://cdn.sofifa.net/teams/481/30.png ... No <span class="pos pos7">LB Sep 14, 2020 NaN 2024 172 69 €97.7M 19.0 NaN
3 192985 K. De Bruyne 31 https://cdn.sofifa.net/players/192/985/23_60.png Belgium https://cdn.sofifa.net/flags/be.png 91 91 Manchester City https://cdn.sofifa.net/teams/10/30.png ... Yes <span class="pos pos13">RCM Aug 30, 2015 NaN 2025 181 70 €198.9M 17.0 NaN
4 224232 N. Barella 25 https://cdn.sofifa.net/players/224/232/23_60.png Italy https://cdn.sofifa.net/flags/it.png 86 89 Inter https://cdn.sofifa.net/teams/44/30.png ... Yes <span class="pos pos13">RCM Sep 1, 2020 NaN 2026 172 68 €154.4M 23.0 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17655 269526 Deng Xiongtao 19 https://cdn.sofifa.net/players/269/526/23_60.png China PR https://cdn.sofifa.net/flags/cn.png 48 61 Meizhou Hakka https://cdn.sofifa.net/teams/114628/30.png ... No <span class="pos pos29">RES Apr 11, 2022 NaN 2027 190 78 €218K 35.0 NaN
17656 267946 22 Lim Jun Sub 17 https://cdn.sofifa.net/players/267/946/22_60.png Korea Republic https://cdn.sofifa.net/flags/kr.png 48 64 Jeju United FC https://cdn.sofifa.net/teams/1478/30.png ... No <span class="pos pos29">RES Jan 1, 2022 NaN 2026 195 84 €188K 21.0 NaN
17657 270567 A. Demir 25 https://cdn.sofifa.net/players/270/567/23_60.png Turkey https://cdn.sofifa.net/flags/tr.png 51 56 Ümraniyespor https://cdn.sofifa.net/teams/113796/30.png ... No <span class="pos pos29">RES Jun 6, 2021 NaN 2023 190 82 €142K 12.0 NaN
17658 256624 21 S. Czajor 18 https://cdn.sofifa.net/players/256/624/21_60.png Poland https://cdn.sofifa.net/flags/pl.png 50 65 Fleetwood Town https://cdn.sofifa.net/teams/112260/30.png ... No <span class="pos pos29">RES Jan 1, 2020 NaN 2021 187 79 €214K 40.0 NaN
17659 256376 21 F. Jakobsson 20 https://cdn.sofifa.net/players/256/376/21_60.png Sweden https://cdn.sofifa.net/flags/se.png 50 61 IFK Norrköping https://cdn.sofifa.net/teams/702/30.png ... No <span class="pos pos29">RES Jan 8, 2020 NaN 2021 186 78 €131K 30.0 NaN

17660 rows × 29 columns

#

# 예시4df.Position 열에 아래와 같은 변환을 수행하고, 변환된 열을 할당하라.

before after
<span class="pos pos28">SUB SUB
<span class="pos pos15">LCM LCM
<span class="pos pos7">LB LB
<span class="pos pos13">RCM RCM
<span class="pos pos13">RCM RCM
df = pd.read_csv('https://raw.githubusercontent.com/guebin/DV2022/master/posts/FIFA23_official_data.csv').drop(['Loaned From','Best Overall Rating'],axis=1).dropna()
df
ID Name Age Photo Nationality Flag Overall Potential Club Club Logo ... Work Rate Body Type Real Face Position Joined Contract Valid Until Height Weight Release Clause Kit Number
0 209658 L. Goretzka 27 https://cdn.sofifa.net/players/209/658/23_60.png Germany https://cdn.sofifa.net/flags/de.png 87 88 FC Bayern München https://cdn.sofifa.net/teams/21/30.png ... High/ Medium Unique Yes <span class="pos pos28">SUB Jul 1, 2018 2026 189cm 82kg €157M 8.0
1 212198 Bruno Fernandes 27 https://cdn.sofifa.net/players/212/198/23_60.png Portugal https://cdn.sofifa.net/flags/pt.png 86 87 Manchester United https://cdn.sofifa.net/teams/11/30.png ... High/ High Unique Yes <span class="pos pos15">LCM Jan 30, 2020 2026 179cm 69kg €155M 8.0
2 224334 M. Acuña 30 https://cdn.sofifa.net/players/224/334/23_60.png Argentina https://cdn.sofifa.net/flags/ar.png 85 85 Sevilla FC https://cdn.sofifa.net/teams/481/30.png ... High/ High Stocky (170-185) No <span class="pos pos7">LB Sep 14, 2020 2024 172cm 69kg €97.7M 19.0
3 192985 K. De Bruyne 31 https://cdn.sofifa.net/players/192/985/23_60.png Belgium https://cdn.sofifa.net/flags/be.png 91 91 Manchester City https://cdn.sofifa.net/teams/10/30.png ... High/ High Unique Yes <span class="pos pos13">RCM Aug 30, 2015 2025 181cm 70kg €198.9M 17.0
4 224232 N. Barella 25 https://cdn.sofifa.net/players/224/232/23_60.png Italy https://cdn.sofifa.net/flags/it.png 86 89 Inter https://cdn.sofifa.net/teams/44/30.png ... High/ High Normal (170-) Yes <span class="pos pos13">RCM Sep 1, 2020 2026 172cm 68kg €154.4M 23.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17655 269526 Deng Xiongtao 19 https://cdn.sofifa.net/players/269/526/23_60.png China PR https://cdn.sofifa.net/flags/cn.png 48 61 Meizhou Hakka https://cdn.sofifa.net/teams/114628/30.png ... Medium/ Medium Normal (185+) No <span class="pos pos29">RES Apr 11, 2022 2027 190cm 78kg €218K 35.0
17656 267946 22 Lim Jun Sub 17 https://cdn.sofifa.net/players/267/946/22_60.png Korea Republic https://cdn.sofifa.net/flags/kr.png 48 64 Jeju United FC https://cdn.sofifa.net/teams/1478/30.png ... Medium/ Medium Lean (185+) No <span class="pos pos29">RES Jan 1, 2022 2026 195cm 84kg €188K 21.0
17657 270567 A. Demir 25 https://cdn.sofifa.net/players/270/567/23_60.png Turkey https://cdn.sofifa.net/flags/tr.png 51 56 Ümraniyespor https://cdn.sofifa.net/teams/113796/30.png ... Medium/ Medium Lean (185+) No <span class="pos pos29">RES Jun 6, 2021 2023 190cm 82kg €142K 12.0
17658 256624 21 S. Czajor 18 https://cdn.sofifa.net/players/256/624/21_60.png Poland https://cdn.sofifa.net/flags/pl.png 50 65 Fleetwood Town https://cdn.sofifa.net/teams/112260/30.png ... Medium/ Medium Normal (185+) No <span class="pos pos29">RES Jan 1, 2020 2021 187cm 79kg €214K 40.0
17659 256376 21 F. Jakobsson 20 https://cdn.sofifa.net/players/256/376/21_60.png Sweden https://cdn.sofifa.net/flags/se.png 50 61 IFK Norrköping https://cdn.sofifa.net/teams/702/30.png ... Medium/ Medium Normal (185+) No <span class="pos pos29">RES Jan 8, 2020 2021 186cm 78kg €131K 30.0

16364 rows × 27 columns

df['Position'] = [l.split('>')[-1] for l in df.Position]
df
ID Name Age Photo Nationality Flag Overall Potential Club Club Logo ... Work Rate Body Type Real Face Position Joined Contract Valid Until Height Weight Release Clause Kit Number
0 209658 L. Goretzka 27 https://cdn.sofifa.net/players/209/658/23_60.png Germany https://cdn.sofifa.net/flags/de.png 87 88 FC Bayern München https://cdn.sofifa.net/teams/21/30.png ... High/ Medium Unique Yes SUB Jul 1, 2018 2026 189cm 82kg €157M 8.0
1 212198 Bruno Fernandes 27 https://cdn.sofifa.net/players/212/198/23_60.png Portugal https://cdn.sofifa.net/flags/pt.png 86 87 Manchester United https://cdn.sofifa.net/teams/11/30.png ... High/ High Unique Yes LCM Jan 30, 2020 2026 179cm 69kg €155M 8.0
2 224334 M. Acuña 30 https://cdn.sofifa.net/players/224/334/23_60.png Argentina https://cdn.sofifa.net/flags/ar.png 85 85 Sevilla FC https://cdn.sofifa.net/teams/481/30.png ... High/ High Stocky (170-185) No LB Sep 14, 2020 2024 172cm 69kg €97.7M 19.0
3 192985 K. De Bruyne 31 https://cdn.sofifa.net/players/192/985/23_60.png Belgium https://cdn.sofifa.net/flags/be.png 91 91 Manchester City https://cdn.sofifa.net/teams/10/30.png ... High/ High Unique Yes RCM Aug 30, 2015 2025 181cm 70kg €198.9M 17.0
4 224232 N. Barella 25 https://cdn.sofifa.net/players/224/232/23_60.png Italy https://cdn.sofifa.net/flags/it.png 86 89 Inter https://cdn.sofifa.net/teams/44/30.png ... High/ High Normal (170-) Yes RCM Sep 1, 2020 2026 172cm 68kg €154.4M 23.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17655 269526 Deng Xiongtao 19 https://cdn.sofifa.net/players/269/526/23_60.png China PR https://cdn.sofifa.net/flags/cn.png 48 61 Meizhou Hakka https://cdn.sofifa.net/teams/114628/30.png ... Medium/ Medium Normal (185+) No RES Apr 11, 2022 2027 190cm 78kg €218K 35.0
17656 267946 22 Lim Jun Sub 17 https://cdn.sofifa.net/players/267/946/22_60.png Korea Republic https://cdn.sofifa.net/flags/kr.png 48 64 Jeju United FC https://cdn.sofifa.net/teams/1478/30.png ... Medium/ Medium Lean (185+) No RES Jan 1, 2022 2026 195cm 84kg €188K 21.0
17657 270567 A. Demir 25 https://cdn.sofifa.net/players/270/567/23_60.png Turkey https://cdn.sofifa.net/flags/tr.png 51 56 Ümraniyespor https://cdn.sofifa.net/teams/113796/30.png ... Medium/ Medium Lean (185+) No RES Jun 6, 2021 2023 190cm 82kg €142K 12.0
17658 256624 21 S. Czajor 18 https://cdn.sofifa.net/players/256/624/21_60.png Poland https://cdn.sofifa.net/flags/pl.png 50 65 Fleetwood Town https://cdn.sofifa.net/teams/112260/30.png ... Medium/ Medium Normal (185+) No RES Jan 1, 2020 2021 187cm 79kg €214K 40.0
17659 256376 21 F. Jakobsson 20 https://cdn.sofifa.net/players/256/376/21_60.png Sweden https://cdn.sofifa.net/flags/se.png 50 61 IFK Norrköping https://cdn.sofifa.net/teams/702/30.png ... Medium/ Medium Normal (185+) No RES Jan 8, 2020 2021 186cm 78kg €131K 30.0

16364 rows × 27 columns

s.apply(변환함수)

- 예시1: 원소별로 처음3개의 숫자만 출력

df = pd.read_csv('https://raw.githubusercontent.com/guebin/DV2022/master/posts/FIFA23_official_data.csv')
df.Height.apply(lambda x: int(x.replace('cm',''))) 
0        189
1        179
2        172
3        181
4        172
        ... 
17655    190
17656    195
17657    190
17658    187
17659    186
Name: Height, Length: 17660, dtype: int64

- 예시2: 원소별로 처음3개의 문자만 출력 \(\to\) str자료형을 int자료형으로 변환

df = pd.read_csv('https://raw.githubusercontent.com/guebin/DV2022/master/posts/FIFA23_official_data.csv')

(방법1) – 한번에

df.Height.apply(lambda x: int(x[:3]))
0        189
1        179
2        172
3        181
4        172
        ... 
17655    190
17656    195
17657    190
17658    187
17659    186
Name: Height, Length: 17660, dtype: int64

(방법2) – 연쇄적으로..

df.Height.apply(lambda x: x[:3]).apply(int)
0        189
1        179
2        172
3        181
4        172
        ... 
17655    190
17656    195
17657    190
17658    187
17659    186
Name: Height, Length: 17660, dtype: int64