Lesson 09: pandas 공부 4단계 ~ 5단계

Author

최규빈

Published

July 24, 2023

강의영상

imports

import pandas as pd

pandas 공부 4단계: 메소드

data: FIFA23 data

- FIFA23라는 축구게임이 있음

- 게임에 실제 선수들이 나오면서 선수들의 능력치가 세밀하게 구현되어 있음

- 선수들 능력치에 대한 데이터셋은 캐글에 공개되어 있음 - https://www.kaggle.com/datasets/bryanb/fifa-player-stats-database?select=FIFA23_official_data.csv

df.head(): 처음 몇개만..

- 일단 살펴보기

df=pd.read_csv('https://raw.githubusercontent.com/guebin/DV2022/master/posts/FIFA23_official_data.csv')
df.head()
ID Name Age Photo Nationality Flag Overall Potential Club Club Logo ... Real Face Position Joined Loaned From Contract Valid Until Height Weight Release Clause Kit Number Best Overall Rating
0 209658 L. Goretzka 27 https://cdn.sofifa.net/players/209/658/23_60.png Germany https://cdn.sofifa.net/flags/de.png 87 88 FC Bayern München https://cdn.sofifa.net/teams/21/30.png ... Yes <span class="pos pos28">SUB Jul 1, 2018 NaN 2026 189cm 82kg €157M 8.0 NaN
1 212198 Bruno Fernandes 27 https://cdn.sofifa.net/players/212/198/23_60.png Portugal https://cdn.sofifa.net/flags/pt.png 86 87 Manchester United https://cdn.sofifa.net/teams/11/30.png ... Yes <span class="pos pos15">LCM Jan 30, 2020 NaN 2026 179cm 69kg €155M 8.0 NaN
2 224334 M. Acuña 30 https://cdn.sofifa.net/players/224/334/23_60.png Argentina https://cdn.sofifa.net/flags/ar.png 85 85 Sevilla FC https://cdn.sofifa.net/teams/481/30.png ... No <span class="pos pos7">LB Sep 14, 2020 NaN 2024 172cm 69kg €97.7M 19.0 NaN
3 192985 K. De Bruyne 31 https://cdn.sofifa.net/players/192/985/23_60.png Belgium https://cdn.sofifa.net/flags/be.png 91 91 Manchester City https://cdn.sofifa.net/teams/10/30.png ... Yes <span class="pos pos13">RCM Aug 30, 2015 NaN 2025 181cm 70kg €198.9M 17.0 NaN
4 224232 N. Barella 25 https://cdn.sofifa.net/players/224/232/23_60.png Italy https://cdn.sofifa.net/flags/it.png 86 89 Inter https://cdn.sofifa.net/teams/44/30.png ... Yes <span class="pos pos13">RCM Sep 1, 2020 NaN 2026 172cm 68kg €154.4M 23.0 NaN

5 rows × 29 columns

df.T

- 트랜스포즈해서 보는게 편할지도..?

df.T
0 1 2 3 4 5 6 7 8 9 ... 17650 17651 17652 17653 17654 17655 17656 17657 17658 17659
ID 209658 212198 224334 192985 224232 212622 197445 187961 208333 210514 ... 256879 269546 267647 253186 267461 269526 267946 270567 256624 256376
Name L. Goretzka Bruno Fernandes M. Acuña K. De Bruyne N. Barella J. Kimmich D. Alaba 22 Paulinho E. Can João Cancelo ... 22 G. Leijon Wu Fei 22 E. Grosz 22 S. Booth 22 L. Grimpe Deng Xiongtao 22 Lim Jun Sub A. Demir 21 S. Czajor 21 F. Jakobsson
Age 27 27 30 31 25 27 30 32 28 28 ... 19 32 18 20 17 19 17 25 18 20
Photo https://cdn.sofifa.net/players/209/658/23_60.png https://cdn.sofifa.net/players/212/198/23_60.png https://cdn.sofifa.net/players/224/334/23_60.png https://cdn.sofifa.net/players/192/985/23_60.png https://cdn.sofifa.net/players/224/232/23_60.png https://cdn.sofifa.net/players/212/622/23_60.png https://cdn.sofifa.net/players/197/445/23_60.png https://cdn.sofifa.net/players/187/961/22_60.png https://cdn.sofifa.net/players/208/333/23_60.png https://cdn.sofifa.net/players/210/514/23_60.png ... https://cdn.sofifa.net/players/256/879/22_60.png https://cdn.sofifa.net/players/269/546/23_60.png https://cdn.sofifa.net/players/267/647/22_60.png https://cdn.sofifa.net/players/253/186/22_60.png https://cdn.sofifa.net/players/267/461/22_60.png https://cdn.sofifa.net/players/269/526/23_60.png https://cdn.sofifa.net/players/267/946/22_60.png https://cdn.sofifa.net/players/270/567/23_60.png https://cdn.sofifa.net/players/256/624/21_60.png https://cdn.sofifa.net/players/256/376/21_60.png
Nationality Germany Portugal Argentina Belgium Italy Germany Austria Brazil Germany Portugal ... Sweden China PR Romania England Germany China PR Korea Republic Turkey Poland Sweden
Flag https://cdn.sofifa.net/flags/de.png https://cdn.sofifa.net/flags/pt.png https://cdn.sofifa.net/flags/ar.png https://cdn.sofifa.net/flags/be.png https://cdn.sofifa.net/flags/it.png https://cdn.sofifa.net/flags/de.png https://cdn.sofifa.net/flags/at.png https://cdn.sofifa.net/flags/br.png https://cdn.sofifa.net/flags/de.png https://cdn.sofifa.net/flags/pt.png ... https://cdn.sofifa.net/flags/se.png https://cdn.sofifa.net/flags/cn.png https://cdn.sofifa.net/flags/ro.png https://cdn.sofifa.net/flags/gb-eng.png https://cdn.sofifa.net/flags/de.png https://cdn.sofifa.net/flags/cn.png https://cdn.sofifa.net/flags/kr.png https://cdn.sofifa.net/flags/tr.png https://cdn.sofifa.net/flags/pl.png https://cdn.sofifa.net/flags/se.png
Overall 87 86 85 91 86 89 86 83 82 88 ... 52 51 52 51 54 48 48 51 50 50
Potential 88 87 85 91 89 90 86 83 82 88 ... 62 51 70 60 68 61 64 56 65 61
Club FC Bayern München Manchester United Sevilla FC Manchester City Inter FC Bayern München Real Madrid CF Al Ahli Borussia Dortmund Manchester City ... Örebro SK Wuhan Three Towns Gaz Metan Mediaş Crewe Alexandra RB Leipzig Meizhou Hakka Jeju United FC Ümraniyespor Fleetwood Town IFK Norrköping
Club Logo https://cdn.sofifa.net/teams/21/30.png https://cdn.sofifa.net/teams/11/30.png https://cdn.sofifa.net/teams/481/30.png https://cdn.sofifa.net/teams/10/30.png https://cdn.sofifa.net/teams/44/30.png https://cdn.sofifa.net/teams/21/30.png https://cdn.sofifa.net/teams/243/30.png https://cdn.sofifa.net/teams/112387/30.png https://cdn.sofifa.net/teams/22/30.png https://cdn.sofifa.net/teams/10/30.png ... https://cdn.sofifa.net/teams/705/30.png https://cdn.sofifa.net/teams/116361/30.png https://cdn.sofifa.net/teams/112637/30.png https://cdn.sofifa.net/teams/121/30.png https://cdn.sofifa.net/teams/112172/30.png https://cdn.sofifa.net/teams/114628/30.png https://cdn.sofifa.net/teams/1478/30.png https://cdn.sofifa.net/teams/113796/30.png https://cdn.sofifa.net/teams/112260/30.png https://cdn.sofifa.net/teams/702/30.png
Value €91M €78.5M €46.5M €107.5M €89.5M €105.5M €55.5M €28.5M €30.5M €82.5M ... €150K €30K €180K €110K €210K €100K €100K €70K €90K €90K
Wage €115K €190K €46K €350K €110K €130K €220K €61K €63K €250K ... €500 €2K €500 €850 €500 €500 €500 €2K €500 €500
Special 2312 2305 2303 2303 2296 2283 2277 2273 2271 2262 ... 779 777 775 768 767 762 761 759 758 749
Preferred Foot Right Right Left Right Right Right Left Right Right Right ... Right Right Right Right Right Right Right Right Right Left
International Reputation 4.0 3.0 2.0 4.0 3.0 4.0 4.0 3.0 3.0 3.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Weak Foot 4.0 3.0 3.0 5.0 3.0 4.0 4.0 4.0 4.0 4.0 ... 3.0 2.0 2.0 2.0 3.0 3.0 2.0 2.0 2.0 2.0
Skill Moves 3.0 4.0 3.0 4.0 3.0 3.0 3.0 4.0 3.0 4.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Work Rate High/ Medium High/ High High/ High High/ High High/ High High/ Medium Medium/ Medium High/ High Medium/ High High/ Medium ... Medium/ Medium Medium/ Medium Medium/ Medium Medium/ Medium Medium/ Medium Medium/ Medium Medium/ Medium Medium/ Medium Medium/ Medium Medium/ Medium
Body Type Unique Unique Stocky (170-185) Unique Normal (170-) Normal (170-185) Normal (170-185) Normal (170-185) Stocky (185+) Unique ... Normal (185+) Normal (185+) Lean (185+) Lean (185+) Lean (185+) Normal (185+) Lean (185+) Lean (185+) Normal (185+) Normal (185+)
Real Face Yes Yes No Yes Yes Yes Yes Yes Yes Yes ... No No No No No No No No No No
Position <span class="pos pos28">SUB <span class="pos pos15">LCM <span class="pos pos7">LB <span class="pos pos13">RCM <span class="pos pos13">RCM <span class="pos pos9">RDM <span class="pos pos6">LCB <span class="pos pos15">LCM <span class="pos pos28">SUB <span class="pos pos7">LB ... <span class="pos pos28">SUB <span class="pos pos28">SUB <span class="pos pos28">SUB <span class="pos pos29">RES <span class="pos pos29">RES <span class="pos pos29">RES <span class="pos pos29">RES <span class="pos pos29">RES <span class="pos pos29">RES <span class="pos pos29">RES
Joined Jul 1, 2018 Jan 30, 2020 Sep 14, 2020 Aug 30, 2015 Sep 1, 2020 Jul 1, 2015 Jul 1, 2021 Jul 22, 2021 Feb 18, 2020 Aug 7, 2019 ... Jun 14, 2020 Feb 15, 2019 Jul 1, 2020 Jul 1, 2019 Feb 7, 2022 Apr 11, 2022 Jan 1, 2022 Jun 6, 2021 Jan 1, 2020 Jan 8, 2020
Loaned From NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Contract Valid Until 2026 2026 2024 2025 2026 2025 2026 2024 2024 2027 ... 2022 2022 2022 2022 2023 2027 2026 2023 2021 2021
Height 189cm 179cm 172cm 181cm 172cm 177cm 180cm 183cm 186cm 182cm ... 188cm 186cm 190cm 195cm 186cm 190cm 195cm 190cm 187cm 186cm
Weight 82kg 69kg 69kg 70kg 68kg 75kg 78kg 80kg 86kg 74kg ... 81kg 78kg 70kg 80kg 78kg 78kg 84kg 82kg 79kg 78kg
Release Clause €157M €155M €97.7M €198.9M €154.4M €182M €113.8M €48.5M €51.9M €152.6M ... €218K €47K €356K €215K €488K €218K €188K €142K €214K €131K
Kit Number 8.0 8.0 19.0 17.0 23.0 6.0 4.0 15.0 23.0 7.0 ... 33.0 1.0 99.0 27.0 43.0 35.0 21.0 12.0 40.0 30.0
Best Overall Rating NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

29 rows × 17660 columns

df.info(): column별 자료형, 결측치 조사

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17660 entries, 0 to 17659
Data columns (total 29 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   ID                        17660 non-null  int64  
 1   Name                      17660 non-null  object 
 2   Age                       17660 non-null  int64  
 3   Photo                     17660 non-null  object 
 4   Nationality               17660 non-null  object 
 5   Flag                      17660 non-null  object 
 6   Overall                   17660 non-null  int64  
 7   Potential                 17660 non-null  int64  
 8   Club                      17449 non-null  object 
 9   Club Logo                 17660 non-null  object 
 10  Value                     17660 non-null  object 
 11  Wage                      17660 non-null  object 
 12  Special                   17660 non-null  int64  
 13  Preferred Foot            17660 non-null  object 
 14  International Reputation  17660 non-null  float64
 15  Weak Foot                 17660 non-null  float64
 16  Skill Moves               17660 non-null  float64
 17  Work Rate                 17660 non-null  object 
 18  Body Type                 17622 non-null  object 
 19  Real Face                 17622 non-null  object 
 20  Position                  17625 non-null  object 
 21  Joined                    16562 non-null  object 
 22  Loaned From               694 non-null    object 
 23  Contract Valid Until      17299 non-null  object 
 24  Height                    17660 non-null  object 
 25  Weight                    17660 non-null  object 
 26  Release Clause            16509 non-null  object 
 27  Kit Number                17625 non-null  float64
 28  Best Overall Rating       21 non-null     object 
dtypes: float64(4), int64(5), object(20)
memory usage: 3.9+ MB

s.str.replace(): 문자열치환

df['Height'].str.replace('cm','')
0        189
1        179
2        172
3        181
4        172
        ... 
17655    190
17656    195
17657    190
17658    187
17659    186
Name: Height, Length: 17660, dtype: object

s.astype: 자료형변화

df['Height'].str.replace('cm','').astype('int')
0        189
1        179
2        172
3        181
4        172
        ... 
17655    190
17656    195
17657    190
17658    187
17659    186
Name: Height, Length: 17660, dtype: int64

df.drop() 결측치가 많은 column 제거

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17660 entries, 0 to 17659
Data columns (total 29 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   ID                        17660 non-null  int64  
 1   Name                      17660 non-null  object 
 2   Age                       17660 non-null  int64  
 3   Photo                     17660 non-null  object 
 4   Nationality               17660 non-null  object 
 5   Flag                      17660 non-null  object 
 6   Overall                   17660 non-null  int64  
 7   Potential                 17660 non-null  int64  
 8   Club                      17449 non-null  object 
 9   Club Logo                 17660 non-null  object 
 10  Value                     17660 non-null  object 
 11  Wage                      17660 non-null  object 
 12  Special                   17660 non-null  int64  
 13  Preferred Foot            17660 non-null  object 
 14  International Reputation  17660 non-null  float64
 15  Weak Foot                 17660 non-null  float64
 16  Skill Moves               17660 non-null  float64
 17  Work Rate                 17660 non-null  object 
 18  Body Type                 17622 non-null  object 
 19  Real Face                 17622 non-null  object 
 20  Position                  17625 non-null  object 
 21  Joined                    16562 non-null  object 
 22  Loaned From               694 non-null    object 
 23  Contract Valid Until      17299 non-null  object 
 24  Height                    17660 non-null  object 
 25  Weight                    17660 non-null  object 
 26  Release Clause            16509 non-null  object 
 27  Kit Number                17625 non-null  float64
 28  Best Overall Rating       21 non-null     object 
dtypes: float64(4), int64(5), object(20)
memory usage: 3.9+ MB
df.drop(['Loaned From','Best Overall Rating'],axis=1)
ID Name Age Photo Nationality Flag Overall Potential Club Club Logo ... Work Rate Body Type Real Face Position Joined Contract Valid Until Height Weight Release Clause Kit Number
0 209658 L. Goretzka 27 https://cdn.sofifa.net/players/209/658/23_60.png Germany https://cdn.sofifa.net/flags/de.png 87 88 FC Bayern München https://cdn.sofifa.net/teams/21/30.png ... High/ Medium Unique Yes <span class="pos pos28">SUB Jul 1, 2018 2026 189cm 82kg €157M 8.0
1 212198 Bruno Fernandes 27 https://cdn.sofifa.net/players/212/198/23_60.png Portugal https://cdn.sofifa.net/flags/pt.png 86 87 Manchester United https://cdn.sofifa.net/teams/11/30.png ... High/ High Unique Yes <span class="pos pos15">LCM Jan 30, 2020 2026 179cm 69kg €155M 8.0
2 224334 M. Acuña 30 https://cdn.sofifa.net/players/224/334/23_60.png Argentina https://cdn.sofifa.net/flags/ar.png 85 85 Sevilla FC https://cdn.sofifa.net/teams/481/30.png ... High/ High Stocky (170-185) No <span class="pos pos7">LB Sep 14, 2020 2024 172cm 69kg €97.7M 19.0
3 192985 K. De Bruyne 31 https://cdn.sofifa.net/players/192/985/23_60.png Belgium https://cdn.sofifa.net/flags/be.png 91 91 Manchester City https://cdn.sofifa.net/teams/10/30.png ... High/ High Unique Yes <span class="pos pos13">RCM Aug 30, 2015 2025 181cm 70kg €198.9M 17.0
4 224232 N. Barella 25 https://cdn.sofifa.net/players/224/232/23_60.png Italy https://cdn.sofifa.net/flags/it.png 86 89 Inter https://cdn.sofifa.net/teams/44/30.png ... High/ High Normal (170-) Yes <span class="pos pos13">RCM Sep 1, 2020 2026 172cm 68kg €154.4M 23.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17655 269526 Deng Xiongtao 19 https://cdn.sofifa.net/players/269/526/23_60.png China PR https://cdn.sofifa.net/flags/cn.png 48 61 Meizhou Hakka https://cdn.sofifa.net/teams/114628/30.png ... Medium/ Medium Normal (185+) No <span class="pos pos29">RES Apr 11, 2022 2027 190cm 78kg €218K 35.0
17656 267946 22 Lim Jun Sub 17 https://cdn.sofifa.net/players/267/946/22_60.png Korea Republic https://cdn.sofifa.net/flags/kr.png 48 64 Jeju United FC https://cdn.sofifa.net/teams/1478/30.png ... Medium/ Medium Lean (185+) No <span class="pos pos29">RES Jan 1, 2022 2026 195cm 84kg €188K 21.0
17657 270567 A. Demir 25 https://cdn.sofifa.net/players/270/567/23_60.png Turkey https://cdn.sofifa.net/flags/tr.png 51 56 Ümraniyespor https://cdn.sofifa.net/teams/113796/30.png ... Medium/ Medium Lean (185+) No <span class="pos pos29">RES Jun 6, 2021 2023 190cm 82kg €142K 12.0
17658 256624 21 S. Czajor 18 https://cdn.sofifa.net/players/256/624/21_60.png Poland https://cdn.sofifa.net/flags/pl.png 50 65 Fleetwood Town https://cdn.sofifa.net/teams/112260/30.png ... Medium/ Medium Normal (185+) No <span class="pos pos29">RES Jan 1, 2020 2021 187cm 79kg €214K 40.0
17659 256376 21 F. Jakobsson 20 https://cdn.sofifa.net/players/256/376/21_60.png Sweden https://cdn.sofifa.net/flags/se.png 50 61 IFK Norrköping https://cdn.sofifa.net/teams/702/30.png ... Medium/ Medium Normal (185+) No <span class="pos pos29">RES Jan 8, 2020 2021 186cm 78kg €131K 30.0

17660 rows × 27 columns

df.dropna() 결측치가 포함된 row 제거

df.drop(['Loaned From','Best Overall Rating'],axis=1).dropna()
ID Name Age Photo Nationality Flag Overall Potential Club Club Logo ... Work Rate Body Type Real Face Position Joined Contract Valid Until Height Weight Release Clause Kit Number
0 209658 L. Goretzka 27 https://cdn.sofifa.net/players/209/658/23_60.png Germany https://cdn.sofifa.net/flags/de.png 87 88 FC Bayern München https://cdn.sofifa.net/teams/21/30.png ... High/ Medium Unique Yes <span class="pos pos28">SUB Jul 1, 2018 2026 189cm 82kg €157M 8.0
1 212198 Bruno Fernandes 27 https://cdn.sofifa.net/players/212/198/23_60.png Portugal https://cdn.sofifa.net/flags/pt.png 86 87 Manchester United https://cdn.sofifa.net/teams/11/30.png ... High/ High Unique Yes <span class="pos pos15">LCM Jan 30, 2020 2026 179cm 69kg €155M 8.0
2 224334 M. Acuña 30 https://cdn.sofifa.net/players/224/334/23_60.png Argentina https://cdn.sofifa.net/flags/ar.png 85 85 Sevilla FC https://cdn.sofifa.net/teams/481/30.png ... High/ High Stocky (170-185) No <span class="pos pos7">LB Sep 14, 2020 2024 172cm 69kg €97.7M 19.0
3 192985 K. De Bruyne 31 https://cdn.sofifa.net/players/192/985/23_60.png Belgium https://cdn.sofifa.net/flags/be.png 91 91 Manchester City https://cdn.sofifa.net/teams/10/30.png ... High/ High Unique Yes <span class="pos pos13">RCM Aug 30, 2015 2025 181cm 70kg €198.9M 17.0
4 224232 N. Barella 25 https://cdn.sofifa.net/players/224/232/23_60.png Italy https://cdn.sofifa.net/flags/it.png 86 89 Inter https://cdn.sofifa.net/teams/44/30.png ... High/ High Normal (170-) Yes <span class="pos pos13">RCM Sep 1, 2020 2026 172cm 68kg €154.4M 23.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17655 269526 Deng Xiongtao 19 https://cdn.sofifa.net/players/269/526/23_60.png China PR https://cdn.sofifa.net/flags/cn.png 48 61 Meizhou Hakka https://cdn.sofifa.net/teams/114628/30.png ... Medium/ Medium Normal (185+) No <span class="pos pos29">RES Apr 11, 2022 2027 190cm 78kg €218K 35.0
17656 267946 22 Lim Jun Sub 17 https://cdn.sofifa.net/players/267/946/22_60.png Korea Republic https://cdn.sofifa.net/flags/kr.png 48 64 Jeju United FC https://cdn.sofifa.net/teams/1478/30.png ... Medium/ Medium Lean (185+) No <span class="pos pos29">RES Jan 1, 2022 2026 195cm 84kg €188K 21.0
17657 270567 A. Demir 25 https://cdn.sofifa.net/players/270/567/23_60.png Turkey https://cdn.sofifa.net/flags/tr.png 51 56 Ümraniyespor https://cdn.sofifa.net/teams/113796/30.png ... Medium/ Medium Lean (185+) No <span class="pos pos29">RES Jun 6, 2021 2023 190cm 82kg €142K 12.0
17658 256624 21 S. Czajor 18 https://cdn.sofifa.net/players/256/624/21_60.png Poland https://cdn.sofifa.net/flags/pl.png 50 65 Fleetwood Town https://cdn.sofifa.net/teams/112260/30.png ... Medium/ Medium Normal (185+) No <span class="pos pos29">RES Jan 1, 2020 2021 187cm 79kg €214K 40.0
17659 256376 21 F. Jakobsson 20 https://cdn.sofifa.net/players/256/376/21_60.png Sweden https://cdn.sofifa.net/flags/se.png 50 61 IFK Norrköping https://cdn.sofifa.net/teams/702/30.png ... Medium/ Medium Normal (185+) No <span class="pos pos29">RES Jan 8, 2020 2021 186cm 78kg €131K 30.0

16364 rows × 27 columns

pandas 공부 5단계: 할당

- 자료

np.random.seed(43052)
att = np.random.choice(np.arange(10,21)*5,20)
rep = np.random.choice(np.arange(5,21)*5,20)
mid = np.random.choice(np.arange(0,21)*5,20)
fin = np.random.choice(np.arange(0,21)*5,20)
student_id = [ '2023-12362', '2022-12471', '2023-12333', '2022-12400', '2022-12377',
               '2022-12469', '2023-12314', '2022-12363', '2023-12445', '2023-12336',
               '2023-12426', '2022-12380', '2023-12422', '2022-12488', '2022-12370',
               '2023-12443', '2022-12463', '2023-12491', '2023-12340', '2022-12312' ]
df = pd.DataFrame({'student_id':student_id,'att':att,'rep':rep,'mid':mid,'fin':fin})
df
student_id att rep mid fin
0 2023-12362 65 55 50 40
1 2022-12471 95 100 50 80
2 2023-12333 65 90 60 30
3 2022-12400 55 80 75 80
4 2022-12377 80 30 30 100
5 2022-12469 75 40 100 15
6 2023-12314 65 45 45 90
7 2022-12363 60 60 25 0
8 2023-12445 95 65 20 10
9 2023-12336 90 80 80 20
10 2023-12426 55 75 35 25
11 2022-12380 95 95 45 0
12 2023-12422 95 55 15 35
13 2022-12488 50 80 40 30
14 2022-12370 50 55 15 85
15 2023-12443 95 30 30 95
16 2022-12463 50 50 45 10
17 2023-12491 65 55 15 45
18 2023-12340 70 70 40 35
19 2022-12312 90 90 80 90

Assign

- 방법1: assign을 이용한 추가

df.assign(total = df.att*0.1 + df.rep*0.2 + df.mid*0.35 + df.fin*0.35) 
student_id att rep mid fin total
0 2023-12362 65 55 50 40 49.00
1 2022-12471 95 100 50 80 75.00
2 2023-12333 65 90 60 30 56.00
3 2022-12400 55 80 75 80 75.75
4 2022-12377 80 30 30 100 59.50
5 2022-12469 75 40 100 15 55.75
6 2023-12314 65 45 45 90 62.75
7 2022-12363 60 60 25 0 26.75
8 2023-12445 95 65 20 10 33.00
9 2023-12336 90 80 80 20 60.00
10 2023-12426 55 75 35 25 41.50
11 2022-12380 95 95 45 0 44.25
12 2023-12422 95 55 15 35 38.00
13 2022-12488 50 80 40 30 45.50
14 2022-12370 50 55 15 85 51.00
15 2023-12443 95 30 30 95 59.25
16 2022-12463 50 50 45 10 34.25
17 2023-12491 65 55 15 45 38.50
18 2023-12340 70 70 40 35 47.25
19 2022-12312 90 90 80 90 86.50

이 방법은 df를 일시적으로 변화시킴

Eval

- 방법2: eval을 이용한 추가

df.eval('total = att*0.1 + rep*0.2 + mid*0.3 + fin*0.4') 
student_id att rep mid fin total
0 2023-12362 65 55 50 40 48.5
1 2022-12471 95 100 50 80 76.5
2 2023-12333 65 90 60 30 54.5
3 2022-12400 55 80 75 80 76.0
4 2022-12377 80 30 30 100 63.0
5 2022-12469 75 40 100 15 51.5
6 2023-12314 65 45 45 90 65.0
7 2022-12363 60 60 25 0 25.5
8 2023-12445 95 65 20 10 32.5
9 2023-12336 90 80 80 20 57.0
10 2023-12426 55 75 35 25 41.0
11 2022-12380 95 95 45 0 42.0
12 2023-12422 95 55 15 35 39.0
13 2022-12488 50 80 40 30 45.0
14 2022-12370 50 55 15 85 54.5
15 2023-12443 95 30 30 95 62.5
16 2022-12463 50 50 45 10 32.5
17 2023-12491 65 55 15 45 40.0
18 2023-12340 70 70 40 35 47.0
19 2022-12312 90 90 80 90 87.0

이 방법은 df를 일시적으로 변화시킴

df[‘colname’] = xxx

- 방법3: df['total'] 을 이용한 할당

df['total'] = df.att*0.1 + df.rep*0.2 + df.mid*0.3 + df.fin*0.4
df
student_id att rep mid fin total
0 2023-12362 65 55 50 40 48.5
1 2022-12471 95 100 50 80 76.5
2 2023-12333 65 90 60 30 54.5
3 2022-12400 55 80 75 80 76.0
4 2022-12377 80 30 30 100 63.0
5 2022-12469 75 40 100 15 51.5
6 2023-12314 65 45 45 90 65.0
7 2022-12363 60 60 25 0 25.5
8 2023-12445 95 65 20 10 32.5
9 2023-12336 90 80 80 20 57.0
10 2023-12426 55 75 35 25 41.0
11 2022-12380 95 95 45 0 42.0
12 2023-12422 95 55 15 35 39.0
13 2022-12488 50 80 40 30 45.0
14 2022-12370 50 55 15 85 54.5
15 2023-12443 95 30 30 95 62.5
16 2022-12463 50 50 45 10 32.5
17 2023-12491 65 55 15 45 40.0
18 2023-12340 70 70 40 35 47.0
19 2022-12312 90 90 80 90 87.0

이 방법은 df를 영구적으로 변화시킴

- 문제: 아래의 자료에서 입학년도를 추가하고 싶다면?

np.random.seed(43052)
att = np.random.choice(np.arange(10,21)*5,20)
rep = np.random.choice(np.arange(5,21)*5,20)
mid = np.random.choice(np.arange(0,21)*5,20)
fin = np.random.choice(np.arange(0,21)*5,20)
student_id = [ '2023-12362', '2022-12471', '2023-12333', '2022-12400', '2022-12377',
               '2022-12469', '2023-12314', '2022-12363', '2023-12445', '2023-12336',
               '2023-12426', '2022-12380', '2023-12422', '2022-12488', '2022-12370',
               '2023-12443', '2022-12463', '2023-12491', '2023-12340', '2022-12312' ]
df = pd.DataFrame({'student_id':student_id,'att':att,'rep':rep,'mid':mid,'fin':fin})
df.head()
student_id att rep mid fin
0 2023-12362 65 55 50 40
1 2022-12471 95 100 50 80
2 2023-12333 65 90 60 30
3 2022-12400 55 80 75 80
4 2022-12377 80 30 30 100
df.assign(year=[l.split('-')[0] for l in df.student_id]).query('year=="2022"')
student_id att rep mid fin year
1 2022-12471 95 100 50 80 2022
3 2022-12400 55 80 75 80 2022
4 2022-12377 80 30 30 100 2022
5 2022-12469 75 40 100 15 2022
7 2022-12363 60 60 25 0 2022
11 2022-12380 95 95 45 0 2022
13 2022-12488 50 80 40 30 2022
14 2022-12370 50 55 15 85 2022
16 2022-12463 50 50 45 10 2022
19 2022-12312 90 90 80 90 2022

Quiz

아래는 전북대 통계학과 학생들이 R과 Python을 공부한 평균 시간이다.

np.random.seed(20230426)
day = ['2023-04-24(Mon)','2023-04-25(Tue)','2023-04-26(Wed)','2023-04-27(Thu)','2023-04-28(Fri)',
       '2023-05-01(Mon)','2023-05-02(Tue)','2023-05-03(Wed)','2023-05-04(Thu)','2023-05-05(Fri)',
       '2023-05-08(Mon)','2023-05-09(Tue)','2023-05-10(Wed)','2023-05-11(Thu)','2023-05-12(Fri)']
hours1 = np.random.randn(15).cumsum()*2
hours1 = hours1 - hours1.min() +1
hours2 = np.random.randn(15).cumsum()*2
hours2 = hours2 - hours2.min() +1

df = pd.DataFrame({'hours(R)':hours1, 'hours(Python)':hours2},index=day)
df
hours(R) hours(Python)
2023-04-24(Mon) 11.064829 9.254671
2023-04-25(Tue) 9.790750 7.327548
2023-04-26(Wed) 5.993362 9.185495
2023-04-27(Thu) 7.542498 12.525569
2023-04-28(Fri) 8.598600 10.906909
2023-05-01(Mon) 6.933549 9.865538
2023-05-02(Tue) 6.456987 11.081043
2023-05-03(Wed) 4.976548 10.240239
2023-05-04(Thu) 6.021139 5.822405
2023-05-05(Fri) 1.851839 5.522484
2023-05-08(Mon) 1.000000 4.319094
2023-05-09(Tue) 1.350073 1.000000
2023-05-10(Wed) 3.138700 2.633662
2023-05-11(Thu) 3.153756 4.870860
2023-05-12(Fri) 1.353976 1.785441

(1) 데이터프레임을 변형하여 아래와 같이 만들어라.

#
hours(R) hours(Python) day weekday
0 11.064829 9.254671 2023-04-24 Mon
1 9.790750 7.327548 2023-04-25 Tue
2 5.993362 9.185495 2023-04-26 Wed
3 7.542498 12.525569 2023-04-27 Thu
4 8.598600 10.906909 2023-04-28 Fri
5 6.933549 9.865538 2023-05-01 Mon
6 6.456987 11.081043 2023-05-02 Tue
7 4.976548 10.240239 2023-05-03 Wed
8 6.021139 5.822405 2023-05-04 Thu
9 1.851839 5.522484 2023-05-05 Fri
10 1.000000 4.319094 2023-05-08 Mon
11 1.350073 1.000000 2023-05-09 Tue
12 3.138700 2.633662 2023-05-10 Wed
13 3.153756 4.870860 2023-05-11 Thu
14 1.353976 1.785441 2023-05-12 Fri

(2) 4월달에 전북대학교 학생들은 R과 Python중 어떤 과목을 더 많이 공부하였는가?

(3) ‘월-금’ 사이의 요일중 R을 가장 열심히 공부한 요일은 어느 요일인가?

(4) ’월-금’사이의 요일중 Python과 R의 합계학습량이 가장 큰 요일은 어느 요일인가?

(5) R과 Python의 학습량 차이를 계산하고 (R-Python을 계산) maplotlib 을 이용하여 시각화하라.

(풀이)

# 시각화예시
[<matplotlib.lines.Line2D at 0x7fd1d985ad00>]