04wk-2: 파이썬의 자료형 (7)

Author

최규빈

Published

March 29, 2023

강의영상

youtube: https://youtube.com/playlist?list=PLQqh36zP38-yyWy5IzZwxEX_dHX53cBJh

딕셔너리 고급내용 (2)

key의 조건

- 조건1: 키로 쓸 수 있는 자료형은 정해져 있다.

int O, float O, bool O, str O, list X, tuple O, dict X, set X

(예시1) dict의 키로 int를 사용

dct = {0:[1,2,3], 1:[2,3,4]} 
dct[0] # 인덱싱 하는거 같네?

[1, 2, 3]

dct[-1] # 속았지?

KeyError: -1

(예시2) dict의 키로 float을 사용 <– 이렇게 쓰는 사람 본적이 없어요

dct = {3.14:'π', 2.178:'e'}
dct[3.14]

'π'

(예시3) dict의 키로 bool을 사용

dct = {True: '참이다', False: '거짓이다.'} 
dct

{True: '참이다', False: '거짓이다.'}

dct[1<2]

'참이다'

(예시4) dict의 키로 str을 사용 ( $⋆$ )

dct = {'guebin':[10,20,30,30], 'hanni':[10,20,25,40]}
dct['guebin']

[10, 20, 30, 30]

(예시5) dict의 키로 list를 사용 $\Rightarrow$ 불가능

dct = {[10,20,30,40]: 'guebin', [10,20,25,40]: 'hanni'} 
dct

TypeError: unhashable type: 'list'

(예시6) dict의 키로 tuple 사용 ( $⋆$ )

dct = {(10,20,30,40): 'guebin', (10,20,25,40): 'hanni'} 
dct

{(10, 20, 30, 40): 'guebin', (10, 20, 25, 40): 'hanni'}

dct[(10,20,30,40)]

'guebin'

dct[10,20,30,40]

'guebin'

(예시7) dict의 키로 dict사용 $\Rightarrow$ 불가능

dct = {{0:1}: 'guebin', {1:2}: 'hanni'} 
dct

TypeError: unhashable type: 'dict'

(예시8) dict의 키로 set사용 $\Rightarrow$ 불가능

dct = {{'샌드위치','딸기우유'}:'점심', {'불고기','된장찌개','김','콩자반'}: '저녁'}
dct

TypeError: unhashable type: 'set'

- 조건2: 키는 중복해서 쓸 수 없다.

(예시1)

dct = {0:[1,2,3], 1:[2,3,4], 0:[3,4,5]} # 이렇게 쓰지 마세요
dct

{0: [3, 4, 5], 1: [2, 3, 4]}

value의 조건

- 없다… $\Rightarrow$ dict는 컨테이너형!!

딕셔너리 컴프리헨션

- 예시1

lst = [['딸기','사과'],['오토바이','자동차'],['컴퓨터','아이패드','마우스']]
lst

[['딸기', '사과'], ['오토바이', '자동차'], ['컴퓨터', '아이패드', '마우스']]

{i:lst[i] for i in range(3)}

{0: ['딸기', '사과'], 1: ['오토바이', '자동차'], 2: ['컴퓨터', '아이패드', '마우스']}

- 예시2: key, val을 서로 바꾸는 예시

dct = {'a':(1,0,0,0), 'b':(0,1,0,0), 'c':(0,0,1,0), 'd':(0,0,0,1)}
dct

{'a': (1, 0, 0, 0), 'b': (0, 1, 0, 0), 'c': (0, 0, 1, 0), 'd': (0, 0, 0, 1)}

{v:k for k,v in dct.items() }

{(1, 0, 0, 0): 'a', (0, 1, 0, 0): 'b', (0, 0, 1, 0): 'c', (0, 0, 0, 1): 'd'}

바꿔치기 (3)

- 예제1: 아래와 같은 리스트가 있다고 하자.

lst = list('abcd'*2)
lst

['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']

아래의 규칙에 의하여 lst의 각 원소의 값을 바꾸고 싶다고 하자.

변환전	변환후
‘a’	[1,0,0,0]
‘b’	[0,1,0,0]
‘c’	[0,0,1,0]
‘d’	[0,0,0,1]

이를 구현하는 코드를 작성하고, 역변환하는 코드를 작성하라.

hint: 아래의 dct를 이용할 것

dct = {'a':[1,0,0,0], 'b':[0,1,0,0], 'c':[0,0,1,0], 'd':[0,0,0,1]}
dct

{'a': [1, 0, 0, 0], 'b': [0, 1, 0, 0], 'c': [0, 0, 1, 0], 'd': [0, 0, 0, 1]}

(풀이)

변환하는 코드를 구현하면

lst2= [dct[l] for l in lst] 
lst2

[[1, 0, 0, 0],
 [0, 1, 0, 0],
 [0, 0, 1, 0],
 [0, 0, 0, 1],
 [1, 0, 0, 0],
 [0, 1, 0, 0],
 [0, 0, 1, 0],
 [0, 0, 0, 1]]

역변환하는 코드를 구현하면

(1단계)

dct_inv = {tuple(v):k for k,v in dct.items()}
dct_inv

{(1, 0, 0, 0): 'a', (0, 1, 0, 0): 'b', (0, 0, 1, 0): 'c', (0, 0, 0, 1): 'd'}

(2단계)

[dct_inv[tuple(l)] for l in lst2]

['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']

내생각

위와 같은 코드는 경우에 따라서 아래와 같은 복잡합 코드를 피할 수 있는 장점이 있다.

[x for l in lst2 for x,y in dct.items() if l==y]

['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']

- 예제2: 아래와 같은 리스트가 있다고 하자. – 강의를 재촬영 했습니다.

lst = ['딸기', '사과', '바나나', '바나나', '오토바이', '자동차', '기차']
lst

['딸기', '사과', '바나나', '바나나', '오토바이', '자동차', '기차']

아래와 같은 규칙에 따라서 바꾸고 싶다고 하자.

변환전	변환후
딸기	과일
사과	과일
바나나	과일
오토바이	탈것
자동차	탈것
버스	탈것
기차	탈것

(풀이1)

dct = {'딸기':'과일', '사과':'과일', '바나나':'과일', 
       '오토바이':'탈것', '자동차':'탈것', '버스':'탈것', '기차':'탈것'}
dct

{'딸기': '과일',
 '사과': '과일',
 '바나나': '과일',
 '오토바이': '탈것',
 '자동차': '탈것',
 '버스': '탈것',
 '기차': '탈것'}

[dct[l] for l in lst]

['과일', '과일', '과일', '과일', '탈것', '탈것', '탈것']

(풀이2) – 지난시간에 한 것

dct = {'과일':['딸기','사과','바나나'], '탈것':['오토바이','자동차', '버스', '기차']} 
dct

{'과일': ['딸기', '사과', '바나나'], '탈것': ['오토바이', '자동차', '버스', '기차']}

[k for l in lst for k,v in dct.items() if l in v]

['과일', '과일', '과일', '과일', '탈것', '탈것', '탈것']

(풀이3)

_dct = {l:k for k,v in dct.items() for l in v}
_dct

{'딸기': '과일',
 '사과': '과일',
 '바나나': '과일',
 '오토바이': '탈것',
 '자동차': '탈것',
 '버스': '탈것',
 '기차': '탈것'}

[_dct[l] for l in lst]

['과일', '과일', '과일', '과일', '탈것', '탈것', '탈것']

집합 기본내용

선언

wishlist={'notebook','desktop'}
wishlist

{'desktop', 'notebook'}

원소추출

- 일단 인덱스로는 못합니다.

wishlist={'notebook','desktop'}
wishlist[0]

TypeError: 'set' object is not subscriptable

- 딱히 하는 방법이 없어요.. 그리고 이걸 하는 의미가 없어요.. (원소에 접근해서 뭐하려고??)

원소추가

- 이건 의미가 있음

wishlist={'notebook','desktop'} 
wishlist

{'desktop', 'notebook'}

wishlist.add('ipad')
wishlist

{'desktop', 'ipad', 'notebook'}

wishlist.add('notebook') # 이미 원소로 있는건 추가되지 않음. 
wishlist

{'desktop', 'ipad', 'notebook'}

원소삭제

wishlist={'desktop', 'ipad', 'notebook'}
wishlist

{'desktop', 'ipad', 'notebook'}

wishlist.remove('notebook')

wishlist

{'desktop', 'ipad'}

연산

- in 연산자

wishlist={'desktop', 'ipad', 'notebook'}
wishlist

{'desktop', 'ipad', 'notebook'}

'notebook' in wishlist

True

참고로 in연산자는 집합에서만 쓰는것은 아님

- 합집합, 교집합, 차집합

day1 = {'notebook','desktop'}
day2 = {'notebook','ipad'}

day1 | day2 # 합집합

{'desktop', 'ipad', 'notebook'}

day1 & day2 # 교집합

{'notebook'}

day1 - day2 # 차집합

{'desktop'}

day2 - day1 # 차집합

{'ipad'}

- 부분집합

day1 = {'notebook', 'desktop'}
day2 = day1 | {'ipad'}

day1 < day2  # day1는 day2의 부분집합인가?

True

day2 < day1

False

집합 특수기능

- 합집합

day1 = {'notebook', 'desktop'}
day2 = {'notebook','ipad'}

day1.union(day2)

{'desktop', 'ipad', 'notebook'}

- 나머지 메소드는 스스로 찾아보세요

for문과 set

day1 = {'notebook', 'desktop'}
day2 = {'notebook', 'ipad'}

for i in day1|day2: 
    print(i)

notebook
ipad
desktop

집합 고급내용

set 컴프리헨션

- 예시1

lst = [1,2,1,1,3,4,5]
{l for l in lst}

{1, 2, 3, 4, 5}

유니크한 원소

- 예제1: 아래의 list는 모두 몇 종류의 문자로 이루어져 있는가?

lst=list('asdfasssdfdsasdfasdfasdfasdf')

(풀이)

set(lst)

{'a', 'd', 'f', 's'}

len(set(lst))

- 예제2: 아래의 txt에서 어떠한 종류의 문자가 각각 몇번씩 사용되었는지 빈도를 구하는 코드를 작성하라.

txt = 'asdkflkjahsdlkjfhlaksglkjdhflkgjhlskdfjhglkajhsdlkfjhalsdkf'
txt

'asdkflkjahsdlkjfhlaksglkjdhflkgjhlskdfjhglkajhsdlkfjhalsdkf'

(풀이)

{k:list(txt).count(k) for k in set(txt)}

{'s': 6, 'a': 5, 'g': 3, 'k': 10, 'j': 7, 'h': 7, 'd': 6, 'l': 9, 'f': 6}

HW

Oxford-III: 1–5 // reference

아래는 이미지 파일명들이 저장된 string을 불러오는 코드이다.

import requests
url = 'https://raw.githubusercontent.com/guebin/PP2023/main/posts/01_PythonBasic/Oxford-IIIT.txt'
txt = requests.get(url).content.decode()

이미지파일이 저장된 형식은 아래와 같다.

Abyssinian_1.jpg
British_Shorthair_129.jpg

note: British_Shorthair와 같이 종 이름 사이에 _가 들어있는 경우도 있음.

1. txt를 적당히 변환하여 아래와 같은 list를 만들어라.

lst[:10],lst[810:820]

(['Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian'],
 ['BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair'])

hint1

'Abyssinian_1.jpg\nAbyssinian_10.jpg'.split('\n')

['Abyssinian_1.jpg', 'Abyssinian_10.jpg']

hint2

''.join(['British', 'Shorthair'])

'BritishShorthair'

''.join(['Abyssinian'])

'Abyssinian'

(풀이1)

lst = [''.join(filename.split('_')[:-1]) for filename in txt.split('\n')]

(풀이2)

def f(filename): 
    *name, _ = filename.split('_')
    return ''.join(name)
lst = [f(filename) for filename in txt.split('\n')]

(확인)

lst[:10],lst[810:820]

(['Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian'],
 ['BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair',
  'BritishShorthair'])

2. 그림파일에는 총 몇가지 종류의 고양이와, 몇가지 종류의 강아지가 있는가?

note: 고양이사진은 대문자로 시작하고, 강아지 사진은 소문자로 시작한다.

note: 12종의 고양이, 25종의 강아지가 있음

(풀이)

[s[0].isupper() for s in set(lst)].count(True) # 고양이 12

[s[0].isupper() for s in set(lst)].count(False) # 강아지 25

3. 아래는 1번의 결과로 얻어진 lst의 첫 10개의 원소와 마지막 10개의 원소이다.

lst[:10], lst[-10:]

(['Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian',
  'Abyssinian'],
 ['yorkshireterrier',
  'yorkshireterrier',
  'yorkshireterrier',
  'yorkshireterrier',
  'yorkshireterrier',
  'yorkshireterrier',
  'yorkshireterrier',
  'yorkshireterrier',
  'yorkshireterrier',
  'yorkshireterrier'])

적당한 변환을 정의하여 lst를 아래와 같이 바꾸어라.

lst2[:10], lst2[-10:] # 바뀐 lst

(['cat', 'cat', 'cat', 'cat', 'cat', 'cat', 'cat', 'cat', 'cat', 'cat'],
 ['dog', 'dog', 'dog', 'dog', 'dog', 'dog', 'dog', 'dog', 'dog', 'dog'])

(풀이1)

dct = {'cat':[s for s in set(lst) if s[0].isupper()], 'dog': [s for s in set(lst) if not s[0].isupper()]}
lst2 = [k for l in lst for k,v in dct.items() if l in v]

(풀이2)

def f(fname): 
    return 'cat' if fname[0].isupper() else 'dog'
lst2= [f(fname) for fname in lst]

4. txt에는 강아지사진과 고양이사진이 모두 몇장씩 들어있는가?

## 출력예시

{'dog': 4990, 'cat': 2403}

(풀이)

{k:lst2.count(k) for k in ['dog','cat']}

{'dog': 4990, 'cat': 2403}

5. txt에 각 종별로 몇장의 사진이 있는지 조사하라.

## 출력예시

{'beagle': 200,
 'scottishterrier': 199,
 'newfoundland': 200,
 'Birman': 200,
 'Bombay': 200,
 'pug': 200,
 'germanshorthaired': 200,
 'samoyed': 200,
 'Sphynx': 200,
 'englishsetter': 200,
 'Bengal': 200,
 'MaineCoon': 200,
 'Persian': 200,
 'boxer': 200,
 'staffordshirebullterrier': 191,
 'Siamese': 200,
 'bassethound': 200,
 'wheatenterrier': 200,
 'englishcockerspaniel': 200,
 'Ragdoll': 200,
 'yorkshireterrier': 200,
 'EgyptianMau': 200,
 'BritishShorthair': 200,
 'keeshond': 200,
 'RussianBlue': 200,
 'saintbernard': 200,
 'americanbulldog': 200,
 'Abyssinian': 203,
 'leonberger': 200,
 'greatpyrenees': 200,
 'japanesechin': 200,
 'pomeranian': 200,
 'chihuahua': 200,
 'shibainu': 200,
 'americanpitbullterrier': 200,
 'miniaturepinscher': 200,
 'havanese': 200}

(풀이)

{k:lst.count(k) for k in set(lst)}

{'scottishterrier': 199,
 'RussianBlue': 200,
 'newfoundland': 200,
 'samoyed': 200,
 'Birman': 200,
 'germanshorthaired': 200,
 'japanesechin': 200,
 'yorkshireterrier': 200,
 'miniaturepinscher': 200,
 'leonberger': 200,
 'pomeranian': 200,
 'pug': 200,
 'Siamese': 200,
 'beagle': 200,
 'shibainu': 200,
 'Sphynx': 200,
 'staffordshirebullterrier': 191,
 'MaineCoon': 200,
 'americanbulldog': 200,
 'chihuahua': 200,
 'keeshond': 200,
 'Bengal': 200,
 'saintbernard': 200,
 'greatpyrenees': 200,
 'boxer': 200,
 'englishcockerspaniel': 200,
 'Ragdoll': 200,
 'EgyptianMau': 200,
 'BritishShorthair': 200,
 'Persian': 200,
 'Abyssinian': 203,
 'englishsetter': 200,
 'bassethound': 200,
 'Bombay': 200,
 'americanpitbullterrier': 200,
 'havanese': 200,
 'wheatenterrier': 200}