강의영상

- (1/4) 추천시스템 데이터소개

- (2/4) dls 만들기, 학습

- (3/4) bias

- (4/4) 해석

import

import torch 
from fastai.collab import * 
from fastai.tabular.all import *

data

path = untar_data(URLs.ML_100k)

- 첫번째 데이터프레임

ratings=pd.read_csv(path/'u.data', delimiter='\t', header=None, names=['user','movie','rating','timestamp'])
ratings

마지막열은 무의믜

- 두번째 데이터프레임

movies = pd.read_csv(path/'u.item', delimiter='|', encoding='latin-1', usecols=(0,1), names=('movie','title'), header=None)
movies

- 두 데이터프레임을 합친다.

df = ratings.merge(movies)
df

dls

dls = CollabDataLoaders.from_df(df,bs=64,item_name='title') 
dls.show_batch()

learn

lrnr = collab_learner(dls, n_factors=10, y_range=(0,5)) 
lrnr.fit(13)

교재의 loss도 0.82 근처

- 결과를 살펴보자.

lrnr.show_results()

솔직히 다 맞추는 느낌이 있진 않음

learn2

lrnr2 = collab_learner(dls, use_nn=True, y_range=(0,5), layers=[20,10]) 
lrnr2.fit(8)

lrnr2.show_results()

적당한 수준에서 합리적임

bias

lrnr.model

EmbeddingDotBias(
  (u_weight): Embedding(944, 10)
  (i_weight): Embedding(1665, 10)
  (u_bias): Embedding(944, 1)
  (i_bias): Embedding(1665, 1)
)

lrnr.model.i_bias.weight.detach().to('cpu').squeeze()

tensor([ 0.0009, -0.1644,  0.0729,  ...,  0.0088,  0.2506,  0.0984])

의미? 아이테의 바이어스: 평균적으로 높은 평점을 받거나 낮은 평점을 받는 영화들이있는데, 그 정도를 숫자로 표현`

lst1=lrnr.model.i_bias.weight.detach().to('cpu').squeeze().argsort()[:20].tolist()

lst2=lrnr.model.i_bias.weight.detach().to('cpu').squeeze().argsort(descending=True)[:20].tolist()

list(dls.classes['title'][lst1])

['Children of the Corn: The Gathering (1996)',
 'Body Parts (1991)',
 '3 Ninjas: High Noon At Mega Mountain (1998)',
 'Jury Duty (1995)',
 'Amityville II: The Possession (1982)',
 'Theodore Rex (1995)',
 'Lawnmower Man 2: Beyond Cyberspace (1996)',
 'Dunston Checks In (1996)',
 'Crow: City of Angels, The (1996)',
 'Barb Wire (1996)',
 'Robocop 3 (1993)',
 'Amityville 3-D (1983)',
 'Amityville: A New Generation (1993)',
 'Bloodsport 2 (1995)',
 'Island of Dr. Moreau, The (1996)',
 'Solo (1996)',
 'Bushwhacked (1995)',
 'Big Bully (1996)',
 'Gordy (1995)',
 'Amityville Curse, The (1990)']

비인기

list(dls.classes['title'][lst2])

['Close Shave, A (1995)',
 'As Good As It Gets (1997)',
 'L.A. Confidential (1997)',
 "Schindler's List (1993)",
 'Silence of the Lambs, The (1991)',
 'Rear Window (1954)',
 'Titanic (1997)',
 'Apt Pupil (1998)',
 'Wrong Trousers, The (1993)',
 'Good Will Hunting (1997)',
 'Henry V (1989)',
 'North by Northwest (1959)',
 'Vertigo (1958)',
 'Sunset Blvd. (1950)',
 'Shawshank Redemption, The (1994)',
 'To Kill a Mockingbird (1962)',
 'Fugitive, The (1993)',
 'Full Monty, The (1997)',
 'Blade Runner (1982)',
 'Treasure of the Sierra Madre, The (1948)']

인기

- 모형이 잘 학습된것 같다.

예측

- 타이타닉(1501)과 로보캅3(1251)에 관심을 가지자.

x,y = dls.one_batch()

x[:5]

tensor([[ 782, 1315],
        [ 145, 1207],
        [ 823, 1508],
        [ 452, 1157],
        [ 794, 1524]])

- 1~30번까지의 유저가 타이타닉(1501)을 어떻게 생각할지? 재미있게 생각한다.

xx = torch.tensor([[i,1501] for i in range(1,31)])

lrnr.model(xx.to("cuda:0"))

tensor([4.4531, 4.4386, 3.3761, 4.8889, 3.8445, 3.5618, 4.3848, 4.6649, 4.6996,
        4.4931, 4.1034, 4.7730, 4.0311, 4.2925, 3.6165, 4.8764, 3.8198, 4.0804,
        4.0261, 3.7349, 4.0564, 4.6282, 3.8605, 4.6444, 4.5684, 3.8845, 4.0185,
        4.4556, 4.2731, 4.4253], device='cuda:0', grad_fn=<AddBackward0>)

lrnr2.model(xx.to("cuda:0")).reshape(-1)

tensor([4.3266, 4.6268, 3.6982, 4.7522, 4.2184, 3.9160, 4.6478, 4.6014, 4.5140,
        4.2847, 3.8976, 4.5324, 3.9723, 4.0141, 3.6544, 4.7591, 4.0607, 4.2835,
        3.7749, 3.3524, 4.0209, 4.6510, 3.8351, 4.6803, 4.2459, 3.8215, 3.9208,
        4.4372, 4.2947, 4.3690], device='cuda:0',
       grad_fn=<ReshapeAliasBackward0>)

- 1~30번까지의 유저가 로보캅3(1251)을 어떻게 생각할지? 재미없게 생각한다.

xx = torch.tensor([[i,1251] for i in range(1,31)])

lrnr.model(xx.to("cuda:0"))

tensor([1.2492, 1.5321, 1.7723, 2.0212, 1.5710, 1.4888, 2.3363, 1.6964, 1.9715,
        2.4178, 2.2392, 2.1472, 1.6472, 2.1953, 1.7494, 1.3403, 1.7519, 2.2229,
        2.3043, 2.3619, 1.3508, 1.1981, 1.9056, 2.0056, 2.2205, 1.5279, 1.7869,
        1.7508, 2.0724, 2.4507], device='cuda:0', grad_fn=<AddBackward0>)

lrnr2.model(xx.to("cuda:0")).reshape(-1)

tensor([1.3524, 1.9509, 1.3900, 2.1394, 1.6396, 0.9705, 2.4360, 1.8458, 2.0537,
        2.2466, 1.8744, 2.5842, 1.7164, 2.3071, 1.2227, 1.8126, 1.0431, 1.8193,
        2.2985, 2.4256, 1.1279, 1.1289, 1.4594, 2.2158, 2.8599, 1.6962, 1.2229,
        1.8232, 1.7359, 1.6326], device='cuda:0',
       grad_fn=<ReshapeAliasBackward0>)

	user	movie	rating	timestamp
0	196	242	3	881250949
1	186	302	3	891717742
2	22	377	1	878887116
3	244	51	2	880606923
4	166	346	1	886397596
...	...	...	...	...
99995	880	476	3	880175444
99996	716	204	5	879795543
99997	276	1090	1	874795795
99998	13	225	2	882399156
99999	12	203	3	879959583

	movie	title
0	1	Toy Story (1995)
1	2	GoldenEye (1995)
2	3	Four Rooms (1995)
3	4	Get Shorty (1995)
4	5	Copycat (1995)
...	...	...
1677	1678	Mat' i syn (1997)
1678	1679	B. Monkey (1998)
1679	1680	Sliding Doors (1998)
1680	1681	You So Crazy (1994)
1681	1682	Scream of Stone (Schrei aus Stein) (1991)

	user	movie	rating	timestamp	title
0	196	242	3	881250949	Kolya (1996)
1	63	242	3	875747190	Kolya (1996)
2	226	242	5	883888671	Kolya (1996)
3	154	242	3	879138235	Kolya (1996)
4	306	242	5	876503793	Kolya (1996)
...	...	...	...	...	...
99995	840	1674	4	891211682	Mamma Roma (1962)
99996	655	1640	3	888474646	Eighth Day, The (1996)
99997	655	1637	3	888984255	Girls Town (1996)
99998	655	1630	3	887428735	Silence of the Palace, The (Saimt el Qusur) (1994)
99999	655	1641	3	887427810	Dadetown (1995)

	user	title	rating
0	853	Hoodlum (1997)	4
1	384	Jackal, The (1997)	4
2	721	Robert A. Heinlein's The Puppet Masters (1994)	3
3	840	Rear Window (1954)	5
4	429	Pink Floyd - The Wall (1982)	3
5	536	Age of Innocence, The (1993)	3
6	763	Amadeus (1984)	4
7	913	Rear Window (1954)	4
8	276	Eraser (1996)	3
9	645	Cook the Thief His Wife & Her Lover, The (1989)	4

epoch	train_loss	valid_loss	time
0	1.142334	1.111295	00:04
1	0.919599	0.928836	00:03
2	0.865876	0.896597	00:03
3	0.853443	0.881308	00:03
4	0.860030	0.872683	00:03
5	0.849131	0.864368	00:03
6	0.827771	0.854535	00:03
7	0.797815	0.844754	00:03
8	0.810200	0.836310	00:03
9	0.756194	0.830930	00:03
10	0.766496	0.826815	00:03
11	0.734313	0.823833	00:03
12	0.726935	0.822590	00:03

	user	title	rating	rating_pred
0	922	320	3	3.366130
1	75	245	2	3.019973
2	82	885	3	2.705256
3	25	42	4	4.255459
4	16	390	5	4.548031
5	488	33	4	3.670551
6	796	1477	3	3.682889
7	887	686	3	4.129278
8	297	442	1	3.848483

epoch	train_loss	valid_loss	time
0	0.943745	0.911532	00:05
1	0.886727	0.887183	00:05
2	0.851722	0.876992	00:04
3	0.866142	0.875833	00:04
4	0.804943	0.872449	00:04
5	0.810429	0.877015	00:04
6	0.753599	0.881696	00:04
7	0.722334	0.891261	00:04

	user	title	rating	rating_pred
0	446	861	1	2.672008
1	311	1078	4	4.112122
2	234	909	4	3.065944
3	342	314	3	3.664986
4	823	95	3	3.768519
5	804	188	4	2.877244
6	234	497	4	3.379913
7	533	225	4	3.330431
8	848	927	5	4.673620