강의영상

- (1/6) AGI

- (2/6) Fastai를 이용한 언어모델 실습

- (3/6) hello, 간단한 순환신경망 구현 (1)

- (4/6) hello, 간단한 순환신경망 구현 (2)

- (5/6) hello, 간단한 순환신경망 구현 (3)

- (6/6) hello, 간단한 순환신경망 구현 (4)

강인공지능(AGI)?

- https://zdnet.co.kr/view/?no=20160622145838

김 교수는 “인공지능은 사람보다 훌륭한 강인공지능과 사람보다 못하지만 사람에게 도움이 되는 약인공지능으로 나눌 수 있다”며 “90년대 후반부터 부활한 인공지능은 기계학습의 발전, 데이터로부터의 지식 추출, 빠른 컴퓨터와 다양한 데이터로 새로운 지식을 창출할 수 있는 방향으로 발전했다”고 설명했다. 또 “사람들은 지식의 오류를 인정하는 약인공지능으로 갔고 약인공지능은 확률, 통계 이론을 중심으로 발전하고 있다”고 덧붙였다. 강인공지능은 영화 터미네이터, 아이로봇에 나오는 로봇처럼 사람의 능력을 뛰어넘는 인공지능을 의미한다. 사람보다 강한 체력과 지능으로 인간이 못하는 일을 척척 해내는 인공지능이다. 반면 약인공지능은 바이센티니얼맨이나 A.I.에 나오는 로봇으로 감성 등 인간 고유의 특성을 넘을 수 없고 오류가 나기도 하지만 뛰어난 연산능력으로 사람의 업무에 도움을 주는 인공지능이다. 인공지능 초기에는 강인공지능이 대세를 이뤘다. 튜링테스트를 시작으로 초기의 자연어 처리 기능이 등장했지만 강인공지능은 커진 기대감을 충족시키지 못하고 큰 실망감을 안기며 사라졌다. 김 교수는 “70년대 중반부터 과학기술 투자 펀드가 인공지능 연구 지원을 끊었는데 이유는 결과물이 없었기 때문”이라고 설명했다.

- GPT3를 기점으로 언어모델의 발전이 눈부심 $\to$ AGI의 출현이라 말하는 사람도 있음

언어모델 실습

from fastai.text.all import * 
import numpy as np

path = untar_data(URLs.IMDB)

files = get_text_files(path) 
files

(#100002) [Path('/home/cgb3/.fastai/data/imdb/train/neg/7473_4.txt'),Path('/home/cgb3/.fastai/data/imdb/train/neg/11965_4.txt'),Path('/home/cgb3/.fastai/data/imdb/train/neg/11897_3.txt'),Path('/home/cgb3/.fastai/data/imdb/train/neg/9582_4.txt'),Path('/home/cgb3/.fastai/data/imdb/train/neg/2709_2.txt'),Path('/home/cgb3/.fastai/data/imdb/train/neg/7593_2.txt'),Path('/home/cgb3/.fastai/data/imdb/train/neg/3936_2.txt'),Path('/home/cgb3/.fastai/data/imdb/train/neg/12323_4.txt'),Path('/home/cgb3/.fastai/data/imdb/train/neg/576_3.txt'),Path('/home/cgb3/.fastai/data/imdb/train/neg/12303_1.txt')...]

files: path의 모든 하위폴더에 존재하는 text파일들을 목록화하여 저장한것

dls = DataBlock(
    blocks=TextBlock.from_folder(path,is_lm=True), 
    get_items=get_text_files, splitter=RandomSplitter(0.1)
).dataloaders(path,bs=128,seq_len=80)

dls.show_batch()

xxbox: 새로운 텍스트의 시작
xxmaj: 다음단어가 대문자임로 시작함을 의미함 (모든단어는 기본적으로 소문자로 생각함)

lrnr = language_model_learner(dls,AWD_LSTM,metrics=accuracy).to_fp16()

lrnr.fit_one_cycle(5)

lrnr.predict('I liked this movie because',40)

"i liked this movie because though that has it my choice is that it will make me there . THAT is not like the old wooden Riding Horse as far as everyone else . They ca n't translate in"

말이 안되는게 있긴 하겠지만 그럴듯해보임

문제의설계

text = 'h e l l o '*100 
text

'h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o '

tokens = text.split(' ')[:-1]
tokens[:10]

['h', 'e', 'l', 'l', 'o', 'h', 'e', 'l', 'l', 'o']

- 바로직전의 문자로 다음문자를 맞춰보자

hello니까, h $\to$ e, e$\to$ l, l $\to$ l/o (?), o $\to$ h, ...
l 다음에 올 문자가 조금 애매하다.

- 마치 아래의 표에서 $X \to y$인 맵핑을 알아차려 $X$를 보고 $y$를 예측하듯이

X	y
1	2
2	4
3	6
1	2
2	4
3	6
1	2
2	4
...	...

아래의 규칙을 알아차리는 것이 목표이다.

X	y
h	e
e	l
l	l/o
o	h
h	e
...	...

Embedding

- X,y를 설정하자.

len(tokens)

500

X= tokens[:(len(tokens)-1)]
y= tokens[1:]

X[0],y[0]

('h', 'e')

X[1],y[1]

('e', 'l')

print(X[:10])
print(y[:10])

['h', 'e', 'l', 'l', 'o', 'h', 'e', 'l', 'l', 'o']
['e', 'l', 'l', 'o', 'h', 'e', 'l', 'l', 'o', 'h']

- 이제 문자를 숫자로 바꾸어서 컴퓨터가 이해할수 있는 형태, 즉 학습가능한 형태로 만들자.

dic = {'h':0, 'e':1, 'l':2, 'o':3} 
dic

{'h': 0, 'e': 1, 'l': 2, 'o': 3}

dic['h'],dic['e'],dic['l'],dic['o']

(0, 1, 2, 3)

nums = [dic[i] for i in tokens]

tokens[:10], nums[:10]

(['h', 'e', 'l', 'l', 'o', 'h', 'e', 'l', 'l', 'o'],
 [0, 1, 2, 2, 3, 0, 1, 2, 2, 3])

- (맵핑방식1) 아래와 같이 문자와 숫자를 맵핑하였다.

문자(tokens)	숫자(nums)
'h'	0
'e'	1
'l'	2
'l'	2
'o'	3
'h'	0
'e'	1
'l'	2
'l'	2
'o'	3
...	...

- (맵핑방식2) 위의 방식보다 아래의 방식이 더 의미상 좋다. 위의 방식대로 맵핑하면하면 의미가 e=1, l=2가 되는데 그렇다고 해서 l이 e보다 2배 강한 입력을 의미하는 것은 아니잖음?

문자(tokens)	숫자(nums)
'h'	1,0,0,0
'e'	0,1,0,0
'l'	0,0,1,0
'l'	0,0,1,0
'o'	0,0,0,1
'h'	1,0,0,0
'e'	0,1,0,0
'l'	0,0,1,0
'l'	0,0,1,0
'o'	0,0,0,1
...	...

- 맵핑방식2로 처리하고 싶은데, 데이터 전처리 하기가 너무 힘들것 같다.

그런데 이러한것은 빈번하게 일어나는 상황
누군가가 구해놓지 않았을까?
torch.nn.Embedding

- 맵핑방식1의 구현

_x = torch.tensor([[0.0],[1.0],[2.0],[2.0],[3.0],[0.0],[1.0],[2.0],[2.0],[3.0]])
_x

tensor([[0.],
        [1.],
        [2.],
        [2.],
        [3.],
        [0.],
        [1.],
        [2.],
        [2.],
        [3.]])

_l1 = torch.nn.Linear(in_features=1, out_features=20, bias=False)

_l1(_x)

tensor([[-0.0000, -0.0000,  0.0000, -0.0000,  0.0000, -0.0000, -0.0000, -0.0000,
          0.0000, -0.0000,  0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000,
          0.0000, -0.0000,  0.0000, -0.0000],
        [-0.0896, -0.6548,  0.4603, -0.5948,  0.4595, -0.5899, -0.4345, -0.8192,
          0.3320, -0.5660,  0.8664, -0.4064, -0.3070, -0.1649, -0.6602, -0.0315,
          0.8055, -0.7989,  0.4684, -0.6249],
        [-0.1792, -1.3096,  0.9206, -1.1895,  0.9189, -1.1798, -0.8690, -1.6384,
          0.6640, -1.1321,  1.7327, -0.8129, -0.6139, -0.3297, -1.3205, -0.0631,
          1.6110, -1.5978,  0.9368, -1.2498],
        [-0.1792, -1.3096,  0.9206, -1.1895,  0.9189, -1.1798, -0.8690, -1.6384,
          0.6640, -1.1321,  1.7327, -0.8129, -0.6139, -0.3297, -1.3205, -0.0631,
          1.6110, -1.5978,  0.9368, -1.2498],
        [-0.2687, -1.9644,  1.3808, -1.7843,  1.3784, -1.7697, -1.3035, -2.4576,
          0.9961, -1.6981,  2.5991, -1.2193, -0.9209, -0.4946, -1.9807, -0.0946,
          2.4165, -2.3967,  1.4052, -1.8746],
        [-0.0000, -0.0000,  0.0000, -0.0000,  0.0000, -0.0000, -0.0000, -0.0000,
          0.0000, -0.0000,  0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000,
          0.0000, -0.0000,  0.0000, -0.0000],
        [-0.0896, -0.6548,  0.4603, -0.5948,  0.4595, -0.5899, -0.4345, -0.8192,
          0.3320, -0.5660,  0.8664, -0.4064, -0.3070, -0.1649, -0.6602, -0.0315,
          0.8055, -0.7989,  0.4684, -0.6249],
        [-0.1792, -1.3096,  0.9206, -1.1895,  0.9189, -1.1798, -0.8690, -1.6384,
          0.6640, -1.1321,  1.7327, -0.8129, -0.6139, -0.3297, -1.3205, -0.0631,
          1.6110, -1.5978,  0.9368, -1.2498],
        [-0.1792, -1.3096,  0.9206, -1.1895,  0.9189, -1.1798, -0.8690, -1.6384,
          0.6640, -1.1321,  1.7327, -0.8129, -0.6139, -0.3297, -1.3205, -0.0631,
          1.6110, -1.5978,  0.9368, -1.2498],
        [-0.2687, -1.9644,  1.3808, -1.7843,  1.3784, -1.7697, -1.3035, -2.4576,
          0.9961, -1.6981,  2.5991, -1.2193, -0.9209, -0.4946, -1.9807, -0.0946,
          2.4165, -2.3967,  1.4052, -1.8746]], grad_fn=<MmBackward0>)

입력: (10,1)
출력: (10,20)

- 맵핑방식2의 구현

e1= torch.nn.Embedding(num_embeddings=4, embedding_dim=20)

_x = torch.tensor([0,1,2,2,3,0,1,2,2,3])
_x

tensor([0, 1, 2, 2, 3, 0, 1, 2, 2, 3])

e1(_x)

tensor([[-2.3394,  1.3453, -0.3396,  0.5713, -2.2313,  0.8252,  0.1893,  0.6070,
         -0.7624, -0.2676,  1.1180,  0.6737,  1.6577, -1.3665,  1.3162, -0.0132,
          1.4430,  0.0303, -1.1834, -0.3722],
        [ 0.9234, -1.9822,  0.6516, -0.4174,  0.3364, -0.2246,  1.4697, -0.8379,
         -0.1248, -0.7375, -0.8327, -0.9179,  1.2838,  0.0307, -0.8350, -0.3936,
          0.1039,  0.3440,  2.0178, -1.0137],
        [-1.7603, -2.0304,  1.3996, -0.3321, -1.3096, -0.1978, -0.0037,  0.6667,
         -0.6620,  1.3511,  0.3226,  0.3802,  0.1700, -0.1528,  0.4741,  0.2493,
         -0.0702, -0.0315, -1.2255,  0.3966],
        [-1.7603, -2.0304,  1.3996, -0.3321, -1.3096, -0.1978, -0.0037,  0.6667,
         -0.6620,  1.3511,  0.3226,  0.3802,  0.1700, -0.1528,  0.4741,  0.2493,
         -0.0702, -0.0315, -1.2255,  0.3966],
        [ 0.9557, -0.1601, -0.9933, -1.0430, -1.3975, -1.8217,  0.5578, -0.4781,
         -0.8854,  0.8391,  1.3672,  0.3315,  1.5096, -0.7648,  0.0735, -0.6721,
         -0.2264, -0.2406,  0.4416,  0.6807],
        [-2.3394,  1.3453, -0.3396,  0.5713, -2.2313,  0.8252,  0.1893,  0.6070,
         -0.7624, -0.2676,  1.1180,  0.6737,  1.6577, -1.3665,  1.3162, -0.0132,
          1.4430,  0.0303, -1.1834, -0.3722],
        [ 0.9234, -1.9822,  0.6516, -0.4174,  0.3364, -0.2246,  1.4697, -0.8379,
         -0.1248, -0.7375, -0.8327, -0.9179,  1.2838,  0.0307, -0.8350, -0.3936,
          0.1039,  0.3440,  2.0178, -1.0137],
        [-1.7603, -2.0304,  1.3996, -0.3321, -1.3096, -0.1978, -0.0037,  0.6667,
         -0.6620,  1.3511,  0.3226,  0.3802,  0.1700, -0.1528,  0.4741,  0.2493,
         -0.0702, -0.0315, -1.2255,  0.3966],
        [-1.7603, -2.0304,  1.3996, -0.3321, -1.3096, -0.1978, -0.0037,  0.6667,
         -0.6620,  1.3511,  0.3226,  0.3802,  0.1700, -0.1528,  0.4741,  0.2493,
         -0.0702, -0.0315, -1.2255,  0.3966],
        [ 0.9557, -0.1601, -0.9933, -1.0430, -1.3975, -1.8217,  0.5578, -0.4781,
         -0.8854,  0.8391,  1.3672,  0.3315,  1.5096, -0.7648,  0.0735, -0.6721,
         -0.2264, -0.2406,  0.4416,  0.6807]], grad_fn=<EmbeddingBackward0>)

입력 (10,1)
출력 (10,20)

- torch.nn.Linear(), torch.nn.Embedding() 의 차이가 없어보인다? $\to$ 파라메터를 조사하면 차이가 있다

len(list(_l1.parameters())[0])

20

list(e1.parameters())[0].shape

torch.Size([4, 20])

- 결국에는 맵핑방식1의 경우 아래와 같이 이해할 수 있고

${\bf X}$: (10,1)
${\bf W}$: (1,20)
${\bf XW}$: (10,20)

- 맵핑방식2의 경우 아래와 같이 이해가능하다.

${\bf X}$: (10,1)
$\tilde{\bf X}$: (10,4)
${\bf W}$: (4,20)
$\tilde{\bf X}{\bf W}$: (10,20)

- 결국 우리가 맵핑방식2처럼 구현하고 싶다고 해도, 입력은 아래와 같이 넣어도 무방하다. 이후에는 파이토치의 torch.nn.Embedding()이 알아서 해결해준다.

_x

tensor([0, 1, 2, 2, 3, 0, 1, 2, 2, 3])

네트워크 구축

- 이제 숫자화된 자료 nums를 이용하여 다시 X,y를 선언하자.

X = torch.tensor(nums[:499]) 
y = torch.tensor(nums[1:])

X[0],y[0]

(tensor(0), tensor(1))

X[1],y[1]

(tensor(1), tensor(2))

- 간단한 네트워크를 설계하자.

e1=torch.nn.Embedding(num_embeddings=4, embedding_dim=20) 
l1=torch.nn.Linear(in_features=20,out_features=20)
a1=torch.nn.ReLU()
l2=torch.nn.Linear(in_features=20,out_features=4) 
a2=torch.nn.Softmax()

X.shape, e1(X).shape

(torch.Size([499]), torch.Size([499, 20]))

e1(X).shape, a1(l1(e1(X))).shape

(torch.Size([499, 20]), torch.Size([499, 20]))

a1(l1(e1(X))).shape, l2(a1(l1(e1(X)))).shape

(torch.Size([499, 20]), torch.Size([499, 4]))

a2(l2(a1(l1(e1(X))))).shape

<ipython-input-91-6a5d66616296>:1: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  a2(l2(a1(l1(e1(X))))).shape

torch.Size([499, 4])

$X$의 차원이 정확하게 명시되지 않아서 대충 컴퓨터가 알아서 계산했다라는 뜻의 워닝

- 워닝이 찝찝하여 내가 softmax를 수동으로 직접계산해봄

l2(a1(l1(e1(X))))[0]

tensor([-0.1398, -0.1216, -0.0264,  0.2706], grad_fn=<SelectBackward0>)

np.exp(-0.1398)/(np.exp(-0.1398)+np.exp(-0.1216)+np.exp(-0.0264)+np.exp(0.2706))

0.21524507060064613

a2(l2(a1(l1(e1(X)))))[0]

<ipython-input-102-17d00ccf79de>:1: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  a2(l2(a1(l1(e1(X)))))[0]

tensor([0.2152, 0.2192, 0.2411, 0.3245], grad_fn=<SelectBackward0>)

잘 계산된것 같다.

- 순전파의 차원변화 요약

torch.Size([499]) # X
torch.Size([499, 20]) # e1이후
torch.Size([499, 20]) # l1이후  
torch.Size([499, 20]) # a1이후 
torch.Size([499, 4]) # l1이후 
torch.Size([499, 4]) # a2이후 = yhat

net = torch.nn.Sequential(
    torch.nn.Embedding(num_embeddings=4,embedding_dim=20),
    torch.nn.Linear(in_features=20,out_features=20), 
    torch.nn.ReLU(),
    torch.nn.Linear(in_features=20,out_features=4))
    #torch.nn.Softmax()

net(X)

tensor([[-0.0016,  0.0228, -0.0857, -0.1480],
        [ 0.0805,  0.0955,  0.0832, -0.2003],
        [-0.0160, -0.1443,  0.0684, -0.0805],
        ...,
        [ 0.0805,  0.0955,  0.0832, -0.2003],
        [-0.0160, -0.1443,  0.0684, -0.0805],
        [-0.0160, -0.1443,  0.0684, -0.0805]], grad_fn=<AddmmBackward0>)

- 손실함수, 옵티마이저

loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters())

- 학습

for i in range(1000): 
    ## 1 
    yhat = net(X)
    ## 2 
    loss = loss_fn(yhat,y) 
    ## 3 
    loss.backward()
    ## 4 
    optimizer.step()
    optimizer.zero_grad()

X[:7]

tensor([0, 1, 2, 2, 3, 0, 1])

net(X)[:7]

tensor([[-1.1941,  6.0211, -2.7683, -3.4999],
        [-2.9995, -3.0592,  5.9518, -2.5928],
        [-4.0286, -4.0547,  3.8021,  3.8020],
        [-4.0286, -4.0547,  3.8021,  3.8020],
        [ 6.2541, -1.0755, -2.3719, -2.1683],
        [-1.1941,  6.0211, -2.7683, -3.4999],
        [-2.9995, -3.0592,  5.9518, -2.5928]], grad_fn=<SliceBackward0>)

학습이 잘 되었다.

net의 개선

- 단어수가 4에서 바뀔때마다 아래를 반복하여 입력해야할까?

net = torch.nn.Sequential(
    torch.nn.Embedding(num_embeddings=4,embedding_dim=20),
    torch.nn.Linear(in_features=20,out_features=20), 
    torch.nn.ReLU(),
    torch.nn.Linear(in_features=20,out_features=4))
    #torch.nn.Softmax()

- net을 찍어내는 무언가가 있으면 좋겠다. 제가 만들어볼게요!

class BDA(Module): 
    def __init__(self, num_embeddings): 
        self.embedding = torch.nn.Embedding(num_embeddings,20)
        self.linear1 = torch.nn.Linear(in_features=20,out_features=20)
        self.relu = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(in_features=20,out_features=num_embeddings)
    def forward(self, X): # net(X)를 계산해주는 방식 
        u=self.linear1(self.embedding(X))
        v=self.relu(u)
        return self.linear2(v) # net(X)의 결과

net2 = BDA(4)

net

Sequential(
  (0): Embedding(4, 20)
  (1): Linear(in_features=20, out_features=20, bias=True)
  (2): ReLU()
  (3): Linear(in_features=20, out_features=4, bias=True)
)

net2

BDA(
  (embedding): Embedding(4, 20)
  (linear1): Linear(in_features=20, out_features=20, bias=True)
  (relu): ReLU()
  (linear2): Linear(in_features=20, out_features=4, bias=True)
)

- net2도 학습하여 net와 동일한 결과가 나오는지 체크해보자.

loss_fn= torch.nn.CrossEntropyLoss()
optimizer2 = torch.optim.Adam(net2.parameters())

for i in range(1000):
    ## 1 
    yhat = net2(X) 
    ## 2 
    loss = loss_fn(yhat,y) 
    ## 3
    loss.backward()
    ## 4 
    optimizer2.step()
    optimizer2.zero_grad()

net2(X)

tensor([[-2.8823,  5.3388, -1.5988, -2.9157],
        [-2.4622, -2.1139,  5.8411, -3.3147],
        [-3.5933, -2.8342,  4.4629,  4.4633],
        ...,
        [-2.4622, -2.1139,  5.8411, -3.3147],
        [-3.5933, -2.8342,  4.4629,  4.4633],
        [-3.5933, -2.8342,  4.4629,  4.4633]], grad_fn=<AddmmBackward0>)

net(X)

tensor([[-1.1941,  6.0211, -2.7683, -3.4999],
        [-2.9995, -3.0592,  5.9518, -2.5928],
        [-4.0286, -4.0547,  3.8021,  3.8020],
        ...,
        [-2.9995, -3.0592,  5.9518, -2.5928],
        [-4.0286, -4.0547,  3.8021,  3.8020],
        [-4.0286, -4.0547,  3.8021,  3.8020]], grad_fn=<AddmmBackward0>)

- net2도 잘 학습되었다.

이전 2개의 글자를 보고 다음글자를 맞추어보자.

- X,y 를 다시 설정하자.

X = torch.tensor([nums[:498],nums[1:499]]).T
y = torch.tensor(nums[2:])

X[0],y[0] # h,e -> l

(tensor([0, 1]), tensor(2))

X[1],y[1] # e,l -> l

(tensor([1, 2]), tensor(2))

X[2],y[2] # l,l -> o

(tensor([2, 2]), tensor(3))

X[3],y[3] # l,o -> h

(tensor([2, 3]), tensor(0))

- 아키텍처를 대충 스케치하여 보자.

_e1 = torch.nn.Embedding(num_embeddings=4, embedding_dim=20)

X.shape, _e1(X).shape

(torch.Size([498, 2]), torch.Size([498, 2, 20]))

- 이전의 아키텍처는 아래와 같았음

torch.Size([499]) # X
torch.Size([499, 20]) # e1이후
torch.Size([499, 20]) # l1이후  
torch.Size([499, 20]) # a1이후 
torch.Size([499, 4]) # l1이후 
torch.Size([499, 4]) # a2이후 = yhat

- 마지막의 차원을 처리하기 애매해진다. $\to$ 순환망을 설계함

X[:,1]

tensor([1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2])

class BDA2(Module): 
    def __init__(self, num_embeddings): 
        self.embedding = torch.nn.Embedding(num_embeddings,20)
        self.linear1 = torch.nn.Linear(in_features=20,out_features=20)
        self.relu = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(in_features=20,out_features=num_embeddings)
    def forward(self, X): # net(X)를 계산해주는 방식 
        x1=X[:,0] # X의 첫번째 칼럼, y보다 2시점이전  (x1,x2) -> y // (h,e) --> l 
        x2=X[:,1] # X의 두번째 칼럼, y보다 1시점이전 
        h=self.relu(self.linear1(self.embedding(x1))) # x1 -> x2를 예측하는 네트워크의 일부 
        h2=self.relu(self.linear1(h+ self.embedding(x2))) # x2 -> y를 예측하는 네트어크의 일부 
        return self.linear2(h2) # net(X)의 결과

- 결국 최종출력인 self.linear2(h2)는 h와 x2가 담긴 함수이다. 그런데 h는 x1이 담긴 함수이다. 따라서 h2는 x2가 담겨있는 동시에 x1대한 정보도 약하게 담겨있다고 볼 수 있음

net3=BDA2(4) 
net3

BDA2(
  (embedding): Embedding(4, 20)
  (linear1): Linear(in_features=20, out_features=20, bias=True)
  (relu): ReLU()
  (linear2): Linear(in_features=20, out_features=4, bias=True)
)

net2

BDA(
  (embedding): Embedding(4, 20)
  (linear1): Linear(in_features=20, out_features=20, bias=True)
  (relu): ReLU()
  (linear2): Linear(in_features=20, out_features=4, bias=True)
)

구조의 차이는 없지만 순전파의 계산방식이 다른다! (그렇다면 역전파 계산방식도 다르겠죠?)

- 다시 학습해보자.

loss_fn = torch.nn.CrossEntropyLoss() 
optimizer3= torch. optim.Adam(net3.parameters())

for i in range(1000):
    ## 1 
    yhat = net3(X) 
    ## 2 
    loss = loss_fn(yhat,y) 
    ## 3 
    loss.backward()
    ## 4 
    optimizer3.step()
    optimizer3.zero_grad()

X[:5]

tensor([[0, 1],
        [1, 2],
        [2, 2],
        [2, 3],
        [3, 0]])

net3(X)[:5]

tensor([[-3.4906, -2.5577,  5.7092, -5.5080],
        [-3.6006, -2.8640,  5.7962, -2.3766],
        [-1.6940, -3.4435, -2.1906,  5.8796],
        [ 6.6544, -2.4499, -3.2569, -0.3492],
        [-2.6329,  6.2465, -2.2382, -3.0478]], grad_fn=<SliceBackward0>)

h,e $\to$ l
e,l $\to$ l
l,l $\to$ o
l,o $\to$ h
o,h $\to$ e

- 학습이 잘 되었다.

	text	text_
0	xxbos xxmaj ca xxmaj twiste a xxmaj popenguine allows its ' viewers to take a somewhat raw glimpse into the conflict between generations , caused by the colonization of one 's country . xxmaj moussa xxmaj sene xxmaj absa ( director and writer ) does not convey a clear pro or con stance on the changing culture of xxmaj senegal , but does provide a rather complete look at both sides of the issue . xxmaj he provides this raw	xxmaj ca xxmaj twiste a xxmaj popenguine allows its ' viewers to take a somewhat raw glimpse into the conflict between generations , caused by the colonization of one 's country . xxmaj moussa xxmaj sene xxmaj absa ( director and writer ) does not convey a clear pro or con stance on the changing culture of xxmaj senegal , but does provide a rather complete look at both sides of the issue . xxmaj he provides this raw look
1	\n\n 1 / 10 . \n\n xxmaj films this bad are rare . xxmaj ca n't recommend enough that you avoid this like the plague . xxbos xxmaj it is always great to see a movie that teaches us about history in xxmaj africa as they are definitely too few . xxmaj however , the movie depicts xxmaj lumumba as a political leader who wanted the new independent country to be the same as the old colonialist one … ,	1 / 10 . \n\n xxmaj films this bad are rare . xxmaj ca n't recommend enough that you avoid this like the plague . xxbos xxmaj it is always great to see a movie that teaches us about history in xxmaj africa as they are definitely too few . xxmaj however , the movie depicts xxmaj lumumba as a political leader who wanted the new independent country to be the same as the old colonialist one … , i
2	! ! xxbos xxmaj it is such a shame when actors and actresses of high quality get involved with pure crap , probably because they were offered a great deal of money . xxmaj not one of xxmaj helen xxmaj mirren 's better career moves . xxmaj the acting of the " teens " is simply appalling , not helped by a script that is in parts simply inept . \n\n xxmaj most of xxmaj kevin xxmaj williamson 's work	! xxbos xxmaj it is such a shame when actors and actresses of high quality get involved with pure crap , probably because they were offered a great deal of money . xxmaj not one of xxmaj helen xxmaj mirren 's better career moves . xxmaj the acting of the " teens " is simply appalling , not helped by a script that is in parts simply inept . \n\n xxmaj most of xxmaj kevin xxmaj williamson 's work is
3	ask the viewer . xxmaj there are a dozen other ways to have contrived an justifiable plot without putting the viewers through the ordeal and offering the surprise at the end . xxmaj this just sucked . i was angry that i had spent my time to watch it- i highly advise that you save yours and pass on this lump of dirt . xxbos xxmaj this is a film from xxmaj chaplin 's first year in films . xxmaj	the viewer . xxmaj there are a dozen other ways to have contrived an justifiable plot without putting the viewers through the ordeal and offering the surprise at the end . xxmaj this just sucked . i was angry that i had spent my time to watch it- i highly advise that you save yours and pass on this lump of dirt . xxbos xxmaj this is a film from xxmaj chaplin 's first year in films . xxmaj during
4	even allow me that pleasure . \n\n xxmaj please if you want to torture yourself , go ahead watch this . xxbos i could write a long review about how xxmaj lady in the xxmaj water was an unfathomably contrived piece of cinematic shite , but xxmaj i 'm sure plenty of that will be going around so xxmaj i 'll just give a list of the three main reasons i never want to see this movie again . \n\n	allow me that pleasure . \n\n xxmaj please if you want to torture yourself , go ahead watch this . xxbos i could write a long review about how xxmaj lady in the xxmaj water was an unfathomably contrived piece of cinematic shite , but xxmaj i 'm sure plenty of that will be going around so xxmaj i 'll just give a list of the three main reasons i never want to see this movie again . \n\n 1
5	can get the milk for free ? " xxmaj it 's a hilarious adventure of a film about dating and the ultimate search for your " soul mate " or the " one " . xxmaj it is the perfect date movie . i found it to be more than hilarious , with actor / comedian xxmaj ryan xxmaj reynolds stealing the show . \n\n xxmaj for all the xxmaj ryan xxmaj reynolds fans out there this is hands down	get the milk for free ? " xxmaj it 's a hilarious adventure of a film about dating and the ultimate search for your " soul mate " or the " one " . xxmaj it is the perfect date movie . i found it to be more than hilarious , with actor / comedian xxmaj ryan xxmaj reynolds stealing the show . \n\n xxmaj for all the xxmaj ryan xxmaj reynolds fans out there this is hands down the
6	talent in acting and writing came out as quite impressive . i was pleasantly surprised to learn that they were really xxmaj matt and xxmaj trey , later on . xxmaj the humor is sometimes crude , sometimes foul , sometimes brilliant , sometimes subtle , sometimes loud and sometimes stupid but overall , this is one hell of a movie that is without doubt , under rated . xxmaj well actually , its insane . xxmaj totally insane .	in acting and writing came out as quite impressive . i was pleasantly surprised to learn that they were really xxmaj matt and xxmaj trey , later on . xxmaj the humor is sometimes crude , sometimes foul , sometimes brilliant , sometimes subtle , sometimes loud and sometimes stupid but overall , this is one hell of a movie that is without doubt , under rated . xxmaj well actually , its insane . xxmaj totally insane . xxbos
7	the aunts in it ( zelda was reduced to a candle ) and she 's about to get married but she runs off with xxmaj harvey the end . i would have liked to know what happened after . well that s my review and the only thing i can say is the only thing that stayed it 's appealing self through the seven years was xxmaj salem the cat . xxbos xxmaj this is not a horror film ,	aunts in it ( zelda was reduced to a candle ) and she 's about to get married but she runs off with xxmaj harvey the end . i would have liked to know what happened after . well that s my review and the only thing i can say is the only thing that stayed it 's appealing self through the seven years was xxmaj salem the cat . xxbos xxmaj this is not a horror film , but
8	xxup ps : xxmaj xxunk xxrep 3 ? ! ? ! ! xxmaj what were they thinking ? xxup ds xxbos a short review without any spoilers follows . \n\n i saw this movie yesterday at the xxmaj cannes xxmaj film xxmaj festival . xxmaj my initial reaction is one of wonder and happiness . xxmaj i 'm so happy films like this are being made in our age of blockbusters . \n\n xxmaj roy xxmaj andersson 's new movie	ps : xxmaj xxunk xxrep 3 ? ! ? ! ! xxmaj what were they thinking ? xxup ds xxbos a short review without any spoilers follows . \n\n i saw this movie yesterday at the xxmaj cannes xxmaj film xxmaj festival . xxmaj my initial reaction is one of wonder and happiness . xxmaj i 'm so happy films like this are being made in our age of blockbusters . \n\n xxmaj roy xxmaj andersson 's new movie "

epoch	train_loss	valid_loss	accuracy	time
0	4.463490	4.135534	0.283216	07:27
1	4.344551	4.033030	0.290044	07:26
2	4.303035	3.999581	0.292261	07:33
3	4.269669	3.987099	0.293206	07:30
4	4.260648	3.984843	0.293375	07:29