강의영상

- (1/10) 드랍아웃, 배치추가 (1)

- (2/10) 드랍아웃, 배치추가 (2)

- (3/10) resnet34

- (4/10) 모형을 뜯어보는 방법 (직접만든모델)

- (5/10) resnet34 의 구조 (1)

- (6/10) resnet34 의 구조 (2)

- (7/10) $y$ 의 형태에 따라 달라지는 오차항가정/활성화함수/손실함수, 딥러닝의 연구의 4가지 축, 설명가능한 딥러닝의 필요성

- (8/10) CAM (1)

- (9/10) CAM (2)

- (10/10) CAM (3)

import

import torch 
from fastai.vision.all import *

import graphviz

def gv(s): return graphviz.Source('digraph G{ rankdir="LR"'+ s + ';}')

data

- download data

path = untar_data(URLs.MNIST_SAMPLE)

path.ls()

(#3) [Path('/home/cgb4/.fastai/data/mnist_sample/labels.csv'),Path('/home/cgb4/.fastai/data/mnist_sample/train'),Path('/home/cgb4/.fastai/data/mnist_sample/valid')]

- list

threes=(path/'train'/'3').ls()
sevens=(path/'train'/'7').ls()

- list $\to$ image

Image.open(threes[4])

- image $\to$ tensor

tensor(Image.open(threes[4]))

tensor([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   1,  72, 156,
         241, 254, 255, 188,   9,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  17, 168, 250, 232,
         147,  79, 143, 254,  25,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0, 109, 231, 164,  39,   0,
           0,   0,  86, 251,  24,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,  81,  40,   0,   0,   0,
           0,   4, 200, 157,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,  92, 249,  27,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           5, 221, 128,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         147, 185,  19,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 137,
         224,  20,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 137, 239,
          68,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  83, 239,  95,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  83, 245, 104,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 179, 254, 224, 217,
         147,  36,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  15,  44, 117, 117,
         196, 237, 104,   7,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           6, 117, 246,  95,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,  85, 241,  22,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0, 225, 102,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0, 170, 131,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  28, 104,   0,   0,
           0,   0,  17, 234,  87,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   1, 198, 179,  29,
           0,  42, 199, 235,  17,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  14, 154, 236,
         250, 252, 163,  12,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0]],
       dtype=torch.uint8)

여기에서 tensor는 파이토치가 아니라 fastai에서 구현한 함수임

- 여러개의 리스트를 모두 텐서로 바꿔보자.

seven_tensor = torch.stack([tensor(Image.open(i)) for i in sevens]).float()/255
three_tensor = torch.stack([tensor(Image.open(i)) for i in threes]).float()/255

- $X$ 와 $y$ 를 만들자.

seven_tensor.shape, three_tensor.shape

(torch.Size([6265, 28, 28]), torch.Size([6131, 28, 28]))

y=torch.tensor([0.0]*6265+ [1.0]*6131).reshape(12396,1)

X=torch.vstack([seven_tensor,three_tensor]).reshape(12396,-1)

X.shape, y.shape

(torch.Size([12396, 784]), torch.Size([12396, 1]))

X=X.reshape(12396,1,28,28)

X.shape

torch.Size([12396, 1, 28, 28])

1. 지난시간까지의 모형 (직접네트워크설계, pytorch)

2d convolution with windowsize=5

c1=torch.nn.Conv2d(1,16,5) # 입력채널=1 (흑백이므로), 출력채널=16, 윈도우크기5

X.shape, c1(X).shape

(torch.Size([12396, 1, 28, 28]), torch.Size([12396, 16, 24, 24]))

MaxPool2d

m1=torch.nn.MaxPool2d(2)

X.shape,c1(X).shape,m1(c1(X)).shape

(torch.Size([12396, 1, 28, 28]),
 torch.Size([12396, 16, 24, 24]),
 torch.Size([12396, 16, 12, 12]))

ReLU

a1=torch.nn.ReLU()

X.shape,c1(X).shape, m1(c1(X)).shape, a1(m1(c1(X))).shape

(torch.Size([12396, 1, 28, 28]),
 torch.Size([12396, 16, 24, 24]),
 torch.Size([12396, 16, 12, 12]),
 torch.Size([12396, 16, 12, 12]))

flatten

class Flatten(torch.nn.Module):
    def forward(self,x): 
        return x.reshape(12396,-1)

flatten=Flatten()

X.shape,c1(X).shape, m1(c1(X)).shape, a1(m1(c1(X))).shape, flatten(a1(m1(c1(X)))).shape

(torch.Size([12396, 1, 28, 28]),
 torch.Size([12396, 16, 24, 24]),
 torch.Size([12396, 16, 12, 12]),
 torch.Size([12396, 16, 12, 12]),
 torch.Size([12396, 2304]))

linear

l1=torch.nn.Linear(in_features=2304,out_features=1)

X.shape,\
c1(X).shape, \
m1(c1(X)).shape, \
a1(m1(c1(X))).shape, \
flatten(a1(m1(c1(X)))).shape, \
l1(flatten(a1(m1(c1(X))))).shape

(torch.Size([12396, 1, 28, 28]),
 torch.Size([12396, 16, 24, 24]),
 torch.Size([12396, 16, 12, 12]),
 torch.Size([12396, 16, 12, 12]),
 torch.Size([12396, 2304]),
 torch.Size([12396, 1]))

plt.plot(l1(flatten(a1(m1(c1(X))))).data)

[<matplotlib.lines.Line2D at 0x7f3f9b27b310>]

networks 설계

net = nn.Sequential(c1,m1,a1,flatten,l1)
## 마지막의 sigmoid는 생략한다. torch.nn..BCEWithLogitsLoss()에 내장되어 있을것이므로

- 손실함수와 옵티마이저 정의

loss_fn=torch.nn.BCEWithLogitsLoss()
optimizer= torch.optim.Adam(net.parameters())

- step1~4

for epoc in range(200): 
    ## 1 
    yhat=net(X)
    ## 2 
    loss=loss_fn(yhat,y) 
    ## 3 
    loss.backward()
    ## 4 
    optimizer.step()
    net.zero_grad()

a2= torch.nn.Sigmoid()

plt.plot(y)
plt.plot(a2(yhat.data),'.')

[<matplotlib.lines.Line2D at 0x7f3f9b25f7c0>]

ypred=a2(yhat.data)>0.5

sum(ypred==y)/12396

tensor([0.9938])

2. 드랍이웃, 배치추가 (직접네트워크설계, pytorch+fastai)

step1: dls를 만들자.

ds=torch.utils.data.TensorDataset(X,y)

ds.tensors[0].shape

torch.Size([12396, 1, 28, 28])

ds1,ds2 = torch.utils.data.random_split(ds,[10000,2396])

dl1 = torch.utils.data.DataLoader(ds1,batch_size=500) 
dl2 = torch.utils.data.DataLoader(ds2,batch_size=2396)

dls=DataLoaders(dl1,dl2)

step2: 아키텍처, 손실함수, 옵티마이저

class Flatten(torch.nn.Module):
    def forward(self,x): 
        return x.reshape(x.shape[0],-1)

net=torch.nn.Sequential(
    torch.nn.Conv2d(1,16,5), 
    torch.nn.MaxPool2d(2), 
    torch.nn.ReLU(),
    torch.nn.Dropout2d(), 
    Flatten(),
    torch.nn.Linear(2304,1))

loss_fn=torch.nn.BCEWithLogitsLoss()
#optimizer= torch.optim.Adam(net.parameters())

step3: lrnr 생성 후 적합

lrnr1 = Learner(dls,net,opt_func=Adam,loss_func=loss_fn)

lrnr1.fit(10)

- 결과를 시각화하면 아래와 같다.

plt.plot(a2(net(X.to("cuda:0")).to("cpu").data),'.')

[<matplotlib.lines.Line2D at 0x7f3f9b66d9d0>]

- 빠르고 적합결과도 좋음

3. resnet34 (기존의 네트워크 사용, 순수 fastai)

- 데이터로부터 새로운 데이터로더스를 만들고 이를 dls2라고 하자.

path=untar_data(URLs.MNIST_SAMPLE) 
path

Path('/home/cgb4/.fastai/data/mnist_sample')

dls2=ImageDataLoaders.from_folder(
    path,
    train='train',
    valid_pct=0.2)

- 러너오브젝트를 생성하고 학습하자.

lrnr2=cnn_learner(dls2,resnet34,metrics=error_rate)
lrnr2.fine_tune(1)

- 결과관찰

lrnr2.show_results()

모형을 뜯어보는 방법 (lrnr1.model)

- 우선 방법2로 돌아가자.

net(X.to("cuda:0"))

tensor([[-8.1382],
        [-6.9877],
        [ 0.7937],
        ...,
        [12.1038],
        [15.0634],
        [ 7.9055]], device='cuda:0', grad_fn=<AddmmBackward>)

- 네트워크 구조

net

Sequential(
  (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1))
  (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (2): ReLU()
  (3): Dropout2d(p=0.5, inplace=False)
  (4): Flatten()
  (5): Linear(in_features=2304, out_features=1, bias=True)
)

- 층별변환과정

print(X.shape, '--> input image')
print(net[0](X.to("cuda:0")).shape, '--> 2dConv')
print(net[1](net[0](X.to("cuda:0"))).shape, '--> MaxPool2d')
print(net[2](net[1](net[0](X.to("cuda:0")))).shape, '--> ReLU')
print(net[3](net[2](net[1](net[0](X.to("cuda:0"))))).shape, '--> Dropout2d')
print(net[4](net[3](net[2](net[1](net[0](X.to("cuda:0")))))).shape, '--> Flatten')
print(net[5](net[4](net[3](net[2](net[1](net[0](X.to("cuda:0"))))))).shape, '--> Linear')

torch.Size([12396, 1, 28, 28]) --> input image
torch.Size([12396, 16, 24, 24]) --> 2dConv
torch.Size([12396, 16, 12, 12]) --> MaxPool2d
torch.Size([12396, 16, 12, 12]) --> ReLU
torch.Size([12396, 16, 12, 12]) --> Dropout2d
torch.Size([12396, 2304]) --> Flatten
torch.Size([12396, 1]) --> Linear

- 최종결과

net[5](net[4](net[3](net[2](net[1](net[0](X.to("cuda:0")))))))

tensor([[-8.1382],
        [-6.9877],
        [ 0.7937],
        ...,
        [12.1038],
        [15.0634],
        [ 7.9055]], device='cuda:0', grad_fn=<AddmmBackward>)

net(X.to("cuda:0"))

tensor([[-8.1382],
        [-6.9877],
        [ 0.7937],
        ...,
        [12.1038],
        [15.0634],
        [ 7.9055]], device='cuda:0', grad_fn=<AddmmBackward>)

- lrnr1자체를 활용해도 층별변환과정을 추적할수 있음. (lrnr1.model = net 임을 이용)

lrnr1.model

Sequential(
  (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1))
  (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (2): ReLU()
  (3): Dropout2d(p=0.5, inplace=False)
  (4): Flatten()
  (5): Linear(in_features=2304, out_features=1, bias=True)
)

lrnr1.model[0]

Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1))

lrnr1.model(X.to("cuda:0"))

tensor([[-8.1382],
        [-6.9877],
        [ 0.7937],
        ...,
        [12.1038],
        [15.0634],
        [ 7.9055]], device='cuda:0', grad_fn=<AddmmBackward>)

print(X.shape, '--> input image')
print(lrnr1.model[0](X.to("cuda:0")).shape, '--> 2dConv')
print(lrnr1.model[1](lrnr1.model[0](X.to("cuda:0"))).shape, '--> MaxPool2d')
print(lrnr1.model[2](lrnr1.model[1](lrnr1.model[0](X.to("cuda:0")))).shape, '--> ReLU')
print(lrnr1.model[3](lrnr1.model[2](lrnr1.model[1](lrnr1.model[0](X.to("cuda:0"))))).shape, '--> Dropout2d')
print(lrnr1.model[4](lrnr1.model[3](lrnr1.model[2](lrnr1.model[1](lrnr1.model[0](X.to("cuda:0")))))).shape, '--> Flatten')
print(lrnr1.model[5](lrnr1.model[4](lrnr1.model[3](lrnr1.model[2](lrnr1.model[1](lrnr1.model[0](X.to("cuda:0"))))))).shape, '--> Linear')

torch.Size([12396, 1, 28, 28]) --> input image
torch.Size([12396, 16, 24, 24]) --> 2dConv
torch.Size([12396, 16, 12, 12]) --> MaxPool2d
torch.Size([12396, 16, 12, 12]) --> ReLU
torch.Size([12396, 16, 12, 12]) --> Dropout2d
torch.Size([12396, 2304]) --> Flatten
torch.Size([12396, 1]) --> Linear

- 정리: 모형은 항상 아래와 같이 2d-part 와 1d-part로 나누어진다.

torch.Size([12396, 1, 28, 28]) --> input image
torch.Size([12396, 16, 24, 24]) --> 2dConv
torch.Size([12396, 16, 12, 12]) --> MaxPool2d
torch.Size([12396, 16, 12, 12]) --> ReLU
torch.Size([12396, 16, 12, 12]) --> Dropout2d
===============================================================
torch.Size([12396, 2304]) --> Flatten
torch.Size([12396, 1]) --> Linear

- 2d-part:

2d선형변환: nn.torch.nn.Conv2d()
2d비선형변환: torch.nn.MaxPool2d(), torch.nn.ReLU()

- 1d-part:

1d선형변환: torch.nn.Linear()
1d비선형변환: torch.nn.ReLU()

_net1=torch.nn.Sequential(
    net[0],
    net[1],
    net[2],
    net[3])
_net2=torch.nn.Sequential(
    net[4],
    net[5])

_net1

Sequential(
  (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1))
  (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (2): ReLU()
  (3): Dropout2d(p=0.5, inplace=False)
)

_net2

Sequential(
  (0): Flatten()
  (1): Linear(in_features=2304, out_features=1, bias=True)
)

_net=torch.nn.Sequential(_net1,_net2)

_net[1](_net[0](X.to('cuda:0')))

tensor([[-8.1382],
        [-6.9877],
        [ 0.7937],
        ...,
        [12.1038],
        [15.0634],
        [ 7.9055]], device='cuda:0', grad_fn=<AddmmBackward>)

lrnr2.model 분석

- 아래의 모형은 현재 가장 성능이 좋은 모형(state of the art)중 하나인 resnet이다.

lrnr2.model

Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (5): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (6): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (4): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (5): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (7): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (1): Sequential(
    (0): AdaptiveConcatPool2d(
      (ap): AdaptiveAvgPool2d(output_size=1)
      (mp): AdaptiveMaxPool2d(output_size=1)
    )
    (1): Flatten(full=False)
    (2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.25, inplace=False)
    (4): Linear(in_features=1024, out_features=512, bias=False)
    (5): ReLU(inplace=True)
    (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.5, inplace=False)
    (8): Linear(in_features=512, out_features=2, bias=False)
  )
)

- 특징

2d-part: 입력채널이3이다, Conv2d에 padding/stride의 옵션이 있다, 드랍아웃이 없다, 배치정규화가 있다.
1d-part: 배치정규화가 있다, 출력의 차원이 2이다.

DLS, Networks

네트워크의 형태에 따라서 dls의 형태도 다르게 만들어야 한다.
MLP모형: 입력이 $784$ , 첫 네트워크의 형태가 $784 \to 30$ 인 torch.nn.Linear()
CNN모형: 입력이 $1\times 28 \times 28$ , 첫 네트워크의 형태가 $1\times 28 \times 28 \to 16 \times 24 \times 24$ 인 torch.nn.Conv2d()
Resnet34: 입력이 $3\times 28 \times 28$ , 첫 네트워크의 형태가 $3\times 28 \times 28 \to ??$

참고

$y$	분포가정	마지막층의 활성화함수	손실함수(파이토치)
3.45, 4.43, ... (연속형)	정규분포	Linear	MSEloss
0 or 1	이항분포(베르누이)	Sigmoid	BCEloss
[0,0,1], [0,1,0], [1,0,0]	다항분포	Softmax	CrossEntropyLoss

딥러닝 연구의 네가지 축

(1) 아키텍처 $(\star)$

한 영역의 전문적인 지식이 필요한 것이 아닌것 같다.
끈기, 약간의 운, 직관, 좋은컴퓨터..

(2) 손실함수

통계적지식필요 // 기존의 손실함수를 변형하는 형태 (패널티텀활용)

(3) 미분계산

병렬처리등에 대한 지식 필요

(4) 옵티마이저

최적화에 대한 이론적 토대 필요

- 딥러닝 이전까지의 아키텍처에 대한 연구

파라메트릭 모형: 전문가
넌파라메트릭 모형: 전문가
딥러닝: 상대적으로 비전문가

- 특징: 비전문가도 만들수 있다 + 블랙박스 (내부연산을 뜯어볼 수는 있지만 우리가 해석하기 어려움)

- 설명가능한 딥러닝에 대한 요구 (XAI)

설명가능한 CNN모형

- 현재까지의 모형

1단계: 2d선형변환 $\to$ 2d비선형변환
2단계: Flatten $\to$ MLP

- lrnr1(제가만들었던 모형)의 모형을 다시 복습

lrnr1.model

Sequential(
  (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1))
  (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (2): ReLU()
  (3): Dropout2d(p=0.5, inplace=False)
  (4): Flatten()
  (5): Linear(in_features=2304, out_features=1, bias=True)
)

net1=torch.nn.Sequential(
    lrnr1.model[0],
    lrnr1.model[1],
    lrnr1.model[2],
    lrnr1.model[3])

net1(X.to('cuda:0')).shape

torch.Size([12396, 16, 12, 12])

- 1단계까지의 출력결과를 시각화

fig, axs = plt.subplots(4,4) 
k=0
for i in range(4):
    for j in range(4):
        axs[i,j].imshow(net1(X.to("cuda:0"))[0][k].to("cpu").data)
        k=k+1
fig.set_figheight(8)
fig.set_figwidth(8)
fig.tight_layout()

net1은 유지+ net2의 구조를 변경!!

lrnr1.model

Sequential(
  (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1))
  (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (2): ReLU()
  (3): Dropout2d(p=0.5, inplace=False)
  (4): Flatten()
  (5): Linear(in_features=2304, out_features=1, bias=True)
)

- 계획

변경전net2: $(n,16,12,12) \overset{flatten}{\Longrightarrow} (n,?) \overset{Linear(?,1)}{\Longrightarrow} (n,1)$
변경후net2: $(n,16,12,12) \overset{gap+flatten}{\Longrightarrow} (n,16) \overset{Linear(16,1)}{\Longrightarrow} (n,1)$

- gap: 12 $\times$ 12 픽셀을 평균내서 하나의 값으로 대표하자 (왜?)

ap=torch.nn.AdaptiveAvgPool2d(output_size=1)

ap(net1(X.to("cuda:0"))).shape

torch.Size([12396, 16, 1, 1])

--

보충학습:ap는 그냥 평균

torch.tensor([[0.1,0.2],[0.3,0.4]])

tensor([[0.1000, 0.2000],
        [0.3000, 0.4000]])

ap(torch.tensor([[0.1,0.2],[0.3,0.4]]))

tensor([[0.2500]])

--

- flatten

flatten(ap(net1(X.to("cuda:0")))).shape

torch.Size([12396, 16])

- linear

_l1=torch.nn.Linear(16,1,bias=False)

_l1.to("cuda:0")

Linear(in_features=16, out_features=1, bias=False)

_l1(flatten(ap(net1(X.to("cuda:0"))))).shape

torch.Size([12396, 1])

- 이걸 net2로 구성하자. $\to$ (net1,net2)를 묶어서 하나의 새로운 네트워크를 만들자.

net2=torch.nn.Sequential(
    torch.nn.AdaptiveAvgPool2d(1),
    Flatten(),
    torch.nn.Linear(16,1,bias=False))

net=torch.nn.Sequential(net1,net2) 
net

Sequential(
  (0): Sequential(
    (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): ReLU()
    (3): Dropout2d(p=0.5, inplace=False)
  )
  (1): Sequential(
    (0): AdaptiveAvgPool2d(output_size=1)
    (1): Flatten()
    (2): Linear(in_features=16, out_features=1, bias=False)
  )
)

- 수정된 네트워크로 lrnr3을 만들고 재학습

ds=torch.utils.data.TensorDataset(X,y)
ds1,ds2=torch.utils.data.random_split(ds,[10000,2396])
dl1=torch.utils.data.DataLoader(ds1,batch_size=1000)
dl2=torch.utils.data.DataLoader(ds2,batch_size=2396)
dls=DataLoaders(dl1,dl2)

lrnr3=Learner(dls,net,opt_func=Adam,loss_func=loss_fn,lr=0.1)

lrnr3.fit(10)

CAM: observation을 1개로 고정하고 net2에서 layer의 순서를 바꿔서 시각화

- 계획

변경전net2: $(n,16,12,12) \overset{flatten}{\Longrightarrow} (n,?) \overset{Linear(?,1)}{\Longrightarrow} (n,1)$
변경후net2: $(n,16,12,12) \overset{gap+flatten}{\Longrightarrow} (n,16) \overset{Linear(16,1)}{\Longrightarrow} (n,1)$
CAM: $(1,16,12,12) \overset{Linear(16,1)+flatten}{\Longrightarrow} (12,12) \overset{gap}{\Longrightarrow} 1$

- 준비과정1: 시각화할 샘플을 하나 준비하자.

x=X[100]
X.shape,x.shape

(torch.Size([12396, 1, 28, 28]), torch.Size([1, 28, 28]))

차원이 다르므로 나중에 네트워크에 넣을때 문제가 생길 수 있음 $\to$ 차원을 맞춰주자

x=x.reshape(1,1,28,28)

plt.imshow(x.squeeze())

<matplotlib.image.AxesImage at 0x7f4058887250>

- 준비과정2: 계산과 시각화를 위해서 각 네트워크를 cpu로 옮기자. (fastai로 학습한 직후라 GPU에 있음)

net1.to('cpu')
net2.to('cpu')

Sequential(
  (0): AdaptiveAvgPool2d(output_size=1)
  (1): Flatten()
  (2): Linear(in_features=16, out_features=1, bias=False)
)

- forward확인: 이 값을 기억하자.

net2(net1(x)) ## 음수이므로 class=7 이라고 CNN이 판단

tensor([[-5.3201]], grad_fn=<MmBackward>)

- net2를 수정하고 forward값 확인

net2

Sequential(
  (0): AdaptiveAvgPool2d(output_size=1)
  (1): Flatten()
  (2): Linear(in_features=16, out_features=1, bias=False)
)

net2에서 Linear와 AdaptiveAvgPool2d의 적용순서를 바꿔줌

차원확인

net1(x).squeeze().shape

torch.Size([16, 12, 12])

net2[2].weight.squeeze().shape

torch.Size([16])

Linear(in_features=16, out_features=1, bias=False) 를 적용: 16 $\times$ (16,12,12) $\to$ (12,12)

net2[2].weight.squeeze() @ net1(x).squeeze()

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_56901/2879013373.py in <module>
----> 1 net2[2].weight.squeeze() @ net1(x).squeeze()

RuntimeError: mat1 and mat2 shapes cannot be multiplied (192x12 and 16x1)

실패..

camimg=torch.einsum('i,ijk -> jk',net2[2].weight.squeeze(), net1(x).squeeze()) 
camimg.shape

torch.Size([12, 12])

성공

AdaptiveAvgPool2d(output_size=1) 를 적용

ap(camimg)

tensor([[-5.3201]], grad_fn=<MeanBackward1>)

!!!! 똑같다?

- 아래의 값이 같다.

net2(net1(x)),ap(camimg)

(tensor([[-5.3201]], grad_fn=<MmBackward>),
 tensor([[-5.3201]], grad_fn=<MeanBackward1>))

- 왜냐하면 ap와 선형변환 모두 linear이므로 순서를 바꿔도 상관없음

- 아래와 결국 같은 이치

_x= np.array([1,2,3,4])
_x

array([1, 2, 3, 4])

np.mean(_x*2+1)

6.0

2*np.mean(_x)+1

6.0

- 이제 camimg 에 관심을 가져보자.

camimg

tensor([[   0.0000,    0.0000,    0.0000,    0.0000,    0.0000,    0.0000,
            0.0000,    0.0000,    0.0000,    0.0000,    0.0000,    0.0000],
        [   0.0000,    0.0000,    0.0000,    0.0000,    0.0000,    0.0000,
            0.0000,    0.0000,    0.0000,    0.0000,    0.0000,    0.0000],
        [   0.0000,    0.0000,    0.0000,    0.0000,    5.9098,    0.0000,
            0.0000,    0.0000,    0.0000,    0.0000,    0.0000,    0.0000],
        [   0.0000,  -19.2289,    0.0000,    0.0000,    0.0000,   29.9900,
           37.3474,    0.0000,    0.0000,    0.0000,    0.0000,    0.0000],
        [   0.0000,  -29.6655,    0.0000,    0.0000,    5.6040,   13.2785,
            1.8883,   30.8203,    4.3253,    0.0000,    0.0000,    0.0000],
        [   0.0000,  -21.9075,    0.0000,    0.0000,  -13.0436,   17.4574,
           30.4819,    6.4937,    0.0000,    0.0000,    0.0000,  -12.9101],
        [   0.0000,  -36.6223,    0.0000,    0.0000,   -3.6448,    0.0000,
            2.4763,   20.4865,    8.1199,    0.0000,    0.0000,   -7.6647],
        [  -2.8867, -175.0864, -110.9451,   -6.6493,    0.0000,    0.0000,
            0.0000,  -18.9344,  -43.5822,    0.0000,   -6.6248,    0.0000],
        [   2.1586,   -1.7907,   -9.5646,  -11.2632,    0.0000,    0.0000,
            0.0000,  -33.7909,    0.0000,    0.0000,  -14.1396,    0.0000],
        [   0.0000,    0.0000,   -0.4893,    0.0000,    0.0000,    0.0000,
          -15.7184,  -42.0344,    0.0000,   -3.9603,   -2.1219,    0.0000],
        [   0.0000,    0.0000,    0.0000,    0.0000,    0.0000,    0.0000,
          -34.1539,  -15.9513,    0.0000,  -12.8468,    0.0000,    0.0000],
        [   0.0000,    0.0000,    0.0000,    0.0000,    0.0000,    0.0000,
          -10.5112, -252.0293,   -1.3688,  -11.8025,    0.0000,    0.0000]],
       grad_fn=<ViewBackward>)

ap(camimg), torch.mean(camimg)

(tensor([[-5.3201]], grad_fn=<MeanBackward1>),
 tensor(-5.3201, grad_fn=<MeanBackward0>))

이미지의 값은 대부분0이지만 궁극적으로는 평균을 내서 음수의 값이 나와야 한다.

- 결국 특정픽셀에서 큰 음의 값이 나오기 떄문에 궁극적으로는 평균이 음수가 된다.

평균이 음수이다. $\leftrightarrow$ 이미지가 의미하는것이 7이다.
특정픽셀이 큰 음수값을 가진다. $\leftrightarrow$ 그 픽셀에서 이미지가 7임을 뚜렷하게 알 수 있다.

- 그 특정픽셀이 어딘가?

plt.imshow(camimg.data)

<matplotlib.image.AxesImage at 0x7f4058863d00>

초록색으로 표현된 부분은 CNN모형이 이 숫자를 7이라고 생각한 근거가 된다.

- 원래의 이미지와 비교

plt.imshow(x.squeeze())

<matplotlib.image.AxesImage at 0x7f40587ce4c0>

- 두 이미지를 겹쳐서 그리면 멋진 그림이 될 것 같다.

step1: 원래이미지를 흑백으로 그리자.

plt.imshow(x.squeeze(),cmap='gray',alpha=0.5)

<matplotlib.image.AxesImage at 0x7f40587a7e20>

- step2: 원래이미지는 (28,28)인데 camimg는 (12,12)픽셀 $\to$ camimg의 픽셀을 늘리자.

plt.imshow(camimg.data,alpha=0.5, extent=(0,27,27,0),interpolation='bilinear',cmap='magma')

<matplotlib.image.AxesImage at 0x7f405870e970>

- step3: 합치자.

plt.imshow(x.squeeze(),cmap='gray',alpha=0.5)
plt.imshow(camimg.data,alpha=0.5, extent=(0,27,27,0),interpolation='bilinear',cmap='magma')

<matplotlib.image.AxesImage at 0x7f40587338e0>

숙제

- 숫자3이 그려진 이미지를 observation으로 선택하고 위와 같이 cam을 이용하여 시각화하라.

epoch	train_loss	valid_loss	time
0	0.430748	0.218082	00:00
1	0.262507	0.093608	00:00
2	0.178563	0.070512	00:00
3	0.132288	0.061353	00:00
4	0.104299	0.055355	00:00
5	0.086001	0.050486	00:00
6	0.073517	0.047254	00:00
7	0.064433	0.044630	00:00
8	0.057706	0.042601	00:00
9	0.053289	0.040278	00:00

epoch	train_loss	valid_loss	time
0	0.224906	0.109848	00:00
1	0.224250	0.108319	00:00
2	0.224349	0.101010	00:00
3	0.225390	0.109090	00:00
4	0.223415	0.098581	00:00
5	0.219940	0.092961	00:00
6	0.217830	0.105528	00:00
7	0.215317	0.097136	00:00
8	0.212853	0.094199	00:00
9	0.212849	0.100530	00:00