강의영상

imports

import tensorflow as tf 
import tensorflow.experimental.numpy as tnp
tnp.experimental_enable_numpy_behavior()
import matplotlib.pyplot as plt

지난시간요약

- 이미지자료의 정리

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
X = tf.constant(x_train.reshape(-1,28,28,1),dtype=tf.float64)
y = tf.keras.utils.to_categorical(y_train)
XX = tf.constant(x_test.reshape(-1,28,28,1),dtype=tf.float64)
yy = tf.keras.utils.to_categorical(y_test)

- net1,2

tf.random.set_seed(43052)
net1 = tf.keras.Sequential()
net1.add(tf.keras.layers.Flatten())
net1.add(tf.keras.layers.Dense(30,activation='relu'))
net1.add(tf.keras.layers.Dense(10,activation='softmax'))
net1.compile(loss=tf.losses.categorical_crossentropy,optimizer='adam',metrics='accuracy')
net1.fit(X,y,epochs=10)
Epoch 1/10
1875/1875 [==============================] - 2s 667us/step - loss: 2.5275 - accuracy: 0.4107
Epoch 2/10
1875/1875 [==============================] - 1s 652us/step - loss: 1.1509 - accuracy: 0.5523
Epoch 3/10
1875/1875 [==============================] - 1s 646us/step - loss: 0.8259 - accuracy: 0.6897
Epoch 4/10
1875/1875 [==============================] - 1s 648us/step - loss: 0.7387 - accuracy: 0.7099
Epoch 5/10
1875/1875 [==============================] - 1s 653us/step - loss: 0.7022 - accuracy: 0.7208
Epoch 6/10
1875/1875 [==============================] - 1s 645us/step - loss: 0.6801 - accuracy: 0.7274
Epoch 7/10
1875/1875 [==============================] - 1s 659us/step - loss: 0.6619 - accuracy: 0.7336
Epoch 8/10
1875/1875 [==============================] - 1s 642us/step - loss: 0.6487 - accuracy: 0.7377
Epoch 9/10
1875/1875 [==============================] - 1s 661us/step - loss: 0.6476 - accuracy: 0.7382
Epoch 10/10
1875/1875 [==============================] - 1s 637us/step - loss: 0.6418 - accuracy: 0.7391
<keras.callbacks.History at 0x7fbee0550a00>
net1.evaluate(X,y)
1875/1875 [==============================] - 1s 576us/step - loss: 0.6093 - accuracy: 0.7475
[0.6092554330825806, 0.7475000023841858]
tf.random.set_seed(43052)
net2 = tf.keras.Sequential()
net2.add(tf.keras.layers.Flatten())
net2.add(tf.keras.layers.Dense(500,activation='relu'))
net2.add(tf.keras.layers.Dense(500,activation='relu'))
net2.add(tf.keras.layers.Dense(500,activation='relu'))
net2.add(tf.keras.layers.Dense(500,activation='relu'))
net2.add(tf.keras.layers.Dense(10,activation='softmax'))
net2.compile(loss=tf.losses.categorical_crossentropy,optimizer='adam',metrics='accuracy')
net2.fit(X,y,epochs=10)
Epoch 1/10
1875/1875 [==============================] - 2s 1ms/step - loss: 1.1254 - accuracy: 0.7872
Epoch 2/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.4578 - accuracy: 0.8363
Epoch 3/10
1875/1875 [==============================] - 2s 971us/step - loss: 0.4162 - accuracy: 0.8519
Epoch 4/10
1875/1875 [==============================] - 2s 942us/step - loss: 0.3892 - accuracy: 0.8611
Epoch 5/10
1875/1875 [==============================] - 2s 911us/step - loss: 0.3757 - accuracy: 0.8666
Epoch 6/10
1875/1875 [==============================] - 2s 992us/step - loss: 0.3584 - accuracy: 0.8729
Epoch 7/10
1875/1875 [==============================] - 2s 984us/step - loss: 0.3442 - accuracy: 0.8774
Epoch 8/10
1875/1875 [==============================] - 2s 978us/step - loss: 0.3349 - accuracy: 0.8804
Epoch 9/10
1875/1875 [==============================] - 2s 935us/step - loss: 0.3324 - accuracy: 0.8810
Epoch 10/10
1875/1875 [==============================] - 2s 863us/step - loss: 0.3185 - accuracy: 0.8844
<keras.callbacks.History at 0x7fbee0407b80>
net2.evaluate(XX,yy)
313/313 [==============================] - 0s 855us/step - loss: 0.4091 - accuracy: 0.8585
[0.40912720561027527, 0.8585000038146973]
  • DNN 아키텍처로는 한계가 있음

- net3

tf.random.set_seed(43052)
net3 = tf.keras.Sequential()
net3.add(tf.keras.layers.Conv2D(30,(2,2),activation='relu'))
net3.add(tf.keras.layers.MaxPool2D()) # net3.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))와 동일한 코드
net3.add(tf.keras.layers.Conv2D(30,(2,2),activation='relu'))
net3.add(tf.keras.layers.MaxPool2D())
net3.add(tf.keras.layers.Flatten())
net3.add(tf.keras.layers.Dense(10,activation='softmax'))
net3.compile(loss=tf.losses.categorical_crossentropy,optimizer='adam',metrics='accuracy')
net3.fit(X,y,epochs=10)
Epoch 1/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.7715 - accuracy: 0.8092
Epoch 2/10
1875/1875 [==============================] - 2s 909us/step - loss: 0.3575 - accuracy: 0.8734
Epoch 3/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.3155 - accuracy: 0.8868
Epoch 4/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2929 - accuracy: 0.8927
Epoch 5/10
1875/1875 [==============================] - 2s 927us/step - loss: 0.2760 - accuracy: 0.9002
Epoch 6/10
1875/1875 [==============================] - 2s 984us/step - loss: 0.2658 - accuracy: 0.9032
Epoch 7/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2575 - accuracy: 0.9058
Epoch 8/10
1875/1875 [==============================] - 2s 950us/step - loss: 0.2486 - accuracy: 0.9089
Epoch 9/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2439 - accuracy: 0.9096
Epoch 10/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2383 - accuracy: 0.9126
<keras.callbacks.History at 0x7fbee01be320>
net3.evaluate(XX,yy)
313/313 [==============================] - 0s 858us/step - loss: 0.3237 - accuracy: 0.8892
[0.32372841238975525, 0.88919997215271]
  • DNN의 한계를 가볍게 돌파함

- net3의 구조 감상

net3.layers
[<keras.layers.convolutional.Conv2D at 0x7fbf52da3bb0>,
 <keras.layers.pooling.MaxPooling2D at 0x7fbee022bac0>,
 <keras.layers.convolutional.Conv2D at 0x7fbf52da2950>,
 <keras.layers.pooling.MaxPooling2D at 0x7fbee0234850>,
 <keras.layers.core.flatten.Flatten at 0x7fbee03ba320>,
 <keras.layers.core.dense.Dense at 0x7fbee019de40>]
cv1, mp1, cv2, mp2, flttn, d1 = net3.layers
print(XX.shape) ## 이미지 (2D)
print(cv1(XX).shape) ## 이미지 (2D)
print(mp1(cv1(XX)).shape) ## 이미지 (2D)
print(cv2(mp1(cv1(XX))).shape) ## 이미지 (2D) 
print(mp2(cv2(mp1(cv1(XX)))).shape) ## 이미지 (2D) 
print(flttn(mp2(cv2(mp1(cv1(XX))))).shape) ## 이미지를 펼친것 (1D) 
print(d1(flttn(mp2(cv2(mp1(cv1(XX)))))).shape) ## 이미지를 펼친것 (1D) 
(10000, 28, 28, 1)
(10000, 27, 27, 30)
(10000, 13, 13, 30)
(10000, 12, 12, 30)
(10000, 6, 6, 30)
(10000, 1080)
(10000, 10)
  • 복잡한거아냐?

- 파라메터 비교

net1.summary() ## 23,860
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (32, 784)                 0         
                                                                 
 dense (Dense)               (32, 30)                  23550     
                                                                 
 dense_1 (Dense)             (32, 10)                  310       
                                                                 
=================================================================
Total params: 23,860
Trainable params: 23,860
Non-trainable params: 0
_________________________________________________________________
net2.summary() # 1,149,010
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten_1 (Flatten)         (None, 784)               0         
                                                                 
 dense_2 (Dense)             (None, 500)               392500    
                                                                 
 dense_3 (Dense)             (None, 500)               250500    
                                                                 
 dense_4 (Dense)             (None, 500)               250500    
                                                                 
 dense_5 (Dense)             (None, 500)               250500    
                                                                 
 dense_6 (Dense)             (None, 10)                5010      
                                                                 
=================================================================
Total params: 1,149,010
Trainable params: 1,149,010
Non-trainable params: 0
_________________________________________________________________
net3.summary() # 14,590
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 27, 27, 30)        150       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 30)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 12, 12, 30)        3630      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 6, 6, 30)         0         
 2D)                                                             
                                                                 
 flatten_2 (Flatten)         (None, 1080)              0         
                                                                 
 dense_7 (Dense)             (None, 10)                10810     
                                                                 
=================================================================
Total params: 14,590
Trainable params: 14,590
Non-trainable params: 0
_________________________________________________________________
14590/23860
0.6114836546521375
14590/1149010
0.012697887746842934

텐서플로우를 공부하기 좋은 교재 혹은 참고자료

텐서플로우 교재

- 교재1: http://www.kyobobook.co.kr/product/detailViewEng.laf?mallGb=ENG&ejkGb=ENG&barcode=9781838823412

  • 장점: 텐서플로우 2.0을 다룬 교재, 기본적 내용을 간략히 소개. 다양한 분야를 커버.
  • 단점: 내용이 조금 산만함 (잘 안읽힘). 코드가 예쁘지 않음. 글을 진짜 못씀.

- 교재3: 텐서플로 딥러닝 프로그래밍 (김동근/가메출판사)

  • 장점: 한글교재, 그래도 교재구성에 흐름이 있다. 공식문서의 가이드내용도 포함 (대부분은 튜토리얼 수준만 포함)
  • 단점: 공식문서내용 그냥 거의 그대로 베낌.

공식홈페이지

- 튜토리얼: https://www.tensorflow.org/tutorials?hl=ko

  • 탭눌러서 초보자~고급 모두 읽어보면 좋다.
  • 간단한 모형실습들 제공.
  • 대부분의 교재는 튜토리얼의 내용을 베껴서 출판한다. (외국교재도 포함!)

- 가이드: https://www.tensorflow.org/guide?hl=ko

  • 왜 파이토치가 아니고 텐서플로우를 써야하는가? 에 대한 대답들
  • 모형에 대한 설명보다 프로그램 자체에 대한 이해도를 높여준다.

- API: https://www.tensorflow.org/versions?hl=ko

  • tf의 다양한 모듈구조를 확인
  • 파이썬에서 도움말 치면 나오는 것들 + $\alpha$
  • 교재에서 확인불가능한 아주 디테일한 부분까지 확인가능
tf.GradientTape?
Init signature: tf.GradientTape(persistent=False, watch_accessed_variables=True)
Docstring:     
Record operations for automatic differentiation.

Operations are recorded if they are executed within this context manager and
at least one of their inputs is being "watched".

Trainable variables (created by `tf.Variable` or `tf.compat.v1.get_variable`,
where `trainable=True` is default in both cases) are automatically watched.
Tensors can be manually watched by invoking the `watch` method on this context
manager.

For example, consider the function `y = x * x`. The gradient at `x = 3.0` can
be computed as:

>>> x = tf.constant(3.0)
>>> with tf.GradientTape() as g:
...   g.watch(x)
...   y = x * x
>>> dy_dx = g.gradient(y, x)
>>> print(dy_dx)
tf.Tensor(6.0, shape=(), dtype=float32)

GradientTapes can be nested to compute higher-order derivatives. For example,

>>> x = tf.constant(5.0)
>>> with tf.GradientTape() as g:
...   g.watch(x)
...   with tf.GradientTape() as gg:
...     gg.watch(x)
...     y = x * x
...   dy_dx = gg.gradient(y, x)  # dy_dx = 2 * x
>>> d2y_dx2 = g.gradient(dy_dx, x)  # d2y_dx2 = 2
>>> print(dy_dx)
tf.Tensor(10.0, shape=(), dtype=float32)
>>> print(d2y_dx2)
tf.Tensor(2.0, shape=(), dtype=float32)

By default, the resources held by a GradientTape are released as soon as
GradientTape.gradient() method is called. To compute multiple gradients over
the same computation, create a persistent gradient tape. This allows multiple
calls to the gradient() method as resources are released when the tape object
is garbage collected. For example:

>>> x = tf.constant(3.0)
>>> with tf.GradientTape(persistent=True) as g:
...   g.watch(x)
...   y = x * x
...   z = y * y
>>> dz_dx = g.gradient(z, x)  # (4*x^3 at x = 3)
>>> print(dz_dx)
tf.Tensor(108.0, shape=(), dtype=float32)
>>> dy_dx = g.gradient(y, x)
>>> print(dy_dx)
tf.Tensor(6.0, shape=(), dtype=float32)

By default GradientTape will automatically watch any trainable variables that
are accessed inside the context. If you want fine grained control over which
variables are watched you can disable automatic tracking by passing
`watch_accessed_variables=False` to the tape constructor:

>>> x = tf.Variable(2.0)
>>> w = tf.Variable(5.0)
>>> with tf.GradientTape(
...     watch_accessed_variables=False, persistent=True) as tape:
...   tape.watch(x)
...   y = x ** 2  # Gradients will be available for `x`.
...   z = w ** 3  # No gradients will be available as `w` isn't being watched.
>>> dy_dx = tape.gradient(y, x)
>>> print(dy_dx)
tf.Tensor(4.0, shape=(), dtype=float32)
>>> # No gradients will be available as `w` isn't being watched.
>>> dz_dy = tape.gradient(z, w)
>>> print(dz_dy)
None

Note that when using models you should ensure that your variables exist when
using `watch_accessed_variables=False`. Otherwise it's quite easy to make your
first iteration not have any gradients:

```python
a = tf.keras.layers.Dense(32)
b = tf.keras.layers.Dense(32)

with tf.GradientTape(watch_accessed_variables=False) as tape:
  tape.watch(a.variables)  # Since `a.build` has not been called at this point
                           # `a.variables` will return an empty list and the
                           # tape will not be watching anything.
  result = b(a(inputs))
  tape.gradient(result, a.variables)  # The result of this computation will be
                                      # a list of `None`s since a's variables
                                      # are not being watched.
```

Note that only tensors with real or complex dtypes are differentiable.
Init docstring:
Creates a new GradientTape.

Args:
  persistent: Boolean controlling whether a persistent gradient tape
    is created. False by default, which means at most one call can
    be made to the gradient() method on this object.
  watch_accessed_variables: Boolean controlling whether the tape will
    automatically `watch` any (trainable) variables accessed while the tape
    is active. Defaults to True meaning gradients can be requested from any
    result computed in the tape derived from reading a trainable `Variable`.
    If False users must explicitly `watch` any `Variable`s they want to
    request gradients from.
File:           ~/anaconda3/envs/tfgpu/lib/python3.10/site-packages/tensorflow/python/eager/backprop.py
Type:           type
Subclasses:     LossScaleGradientTape

텐서플로우가 아닌 그냥 딥러닝 이론교재

- Deep Learning (이안굿펠로우): https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=171345378

  • 장점: 그나마 잘 쓰여진 전통있는 교재, 교수님들 책꽃이에 하나씩 꼽혀있었음. 방대한 내용다룸. 깊이있음. 틀린것 없음. 무료
  • 단점: 번역이 쓰레기임. 내용이 너무 어려움. (이해를 하라고 쓴 설명이 아님)
  • 무료제공: https://www.deeplearningbook.org/

- 기계학습 (오일석)

  • 장점: 이해가 잘된다. 꽤 넓은 분야를 다룬다. 비교적 간단한 예제로 개념을 설명한다.

CNN의 시작: 알렉스넷

2d convolution