12wk-1: 선인장 이미지 분류

Author

최규빈

Published

December 3, 2024

1. 강의영상

2. imports

import numpy as np
import pandas as pd 
import zipfile
import os
import PIL.Image
import matplotlib.pyplot as plt
#---#
import datasets
import transformers
import torchvision.transforms
import evaluate
import torch
/home/cgb3/anaconda3/envs/hf/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

3. Kaggle

A. ref

ref: https://www.kaggle.com/c/aerial-cactus-identification

B. 압축해제

with zipfile.ZipFile('aerial-cactus-identification.zip', 'r') as z:
    z.extractall('./data')
with zipfile.ZipFile('./data/test.zip', 'r') as z_test:
    z_test.extractall('./data')    
with zipfile.ZipFile('./data/train.zip', 'r') as z_train:
    z_train.extractall('./data')

C. 데이터 살펴보기

train_csv = pd.read_csv("./data/train.csv")
train_csv
id has_cactus
0 0004be2cfeaba1c0361d39e2b000257b.jpg 1
1 000c8a36845c0208e833c79c1bffedd1.jpg 1
2 000d1e9a533f62e55c289303b072733d.jpg 1
3 0011485b40695e9138e92d0b3fb55128.jpg 1
4 0014d7a11e90b62848904c1418fc8cf2.jpg 1
... ... ...
17495 ffede47a74e47a5930f81c0b6896479e.jpg 0
17496 ffef6382a50d23251d4bc05519c91037.jpg 1
17497 fff059ecc91b30be5745e8b81111dc7b.jpg 1
17498 fff43acb3b7a23edcc4ae937be2b7522.jpg 0
17499 fffd9e9b990eba07c836745d8aef1a3a.jpg 1

17500 rows × 2 columns

  • train에는 이미지에 해당하는 라벨이 존재
test_csv = pd.read_csv("./data/sample_submission.csv")
test_csv
id has_cactus
0 000940378805c44108d287872b2f04ce.jpg 0.5
1 0017242f54ececa4512b4d7937d1e21e.jpg 0.5
2 001ee6d8564003107853118ab87df407.jpg 0.5
3 002e175c3c1e060769475f52182583d0.jpg 0.5
4 0036e44a7e8f7218e9bc7bf8137e4943.jpg 0.5
... ... ...
3995 ffaafd0c9f2f0e73172848463bc2e523.jpg 0.5
3996 ffae37344310a1549162493237d25d3f.jpg 0.5
3997 ffbd469c56873d064326204aac546e0d.jpg 0.5
3998 ffcb76b7d47f29ece11c751e5f763f52.jpg 0.5
3999 fffed17d1a8e0433a934db518d7f532c.jpg 0.5

4000 rows × 2 columns

  • test에는 이미지에 해당하는 라벨이 존재하지 않음.
  • 우리의 목표: 확률값을 잘 추정해서 sample_submissionhas_cactus열에 대입하고 그 결과를 캐글에 제출

4. Logits의 이해

A. 로짓의 의미

- 로짓의 이해: 클래스가 2개인 자료에 대한 분류문제를 푼다고 하자. 8개의 observation/examples 에 대한 로짓값이 아래와 같다고 하자.

logits = np.array(
    [[ 2.7346244, -3.1177292],
     [ 2.7103324, -3.1362345],
     [ 2.7464483, -3.0521457],
     [ 2.7195318, -3.122628 ],
     [ 2.7138977, -3.1041346],
     [ 2.7398622, -3.1098123],
     [ 0.0657177, -0.0930362],
     [-2.7668718,  3.0918367]]
)
logits
array([[ 2.7346244, -3.1177292],
       [ 2.7103324, -3.1362345],
       [ 2.7464483, -3.0521457],
       [ 2.7195318, -3.122628 ],
       [ 2.7138977, -3.1041346],
       [ 2.7398622, -3.1098123],
       [ 0.0657177, -0.0930362],
       [-2.7668718,  3.0918367]])

로짓값은 일반적으로 \((n,k)\)의 차원을 가지며 여기에서 \(n\)은 observation의 숫자, \(k\)는 클래스의 숫자를 의미한다. 이 예제의 경우는 \(n=8\), \(k=2\)인 경우이다.

여기에서 각 observation에 대한 로짓값이 의미하는 바를 살펴보면 아래와 같다.

  1. 첫 번째 관측값 ([2.7346244, -3.1177292]):
  • 첫 번째 클래스에 대한 확신 정도: 2.7346244
  • 두 번째 클래스에 대한 확신 정도: -3.1177292
  1. 두 번째 관측값 ([2.7103324, -3.1362345]):
  • 첫 번째 클래스에 대한 확신 정도: 2.7103324
  • 두 번째 클래스에 대한 확신 정도: -3.1362345

  1. 마지막에서 두번째 관측값 ([0.0657177, -0.0930362]):
  • 첫 번째 클래스에 대한 확신 정도: 0.0657177
  • 두 번째 클래스에 대한 확신 정도: -0.0930362
  1. 마지막 관측값 ([-2.7668718, 3.0918367]):
  • 첫 번째 클래스에 대한 확신 정도: -2.7668718
  • 두 번째 클래스에 대한 확신 정도: 3.0918367

B. 로짓 \(\to\) 예측클래스

- 로짓 \(\to\) 예측클래스의 과정을 살펴보자.

  1. 첫 번째 관측값: \(2.7346244 > -3.1177292\) \(\Rightarrow\) 첫 번째 클래스로 예측

  2. 두 번째 관측값: \(2.7103324 > -3.1362345\) \(\Rightarrow\) 첫 번째 클래스로 예측

  1. 마지막에서 두번째 관측값: \(0.0657177 > -0.0930362\) \(\Rightarrow\) 첫 번째 클래스로 예측

  2. 마지막 관측값: \(-2.7668718 < 3.0918367\) \(\Rightarrow\) 두 번째 클래스로 예측

logits
array([[ 2.7346244, -3.1177292],
       [ 2.7103324, -3.1362345],
       [ 2.7464483, -3.0521457],
       [ 2.7195318, -3.122628 ],
       [ 2.7138977, -3.1041346],
       [ 2.7398622, -3.1098123],
       [ 0.0657177, -0.0930362],
       [-2.7668718,  3.0918367]])
for u in logits:
    u1, u2 = u 
    if u1 > u2: 
        prediction = 0 
    else: 
        prediction = 1 
    print(prediction)
0
0
0
0
0
0
0
1

- 이것은 아래를 이용하여 구할수도 있다.

logits.argmax(axis=1)
array([0, 0, 0, 0, 0, 0, 0, 1])

C. 로짓 \(\to\) 예측확률

- 로짓 \(\to\) 예측확률의 과정을 살펴보자.

\({\boldsymbol u}=\begin{bmatrix} u_1 & \dots & u_k\end{bmatrix}\)를 고정된 observation에 대한 logits값 이라고 하자. 이때 각 클래스에 속할 확률값은 아래와 같이 구한다.

\[\text{prob} =\left[\frac{\exp(u_1)}{\exp(u_1)+\dots+\exp(u_k)}, \cdots, \frac{\exp(u_k)}{\exp(u_1)+\dots+\exp(u_k)}\right]\]

logits
array([[ 2.7346244, -3.1177292],
       [ 2.7103324, -3.1362345],
       [ 2.7464483, -3.0521457],
       [ 2.7195318, -3.122628 ],
       [ 2.7138977, -3.1041346],
       [ 2.7398622, -3.1098123],
       [ 0.0657177, -0.0930362],
       [-2.7668718,  3.0918367]])
for u in logits:
    u1,u2 = u
    p1 = np.exp(u1) / (np.exp(u1)+np.exp(u2))
    p2 = np.exp(u2) / (np.exp(u1)+np.exp(u2))
    prediction_scores = [p1,p2]
    print(prediction_scores)
[0.9971351022231982, 0.0028648977768018545]
[0.9971185237682113, 0.0028814762317887605]
[0.9969773496339113, 0.0030226503660887583]
[0.9971058336243687, 0.0028941663756314024]
[0.997035364944602, 0.0029646350553980375]
[0.9971274386623321, 0.002872561337667959]
[0.5396053294830294, 0.4603946705169706]
[0.0028468010295870897, 0.997153198970413]

- 이 확률은 아래를 통하여 구할수도 있다.

torch.tensor(logits).softmax(dim=1)
tensor([[0.9971, 0.0029],
        [0.9971, 0.0029],
        [0.9970, 0.0030],
        [0.9971, 0.0029],
        [0.9970, 0.0030],
        [0.9971, 0.0029],
        [0.5396, 0.4604],
        [0.0028, 0.9972]], dtype=torch.float64)

5. 평가지표

A. accuracy 계산

- accuracy의 계산: logitslabels가 아래와 같이 주어졌다고 하자.

logits = np.array(
    [[ 2.7346244, -3.1177292],
     [ 2.7103324, -3.1362345],
     [ 2.7464483, -3.0521457],
     [ 2.7195318, -3.122628 ],
     [ 2.7138977, -3.1041346],
     [ 2.7398622, -3.1098123],
     [ 0.0657177, -0.0930362],
     [-2.7668718,  3.0918367]]
)
references = labels = np.array([0,0,0,0,0,0,1,1])
predictions = logits.argmax(axis=1)
predictions
array([0, 0, 0, 0, 0, 0, 0, 1])
references
array([0, 0, 0, 0, 0, 0, 1, 1])

accuracy는 아래와 같이 계산할 수 있다.

7/8
0.875
(predictions == references).sum() / 8
0.875
(predictions == references).mean()
0.875

이걸 아래와 같이 계산할 수도 있다.

acc = evaluate.load("accuracy")
acc.compute(predictions = predictions, references= references)
{'accuracy': 0.875}

B. recall 계산

- 경우에 따라서 1을 얼마나 더 잘맞추는지 알고 싶은 경우도 있다.

\[\text{recall}= \frac{\text{실제 라벨이 1인 관측치 중 올바르게 예측된 관측치수}}{\text{실제 라벨이 1인 관측치 수}}\]

logits = np.array(
    [[ 2.7346244, -3.1177292],
     [ 2.7103324, -3.1362345],
     [ 2.7464483, -3.0521457],
     [ 2.7195318, -3.122628 ],
     [ 2.7138977, -3.1041346],
     [ 2.7398622, -3.1098123],
     [ 0.0657177, -0.0930362],
     [-2.7668718,  3.0918367]]
)
references = labels = np.array([0,0,0,0,0,0,1,1])
predictions = logits.argmax(axis=1)
predictions
array([0, 0, 0, 0, 0, 0, 0, 1])
(predictions[references == 1]==1).mean() # recall 
0.5

이것을 아래와 같이 구할수도 있다.

rec = evaluate.load("recall")
rec.compute(predictions = predictions, references = references)
{'recall': 0.5}

C. auc 계산

- accuracy 이외의 평가지표들

- AUC: 클래스간의 불균형이 있을때 유의미한 평가지표

logits = np.array(
    [[ 2.7346244, -3.1177292],
     [ 2.7103324, -3.1362345],
     [ 2.7464483, -3.0521457],
     [ 2.7195318, -3.122628 ],
     [ 2.7138977, -3.1041346],
     [ 2.7398622, -3.1098123],
     [ 0.0657177, -0.0930362],
     [-2.7668718,  3.0918367]]
)
references = labels = np.array([0,0,0,0,0,0,1,1])
probabilities = torch.tensor(logits).softmax(dim=1).numpy()
prediction_scores = probabilities[:,1]
prediction_scores
array([0.0028649 , 0.00288148, 0.00302265, 0.00289417, 0.00296464,
       0.00287256, 0.46039467, 0.9971532 ])
  • 확률 0.4 이상부터는 1로 판단한다면? \(\to\) 다 맞춘거아니야?
roc_auc = evaluate.load("roc_auc")
roc_auc.compute(prediction_scores=prediction_scores, references=references)
{'roc_auc': 1.0}

# 예제1 – 시각화

logits = np.array(
    [[ 2.7346244, -3.1177292],
     [ 2.7103324, -3.1362345],
     [ 2.7464483, -3.0521457],
     [ 2.7195318, -3.122628 ],
     [ 2.7138977, -3.1041346],
     [ 2.7398622, -3.1098123],
     [ 0.0657177, -0.0930362],
     [-2.7668718,  3.0918367]]
)
references = labels = np.array([0,0,0,0,0,0,1,1])
probabilities = torch.tensor(logits).softmax(dim=1).numpy()
prediction_scores = probabilities[:,1]
plt.plot(prediction_scores,'--o')
plt.plot(labels,'x')
plt.axhline(y=0.5,color='red',linestyle='--')
acc = evaluate.load("accuracy")
rec = evaluate.load("recall")
roc_auc = evaluate.load("roc_auc")
print(acc.compute(predictions=predictions,references=references))
print(rec.compute(predictions=predictions,references=references))
print(roc_auc.compute(prediction_scores=prediction_scores,references=references))
{'accuracy': 0.875}
{'recall': 0.5}
{'roc_auc': 1.0}

# 예제2 – 시각화

logits = np.array(
    [[ 2.7346244, -3.1177292],
     [ 2.7103324, -3.1362345],
     [ 2.7464483, -3.0521457],
     [ 2.7195318, -3.122628 ],
     [ 2.7138977, -3.1041346],
     [ 0.0657177, -0.0930362],
     [ 2.7398622, -3.1098123],
     [-2.7668718,  3.0918367]]
)
references = labels = np.array([0,0,0,0,0,0,1,1])
probabilities = torch.tensor(logits).softmax(dim=1).numpy()
prediction_scores = probabilities[:,1]
plt.plot(prediction_scores,'--o')
plt.plot(labels,'x')
plt.axhline(y=0.5,color='red',linestyle='--')
acc = evaluate.load("accuracy")
rec = evaluate.load("recall")
roc_auc = evaluate.load("roc_auc")
print(acc.compute(predictions=predictions,references=references))
print(rec.compute(predictions=predictions,references=references))
print(roc_auc.compute(prediction_scores=prediction_scores,references=references))
{'accuracy': 0.875}
{'recall': 0.5}
{'roc_auc': 0.5833333333333333}

6. 분석

- train.csv를 pandas로

train_csv = pd.read_csv("./data/train.csv")
train_csv
id has_cactus
0 0004be2cfeaba1c0361d39e2b000257b.jpg 1
1 000c8a36845c0208e833c79c1bffedd1.jpg 1
2 000d1e9a533f62e55c289303b072733d.jpg 1
3 0011485b40695e9138e92d0b3fb55128.jpg 1
4 0014d7a11e90b62848904c1418fc8cf2.jpg 1
... ... ...
17495 ffede47a74e47a5930f81c0b6896479e.jpg 0
17496 ffef6382a50d23251d4bc05519c91037.jpg 1
17497 fff059ecc91b30be5745e8b81111dc7b.jpg 1
17498 fff43acb3b7a23edcc4ae937be2b7522.jpg 0
17499 fffd9e9b990eba07c836745d8aef1a3a.jpg 1

17500 rows × 2 columns

test_csv = pd.read_csv("./data/sample_submission.csv")
test_csv
id has_cactus
0 000940378805c44108d287872b2f04ce.jpg 0.5
1 0017242f54ececa4512b4d7937d1e21e.jpg 0.5
2 001ee6d8564003107853118ab87df407.jpg 0.5
3 002e175c3c1e060769475f52182583d0.jpg 0.5
4 0036e44a7e8f7218e9bc7bf8137e4943.jpg 0.5
... ... ...
3995 ffaafd0c9f2f0e73172848463bc2e523.jpg 0.5
3996 ffae37344310a1549162493237d25d3f.jpg 0.5
3997 ffbd469c56873d064326204aac546e0d.jpg 0.5
3998 ffcb76b7d47f29ece11c751e5f763f52.jpg 0.5
3999 fffed17d1a8e0433a934db518d7f532c.jpg 0.5

4000 rows × 2 columns

A. 예쁜(?) 정석 코드

Step1: Data

ctx_train = datasets.Dataset.from_pandas(train_csv)
ctx_test = datasets.Dataset.from_pandas(test_csv).remove_columns(['has_cactus'])
ctx_train = ctx_train.map(lambda example: {'path': "./data/train/" + example['id']})
ctx_test = ctx_test.map(lambda example: {'path': "./data/test/" + example['id']})
Map: 100%|██████████| 17500/17500 [00:00<00:00, 59071.71 examples/s]
Map: 100%|██████████| 4000/4000 [00:00<00:00, 86647.98 examples/s]
ctx = datasets.DatasetDict({
    'train':ctx_train,
    'test':ctx_test
})
ctx
DatasetDict({
    train: Dataset({
        features: ['id', 'has_cactus', 'path'],
        num_rows: 17500
    })
    test: Dataset({
        features: ['id', 'path'],
        num_rows: 4000
    })
})
compose = torchvision.transforms.Compose([
    lambda path: PIL.Image.open(path),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Resize((224,224))
])
def w_trans(examples):
    # train: examples = {'id':[xx,xxx,....], 'has_cactus':[yy,yyy,...], 'path':[zz,zzz,...]}
    # train: examples = {'id':[xx,xxx,....], 'path':[zz,zzz,...]}
    dct = dict()
    dct['pixel_values'] = torch.stack(list(map(compose, examples['path'])))
    try: 
        dct['labels']= torch.tensor(examples['has_cactus'])
    except:
        pass
    return dct 
ctx = ctx.with_transform(w_trans)
ctx 
DatasetDict({
    train: Dataset({
        features: ['id', 'has_cactus', 'path'],
        num_rows: 17500
    })
    test: Dataset({
        features: ['id', 'path'],
        num_rows: 4000
    })
})
ctx['train'][:2]
#ctx['test'][:2]
{'pixel_values': tensor([[[[0.5333, 0.5333, 0.5333,  ..., 0.6157, 0.6157, 0.6157],
           [0.5333, 0.5333, 0.5333,  ..., 0.6157, 0.6157, 0.6157],
           [0.5333, 0.5333, 0.5333,  ..., 0.6157, 0.6157, 0.6157],
           ...,
           [0.7176, 0.7176, 0.7176,  ..., 0.5451, 0.5451, 0.5451],
           [0.7176, 0.7176, 0.7176,  ..., 0.5451, 0.5451, 0.5451],
           [0.7176, 0.7176, 0.7176,  ..., 0.5451, 0.5451, 0.5451]],
 
          [[0.5412, 0.5412, 0.5412,  ..., 0.5255, 0.5255, 0.5255],
           [0.5412, 0.5412, 0.5412,  ..., 0.5255, 0.5255, 0.5255],
           [0.5412, 0.5412, 0.5412,  ..., 0.5255, 0.5255, 0.5255],
           ...,
           [0.6157, 0.6157, 0.6157,  ..., 0.4314, 0.4314, 0.4314],
           [0.6157, 0.6157, 0.6157,  ..., 0.4314, 0.4314, 0.4314],
           [0.6157, 0.6157, 0.6157,  ..., 0.4314, 0.4314, 0.4314]],
 
          [[0.4902, 0.4902, 0.4902,  ..., 0.5490, 0.5490, 0.5490],
           [0.4902, 0.4902, 0.4902,  ..., 0.5490, 0.5490, 0.5490],
           [0.4902, 0.4902, 0.4902,  ..., 0.5490, 0.5490, 0.5490],
           ...,
           [0.6588, 0.6588, 0.6588,  ..., 0.5098, 0.5098, 0.5098],
           [0.6588, 0.6588, 0.6588,  ..., 0.5098, 0.5098, 0.5098],
           [0.6588, 0.6588, 0.6588,  ..., 0.5098, 0.5098, 0.5098]]],
 
 
         [[[0.4627, 0.4627, 0.4627,  ..., 0.4824, 0.4824, 0.4824],
           [0.4627, 0.4627, 0.4627,  ..., 0.4824, 0.4824, 0.4824],
           [0.4627, 0.4627, 0.4627,  ..., 0.4824, 0.4824, 0.4824],
           ...,
           [0.3647, 0.3647, 0.3647,  ..., 0.4941, 0.4941, 0.4941],
           [0.3647, 0.3647, 0.3647,  ..., 0.4941, 0.4941, 0.4941],
           [0.3647, 0.3647, 0.3647,  ..., 0.4941, 0.4941, 0.4941]],
 
          [[0.4275, 0.4275, 0.4275,  ..., 0.3804, 0.3804, 0.3804],
           [0.4275, 0.4275, 0.4275,  ..., 0.3804, 0.3804, 0.3804],
           [0.4275, 0.4275, 0.4275,  ..., 0.3804, 0.3804, 0.3804],
           ...,
           [0.3059, 0.3059, 0.3059,  ..., 0.4314, 0.4314, 0.4314],
           [0.3059, 0.3059, 0.3059,  ..., 0.4314, 0.4314, 0.4314],
           [0.3059, 0.3059, 0.3059,  ..., 0.4314, 0.4314, 0.4314]],
 
          [[0.4471, 0.4471, 0.4471,  ..., 0.4235, 0.4235, 0.4235],
           [0.4471, 0.4471, 0.4471,  ..., 0.4235, 0.4235, 0.4235],
           [0.4471, 0.4471, 0.4471,  ..., 0.4235, 0.4235, 0.4235],
           ...,
           [0.3255, 0.3255, 0.3255,  ..., 0.4745, 0.4745, 0.4745],
           [0.3255, 0.3255, 0.3255,  ..., 0.4745, 0.4745, 0.4745],
           [0.3255, 0.3255, 0.3255,  ..., 0.4745, 0.4745, 0.4745]]]]),
 'labels': tensor([1, 1])}

Step2: Model

model = transformers.AutoModelForImageClassification.from_pretrained(
    "microsoft/resnet-50",
    num_labels=2,
    ignore_mismatched_sizes=True,
)
Some weights of ResNetForImageClassification were not initialized from the model checkpoint at microsoft/resnet-50 and are newly initialized because the shapes did not match:
- classifier.1.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([2]) in the model instantiated
- classifier.1.weight: found shape torch.Size([1000, 2048]) in the checkpoint and torch.Size([2, 2048]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Step3: Train

# single_batch = [ctx['train'][0],ctx['train'][1]]
# single_batch # [{'pixel_values':xx, 'labels':yy},{'pixel_values':xxx, 'labels':yyy}]
# data_collator = transformers.DefaultDataCollator()
# data_collator(single_batch) # [{'pixel_values':xx, 'labels':yy},{'pixel_values':xxx, 'labels':yyy}] --> {'pixel_values':[xx,xxx], 'labels':[yy,yyy]}
# model.to("cpu")
# model(**data_collator(single_batch))
ImageClassifierOutputWithNoAttention(loss=tensor(0.6874, grad_fn=<NllLossBackward0>), logits=tensor([[0.0648, 0.0802],
        [0.0668, 0.0745]], grad_fn=<AddmmBackward0>), hidden_states=None)
data_collator = transformers.DefaultDataCollator()
data_collator
DefaultDataCollator(return_tensors='pt')
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits,axis=1)
    predictions_scores = torch.tensor(logits).softmax(dim=1).numpy()[:,1]
    acc = evaluate.load("accuracy")
    rec = evaluate.load("recall")
    roc_auc = evaluate.load("roc_auc")
    dct1 = acc.compute(predictions = predictions, references = labels) # {'accuracy':???}
    dct2 = rec.compute(predictions = predictions, references = labels) # {'recall':???}
    dct3 = roc_auc.compute(prediction_scores = predictions_scores, references = labels) # {'roc_auc':???}
    return dct1|dct2|dct3# {'accuracy':???, 'recall':???, 'roc_auc':???}
training_args = transformers.TrainingArguments(
    output_dir="asdf",
    remove_unused_columns=False,
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=16,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=16,
    num_train_epochs=4,
    warmup_ratio=0.1,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    push_to_hub=False,
    report_to="none"
)
trainer = transformers.Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=ctx["train"].select(range(1000)),
    eval_dataset=ctx["train"].select(range(1000,1500)),
    compute_metrics=compute_metrics,
)
trainer.train()
[60/60 00:28, Epoch 3/4]
Epoch Training Loss Validation Loss Accuracy Recall Roc Auc
0 0.671200 0.669483 0.738000 0.989247 0.346690
1 0.611900 0.612094 0.744000 1.000000 0.634871
2 0.604300 0.583503 0.748000 1.000000 0.862273
3 0.577600 0.572888 0.750000 1.000000 0.885774

TrainOutput(global_step=60, training_loss=0.6141141653060913, metrics={'train_runtime': 29.0371, 'train_samples_per_second': 137.755, 'train_steps_per_second': 2.066, 'total_flos': 8.103429948063744e+16, 'train_loss': 0.6141141653060913, 'epoch': 3.8095238095238093})

Step4: Prediction

out = trainer.predict(ctx['test'])
out
[ 1/250 : < :]
PredictionOutput(predictions=array([[-0.14711462,  0.28146246],
       [-0.0999002 ,  0.31734663],
       [-0.13772886,  0.21717918],
       ...,
       [-0.10952485,  0.2599906 ],
       [-0.08912332,  0.41566697],
       [-0.13212559,  0.30518785]], dtype=float32), label_ids=None, metrics={'test_runtime': 4.4363, 'test_samples_per_second': 901.655, 'test_steps_per_second': 56.353})
logits = out.predictions
has_cactus = torch.tensor(logits).softmax(dim=1).numpy()[:,1]
has_cactus
array([0.60553384, 0.6028243 , 0.58780724, ..., 0.59134185, 0.6235844 ,
       0.6076187 ], dtype=float32)
test_csv['has_cactus']= has_cactus
test_csv
id has_cactus
0 000940378805c44108d287872b2f04ce.jpg 0.605534
1 0017242f54ececa4512b4d7937d1e21e.jpg 0.602824
2 001ee6d8564003107853118ab87df407.jpg 0.587807
3 002e175c3c1e060769475f52182583d0.jpg 0.561108
4 0036e44a7e8f7218e9bc7bf8137e4943.jpg 0.578930
... ... ...
3995 ffaafd0c9f2f0e73172848463bc2e523.jpg 0.612775
3996 ffae37344310a1549162493237d25d3f.jpg 0.650841
3997 ffbd469c56873d064326204aac546e0d.jpg 0.591342
3998 ffcb76b7d47f29ece11c751e5f763f52.jpg 0.623584
3999 fffed17d1a8e0433a934db518d7f532c.jpg 0.607619

4000 rows × 2 columns

Step1 ~ Step4

train_csv = pd.read_csv("./data/train.csv")
test_csv = pd.read_csv("./data/sample_submission.csv")
#---#
# Step1: Data
ctx_train = datasets.Dataset.from_pandas(train_csv)
ctx_test = datasets.Dataset.from_pandas(test_csv).remove_columns(['has_cactus'])
ctx_train = ctx_train.map(lambda example: {'path': "./data/train/" + example['id']})
ctx_test = ctx_test.map(lambda example: {'path': "./data/test/" + example['id']})
ctx = datasets.DatasetDict({
    'train':ctx_train,
    'test':ctx_test
})
compose = torchvision.transforms.Compose([
    lambda path: PIL.Image.open(path),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Resize((224,224))
])
def w_trans(examples):
    # train: examples = {'id':[xx,xxx,....], 'has_cactus':[yy,yyy,...], 'path':[zz,zzz,...]}
    # train: examples = {'id':[xx,xxx,....], 'path':[zz,zzz,...]}
    dct = dict()
    dct['pixel_values'] = torch.stack(list(map(compose, examples['path'])))
    try: 
        dct['labels']= torch.tensor(examples['has_cactus'])
    except:
        pass
    return dct 
ctx = ctx.with_transform(w_trans)
# Step2: Model
model = transformers.AutoModelForImageClassification.from_pretrained(
    "microsoft/resnet-50",
    num_labels=2,
    ignore_mismatched_sizes=True,
)
# Step3: Train
data_collator = transformers.DefaultDataCollator()
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits,axis=1)
    predictions_scores = torch.tensor(logits).softmax(dim=1).numpy()[:,1]
    acc = evaluate.load("accuracy")
    rec = evaluate.load("recall")
    roc_auc = evaluate.load("roc_auc")
    dct1 = acc.compute(predictions = predictions, references = labels) # {'accuracy':???}
    dct2 = rec.compute(predictions = predictions, references = labels) # {'recall':???}
    dct3 = roc_auc.compute(prediction_scores = predictions_scores, references = labels) # {'roc_auc':???}
    return dct1|dct2|dct3# {'accuracy':???, 'recall':???, 'roc_auc':???}
training_args = transformers.TrainingArguments(
    output_dir="asdf",
    remove_unused_columns=False,
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=16,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=16,
    num_train_epochs=4,
    warmup_ratio=0.1,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="roc_auc",
    push_to_hub=False,
    report_to="none"
)
trainer = transformers.Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=ctx["train"].select(range(1000)),
    eval_dataset=ctx["train"].select(range(1000,1500)),
    compute_metrics=compute_metrics,
)
trainer.train()
# Step4: Prediction
out = trainer.predict(ctx['test'])
logits = out.predictions
has_cactus = torch.tensor(logits).softmax(dim=1).numpy()[:,1]
test_csv['has_cactus']= has_cactus
Map:   0%|          | 0/17500 [00:00<?, ? examples/s]Map: 100%|██████████| 17500/17500 [00:00<00:00, 63327.18 examples/s]
Map: 100%|██████████| 4000/4000 [00:00<00:00, 87028.23 examples/s]
Some weights of ResNetForImageClassification were not initialized from the model checkpoint at microsoft/resnet-50 and are newly initialized because the shapes did not match:
- classifier.1.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([2]) in the model instantiated
- classifier.1.weight: found shape torch.Size([1000, 2048]) in the checkpoint and torch.Size([2, 2048]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[60/60 00:27, Epoch 3/4]
Epoch Training Loss Validation Loss Accuracy Recall Roc Auc
0 0.688900 0.662976 0.762000 0.916667 0.753990
1 0.627400 0.621370 0.752000 0.997312 0.866389
2 0.615200 0.598107 0.758000 1.000000 0.913243
3 0.592600 0.589067 0.760000 1.000000 0.929183

B. 자유로운 코드

Step1: Datasets

train_csv = pd.read_csv("./data/train.csv")
test_csv = pd.read_csv("./data/sample_submission.csv")
train_csv2 = pd.read_csv("./data/train.csv")
test_csv2 = pd.read_csv("./data/sample_submission.csv")
train_csv2['path'] = ['./data/train/'+l for l in train_csv.id]
test_csv2['path'] = ['./data/test/'+l for l in test_csv.id]
train_csv2 = train_csv2.loc[:,['has_cactus','path']]
test_csv2 = test_csv2.loc[:,['path']]
ctx = datasets.DatasetDict(
    {
        'train': datasets.Dataset.from_pandas(train_csv2),
        'test':datasets.Dataset.from_pandas(test_csv2)
    }
)
ctx
DatasetDict({
    train: Dataset({
        features: ['has_cactus', 'path'],
        num_rows: 17500
    })
    test: Dataset({
        features: ['path'],
        num_rows: 4000
    })
})

Step2~3 과정의 간보기

model = transformers.AutoModelForImageClassification.from_pretrained(
    "microsoft/resnet-50",
    num_labels=2,
    ignore_mismatched_sizes=True,
)
Some weights of ResNetForImageClassification were not initialized from the model checkpoint at microsoft/resnet-50 and are newly initialized because the shapes did not match:
- classifier.1.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([2]) in the model instantiated
- classifier.1.weight: found shape torch.Size([1000, 2048]) in the checkpoint and torch.Size([2, 2048]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
single_batch =  [ctx['train'][0], ctx['train'][1]]
single_batch
[{'has_cactus': 1,
  'path': './data/train/0004be2cfeaba1c0361d39e2b000257b.jpg'},
 {'has_cactus': 1,
  'path': './data/train/000c8a36845c0208e833c79c1bffedd1.jpg'}]
compose = torchvision.transforms.Compose([
    lambda path: PIL.Image.open(path),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Resize((224,224))
])
def collate_fn(single_batch):
    dct = dict()
    dct['pixel_values'] = torch.stack([compose(o['path']) for o in single_batch])
    try: 
        dct['labels'] = torch.tensor([o['has_cactus'] for o in single_batch])
    except:
        pass
    return dct 
model(**collate_fn(single_batch))
ImageClassifierOutputWithNoAttention(loss=tensor(0.6771, grad_fn=<NllLossBackward0>), logits=tensor([[-0.0403,  0.0007],
        [-0.0529, -0.0292]], grad_fn=<AddmmBackward0>), hidden_states=None)

Step1~4

train_csv = pd.read_csv("./data/train.csv")
test_csv = pd.read_csv("./data/sample_submission.csv")
train_csv2 = pd.read_csv("./data/train.csv")
test_csv2 = pd.read_csv("./data/sample_submission.csv")
#---#
# Step1: Data
train_csv2['path'] = ['./data/train/'+l for l in train_csv.id]
test_csv2['path'] = ['./data/test/'+l for l in test_csv.id]
train_csv2 = train_csv2.loc[:,['has_cactus','path']]
test_csv2 = test_csv2.loc[:,['path']]
ctx = datasets.DatasetDict(
    {
        'train': datasets.Dataset.from_pandas(train_csv2),
        'test':datasets.Dataset.from_pandas(test_csv2)
    }
)
# Step2: Model
model = transformers.AutoModelForImageClassification.from_pretrained(
    "microsoft/resnet-50",
    num_labels=2,
    ignore_mismatched_sizes=True,
)
# Step3: Train
def collate_fn(single_batch):
    dct = dict()
    dct['pixel_values'] = torch.stack([compose(o['path']) for o in single_batch])
    try: 
        dct['labels'] = torch.tensor([o['has_cactus'] for o in single_batch])
    except:
        pass
    return dct 
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits,axis=1)
    predictions_scores = torch.tensor(logits).softmax(dim=1).numpy()[:,1]
    acc = evaluate.load("accuracy")
    rec = evaluate.load("recall")
    roc_auc = evaluate.load("roc_auc")
    dct1 = acc.compute(predictions = predictions, references = labels) # {'accuracy':???}
    dct2 = rec.compute(predictions = predictions, references = labels) # {'recall':???}
    dct3 = roc_auc.compute(prediction_scores = predictions_scores, references = labels) # {'roc_auc':???}
    return dct1|dct2|dct3# {'accuracy':???, 'recall':???, 'roc_auc':???}
training_args = transformers.TrainingArguments(
    output_dir="asdf",
    remove_unused_columns=False,
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=16,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=16,
    num_train_epochs=4,
    warmup_ratio=0.1,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="roc_auc",
    push_to_hub=False,
    report_to="none"
)
trainer = transformers.Trainer(
    model=model,
    args=training_args,
    data_collator=collate_fn,
    train_dataset=ctx["train"].select(range(1000)),
    eval_dataset=ctx["train"].select(range(1000,1500)),
    compute_metrics=compute_metrics,
)
trainer.train()
# Step4: Prediction
out = trainer.predict(ctx['test'])
logits = out.predictions
has_cactus = torch.tensor(logits).softmax(dim=1).numpy()[:,1]
test_csv['has_cactus']= has_cactus
Some weights of ResNetForImageClassification were not initialized from the model checkpoint at microsoft/resnet-50 and are newly initialized because the shapes did not match:
- classifier.1.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([2]) in the model instantiated
- classifier.1.weight: found shape torch.Size([1000, 2048]) in the checkpoint and torch.Size([2, 2048]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[60/60 00:27, Epoch 3/4]
Epoch Training Loss Validation Loss Accuracy Recall Roc Auc
0 0.698900 0.686095 0.656000 0.873656 0.345493
1 0.639900 0.631235 0.746000 1.000000 0.702390
2 0.627000 0.609499 0.744000 1.000000 0.841104
3 0.607400 0.598909 0.744000 1.000000 0.885753

A1. 작년강의노트

https://guebin.github.io/MP2023/ – tabular data 분석 위주의 수업