import datasetsimport transformersimport evaluateimport numpy as npimport pandas as pd import torch from sklearn.model_selection import train_test_split
/home/cgb3/anaconda3/envs/hf/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
df_train['label'] = [int("F"in l) for l in df_train.type]df_test['label'] = df_test.type.map(lambda x: int("F"in x))
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6940/6940 [00:02<00:00, 2420.93 examples/s]
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1735/1735 [00:00<00:00, 2450.23 examples/s]
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
["I prefer making decisions based on logic and facts. Analyzing situations objectively is more important to me than considering emotions.", "When making decisions, I think it’s most important to consider how others feel. Understanding and empathizing with the situation is always my top priority. I feel happiest when relationships are harmonious, and I try to maintain an emotional balance in everything I do."]
['I prefer making decisions based on logic and facts. Analyzing situations objectively is more important to me than considering emotions.',
'When making decisions, I think it’s most important to consider how others feel. Understanding and empathizing with the situation is always my top priority. I feel happiest when relationships are harmonious, and I try to maintain an emotional balance in everything I do.']
(풀이)
## Step4 강인공지능 = 강인공지능생성하기("sentiment-analysis", model="my_awesome_model/checkpoint-868")texts = ["I prefer making decisions based on logic and facts. Analyzing situations objectively is more important to me than considering emotions.", "When making decisions, I think it’s most important to consider how others feel. Understanding and empathizing with the situation is always my top priority. I feel happiest when relationships are harmonious, and I try to maintain an emotional balance in everything I do."]강인공지능(texts)
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
sms_spam['train'][-3] 는 훈련 데이터의 마지막에서 세번째 항목을 출력한다. 출력된 샘플은 다음과 같다.
sms_spam['train'][-3]
{'sms': 'FREE camera phones with linerental from 4.49/month with 750 cross ntwk mins. 1/2 price txt bundle deals also avble. Call 08001950382 or call2optout/J MF\n',
'label': 1}
출력된 샘플은 딕셔너리 형식으로, sms 항목에는 “FREE camera phones with linerental from 4.49/month…”와 같은 문장이 담겨 있다. 이 문장은 스팸(Spam) 메시지로 분류되며, label 항목에 1로 저장되어 있다.
label이 나타내는 분류는 다음과 같이 정의된다:
sms_spam['train'].features['label'].names
['ham', 'spam']
분류 레이블은 총 2가지로 나누며, 각각의 레이블은 다음과 같이 정의된다:
{0: 'ham', 1: 'spam'}
{0: 'ham', 1: 'spam'}
따라서, 문장 “FREE camera phones with linerental…”의 분류는 label이 1이므로, 스팸(Spam)에 해당한다.
/home/cgb3/anaconda3/envs/hf/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
#데이터콜렉터(sms_spam_preprocessed['test'][::100])인공지능(**데이터콜렉터( [dict(label=dct['label'][i], input_ids=dct['input_ids'][i],attention_mask=dct['attention_mask'][i]) for i inrange(12) ]).to("cuda:0"))