ABC부트캠프

[22일차] BERT를 활용한 7가지 감정분류 모델 구현 및 테스트

수야! 2025. 7. 26. 17:42

BERT를 활용한 7가지 감정 분류 모델 구축과 실습

자연어 처리(NLP) 기술은 이제 일상 곳곳에서 사용되고 있습니다. 특히 사람의 감정을 이해하고 분류하는 기술은 고객 서비스, 여론 분석, 추천 시스템 등에서 중요한 역할을 합니다. 이번 글에서는 BERT 모델을 활용하여 7가지 감정을 분류하는 다중 감정 분류 모델을 구축하고, 직접 데이터를 학습시켜 예측까지 해보는 실습 과정을 해봤습니다.

저는 이 날 몸 상태가 좋지않아 수업을 듣지못해 교수님의 자료를 바탕으로 독학해서 니즈가 다를 수 있다는 점 참고부탁드립니다.

📌 프로젝트 개요

사용 모델: BERT (bert-base-multilingual-cased)
감정 클래스: happiness, neutral, sadness, angry, surprise, disgust, fear (총 7개)
데이터 출처: AIHub 감성대화 음성 데이터셋
프레임워크: TensorFlow, Huggingface Transformers
응용: 웹앱 배포까지 포함

1. 자연어 처리 모델의 발전 요약

모델 유형특징 및 장단점

📊 통계 기반 (BoW, TF-IDF)	문맥 정보 부족, 단순 빈도 계산
🔤 Word Embedding (Word2Vec, GloVe)	의미적 유사성 반영, 문맥은 무시
🔁 RNN / LSTM / GRU	문맥 반영 가능, 장기 의존 어려움
🎯 Attention & Transformer	병렬처리 가능, 문맥 이해 우수
🧠 사전학습 모델 (BERT, GPT)	다양한 NLP 태스크에서 강력한 성능
💡 거대 언어 모델 (GPT-4 등)	자연스러운 생성 가능, 고사양 요구

2. 감정 데이터 전처리

✅ 데이터 구조

발화문 (발화문)과 감정 라벨 (상황)
CSV 형식으로 제공
총 4만 건 이상 → 실습에서는 30%만 샘플링 (약 5천 건)

✅ 전처리 흐름

import pandas as pd
from sklearn import preprocessing

# 데이터 불러오기
data = pd.read_csv('파일이름.csv')

# 감정 라벨을 숫자로 변환
label_encoder = preprocessing.LabelEncoder()
data['감정'] = label_encoder.fit_transform(data['상황'])

# 샘플링
sample_data = data.sample(frac=0.3, random_state=777)

# X, y 분리
texts = sample_data['발화문']
labels = sample_data['감정']

3. 학습/테스트 분리 및 토큰화

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    texts, labels, test_size=0.3, random_state=777
)

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')

# 사용자 정의 토큰화 함수 예시
def convert_data_tokenizer(X, y, max_len, tokenizer):
    input_ids, attention_masks = [], []
    for text in X:
        enc = tokenizer.encode_plus(
            text, add_special_tokens=True, max_length=max_len, truncation=True,
            padding='max_length', return_attention_mask=True, return_tensors='tf'
        )
        input_ids.append(enc['input_ids'])
        attention_masks.append(enc['attention_mask'])
    return tf.convert_to_tensor(input_ids), tf.convert_to_tensor(y)

train_x, train_y = convert_data_tokenizer(X_train, y_train, 128, tokenizer)
test_x, test_y = convert_data_tokenizer(X_test, y_test, 128, tokenizer)

4. BERT 모델 학습

from transformers import TFBertForSequenceClassification
import tensorflow as tf

model = TFBertForSequenceClassification.from_pretrained(
    'bert-base-multilingual-cased', num_labels=7, from_pt=True
)

optimizer = tf.keras.optimizers.Adam(1e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')

model.compile(optimizer=optimizer, loss=loss, metrics=[metric])

history = model.fit(train_x, train_y, epochs=5, batch_size=1, validation_data=(test_x, test_y))

📈 학습 결과 시각화

import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Accuracy')
plt.legend(['train', 'val'])
plt.show()

5. 모델 평가

from sklearn.metrics import classification_report

y_pred = model.predict(test_x)
preds = np.argmax(y_pred.logits, axis=1)

print(classification_report(test_y, preds))

6. 실시간 감정 예측 함수 구현

from transformers import TextClassificationPipeline

text_classifier = TextClassificationPipeline(
    tokenizer=tokenizer, model=model, framework='tf', return_all_scores=True
)

def predict_text(text):
    preds_list = text_classifier(text)[0]
    sorted_preds = sorted(preds_list, key=lambda x: x['score'], reverse=True)
    idx = int(sorted_preds[0]['label'].split('_')[1])
    label = label_encoder.classes_[idx]
    score = sorted_preds[0]['score']
    print(f"[{text}] → 정확도 {score*100:.2f}%로 '{label}' 감정으로 분류됨")

7. 모델 저장 및 배포

model.save_pretrained('/content/bert-emotion')
tokenizer.save_pretrained('/content/bert-emotion')

☁️ Hugging Face Spaces 배포

Space 생성 후 /src에 다음 파일 업로드:
- config.json, tokenizer_config.json, tf_model.h5, vocab.txt, app.py
requirements.txt, Dockerfile 포함

✨ 마무리

이번 프로젝트에서는 BERT를 활용해 7가지 감정을 분류하는 모델을 직접 학습시키고, 예측함수를 통해 결과를 확인하였습니다. 실시간 감정 예측 시스템은 고객 피드백 분석, AI 상담 시스템, 감정 기반 추천 시스템 등 다양한 곳에 활용할 수 있다고합니다.

'ABC부트캠프' 카테고리의 다른 글

[24일차] AI활용 명함만들기 및 팀프로젝트 (4)	2025.07.26
[23일차] AI를 활용한 자기소개서 및 사업소개서 작성법 (13)	2025.07.26
[21일차] 취업역량강화-건양대학교 (9)	2025.07.21
[20일차] ESG포럼 & 세미나(3) (7)	2025.07.20
[19일차] ABC 부트캠프 진로컨설 기관 탐방 (14)	2025.07.17

현재글[22일차] BERT를 활용한 7가지 감정분류 모델 구현 및 테스트

수야님의 블로그

susuyaa 님의 블로그 입니다.

인공지능활용, 데이터분석, 데이터 시각화, 데이터 분석, ESG포럼, 생성형ai활용, 대전교통공사, 대한상공회의소, 파이썬기본문법, 고용노동부, 딥러닝, 유클리드소프트, abc부트캠프6기, 데이터분석부트캠프, 파이썬기초, 데이터 분석 기초, 미래내일일경험, 피그마작업, CLIP모델, abc부트캠프,

Today :
Yesterday :

수야님의 블로그

[22일차] BERT를 활용한 7가지 감정분류 모델 구현 및 테스트

BERT를 활용한 7가지 감정 분류 모델 구축과 실습

📌 프로젝트 개요

1. 자연어 처리 모델의 발전 요약

2. 감정 데이터 전처리

✅ 데이터 구조

✅ 전처리 흐름

3. 학습/테스트 분리 및 토큰화

4. BERT 모델 학습

📈 학습 결과 시각화

5. 모델 평가

6. 실시간 감정 예측 함수 구현

7. 모델 저장 및 배포

☁️ Hugging Face Spaces 배포

✨ 마무리

'ABC부트캠프' 카테고리의 다른 글

'ABC부트캠프'의 다른글

티스토리툴바

« 2026/04 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

[22일차] BERT를 활용한 7가지 감정분류 모델 구현 및 테스트

BERT를 활용한 7가지 감정 분류 모델 구축과 실습

📌 프로젝트 개요

1. 자연어 처리 모델의 발전 요약

2. 감정 데이터 전처리

✅ 데이터 구조

✅ 전처리 흐름

3. 학습/테스트 분리 및 토큰화

4. BERT 모델 학습

📈 학습 결과 시각화

5. 모델 평가

6. 실시간 감정 예측 함수 구현

7. 모델 저장 및 배포

☁️ Hugging Face Spaces 배포

✨ 마무리

'ABC부트캠프' 카테고리의 다른 글

'ABC부트캠프'의 다른글

관련글

티스토리툴바