05장. MLM (문제

조조링 2024. 11. 28. 09:38

728x90

5장에서는 BERT, DistilBERT, ALBERT와 같은 모델을 활용하여 Hugging Face의 pipeline으로 [MASK] 토큰의 단어를 예측하는 방법과 각 모델의 특징을 소개한다. 이를 통해 MLM(Masked Language Model) 작업의 기본 사용법과 모델 간의 성능 차이를 이해할 수 있다.

문제24. BERT 로딩 MLM 파이프라인

bert-base-uncased 모델을 사용해서 다음 [MASK] 토큰의 단어를 예측해보세요.
" MLM and NSP is the [MASK] task of BERT."

BERT란?

BERT(Bidirectional Encoder Representations from Transformers)는 Google에서 개발한 사전학습 언어 모델로, 양방향 학습을 통해 문맥의 의미를 깊이 이해한다. 특히, MLM과 NSP(Next Sentence Prediction) 두 가지 작업을 통해 강력한 자연어 처리 성능을 보여준다.

파이프라인을 사용해서 다음과 같이 불러와 실행한다.

# transformers 라이브러리에서 pipeline 불로오기
from transformers import pipeline

# pipeline에 과업(fill-mask) 및 모델 지정
unmasker = pipeline('fill-mask', model='bert-base-uncased')

# pipeline을 인스턴스화한 변수 unmasker에 [MASK] 토큰을 지닌 입력문장 투입
unmasker("MLM and NSP is the [MASK] task of BERT.")

# 결과
# [{'score': 0.25727880001068115,
#   'token': 2364,
#   'token_str': 'main',
#   'sequence': 'mlm and nsp is the main task of bert.'},
#  {'score': 0.20740646123886108,
#   'token': 3078,
#   'token_str': 'primary',
#   'sequence': 'mlm and nsp is the primary task of bert.'},
#  {'score': 0.06773324310779572,
#   'token': 2034,
#   'token_str': 'first',
#   'sequence': 'mlm and nsp is the first task of bert.'},
#  {'score': 0.06548510491847992,
#   'token': 2430,
#   'token_str': 'central',
#   'sequence': 'mlm and nsp is the central task of bert.'},
#  {'score': 0.06167399138212204,
#   'token': 3937,
#   'token_str': 'basic',
#   'sequence': 'mlm and nsp is the basic task of bert.'}]

[MASK] 토큰에 빈칸 채우기 작업을 한 결과 main, primary, first, central, basic 등의 결과를 얻었다.
문맥에 맞는 자연스러운 단어가 선택된 것을 볼 수 있다.

문제25. DistilBERT 로딩 MLM 파이프라인

distilbert-base-uncased 모델을 사용해서 다음 [MASK] 토큰의 단어를 예측해 보세요.
" MLM and NSP is the [MASK] taks of BERT"

DistilBERT란?

DistilBERT는 BERT의 경량화 버전으로, 모델 크기를 40% 줄이면서도 BERT 성능의 약 97%를 유지한다. 처리 속도가 빠르고 메모리 사용량이 적어, 실시간 애플리케이션에 적합하다.

from transformers import pipeline

# 모델명이 바뀌었음에 유의
unmasker = pipeline('fill-mask', model='distilbert-base-uncased')
unmasker("MLM and NSP is the [MASK] task of BERT.")

# [{'score': 0.25902509689331055,
#   'token': 3078,
#   'token_str': 'primary',
#   'sequence': 'mlm and nsp is the primary task of bert.'},
#  {'score': 0.16309888660907745,
#   'token': 2364,
#   'token_str': 'main',
#   'sequence': 'mlm and nsp is the main task of bert.'},
#  {'score': 0.08182783424854279,
#   'token': 4563,
#   'token_str': 'core',
#   'sequence': 'mlm and nsp is the core task of bert.'},
#  {'score': 0.0402376614511013,
#   'token': 7037,
#   'token_str': 'dual',
#   'sequence': 'mlm and nsp is the dual task of bert.'},
#  {'score': 0.02484487183392048,
#   'token': 4054,
#   'token_str': 'principal',
#   'sequence': 'mlm and nsp is the principal task of bert.'}]

DistilBERT는 primary, main과 같은 단어를 포함해 BERT와 비슷한 결과를 제공한다.
추가적으로 core, dual, principal과 같은 단어를 예측하며 다양성을 보여준다.
모델의 크기를 줄였지만 문맥 이해력은 BERT에 근접함을 볼 수 있다.

문제26. ALBERT 로딩 MLM 파이프라인

albert-base-v2 모델을 사용해서 다음 [Mask] 토큰의 단어를 예측해 보세요.
" MLM and NSP is the [MASK] taks of BERT"

ALBERT란?

ALBERT(A Lite BERT)는 BERT의 경량화된 버전으로, 매개변수 공유와 팩터라이즈 임베딩(Factorized Embedding)을 통해 모델 크기를 대폭 줄였다. 이를 통해 메모리 효율성을 개선하고 학습 속도를 향상시켰다.

from transformers import pipeline

# 모델명이 바뀌었음에 유의
unmasker = pipeline('fill-mask', model='albert-base-v2')
unmasker("mlm and nsp is the [MASK] task of bert.")

# 결과
# [{'score': 0.04760139808058739,
#   'token': 6612,
#   'token_str': 'ultimate',
#   'sequence': 'mlm and nsp is the ultimate task of bert.'},
#  {'score': 0.02447248436510563,
#   'token': 20766,
#   'token_str': 'hardest',
#   'sequence': 'mlm and nsp is the hardest task of bert.'},
#  {'score': 0.023495197296142578,
#   'token': 1256,
#   'token_str': 'primary',
#   'sequence': 'mlm and nsp is the primary task of bert.'},
#  {'score': 0.02157515101134777,
#   'token': 407,
#   'token_str': 'main',
#   'sequence': 'mlm and nsp is the main task of bert.'},
#  {'score': 0.01808810606598854,
#   'token': 18369,
#   'token_str': 'foremost',
#   'sequence': 'mlm and nsp is the foremost task of bert.'}]

ALBERT는 ultimate, hardest, foremost 같은 독창적인 단어를 예측하며 다른 모델과의 차별점을 보인다.
기존의 primary, main 같은 단어도 예측하며 문맥 이해력을 유지한다.

728x90