AI/자연어처리

NLTK(Natural Language Toolkit)

A띠 2024. 9. 1. 20:46

NLTK(Natural Language Toolkit)는 자연어 처리 및 문서 분석용 파이썬 패키지다. 

https://www.nltk.org/install.html

 

NLTK :: Installing NLTK

Installing NLTK NLTK requires Python versions 3.8, 3.9, 3.10, 3.11 or 3.12. For Windows users, it is strongly recommended that you go through this guide to install Python 3 successfully https://docs.python-guide.org/starting/install3/win/#install3-windows

www.nltk.org

 

* tokenizing : 텍스트 -> 의미가 있는 가장 작은 언어단위(토큰)으로 나누기 -> 전처리

 

!pip install nltk
import nltk
from nltk.tokenize import WordPunctTokenizer
from nltk.tokenize import TreebankWordTokenizer
from nltk.tokenize import word_tokenize
nltk.download('punkt')