Ⅸ. Text classification

던지는 쿼리가 별로 안변하는데 우리는 계속 새로운 결과를 원함

	doc	query
IR	standing	varying
classification	varying	standing

주기적으로 업데이트 해줘야함

standing query : 이거 뭔지 찾아보기

spam filtering 또한 text classification 문제임

Naïve Bayes text classification

▪ Multinomial ▪ Bernoulli

(feature selection은 안함)

document의 classification은 어떻게 할 것인가

: bag of words에서 찾자 (words의 position을 고려하지 않는다.)

분류하기 전에 이미 class가 정해져있어야한다 (spam filter : class가 2개)

Untitled

머신러닝으로 분류된 document를 학습시켜야하는데 → word들이다.

learning, intelligence, algorithm, reinforcement, … 등이 나오면 ML이라고 학습시켜야한다.

각 document에 label을 assign한다.

ex: finance, sports, news, …