프로젝트명 : 한글 워드클라우드 만들기¶

1. 한글자연어 처리 라이브러리 설치¶

# !pip install KoNLPy

from konlpy.tag import Twitter 
from collections import Counter

2. 데이터 불러오기¶

file = open('텍스트파일 경로', 'r')

lists = file.readlines()

file.close()

lists

3. 형태소 분석¶

twitter = Twitter() 
morphs = [] 

for sentence in lists: 
    morphs.append(twitter.pos(sentence)) 
print(morphs)

noun_adj_adv_list=[] 
for sentence in morphs : 
    for word, tag in sentence : 
        if tag in ['Noun'] and ("것" not in word) and ("내" not in word)and ("나" not in word)and ("수"not in word) and("게"not in word)and("말"not in word): 
            noun_adj_adv_list.append(word) 

print(noun_adj_adv_list)

count = Counter(noun_adj_adv_list)

words = dict(count.most_common())

4. 워드클라우드 만들기¶

워드클라우드 라이브러리 설치

# !pip install WordCloud

from wordcloud import WordCloud 
import matplotlib.pyplot as plt 
import nltk 
from nltk.corpus import stopwords

%matplotlib inline 

import matplotlib 
from matplotlib import rc
rc('font', family='NanumBarunGothic')

from wordcloud import WordCloud

wordcloud = WordCloud(
    font_path = '/Library/Fonts/NanumBarunGothic.ttf',    # 맥에선 한글폰트 설정 잘해야함.
    background_color='white',                             # 배경 색깔 정하기
    colormap = 'Accent_r',                                # 폰트 색깔 정하기
    width = 800,
    height = 800
)

wordcloud_words = wordcloud.generate_from_frequencies(words)

array = wordcloud.to_array()
print(type(array)) # numpy.ndarray
print(array.shape) # (800, 800, 3)

fig = plt.figure(figsize=(10, 10))
plt.imshow(array, interpolation="bilinear")
plt.axis('off')
plt.show()
fig.savefig('business_anlytics_worldcloud.png')

<class 'numpy.ndarray'>
(800, 800, 3)

Python 한글워드클라우드 만들기

프로젝트명 : 한글 워드클라우드 만들기¶

1. 한글자연어 처리 라이브러리 설치¶

2. 데이터 불러오기¶

3. 형태소 분석¶

4. 워드클라우드 만들기¶