1 분 소요


가사 생성 모델 - RNN

가사 일부를 제시하면 여러 단어를 출력하도록 학습(생성형 맛보기)

전처리

In [1]:
from keras.preprocessing.text import Tokenizer
from keras.utils import pad_sequences, to_categorical
In [2]:
text = '''저 별을 따다가 니 귀에 걸어주고파
저 달 따다가 니 목에 걸어주고파
세상 모든 좋은 것만 해주고 싶은
이런 내 맘을 그댄 아나요'''
In [3]:
tok = Tokenizer()
tok.fit_on_texts([text])
vocab_size = len(tok.word_index) + 1 # zero padding
vocab_size
Out [3]:
20
In [4]:
seq_list = []
for sentence in text.split('\n'):
    res = tok.texts_to_sequences([sentence])[0]
    for i in range(1, len(res)):
        seq = res[:i+1]
        seq_list.append(seq)
        
# 최대문장길이 구해 zero padding
max_len = max(len(sent) for sent in seq_list)
seq_padded = pad_sequences(seq_list, maxlen = max_len)

# X, y 나누기
X = seq_padded[:, :-1]
y = seq_padded[:, -1]

# 원-핫 인코딩
y_hot = to_categorical(y, num_classes=vocab_size)

딥러닝

In [5]:
import numpy as np
from keras.models import Sequential
from keras.layers import Embedding, Dense, SimpleRNN
In [6]:
model = Sequential([
    Embedding(vocab_size, 10),
    SimpleRNN(32),
    Dense(vocab_size, activation='softmax')
])

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

history = model.fit(X, y_hot, epochs=1000, verbose=1)
Out [6]:
Epoch 1/1000
1/1 [==============================] - 1s 679ms/step - loss: 3.0037 - accuracy: 0.1053
Epoch 2/1000
1/1 [==============================] - 0s 5ms/step - loss: 2.9955 - accuracy: 0.1053
Epoch 3/1000
1/1 [==============================] - 0s 4ms/step - loss: 2.9874 - accuracy: 0.1579
Epoch 4/1000
1/1 [==============================] - 0s 4ms/step - loss: 2.9791 - accuracy: 0.1579
Epoch 5/1000
1/1 [==============================] - 0s 18ms/step - loss: 2.9708 - accuracy: 0.1579

...

Epoch 996/1000
1/1 [==============================] - 0s 3ms/step - loss: 0.0767 - accuracy: 0.9474
Epoch 997/1000
1/1 [==============================] - 0s 3ms/step - loss: 0.0767 - accuracy: 0.9474
Epoch 998/1000
1/1 [==============================] - 0s 2ms/step - loss: 0.0767 - accuracy: 0.9474
Epoch 999/1000
1/1 [==============================] - 0s 3ms/step - loss: 0.0767 - accuracy: 0.9474
Epoch 1000/1000
1/1 [==============================] - 0s 3ms/step - loss: 0.0767 - accuracy: 0.9474

In [7]:
def generate_sentence(model, starting_word, tok, n):
    sentence = starting_word
    
    # 단어 print n회 반복, 문장 생성
    for dummy in range(n):
        encoded = tok.texts_to_sequences([starting_word])[0]
        padded = pad_sequences([encoded], maxlen=max_len)
        res = model.predict(padded, verbose=0)
        res_softmax = np.argmax(res, axis=1)
        
        # for word, index in tok.word_index.items():
        #     if res_softmax == index:
        #         break
        word = tok.sequences_to_texts([res_softmax])[0]
        starting_word = starting_word + ' ' + word
        sentence = sentence + ' ' + word

    return sentence
In [8]:
generate_sentence(model, '저', tok, 2)
Out [8]:
'저 별을 따다가'
In [9]:
generate_sentence(model, '저', tok, 8)
Out [9]:
'저 별을 따다가 니 귀에 걸어주고파 그댄 그댄 아나요'
In [10]:
generate_sentence(model, '저', tok, 20)
Out [10]:
'저 별을 따다가 니 귀에 걸어주고파 그댄 그댄 아나요 내 맘을 그댄 아나요 목에 모든 좋은 목에 걸어주고파 그댄 아나요 걸어주고파'

태그: ,

카테고리:

업데이트:

댓글남기기