자연어 처리 - RNN 가사 생성

2023년 01월 13일 1 분 소요

가사 생성 모델 - RNN

가사 일부를 제시하면 여러 단어를 출력하도록 학습(생성형 맛보기)

전처리

In [1]:

from keras.preprocessing.text import Tokenizer
from keras.utils import pad_sequences, to_categorical

In [2]:

text = '''저 별을 따다가 니 귀에 걸어주고파
저 달 따다가 니 목에 걸어주고파
세상 모든 좋은 것만 해주고 싶은
이런 내 맘을 그댄 아나요'''

In [3]:

tok = Tokenizer()
tok.fit_on_texts([text])
vocab_size = len(tok.word_index) + 1 # zero padding
vocab_size

Out [3]:

In [4]:

seq_list = []
for sentence in text.split('\n'):
    res = tok.texts_to_sequences([sentence])[0]
    for i in range(1, len(res)):
        seq = res[:i+1]
        seq_list.append(seq)
        
# 최대문장길이 구해 zero padding
max_len = max(len(sent) for sent in seq_list)
seq_padded = pad_sequences(seq_list, maxlen = max_len)

# X, y 나누기
X = seq_padded[:, :-1]
y = seq_padded[:, -1]

# 원-핫 인코딩
y_hot = to_categorical(y, num_classes=vocab_size)

딥러닝

In [5]:

import numpy as np
from keras.models import Sequential
from keras.layers import Embedding, Dense, SimpleRNN

In [6]:

model = Sequential([
    Embedding(vocab_size, 10),
    SimpleRNN(32),
    Dense(vocab_size, activation='softmax')
])

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

history = model.fit(X, y_hot, epochs=1000, verbose=1)

Out [6]:

Epoch 1/1000
1/1 [==============================] - 1s 679ms/step - loss: 3.0037 - accuracy: 0.1053
Epoch 2/1000
1/1 [==============================] - 0s 5ms/step - loss: 2.9955 - accuracy: 0.1053
Epoch 3/1000
1/1 [==============================] - 0s 4ms/step - loss: 2.9874 - accuracy: 0.1579
Epoch 4/1000
1/1 [==============================] - 0s 4ms/step - loss: 2.9791 - accuracy: 0.1579
Epoch 5/1000
1/1 [==============================] - 0s 18ms/step - loss: 2.9708 - accuracy: 0.1579

...

Epoch 996/1000
1/1 [==============================] - 0s 3ms/step - loss: 0.0767 - accuracy: 0.9474
Epoch 997/1000
1/1 [==============================] - 0s 3ms/step - loss: 0.0767 - accuracy: 0.9474
Epoch 998/1000
1/1 [==============================] - 0s 2ms/step - loss: 0.0767 - accuracy: 0.9474
Epoch 999/1000
1/1 [==============================] - 0s 3ms/step - loss: 0.0767 - accuracy: 0.9474
Epoch 1000/1000
1/1 [==============================] - 0s 3ms/step - loss: 0.0767 - accuracy: 0.9474

In [7]:

def generate_sentence(model, starting_word, tok, n):
    sentence = starting_word
    
    # 단어 print n회 반복, 문장 생성
    for dummy in range(n):
        encoded = tok.texts_to_sequences([starting_word])[0]
        padded = pad_sequences([encoded], maxlen=max_len)
        res = model.predict(padded, verbose=0)
        res_softmax = np.argmax(res, axis=1)
        
        # for word, index in tok.word_index.items():
        #     if res_softmax == index:
        #         break
        word = tok.sequences_to_texts([res_softmax])[0]
        starting_word = starting_word + ' ' + word
        sentence = sentence + ' ' + word

    return sentence

In [8]:

generate_sentence(model, '저', tok, 2)

Out [8]:

'저 별을 따다가'

In [9]:

generate_sentence(model, '저', tok, 8)

Out [9]:

'저 별을 따다가 니 귀에 걸어주고파 그댄 그댄 아나요'

In [10]:

generate_sentence(model, '저', tok, 20)

Out [10]:

'저 별을 따다가 니 귀에 걸어주고파 그댄 그댄 아나요 내 맘을 그댄 아나요 목에 모든 좋은 목에 걸어주고파 그댄 아나요 걸어주고파'

Twitter Facebook LinkedIn

Kwak Jisub

자연어 처리 - RNN 가사 생성

가사 생성 모델 - RNN

전처리

딥러닝

공유하기

댓글남기기

참고

python max 함수의 매개변수

python round 함수의 문제

python set 자료형과 메소드

MySQL GROUP BY와 MAX