3 분 소요


신경망 적용해보기

보스턴 주택 가격 예측(회귀)

In [1]:
from keras.datasets.boston_housing import load_data

데이터 살펴보기

In [2]:
(x_train, y_train), (x_test, y_test) = load_data()
Out [2]:
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz
57026/57026 [==============================] - 0s 0us/step

In [3]:
x_train.shape, y_train.shape
Out [3]:
((404, 13), (404,))
In [4]:
x_test.shape, y_test.shape
Out [4]:
((102, 13), (102,))
  • 데이터 전처리
In [5]:
import numpy as np
In [6]:
# 데이터 표준화(train 데이터 기준) - 평균이 0 , 표준편차 1
mean = np.mean(x_train)
std = np.std(x_train)

x_train = (x_train - mean) / std
x_test = (x_test - mean) / std
In [7]:
from scipy.sparse import random
# 검증 데이터셋 생성
from sklearn.model_selection import train_test_split

x_train, x_val, y_train, y_val = train_test_split(x_train, y_train,
                                                  test_size=0.33, random_state=777)

모델 구성하기

In [8]:
from keras.models import Sequential
from keras.layers import Dense
In [9]:
model = Sequential()

model.add(Dense(64, activation='relu', input_shape=(13,))) # 13개의 컬럼이 들어와 64개의 출력
model.add(Dense(32, activation='relu'))
model.add(Dense(1)) # 하나의 값을 출력

model.compile(optimizer='adam', loss='mse', metrics=['mae']) # 손실함수: 평균제곱오차, 평가지표: 평균절대오차

모델 학습, 평가하기

In [10]:
history = model.fit(x_train, y_train, epochs=300, validation_data=(x_val, y_val))
Out [10]:
Epoch 1/300
9/9 [==============================] - 3s 80ms/step - loss: 596.1290 - mae: 22.4754 - val_loss: 590.5110 - val_mae: 22.7881
Epoch 2/300
9/9 [==============================] - 0s 11ms/step - loss: 565.7350 - mae: 21.7832 - val_loss: 561.3708 - val_mae: 22.1293
Epoch 3/300
9/9 [==============================] - 0s 11ms/step - loss: 540.0158 - mae: 21.1660 - val_loss: 536.5116 - val_mae: 21.5437
Epoch 4/300
9/9 [==============================] - 0s 27ms/step - loss: 513.6401 - mae: 20.4988 - val_loss: 506.3064 - val_mae: 20.8039
Epoch 5/300
9/9 [==============================] - 0s 22ms/step - loss: 480.5768 - mae: 19.6510 - val_loss: 468.0194 - val_mae: 19.8249

...

Epoch 296/300
9/9 [==============================] - 0s 7ms/step - loss: 52.0947 - mae: 4.8748 - val_loss: 42.3154 - val_mae: 4.4607
Epoch 297/300
9/9 [==============================] - 0s 6ms/step - loss: 52.1622 - mae: 4.9142 - val_loss: 42.2300 - val_mae: 4.4162
Epoch 298/300
9/9 [==============================] - 0s 6ms/step - loss: 52.4207 - mae: 4.7707 - val_loss: 42.2607 - val_mae: 4.3350
Epoch 299/300
9/9 [==============================] - 0s 8ms/step - loss: 52.2057 - mae: 4.9110 - val_loss: 42.4618 - val_mae: 4.5610
Epoch 300/300
9/9 [==============================] - 0s 6ms/step - loss: 52.2071 - mae: 4.9038 - val_loss: 42.1081 - val_mae: 4.3488

In [11]:
model.evaluate(x_test, y_test)
Out [11]:
4/4 [==============================] - 0s 3ms/step - loss: 50.3834 - mae: 5.1484

[50.38343811035156, 5.148377895355225]
  • K-폴드 사용하기
In [12]:
from keras.datasets.boston_housing import load_data
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
from sklearn.model_selection import KFold
In [13]:
(x_train, y_train), (x_test, y_test) = load_data(seed=777)
In [14]:
# 데이터 표준화
mean = np.mean(x_test)
std = np.std(x_train)
x_train = (x_train - mean) / std
x_test = (x_test - mean) / std
In [15]:
# k-fold 정의
k = 3

kfold = KFold(n_splits=k, shuffle=True, random_state=777)

# 모델 반환하는 함수
def get_model():
    model = Sequential()
    model.add(Dense(64, activation='relu', input_shape=(13,)))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(1))

    model.compile(optimizer='adam', loss='mse', metrics=['mae'])

    return model

# 테스트셋을 평가 후 mae를 담을 list
mae_list = []

# k-fold 진행
for train_idx, val_idx in kfold.split(x_train):
    x_train_fold, x_val_fold = x_train[train_idx], x_train[val_idx]
    y_train_fold, y_val_fold = y_train[train_idx], y_train[val_idx]

    # 모델 불러와 학습
    model = get_model()
    model.fit(x_train_fold, y_train_fold, epochs=300,
              validation_data=(x_val_fold, y_val_fold))
    
    # 모델 평가하기
    _, test_mae = model.evaluate(x_test, y_test) # mse는 값을 받지 않으므로 '_'으로 지정
    mae_list.append(test_mae)
Out [15]:
Epoch 1/300
9/9 [==============================] - 1s 20ms/step - loss: 541.0774 - mae: 21.5961 - val_loss: 586.9243 - val_mae: 21.9948
Epoch 2/300
9/9 [==============================] - 0s 6ms/step - loss: 508.1025 - mae: 20.7814 - val_loss: 556.3811 - val_mae: 21.2500

...

Epoch 299/300
9/9 [==============================] - 0s 6ms/step - loss: 37.3497 - mae: 3.9863 - val_loss: 61.1344 - val_mae: 5.2268
Epoch 300/300
9/9 [==============================] - 0s 6ms/step - loss: 36.7971 - mae: 4.0410 - val_loss: 60.5647 - val_mae: 5.2416
4/4 [==============================] - 0s 3ms/step - loss: 48.0752 - mae: 5.0197
Epoch 1/300
9/9 [==============================] - 1s 20ms/step - loss: 633.1082 - mae: 23.4207 - val_loss: 604.9627 - val_mae: 22.7810
Epoch 2/300
9/9 [==============================] - 0s 5ms/step - loss: 598.6494 - mae: 22.6384 - val_loss: 572.4277 - val_mae: 22.0217

...

Epoch 299/300
9/9 [==============================] - 0s 6ms/step - loss: 45.7972 - mae: 4.5456 - val_loss: 42.3953 - val_mae: 4.5343
Epoch 300/300
9/9 [==============================] - 0s 8ms/step - loss: 44.9339 - mae: 4.4730 - val_loss: 42.2676 - val_mae: 4.4774
4/4 [==============================] - 0s 3ms/step - loss: 46.2064 - mae: 4.9409
Epoch 1/300
9/9 [==============================] - 1s 19ms/step - loss: 592.7634 - mae: 22.3774 - val_loss: 504.3690 - val_mae: 20.9569
Epoch 2/300
9/9 [==============================] - 0s 8ms/step - loss: 545.8619 - mae: 21.2595 - val_loss: 462.6132 - val_mae: 19.8989

...


Epoch 299/300
9/9 [==============================] - 0s 9ms/step - loss: 49.7210 - mae: 4.9123 - val_loss: 35.8649 - val_mae: 4.0969
Epoch 300/300
9/9 [==============================] - 0s 8ms/step - loss: 49.8888 - mae: 4.6956 - val_loss: 36.7722 - val_mae: 4.2856
4/4 [==============================] - 0s 4ms/step - loss: 45.8212 - mae: 5.0074

In [16]:
print(mae_list)
print(np.mean(mae_list))
Out [16]:
[5.019698619842529, 4.940866947174072, 5.007435321807861]
4.989333629608154

Reference

  • 이 포스트는 SeSAC 인공지능 자연어처리, 컴퓨터비전 기술을 활용한 응용 SW 개발자 양성 과정 - 심선조 강사님의 강의를 정리한 내용입니다.

댓글남기기