Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
Tags
- mes
- 코테
- 카카오
- Alignments
- python 갯수세기
- 스택
- Scienceplots
- KAKAO
- Python
- knn
- PAPER
- iNT
- TypeError
- mMAPE
- Mae
- MAPE
- Tire
- n_neighbors
- 파이썬을파이썬답게
- SMAPE
- RMES
- 논문
- 논문editor
- Overleaf
- 논문작성
- 평가지표
- Pycaret
- n_sample
- 에러해결
- 프로그래머스
Archives
- Today
- Total
EunGyeongKim
[ML] 이동평균법 (moving average) & 지수평활법 (exponential smoothing) 본문
ML & DL/시계열
[ML] 이동평균법 (moving average) & 지수평활법 (exponential smoothing)
EunGyeongKim 2023. 3. 30. 16:42이동평균법
이동평균법 : 자신의 과거 값에서 일정한 패턴을 파악하여 자신의 미래값을 예측하는 방법
- 과거 데이터에 동일한 가중치를 주는 방식
- 과거 일정기간(N)의 평균을 이용하여 다음시점을 예측하는 방법
- 기간(N)을 작게하면 이동 평균값에 최근 데이터의 경향이 많이 반영됨.
- 기간(N)을 크게하면 과거 데이터의 경향을 많이 반영되게 됨
\( L_t = (D_t + D_{t-1} + \cdots + D_{t-N+1})/N \)
\( = \frac{1}{N}\sum_{i=t+1-N}^{N}D_i \)
- 이동평균법은 주가 예측에 빈번히 이용됨
지수평활법
지수평활법 : 현재보다 멀리 떨어진 데이터일수록 낮은 가중치를 주는 방법
- 초기값( \( L_0 \) ) 이 있을 때 단순 지수평활법에 대한 예측 방법
- \( L_0 \) (초기값) 계산 : \( L_0 = \frac{1}{N} \sum^{N}_{i=1}D_i \)
- \( L_t \) 계산 : \( L_t = \alpha D_t + (1-\alpha)L_{t-1} \)
- \( L_t \) 예측 : \( F_{t+1} = L_t, F_{t+n} = L_t \)
이중지수평활법(double exponential smoothing)
- 추세가 있는 시계열 데이터 예측에 적합
- 단일지수평활법을 두법 적용함.
- 최초 기울기 ( \( B_0 \) )와 절편 ( \( L_0 \) ) 은 이용가능한 데이터를 회귀분석하여 구함
- 이때 종속변수는 관측값이고 독립변수는 시간 인덱스임
- 최초절편을 첫번째 데이터( \( D_0 \) ) 사용하거나, 기울기를 첫번째 데이터와 두번쨰 데이터의 차이 ( \( D_1 - D_0 \) ) 을 사용하기도 함
- 여기서 지수평활법을 이용한 예측은 예측 시점에 따라 예측값이 달라짐
- 절편 : \( L_t = \alpha D_t + (1-\alpha)(L_{t-1} + B_{t-1}) \)
- 기울기 : \( B_t = \beta ( L_t - L_{t-1} ) + (1-\beta)B_{t-1} \)
- 예측값 : \( F_{t+1} = L_t + B_t \)
- \( F_{t+2} = L_t + 2 * B_t \)
- 이때 종속변수는 관측값이고 독립변수는 시간 인덱스임
code
In [1]:
!pip install finance-datareader
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: finance-datareader in /usr/local/lib/python3.9/dist-packages (0.9.50)
Requirement already satisfied: requests>=2.3.0 in /usr/local/lib/python3.9/dist-packages (from finance-datareader) (2.27.1)
Requirement already satisfied: pandas>=0.19.2 in /usr/local/lib/python3.9/dist-packages (from finance-datareader) (1.4.4)
Requirement already satisfied: lxml in /usr/local/lib/python3.9/dist-packages (from finance-datareader) (4.9.2)
Requirement already satisfied: tqdm in /usr/local/lib/python3.9/dist-packages (from finance-datareader) (4.65.0)
Requirement already satisfied: requests-file in /usr/local/lib/python3.9/dist-packages (from finance-datareader) (1.5.1)
Requirement already satisfied: numpy>=1.18.5 in /usr/local/lib/python3.9/dist-packages (from pandas>=0.19.2->finance-datareader) (1.22.4)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.9/dist-packages (from pandas>=0.19.2->finance-datareader) (2022.7.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.9/dist-packages (from pandas>=0.19.2->finance-datareader) (2.8.2)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests>=2.3.0->finance-datareader) (2022.12.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests>=2.3.0->finance-datareader) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests>=2.3.0->finance-datareader) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests>=2.3.0->finance-datareader) (1.26.15)
Requirement already satisfied: six in /usr/local/lib/python3.9/dist-packages (from requests-file->finance-datareader) (1.16.0)
In [2]:
import numpy as np
import pandas as pd
import FinanceDataReader as fdr
pd.set_option('display.max_rows', 560)
In [3]:
etf = fdr.StockListing("ETF/KR") # laod etf list
df = fdr.DataReader('360750', '2020-08-01') # tiger 미국 S&P 500
df
Out[3]:
Open | High | Low | Close | Volume | Change | |
---|---|---|---|---|---|---|
Date | ||||||
2020-08-07 | 9774 | 9789 | 9716 | 9761 | 126092 | NaN |
2020-08-10 | 9754 | 9794 | 9749 | 9785 | 212309 | 0.002459 |
2020-08-11 | 9789 | 9799 | 9774 | 9795 | 83181 | 0.001022 |
2020-08-12 | 9730 | 9754 | 9692 | 9757 | 68040 | -0.003880 |
2020-08-13 | 9834 | 9834 | 9789 | 9805 | 163224 | 0.004920 |
... | ... | ... | ... | ... | ... | ... |
2023-03-24 | 12725 | 12830 | 12725 | 12830 | 690195 | 0.009044 |
2023-03-27 | 12965 | 13050 | 12950 | 13050 | 991106 | 0.017147 |
2023-03-28 | 12985 | 12995 | 12930 | 12990 | 463006 | -0.004598 |
2023-03-29 | 12990 | 13075 | 12935 | 13070 | 1903031 | 0.006159 |
2023-03-30 | 13210 | 13210 | 13140 | 13155 | 804951 | 0.006503 |
653 rows × 6 columns
In [4]:
m_price = df['Close'] # 종가 데이터
df1 = m_price.to_frame() # series 데이터를 dataframe으로 변환
df1 = df1.reset_index() # index를 column으로 변환
t = list(range(1, len(df1) + 1)) # time 변수 추가
df1['t'] = pd.DataFrame(t, columns = ['t'])
df1
Out[4]:
Date | Close | t | |
---|---|---|---|
0 | 2020-08-07 | 9761 | 1 |
1 | 2020-08-10 | 9785 | 2 |
2 | 2020-08-11 | 9795 | 3 |
3 | 2020-08-12 | 9757 | 4 |
4 | 2020-08-13 | 9805 | 5 |
... | ... | ... | ... |
648 | 2023-03-24 | 12830 | 649 |
649 | 2023-03-27 | 13050 | 650 |
650 | 2023-03-28 | 12990 | 651 |
651 | 2023-03-29 | 13070 | 652 |
652 | 2023-03-30 | 13155 | 653 |
653 rows × 3 columns
In [5]:
from statsmodels.formula.api import ols
stock = ols('Close ~ t', data=df1).fit() # ols : Create a Model from a formula and dataframe.
print(stock.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Close R-squared: 0.606
Model: OLS Adj. R-squared: 0.605
Method: Least Squares F-statistic: 1000.
Date: Thu, 30 Mar 2023 Prob (F-statistic): 1.09e-133
Time: 08:03:30 Log-Likelihood: -5336.2
No. Observations: 653 AIC: 1.068e+04
Df Residuals: 651 BIC: 1.069e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1.038e+04 67.216 154.389 0.000 1.02e+04 1.05e+04
t 5.6318 0.178 31.624 0.000 5.282 5.981
==============================================================================
Omnibus: 94.126 Durbin-Watson: 0.022
Prob(Omnibus): 0.000 Jarque-Bera (JB): 27.118
Skew: 0.196 Prob(JB): 1.29e-06
Kurtosis: 2.082 Cond. No. 756.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [6]:
x = df1['t']
ypred = stock.predict(x)
yactual = df1['Close']
df1['yym'] = df1['Date'].dt.to_period('D')
df1['ypred'] = ypred
In [7]:
%matplotlib inline
import matplotlib.pyplot as plt
fig, ax = plt.subplots(nrows=1, figsize=(8,4), sharex=True)
# sharex = Controls sharing of properties among x (*sharex*) or y (*sharey*
# 모든 서브플롯이 같은 x축 눈금을 사용하도록.
df1.plot(x = 'yym', y='Close', ax =ax)
df1.plot(x = 'yym', y='ypred', ax =ax)
ax.set(xlabel='time', ylabel='index')
ax.grid('on', which='minor', axis = 'x')
ax.grid('off', which='major', axis = 'x')
plt.show()
이동평균으로 주가 예측¶
In [8]:
from statsmodels.compat import lzip
import statsmodels.api as sm
from statsmodels.formula.api import ols
df = fdr.DataReader('000660', '2010-10-01') # sk 하이닉스 주가
df = df[['Close']]
df1 = df.copy()
def ma(dframe, n):
dframe['MA{}'.format(n)] = dframe.loc[:, ('Close')].rolling(window=n).mean().shift(1)
In [9]:
ma(df1, 5) # 5 moving average
ma(df1, 60) # 60 moving average
ma(df1, 120) # 120 moving average
In [10]:
df1 = df1.reset_index()
df1 = df1.dropna()
In [11]:
fig, ax= plt.subplots(nrows=1, figsize=(12, 6), sharex=True)
df1.plot(x='Date', y='Close', ax = ax)
df1.plot(x='Date', y='MA5', ax = ax)
df1.plot(x='Date', y='MA60', ax = ax)
df1.plot(x='Date', y='MA120', ax = ax)
ax.set(xlabel='time', ylabel='price')
ax.grid(True)
plt.show()
단일지수평활법으로 주가 예측¶
In [12]:
df1a = df1.reset_index()
df1a['es'] = None # 단순지수평활법으로 구한 자료 위치
In [13]:
# 초기값, index 0~9 까지 평균
df1a.loc[df1a.index[9], 'es'] = df1a.loc[df1a.index[0:10], 'Close'].mean()
In [14]:
alpha = .4
for i in range(10, df1a.shape[0]):
df1a.loc[df1a.index[i], 'es'] = alpha * df1a.loc[df1a.index[i], 'Close'] + (1-alpha) * df1a.loc[df1a.index[i-1], 'es']
In [15]:
df1a['forecast'] = df1a['es'].shift(1)
df1a
Out[15]:
index | Date | Close | MA5 | MA60 | MA120 | es | forecast | |
---|---|---|---|---|---|---|---|---|
0 | 120 | 2011-03-25 | 29900 | 28610.0 | 27793.333333 | 25700.000000 | None | NaN |
1 | 121 | 2011-03-28 | 30900 | 28880.0 | 27905.833333 | 25762.083333 | None | NaN |
2 | 122 | 2011-03-29 | 31000 | 29360.0 | 28035.833333 | 25832.083333 | None | NaN |
3 | 123 | 2011-03-30 | 31550 | 29990.0 | 28153.333333 | 25900.833333 | None | NaN |
4 | 124 | 2011-03-31 | 31300 | 30490.0 | 28275.000000 | 25968.750000 | None | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2958 | 3078 | 2023-03-24 | 87300 | 85340.0 | 86910.000000 | 86551.666667 | 86610.58724 | 86150.978733 |
2959 | 3079 | 2023-03-27 | 85500 | 86000.0 | 87081.666667 | 86605.833333 | 86166.352344 | 86610.58724 |
2960 | 3080 | 2023-03-28 | 88400 | 86360.0 | 87223.333333 | 86625.833333 | 87059.811406 | 86166.352344 |
2961 | 3081 | 2023-03-29 | 86900 | 87320.0 | 87430.000000 | 86644.166667 | 86995.886844 | 87059.811406 |
2962 | 3082 | 2023-03-30 | 88800 | 87320.0 | 87628.333333 | 86620.000000 | 87717.532106 | 86995.886844 |
2963 rows × 8 columns
In [16]:
fig, ax= plt.subplots(nrows=1, figsize=(12, 6), sharex=True)
df1a.plot(x='Date', y='Close', ax = ax)
df1a.plot(x='Date', y='forecast', ax = ax)
ax.set(xlabel='time', ylabel='price')
ax.grid(True)
plt.show()
In [17]:
fig, ax= plt.subplots(nrows=1, figsize=(12, 6), sharex=True)
df1a[-100:].plot(x='Date', y='Close', ax = ax)
df1a[-100:].plot(x='Date', y='forecast', ax = ax)
ax.set(xlabel='time', ylabel='price')
ax.grid(True)
plt.show()
이중지수평활법¶
In [39]:
# 30개 데이터를 이용해 초기 추세 & 절편 생성
df = df[['Close']]
df1 = df.iloc[0:31, :]
df1 = df1.reset_index()
In [40]:
df1['level'] = None
df1['trend'] = None
In [43]:
t = list(range(1, len(df1)+1))
result = ols("Close ~ t", data=df1).fit() # 30개 데이터를 이용해 regression
beta = result.params
beta
Out[43]:
Intercept 23001.935484
t 10.060484
dtype: float64
In [44]:
df2 = df.copy()
In [45]:
df2.loc[df2.index[30], 'level'] = beta[0]
df2.loc[df2.index[30], 'trend'] = beta[1]
In [56]:
n = 31
alpha = .4
beta = .4
for i in range(n, len(df2)):
df2.loc[df2.index[i], 'level']= alpha * df2.loc[df2.index[i], 'Close'] + (1-alpha)*(df2.loc[df2.index[i-1], 'level']+df2.loc[df2.index[i-1], 'trend'])
df2.loc[df2.index[i], 'trend']= beta *(df2.loc[df2.index[i],'level'] - df2.loc[df2.index[i-1], 'level']) + (1-beta)*(df2.loc[df2.index[i-1],'trend'])
In [57]:
df2['forecast'] = (df2['level'] + df2['trend']).shift(1)
df2a = df2.dropna()
df2a = df2a.reset_index()
In [58]:
fig, ax= plt.subplots(nrows=1, figsize=(12, 6), sharex=True)
df2a.plot(x='Date', y='Close', ax = ax)
df2a.plot(x='Date', y='forecast', ax = ax)
ax.set(xlabel='time', ylabel='price')
ax.grid(True)
plt.show()
reference
정호성, 『파이썬을 이용한 경제 및 금융 데이터 분석』, 자유아카데미(2023.1.31)
'ML & DL > 시계열' 카테고리의 다른 글
시계열 데이터셋 저장소 정리 (0) | 2024.02.09 |
---|---|
[ML] DTW (Dynamic Time Warping) (0) | 2024.02.03 |
[ML] 시계열분석 (0) | 2023.03.30 |
Comments