Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

Today

Total

관리 메뉴

EunGyeongKim

[ML] 이동평균법 (moving average) & 지수평활법 (exponential smoothing) 본문

ML & DL/시계열

[ML] 이동평균법 (moving average) & 지수평활법 (exponential smoothing)

EunGyeongKim 2023. 3. 30. 16:42

이동평균법

이동평균법 : 자신의 과거 값에서 일정한 패턴을 파악하여 자신의 미래값을 예측하는 방법

과거 데이터에 동일한 가중치를 주는 방식
과거 일정기간(N)의 평균을 이용하여 다음시점을 예측하는 방법
- 기간(N)을 작게하면 이동 평균값에 최근 데이터의 경향이 많이 반영됨.
- 기간(N)을 크게하면 과거 데이터의 경향을 많이 반영되게 됨

\( L_t = (D_t + D_{t-1} + \cdots + D_{t-N+1})/N \)

\( = \frac{1}{N}\sum_{i=t+1-N}^{N}D_i \)

이동평균법은 주가 예측에 빈번히 이용됨

지수평활법

지수평활법 : 현재보다 멀리 떨어진 데이터일수록 낮은 가중치를 주는 방법

초기값( \( L_0 \) ) 이 있을 때 단순 지수평활법에 대한 예측 방법
- \( L_0 \) (초기값) 계산 : \( L_0 = \frac{1}{N} \sum^{N}_{i=1}D_i \)
- \( L_t \) 계산 : \( L_t = \alpha D_t + (1-\alpha)L_{t-1} \)
- \( L_t \) 예측 : \( F_{t+1} = L_t, F_{t+n} = L_t \)

이중지수평활법(double exponential smoothing)

추세가 있는 시계열 데이터 예측에 적합
단일지수평활법을 두법 적용함.
최초 기울기 ( \( B_0 \) )와 절편 ( \( L_0 \) ) 은 이용가능한 데이터를 회귀분석하여 구함
- 이때 종속변수는 관측값이고 독립변수는 시간 인덱스임
  - 최초절편을 첫번째 데이터( \( D_0 \) ) 사용하거나, 기울기를 첫번째 데이터와 두번쨰 데이터의 차이 ( \( D_1 - D_0 \) ) 을 사용하기도 함
- 여기서 지수평활법을 이용한 예측은 예측 시점에 따라 예측값이 달라짐
  - 절편 : \( L_t = \alpha D_t + (1-\alpha)(L_{t-1} + B_{t-1}) \)
  - 기울기 : \( B_t = \beta ( L_t - L_{t-1} ) + (1-\beta)B_{t-1} \)
  - 예측값 : \( F_{t+1} = L_t + B_t \)
  - \( F_{t+2} = L_t + 2 * B_t \)

code

In [1]:

!pip install finance-datareader

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: finance-datareader in /usr/local/lib/python3.9/dist-packages (0.9.50)
Requirement already satisfied: requests>=2.3.0 in /usr/local/lib/python3.9/dist-packages (from finance-datareader) (2.27.1)
Requirement already satisfied: pandas>=0.19.2 in /usr/local/lib/python3.9/dist-packages (from finance-datareader) (1.4.4)
Requirement already satisfied: lxml in /usr/local/lib/python3.9/dist-packages (from finance-datareader) (4.9.2)
Requirement already satisfied: tqdm in /usr/local/lib/python3.9/dist-packages (from finance-datareader) (4.65.0)
Requirement already satisfied: requests-file in /usr/local/lib/python3.9/dist-packages (from finance-datareader) (1.5.1)
Requirement already satisfied: numpy>=1.18.5 in /usr/local/lib/python3.9/dist-packages (from pandas>=0.19.2->finance-datareader) (1.22.4)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.9/dist-packages (from pandas>=0.19.2->finance-datareader) (2022.7.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.9/dist-packages (from pandas>=0.19.2->finance-datareader) (2.8.2)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests>=2.3.0->finance-datareader) (2022.12.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests>=2.3.0->finance-datareader) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests>=2.3.0->finance-datareader) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests>=2.3.0->finance-datareader) (1.26.15)
Requirement already satisfied: six in /usr/local/lib/python3.9/dist-packages (from requests-file->finance-datareader) (1.16.0)

In [2]:

import numpy as np
import pandas as pd
import FinanceDataReader as fdr
pd.set_option('display.max_rows', 560)

In [3]:

etf = fdr.StockListing("ETF/KR") # laod etf list

df = fdr.DataReader('360750', '2020-08-01') # tiger 미국 S&P 500 
df

Out[3]:

	Open	High	Low	Close	Volume	Change
Date
2020-08-07	9774	9789	9716	9761	126092	NaN
2020-08-10	9754	9794	9749	9785	212309	0.002459
2020-08-11	9789	9799	9774	9795	83181	0.001022
2020-08-12	9730	9754	9692	9757	68040	-0.003880
2020-08-13	9834	9834	9789	9805	163224	0.004920
...	...	...	...	...	...	...
2023-03-24	12725	12830	12725	12830	690195	0.009044
2023-03-27	12965	13050	12950	13050	991106	0.017147
2023-03-28	12985	12995	12930	12990	463006	-0.004598
2023-03-29	12990	13075	12935	13070	1903031	0.006159
2023-03-30	13210	13210	13140	13155	804951	0.006503

653 rows × 6 columns

In [4]:

m_price = df['Close'] # 종가 데이터
df1 = m_price.to_frame() # series 데이터를 dataframe으로 변환
df1 = df1.reset_index() # index를 column으로 변환
t = list(range(1, len(df1) + 1)) # time 변수 추가
df1['t'] = pd.DataFrame(t, columns = ['t'])
df1

Out[4]:

	Date	Close	t
0	2020-08-07	9761	1
1	2020-08-10	9785	2
2	2020-08-11	9795	3
3	2020-08-12	9757	4
4	2020-08-13	9805	5
...	...	...	...
648	2023-03-24	12830	649
649	2023-03-27	13050	650
650	2023-03-28	12990	651
651	2023-03-29	13070	652
652	2023-03-30	13155	653

653 rows × 3 columns

In [5]:

from statsmodels.formula.api import ols

stock = ols('Close ~ t', data=df1).fit() # ols : Create a Model from a formula and dataframe.
print(stock.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Close   R-squared:                       0.606
Model:                            OLS   Adj. R-squared:                  0.605
Method:                 Least Squares   F-statistic:                     1000.
Date:                Thu, 30 Mar 2023   Prob (F-statistic):          1.09e-133
Time:                        08:03:30   Log-Likelihood:                -5336.2
No. Observations:                 653   AIC:                         1.068e+04
Df Residuals:                     651   BIC:                         1.069e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   1.038e+04     67.216    154.389      0.000    1.02e+04    1.05e+04
t              5.6318      0.178     31.624      0.000       5.282       5.981
==============================================================================
Omnibus:                       94.126   Durbin-Watson:                   0.022
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               27.118
Skew:                           0.196   Prob(JB):                     1.29e-06
Kurtosis:                       2.082   Cond. No.                         756.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

In [6]:

x = df1['t']
ypred = stock.predict(x)
yactual = df1['Close']
df1['yym'] = df1['Date'].dt.to_period('D')
df1['ypred'] = ypred

In [7]:

%matplotlib inline
import matplotlib.pyplot as plt


fig, ax = plt.subplots(nrows=1, figsize=(8,4), sharex=True)
# sharex = Controls sharing of properties among x (*sharex*) or y (*sharey*
# 모든 서브플롯이 같은 x축 눈금을 사용하도록.

df1.plot(x = 'yym', y='Close', ax =ax)
df1.plot(x = 'yym', y='ypred', ax =ax)
ax.set(xlabel='time', ylabel='index')
ax.grid('on', which='minor', axis = 'x')
ax.grid('off', which='major', axis = 'x')
plt.show()

이동평균으로 주가 예측¶

In [8]:

from statsmodels.compat import lzip
import statsmodels.api as sm
from statsmodels.formula.api import ols

df = fdr.DataReader('000660', '2010-10-01') # sk 하이닉스 주가 
df = df[['Close']]
df1 = df.copy()

def ma(dframe, n):
    dframe['MA{}'.format(n)] = dframe.loc[:, ('Close')].rolling(window=n).mean().shift(1)

In [9]:

ma(df1, 5) # 5 moving average
ma(df1, 60) # 60 moving average
ma(df1, 120) # 120 moving average

In [10]:

df1 = df1.reset_index()
df1 = df1.dropna()

In [11]:

fig, ax= plt.subplots(nrows=1, figsize=(12, 6), sharex=True)
df1.plot(x='Date', y='Close', ax = ax)
df1.plot(x='Date', y='MA5', ax = ax)
df1.plot(x='Date', y='MA60', ax = ax)
df1.plot(x='Date', y='MA120', ax = ax)
ax.set(xlabel='time', ylabel='price')
ax.grid(True)
plt.show()

단일지수평활법으로 주가 예측¶

In [12]:

df1a = df1.reset_index()
df1a['es'] = None # 단순지수평활법으로 구한 자료 위치

In [13]:

# 초기값, index 0~9 까지 평균
df1a.loc[df1a.index[9], 'es'] = df1a.loc[df1a.index[0:10], 'Close'].mean() 

In [14]:

alpha = .4

for i in range(10, df1a.shape[0]):
    df1a.loc[df1a.index[i], 'es'] = alpha * df1a.loc[df1a.index[i], 'Close'] + (1-alpha) * df1a.loc[df1a.index[i-1], 'es']

In [15]:

df1a['forecast'] = df1a['es'].shift(1)
df1a

Out[15]:

	index	Date	Close	MA5	MA60	MA120	es	forecast
0	120	2011-03-25	29900	28610.0	27793.333333	25700.000000	None	NaN
1	121	2011-03-28	30900	28880.0	27905.833333	25762.083333	None	NaN
2	122	2011-03-29	31000	29360.0	28035.833333	25832.083333	None	NaN
3	123	2011-03-30	31550	29990.0	28153.333333	25900.833333	None	NaN
4	124	2011-03-31	31300	30490.0	28275.000000	25968.750000	None	NaN
...	...	...	...	...	...	...	...	...
2958	3078	2023-03-24	87300	85340.0	86910.000000	86551.666667	86610.58724	86150.978733
2959	3079	2023-03-27	85500	86000.0	87081.666667	86605.833333	86166.352344	86610.58724
2960	3080	2023-03-28	88400	86360.0	87223.333333	86625.833333	87059.811406	86166.352344
2961	3081	2023-03-29	86900	87320.0	87430.000000	86644.166667	86995.886844	87059.811406
2962	3082	2023-03-30	88800	87320.0	87628.333333	86620.000000	87717.532106	86995.886844

2963 rows × 8 columns

In [16]:

fig, ax= plt.subplots(nrows=1, figsize=(12, 6), sharex=True)
df1a.plot(x='Date', y='Close', ax = ax)
df1a.plot(x='Date', y='forecast', ax = ax)
ax.set(xlabel='time', ylabel='price')
ax.grid(True)
plt.show()

In [17]:

fig, ax= plt.subplots(nrows=1, figsize=(12, 6), sharex=True)
df1a[-100:].plot(x='Date', y='Close', ax = ax)
df1a[-100:].plot(x='Date', y='forecast', ax = ax)
ax.set(xlabel='time', ylabel='price')
ax.grid(True)
plt.show()

이중지수평활법¶

In [39]:

# 30개 데이터를 이용해 초기 추세 & 절편 생성
df = df[['Close']]
df1 = df.iloc[0:31, :]
df1 = df1.reset_index()

In [40]:

df1['level'] = None
df1['trend'] = None

In [43]:

t = list(range(1, len(df1)+1))
result = ols("Close ~ t", data=df1).fit() # 30개 데이터를 이용해 regression
beta = result.params
beta

Out[43]:

Intercept    23001.935484
t               10.060484
dtype: float64

In [44]:

df2 = df.copy()

In [45]:

df2.loc[df2.index[30], 'level']  = beta[0]
df2.loc[df2.index[30], 'trend']  = beta[1]

In [56]:

n = 31
alpha = .4
beta = .4
for i in range(n, len(df2)):
    
    df2.loc[df2.index[i], 'level']= alpha * df2.loc[df2.index[i], 'Close'] + (1-alpha)*(df2.loc[df2.index[i-1], 'level']+df2.loc[df2.index[i-1], 'trend'])
    df2.loc[df2.index[i], 'trend']= beta *(df2.loc[df2.index[i],'level'] - df2.loc[df2.index[i-1], 'level']) + (1-beta)*(df2.loc[df2.index[i-1],'trend'])

In [57]:

df2['forecast'] = (df2['level'] + df2['trend']).shift(1)
df2a = df2.dropna()
df2a = df2a.reset_index()

In [58]:

fig, ax= plt.subplots(nrows=1, figsize=(12, 6), sharex=True)
df2a.plot(x='Date', y='Close', ax = ax)
df2a.plot(x='Date', y='forecast', ax = ax)
ax.set(xlabel='time', ylabel='price')
ax.grid(True)
plt.show()

reference

정호성, 『파이썬을 이용한 경제 및 금융 데이터 분석』, 자유아카데미(2023.1.31)

저작자표시

'ML & DL > 시계열' 카테고리의 다른 글

시계열 데이터셋 저장소 정리 (0)	2024.02.09
[ML] DTW (Dynamic Time Warping) (0)	2024.02.03
[ML] 시계열분석 (0)	2023.03.30

'ML & DL/시계열' Related Articles

Comments

« 2024/11 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30