Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

Today

Total

관리 메뉴

EunGyeongKim

불황과 호황 예측(로짓 알고리즘) 본문

기타 공부/금융

불황과 호황 예측(로짓 알고리즘)

EunGyeongKim 2023. 3. 27. 19:10

In [1]:

import datetime
import requests
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup

In [2]:

key = 'key'
url = 'https://ecos.bok.or.kr/api/StatisticTableList/'+key+'/xml/kr/1/10000'
raw = requests.get(url)
xml = BeautifulSoup(raw.text, 'xml')
raw_data = xml.find_all('row')

data = []
for i in range(len(raw_data)):
    p_stat_code = raw_data[i].P_STAT_CODE.string.strip()
    stat_code = raw_data[i].STAT_CODE.string.strip()
    stat_name = raw_data[i].STAT_NAME.string.strip()
    cycle = raw_data[i].find('CYCLE').text
    used = raw_data[i].SRCH_YN.string.strip()
    org_name = raw_data[i].ORG_NAME.string

    total=[p_stat_code, stat_code, stat_name, cycle, used, org_name]

    data.append(total)

In [3]:

df = pd.DataFrame(data, columns=['p_stat_code','stat_code','stat_name','cycle','used','org_name'])
df.to_csv("bok_total_list.csv", encoding='CP949')

In [4]:

df1 = df[df['used'].isin(['Y'])]
stat_code = df1['stat_code'].tolist()
len(stat_code)

Out[4]:

In [5]:

# 세부 통계목록
data=[]

for i in range(len(stat_code)):
    code = stat_code[i]
    url = 'https://ecos.bok.or.kr/api/StatisticItemList/'+key+'/xml/kr/1/100/'+str(code)+'/'
    raw = requests.get(url)
    xml = BeautifulSoup(raw.text, 'xml')
    raw_data = xml.find_all('row')

    for j in range(len(raw_data)):
        stat_code1 = raw_data[j].STAT_CODE.string.strip()
        stat_name = raw_data[j].STAT_NAME.string.strip()
        grp_code = raw_data[j].GRP_CODE.string.strip()
        grp_name = raw_data[j].GRP_NAME.string.strip()
        item_code = raw_data[j].ITEM_CODE.string.strip()
        item_name = raw_data[j].ITEM_NAME.string.strip()
        cycle = raw_data[j].find("CYCLE").text
        start_time = raw_data[j].START_TIME.string.strip()
        end_time = raw_data[j].END_TIME.string.strip()
        data_cnt = raw_data[j].DATA_CNT.string.strip()

        total = [stat_code1,stat_name,grp_code,grp_name,item_code,item_name,cycle,start_time,end_time,data_cnt]
        data.append(total)

In [6]:

temp = pd.DataFrame(data, columns=['stat_code','stat_name','grp_code','grp_name','item_code','item_name','cycle','start_time','end_time','data_cnt'])
temp.to_csv('kob.detailTotal.csv', encoding='CP949')

In [7]:

df = temp.copy()
df1 = df[df['stat_code'].isin(['101Y001'])] # M2상품변 구성내역 말잔(계정조정)
df1

Out[7]:

	stat_code	stat_name	grp_code	grp_name	item_code	item_name	cycle	start_time	end_time	data_cnt
140	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS00	M2(말잔, 계절조정계열)	A	1970	2022	53
141	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS00	M2(말잔, 계절조정계열)	M	197001	202301	637
142	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS01	현금통화	A	2002	2022	21
143	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS01	현금통화	M	200112	202301	254
144	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS02	요구불예금	A	2002	2022	21
145	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS02	요구불예금	M	200112	202301	254
146	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS03	수시입출식저축성예금	A	2002	2022	21
147	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS03	수시입출식저축성예금	M	200112	202301	254
148	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS04	MMF	A	2002	2022	21
149	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS04	MMF	M	200112	202301	254
150	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS05	만기2년미만정기예적금	A	2002	2022	21
151	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS05	만기2년미만정기예적금	M	200112	202301	254
152	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS06	수익증권	A	2002	2022	21
153	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS06	수익증권	M	200112	202301	254
154	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS07	시장형상품 1)	A	2002	2022	21
155	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS07	시장형상품 1)	M	200112	202301	254
156	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS08	만기2년미만금융채	A	2002	2022	21
157	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS08	만기2년미만금융채	M	200112	202301	254
158	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS09	만기2년미만금전신탁	A	2002	2022	21
159	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS09	만기2년미만금전신탁	M	200112	202301	254
160	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS10	기타 2)	A	2002	2022	21
161	101Y001	1.1.3.1.3. M2 상품별 구성내역(말잔, 계절조정계열)	Group1	계정항목	BBGS10	기타 2)	M	200112	202301	254

In [8]:

main_df = pd.read_csv('bok_total_list.csv', encoding='CP949')
detail_df = pd.read_csv('kob.detailTotal.csv', encoding='CP949')

In [22]:

def EcosDownload(statname, statcode, freq, begdate, enddate, item_code, subcode1, subcode2, subcode3, col_name):
    url = "https://ecos.bok.or.kr/api/StatisticSearch/"+key+"/xml/kr/1/1000/%s/%s/%s/%s/%s/%s/%s/%s"%( statcode, freq, begdate, enddate, item_code, subcode1, subcode2, subcode3)

    print(url)

    raw=requests.get(url)
    xml = BeautifulSoup(raw.text, 'xml')
    raw_data = xml.find_all('row')
    data_list =[]
    value_list = []

    for item in raw_data:
        value = item.find('DATA_VALUE').text.encode('utf-8')
        data_str = item.find('TIME').text
        
        if 'Q1' in data_str:
            data_str = data_str.replace('Q1', '03')
        if 'Q2' in data_str:
            data_str = data_str.replace('Q2', '06')
        if 'Q3' in data_str:
            data_str = data_str.replace('Q3', '09')
        if 'Q4' in data_str:
            data_str = data_str.replace('Q4', '12')

        value = float(value)
        data_list.append(data_str)    
        value_list.append(value)

    df = pd.DataFrame(index=data_list)
    df['%s'%(col_name)] = value_list
    
    return df

불황과 호황 예측¶

로짓분석 머신러닝 사용
거지경제 데이터 이용
데이터 = 한국은행경졔통계 시스템 사용

불황과 호황을 예측하는데 이용되는 데이터¶

호황과 불황을 나타내는 이진목표변수 <- realGDP, 12분기 이동평균을 초과하면 호황(1)을 할당. 그렇지 않다면 불황(0) 할당
변수
- realGDP : 실질 국내 총생산(단위 : 전분기 대비 증가율)
- RealCons : 실질 민간소비(단위 : 전분기 대비 증가율)
- INV : 총투자(단위 : 전분기 대비 증가율)
- M2 : M2 통화량(단위 : 전분기 대비 증가율)
- UNEMP : 실업률(단위 : 현분기 실업율)
- EMPLOY : 취업자수(단위 : 전분기 대비 증가율)
- CD_3M : CD 3개월 유통 수익률(단위 : 현분기 수준)
- INFL : 소비자 물가(단위 : 전분기 대비 증가율)

In [ ]:

# 데이터 찾기...
detail_df[(detail_df['stat_code'].str.contains('200Y056') 
& detail_df['cycle'].str.contains('Q') )]

In [24]:

# begdate, enddate 고정 :2015Q1, 2022Q4
# 총투자 begdate가 2015Q1에서 시작
d_tmp = [['realGDP','2.1.1.2. 주요지표(분기지표)', '200Y002', '10111'], 
['realCons','2.1.1.2. 주요지표(분기지표)', '200Y002', '10122'], 
['inv','2.1.9.2. 총저축과 총투자(원계열, 명목, 분기 및 연간)', '200Y056', '13201'], 
['M2','1.1.3.1.2. M2 상품별 구성내역(평잔, 원계열)', '101Y004', 'BBHA01'], 
['unemp','9.1.5.2. 국제 주요국 실업률(계절변동조정)', '902Y021', 'KOR'], 
['employ','9.1.5.3. 국제 주요국 취업자수(계절변동조정)', '902Y022', 'KOR'], 
['CD_3M','1.3.2.2. 시장금리(월,분기,년)', '721Y001', '2010000'], 
['infl','9.1.2.2. 국제 주요국 소비자물가지수', '902Y008', 'KR']]

In [97]:

tmp_data = pd.DataFrame([])

for i in range(len(d_tmp)):
    tmp = (EcosDownload(str(d_tmp[i][1]),  str(d_tmp[i][2]), 'Q', '2015Q1', '2022Q4',  str(d_tmp[i][3]), '', '', '', d_tmp[i][0]))
    tmp_data = pd.concat([tmp_data,tmp],axis=1)

https://ecos.bok.or.kr/api/StatisticSearch/213DB5VCLRGGHS2759WS/xml/kr/1/1000/200Y002/Q/2015Q1/2022Q4/10111///
https://ecos.bok.or.kr/api/StatisticSearch/213DB5VCLRGGHS2759WS/xml/kr/1/1000/200Y002/Q/2015Q1/2022Q4/10122///
https://ecos.bok.or.kr/api/StatisticSearch/213DB5VCLRGGHS2759WS/xml/kr/1/1000/200Y056/Q/2015Q1/2022Q4/13201///
https://ecos.bok.or.kr/api/StatisticSearch/213DB5VCLRGGHS2759WS/xml/kr/1/1000/101Y004/Q/2015Q1/2022Q4/BBHA01///
https://ecos.bok.or.kr/api/StatisticSearch/213DB5VCLRGGHS2759WS/xml/kr/1/1000/902Y021/Q/2015Q1/2022Q4/KOR///
https://ecos.bok.or.kr/api/StatisticSearch/213DB5VCLRGGHS2759WS/xml/kr/1/1000/902Y022/Q/2015Q1/2022Q4/KOR///
https://ecos.bok.or.kr/api/StatisticSearch/213DB5VCLRGGHS2759WS/xml/kr/1/1000/721Y001/Q/2015Q1/2022Q4/2010000///
https://ecos.bok.or.kr/api/StatisticSearch/213DB5VCLRGGHS2759WS/xml/kr/1/1000/902Y008/Q/2015Q1/2022Q4/KR///

In [98]:

tmp_data

Out[98]:

	realGDP	realCons	inv	M2	unemp	employ	CD_3M	infl
201503	0.8	0.8	109470.6	66550.9	3.5	26101.9	2.06	109.54
201506	0.5	0.1	121348.3	68433.1	3.7	26091.2	1.77	109.77
201509	1.4	0.6	129471.2	71089.6	3.6	26216.0	1.63	110.09
201512	0.7	1.9	129311.5	74551.4	3.5	26304.2	1.61	109.92
201603	0.3	-0.1	110589.1	78599.8	3.7	26300.2	1.64	110.48
201606	1.2	0.7	132126.9	80127.1	3.6	26310.9	1.54	110.69
201609	0.4	0.6	140229.5	82388.8	3.9	26468.0	1.35	110.90
201612	0.6	0.3	141772.2	84867.8	3.6	26559.6	1.44	111.52
201703	1.0	0.6	130801.0	88896.9	3.7	26648.5	1.49	112.91
201706	0.7	1.1	148688.3	89896.1	3.7	26684.9	1.40	112.82
201709	1.4	1.0	156139.0	91721.0	3.7	26744.5	1.39	113.36
201712	-0.3	0.6	157083.2	95771.4	3.7	26821.9	1.50	113.12
201803	1.2	1.4	136324.4	98296.5	3.7	26824.9	1.65	114.12
201806	0.6	0.0	151671.1	98765.8	3.7	26791.7	1.65	114.50
201809	0.7	0.6	152209.2	100070.6	4.2	26767.0	1.65	115.11
201812	0.7	0.8	157482.7	102775.1	3.9	26908.1	1.76	115.14
201903	-0.2	0.3	133180.5	106158.2	3.9	26997.0	1.88	114.74
201906	1.1	0.4	156388.2	107112.9	4.0	27033.4	1.84	115.25
201909	0.5	0.6	157988.7	109150.3	3.7	27131.6	1.57	115.16
201912	1.3	1.0	158562.1	112246.2	3.6	27329.5	1.50	115.48
202003	-1.3	-6.6	137895.3	117818.3	3.6	27287.2	1.37	115.85
202006	-3.0	1.1	158472.2	122963.0	4.1	26634.0	0.97	115.26
202009	2.3	0.3	159464.7	127725.2	4.0	26823.7	0.70	115.99
202012	1.2	-1.1	162960.3	133370.1	4.2	26886.8	0.65	116.01
202103	1.7	1.2	143013.1	139405.4	4.3	26889.9	0.72	117.50
202106	0.8	3.3	166314.1	143130.9	3.8	27248.7	0.69	118.12
202109	0.2	0.0	173077.2	147435.6	3.2	27395.5	0.81	118.93
202112	1.3	1.5	182390.3	153439.9	3.3	27549.2	1.18	120.12
202203	0.6	-0.5	150713.4	159455.3	3.0	27907.1	1.47	121.97
202206	0.7	2.9	175880.6	162860.1	2.9	28123.0	1.80	124.51
202209	0.3	1.7	193076.4	165505.1	2.8	28167.7	2.73	125.92
202212	-0.4	-0.6	193354.0	164262.8	2.9	28153.5	3.91	126.43

In [100]:

data = tmp_data.copy()
data['index'] = list(map(int, data.index))
data

Out[100]:

	realGDP	realCons	inv	M2	unemp	employ	CD_3M	infl	index
201503	0.8	0.8	109470.6	66550.9	3.5	26101.9	2.06	109.54	201503
201506	0.5	0.1	121348.3	68433.1	3.7	26091.2	1.77	109.77	201506
201509	1.4	0.6	129471.2	71089.6	3.6	26216.0	1.63	110.09	201509
201512	0.7	1.9	129311.5	74551.4	3.5	26304.2	1.61	109.92	201512
201603	0.3	-0.1	110589.1	78599.8	3.7	26300.2	1.64	110.48	201603
201606	1.2	0.7	132126.9	80127.1	3.6	26310.9	1.54	110.69	201606
201609	0.4	0.6	140229.5	82388.8	3.9	26468.0	1.35	110.90	201609
201612	0.6	0.3	141772.2	84867.8	3.6	26559.6	1.44	111.52	201612
201703	1.0	0.6	130801.0	88896.9	3.7	26648.5	1.49	112.91	201703
201706	0.7	1.1	148688.3	89896.1	3.7	26684.9	1.40	112.82	201706
201709	1.4	1.0	156139.0	91721.0	3.7	26744.5	1.39	113.36	201709
201712	-0.3	0.6	157083.2	95771.4	3.7	26821.9	1.50	113.12	201712
201803	1.2	1.4	136324.4	98296.5	3.7	26824.9	1.65	114.12	201803
201806	0.6	0.0	151671.1	98765.8	3.7	26791.7	1.65	114.50	201806
201809	0.7	0.6	152209.2	100070.6	4.2	26767.0	1.65	115.11	201809
201812	0.7	0.8	157482.7	102775.1	3.9	26908.1	1.76	115.14	201812
201903	-0.2	0.3	133180.5	106158.2	3.9	26997.0	1.88	114.74	201903
201906	1.1	0.4	156388.2	107112.9	4.0	27033.4	1.84	115.25	201906
201909	0.5	0.6	157988.7	109150.3	3.7	27131.6	1.57	115.16	201909
201912	1.3	1.0	158562.1	112246.2	3.6	27329.5	1.50	115.48	201912
202003	-1.3	-6.6	137895.3	117818.3	3.6	27287.2	1.37	115.85	202003
202006	-3.0	1.1	158472.2	122963.0	4.1	26634.0	0.97	115.26	202006
202009	2.3	0.3	159464.7	127725.2	4.0	26823.7	0.70	115.99	202009
202012	1.2	-1.1	162960.3	133370.1	4.2	26886.8	0.65	116.01	202012
202103	1.7	1.2	143013.1	139405.4	4.3	26889.9	0.72	117.50	202103
202106	0.8	3.3	166314.1	143130.9	3.8	27248.7	0.69	118.12	202106
202109	0.2	0.0	173077.2	147435.6	3.2	27395.5	0.81	118.93	202109
202112	1.3	1.5	182390.3	153439.9	3.3	27549.2	1.18	120.12	202112
202203	0.6	-0.5	150713.4	159455.3	3.0	27907.1	1.47	121.97	202203
202206	0.7	2.9	175880.6	162860.1	2.9	28123.0	1.80	124.51	202206
202209	0.3	1.7	193076.4	165505.1	2.8	28167.7	2.73	125.92	202209
202212	-0.4	-0.6	193354.0	164262.8	2.9	28153.5	3.91	126.43	202212

In [101]:

data['QUARTER'] = ((data['index'] % 100)/3).astype(int) # % 나머지 
data['RollingMean']= data.realGDP.rolling(12).mean()
data['TARGET1'] = (data.realGDP > data.RollingMean).astype(int).shift(-1)
pct_cols = ['M2', 'infl']
data.loc[:, pct_cols] = data.loc[:, pct_cols].pct_change(1)
df = pd.get_dummies(data, columns=['QUARTER'], drop_first=True).dropna()

In [102]:

df.TARGET1.value_counts()

Out[102]:

1.0    10
0.0    10
Name: TARGET1, dtype: int64

In [105]:

df1 = df.copy()

In [108]:

x_data =df1[['realGDP', 'realCons', 'inv', 'M2', 'infl', 'unemp', 'employ', 'CD_3M']].to_numpy()

In [109]:

y_data = df1.TARGET1

In [110]:

def normalization(data):
    numerator = data - np.min(data, 0)
    denominator = np.max(data, 0) - np.min(data, 0)
    return numerator / denominator

In [111]:

x_data = normalization(x_data)

In [112]:

#convert into numpy and float format
X = np.asarray(x_data, dtype=np.float32)
y = np.asarray(y_data, dtype=np.float32)

In [113]:

k = x_data.shape[1]

In [115]:

import tensorflow as tf

In [116]:

learning_rate = tf.Variable(0.003)

W = tf.Variable(tf.random.normal(([k, 1])), name='weight')
b = tf.Variable(tf.random.normal(([1])), name='bias')

for i in range(10000+1):
    with tf.GradientTape() as tape:
        
        hypothesis  = tf.sigmoid(tf.matmul(X, W) + b)
        
        cost = -tf.reduce_mean(y * tf.math.log(hypothesis) + (1 - y) * tf.math.log(1 - hypothesis))

        W_grad, b_grad = tape.gradient(cost, [W, b])
        
        W.assign_sub(learning_rate * W_grad)
        b.assign_sub(learning_rate * b_grad)
        predicted = tf.cast(hypothesis > 0.5, dtype=tf.float32)    
        
    if i % 2000 == 0:
        print("{:5} | {:10.6f}".format(i, cost.numpy()))

    0 |   1.327101
 2000 |   0.726086
 4000 |   0.712805
 6000 |   0.705188
 8000 |   0.700846
10000 |   0.698361

In [117]:

y_Predicted = predicted.numpy().flatten()

In [119]:

y_Actual = y.flatten()

In [120]:

data = {'y_Actual': y_Actual,
        'y_Predicted': y_Predicted}

In [121]:

df = pd.DataFrame(data, columns = ['y_Actual', 'y_Predicted'])

In [122]:

cross = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames = ['Actual'], colnames=['Predicted'])
cross

Out[122]:

Predicted	0.0	1.0
Actual
0.0	6	4
1.0	3	7

In [123]:

confusion_matrix = np.zeros([2,2])

In [124]:

try : 
    confusion_matrix[1,1] = cross.loc[1,1]
    confusion_matrix[0,1] = cross.loc[0,1]
    confusion_matrix[1,0] = cross.loc[1,0]
    confusion_matrix[0,0] = cross.loc[0,0]

except Exception as e:
    print(e)

TP  = confusion_matrix[1,1]
FP  = confusion_matrix[0,1]
FN  = confusion_matrix[1,0]
TN  = confusion_matrix[0,0]

In [125]:

confusion_matrix

Out[125]:

array([[6., 4.],
       [3., 7.]])

In [126]:

TOT  = TP + FP + TN + FN

In [127]:

accuracy = (TP + TN)/TOT
accuracy

Out[127]:

0.65

In [127]:

저작자표시 (새창열림)

'기타 공부 > 금융' 카테고리의 다른 글

[Coursera] Understanding the Australian Economy : An introduction to macroeconomic and financial policies. (0)	2024.04.10
[논문 정리]Reassessment of the Weather Effect: Stock Prices and Wall Street Weather (1)	2024.02.10
카이제곱을 이용한 차이 검정 (성별,직업군별 소득차이 검정) (1)	2023.03.25
차이 검정 (30대 1인 가구주 성별에 따른소득차이 검정) (0)	2023.03.21
차이 검정 (가구주 직업별 소득 차이 검정) (0)	2023.03.21

'기타 공부/금융' Related Articles

Comments

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30