Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

Today

Total

관리 메뉴

EunGyeongKim

차이 검정 (30대 1인 가구주 성별에 따른소득차이 검정) 본문

기타 공부/금융

차이 검정 (30대 1인 가구주 성별에 따른소득차이 검정)

EunGyeongKim 2023. 3. 21. 15:26

In [4]:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams
from matplotlib import font_manager, rc
%matplotlib inline

In [5]:

path = '/usr/local/lib/python3.9/dist-packages/matplotlib/mpl-data/fonts/ttf/DejaVuSansMono.ttf'
font_name = font_manager.FontProperties(fname=path).get_name()
rc('font', family = font_name)

In [6]:

path = '/content/2020_가구마스터_20230320_21294.csv'
df = pd.read_csv(path, encoding='CP949')

In [7]:

columns = ['조사연도', '수도권여부', 'MD제공용_가구고유번호', '가구주_성별코드', '가구주_만연령', '가구원수', '가구주_교육정도_학력코드', '가구주_혼인상태코드', '순자산', '부채', '처분가능소득(보완)[경상소득(보완)-비소비지출(보완)]', '가구주_산업대분류코드', '가구주_직업대분류코드', '입주형태코드']
df = df[columns].copy()
df.rename(columns={'조사연도':'year','수도권여부':'metro','MD제공용_가구고유번호':'id','가구주_성별코드':'sex','가구주_만연령':'age','가구원수':'number','가구주_교육정도_학력코드':'education','가구주_혼인상태코드':'marriage','순자산':'asset','부채':'debt','처분가능소득(보완)[경상소득(보완)-비소비지출(보완)]':'income','가구주_산업대분류코드':'industry','가구주_직업대분류코드':'job','입주형태코드':'house'}, inplace=True)

In [8]:

# 30대 1인가구
df1 = df.loc[df['number'].isin([1]) & (df['age']>30) & (df['age']<40) ]
df2 = df1[['sex', 'number', 'age', 'income']]

In [11]:

import seaborn as sns
df21 = df2.loc[df2['sex'].isin([1])] # 남
df22 = df2.loc[df2['sex'].isin([2])] # 여

sns.kdeplot(df21['income'], shade=True, label='male', clip=(-1000,20000))
sns.kdeplot(df22['income'], shade=True, label='female', clip=(-1000,20000))
plt.xlabel('10,000 won')
plt.legend()
plt.grid(True)
plt.show()

<ipython-input-11-c150ecee8d93>:5: FutureWarning: 

`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.

  sns.kdeplot(df21['income'], shade=True, label='male', clip=(-1000,20000))
<ipython-input-11-c150ecee8d93>:6: FutureWarning: 

`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.

  sns.kdeplot(df22['income'], shade=True, label='female', clip=(-1000,20000))

유의성 검정¶

z-통계량 구하기

In [12]:

df3 = df2[['income']].groupby(df2['sex']).agg(['mean', 'std', 'count'])
df3

Out[12]:

	income
	mean	std	count
sex
1	3098.597156	1947.776724	211
2	2743.759398	1865.068730	133

In [13]:

mean = df3[('income', 'mean')]
mean_df = mean[1]-mean[2]
np.round(mean_df, 2)

Out[13]:

354.84

In [15]:

std = df3[('income', 'std')]
count = df3[('income', 'count')]
se1 = std[1]/np.sqrt(count[1])
se2 = std[1]/np.sqrt(count[1])
tot_se = np.sqrt(se1**2 + se2**2)
np.round(tot_se, 2)

Out[15]:

189.63

In [16]:

z = mean_df / tot_se
round(z, 2)

Out[16]:

1.87

In [17]:

import scipy as sp
import scipy.stats

rv = sp.stats.norm(loc=0, scale=1) # 평균 0, 표준편차1, 표준정규분포
np.round(1-rv.cdf(z), 2)

Out[17]:

0.03

p-value = 0.03. 따라서 5%의 유의수준에서 귀무가설(30대 남성 1인 가구 소득과 30대 여성 1인 가구와의 소득차이는 없다)는 기각됨.

저작자표시

'기타 공부 > 금융' 카테고리의 다른 글

불황과 호황 예측(로짓 알고리즘) (0)	2023.03.27
카이제곱을 이용한 차이 검정 (성별,직업군별 소득차이 검정) (1)	2023.03.25
차이 검정 (가구주 직업별 소득 차이 검정) (0)	2023.03.21
표본추출(가구평균소득에 대한 신뢰구간구하기) (0)	2023.03.20
리먼브라더스 사태과, 근 3개년 주식 비교(주가수익률, 주가등락, 6개월 보유시 최대예상손실액) (0)	2023.03.20

'기타 공부/금융' Related Articles

Comments

« 2024/07 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

EunGyeongKim

차이 검정 (30대 1인 가구주 성별에 따른소득차이 검정) 본문

차이 검정 (30대 1인 가구주 성별에 따른소득차이 검정)

유의성 검정¶

'기타 공부 > 금융' 카테고리의 다른 글

티스토리툴바