【量化金融】三因子模型策略实现 | 字数总计: 3.8k | 阅读时长: 20分钟 | 阅读量: |
策略简介 利用Fama三因子模型构建的A股周度百股策略。
环境与数据准备 1 2 3 4 5 6 7 8 9 import numpy as npfrom tqdm import tqdmimport pandas as pdimport osimport gcimport warningswarnings.filterwarnings('ignore' ) from quantools import backtest
1 2 3 4 5 6 7 8 9 10 11 stk_data = pd.read_csv("../data/stk_data.csv" ) stk_data['close_date' ] = pd.to_datetime(stk_data['close_date' ]) stk_data['open_date' ] = pd.to_datetime(stk_data['open_date' ]) open_days_data = pd.read_csv("../data/open_days_data.csv" ) open_days_data['date' ] = pd.to_datetime(open_days_data['date' ]) equity = pd.read_csv("../data/eqy_belongto_parcomsh.csv" ) equity['rpt_date' ] = pd.to_datetime(equity['rpt_date' ]) os.mkdir("../cal_data" )
1 2 3 4 5 6 7 8 9 10 11 print (stk_data.shape)stk_data.head()
TOTAL_SHARES CLOSE OPEN stock_code open_date close_date uadj_close 0 1.945822e+09 160.348451 153.344151 000001.SZ 2006-01-04 2006-01-06 6.41 1 1.945822e+09 155.345379 160.098298 000001.SZ 2006-01-09 2006-01-13 6.21 2 1.945822e+09 155.845687 154.594919 000001.SZ 2006-01-16 2006-01-20 6.23 3 1.945822e+09 158.847530 155.845687 000001.SZ 2006-01-23 2006-01-25 6.35 4 1.945822e+09 155.345379 158.847530 000001.SZ 2006-02-06 2006-02-10 6.21
1 2 3 4 5 6 print (equity.shape)equity.head()
stock_code EQY_BELONGTO_PARCOMSH rpt_date 0 000001.SZ 5.014966e+09 2005-09-30 1 000002.SZ 6.738774e+09 2005-09-30 2 000004.SZ 8.952654e+07 2005-09-30 3 000005.SZ 8.290555e+08 2005-09-30 4 000006.SZ 1.007023e+09 2005-09-30
1 2 3 4 5 6 7 8 9 10 11 print (open_days_data.shape)open_days_data.head()
stock_code HIGH OPEN LOW CLOSE VOLUME date 0 000001.SZ 158.347222 153.344151 153.093997 157.096455 15445068.0 2006-01-04 1 000002.SZ 206.631220 194.684662 194.684662 206.188755 38931043.0 2006-01-04 2 000004.SZ 13.191923 13.035620 12.941839 13.098141 401500.0 2006-01-04 3 000005.SZ 9.436105 9.155268 9.042934 9.379937 3713641.0 2006-01-04 4 000006.SZ 18.698245 18.698245 18.698245 18.698245 0.0 2006-01-04
数据计算 计算三因子 在这一步,考虑到公司财报的报告期各不相同,因此采用每批次财报的截止日期作为数据更新日期,也就是说计算账面市值比等因子时,计算因子的日期与财报日期的对应关系如下:
因子日期 报告期 5、6、7、8月 一季报(最晚04.30公布) 9、10月 半年报(最晚08.30公布) 11、12月 三季报(最晚10.30公布) 1、2、3、4月 去年三季报(最晚去年10.30公布)
其中,由于年报与一季报截止时间一致,而一季报比去年年报数据更新,因此我们不使用年报数据。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 stk_data['mkt_cap' ] = stk_data['TOTAL_SHARES' ] * stk_data['uadj_close' ] def match_rpt_date (date ): """ 将日期转化为对应的报告期; 基于:一季报最晚4/30公布,半年报8/30,三季报10/30,年报来年4/30(因此不用) """ y = date.year m = date.month if m in (5 , 6 , 7 , 8 ): return f"{y} 0331" elif m in (9 , 10 ): return f"{y} 0630" elif m in (11 , 12 ): return f"{y} 0930" elif m in (1 , 2 , 3 , 4 ): return f"{y-1 } 0930" stk_data['rpt_date' ] = pd.to_datetime(stk_data['close_date' ].apply(lambda x: match_rpt_date(x)))
1 all_data = pd.merge(stk_data, equity, on=['stock_code' , 'rpt_date' ], how='left' )
1 2 3 odd = {} for key in tqdm(['HIGH' , 'OPEN' , 'LOW' , 'CLOSE' , 'VOLUME' ]): odd[key] = pd.pivot(open_days_data, index='date' , columns='stock_code' , values=key)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 odd['pred_rtn' ] = (odd['OPEN' ].shift(-2 )-odd['OPEN' ].shift(-1 ))/odd['OPEN' ].shift(-1 ) pred_rtn_na = odd['pred_rtn' ].isna() vol0 = odd['VOLUME' ].shift(-1 )==0 volna = odd['VOLUME' ].shift(-1 ).isna() odd['pred_rtn' ][vol0 | volna & (~pred_rtn_na)] = 0 yz = odd['HIGH' ].shift(-1 )==odd['LOW' ].shift(-1 ) zt = ~(odd['CLOSE' ].shift(-1 ) <= odd['CLOSE' ]) odd['pred_rtn' ][yz & zt & (~pred_rtn_na)] = 0 pred_rtn = odd['pred_rtn' ].stack().reset_index().rename(columns={0 : 'pred_rtn' , 'date' : 'open_date' }) all_data = pd.merge(all_data, pred_rtn, on=['open_date' , 'stock_code' ], how='left' ) all_data = all_data[~all_data['pred_rtn' ].isna()] del oddgc.collect()
1 2 3 4 5 6 7 8 9 10 11 close = pd.pivot(all_data, index='close_date' , columns='stock_code' , values='CLOSE' ) fac_ret = (close-close.shift(1 ))/close.shift(1 ) fac_ret = fac_ret.stack().reset_index().rename(columns={0 : 'fac_ret' , 'date' : 'close_date' }) all_data = pd.merge(all_data, fac_ret, on=['close_date' , 'stock_code' ], how='left' ) all_data['fac_size' ] = np.log(all_data['mkt_cap' ]/1000000 ) all_data['fac_bm' ] = all_data['EQY_BELONGTO_PARCOMSH' ] / all_data['mkt_cap' ]
1 2 3 factors = all_data[['stock_code' , 'close_date' , 'pred_rtn' , 'fac_ret' , 'fac_size' , 'fac_bm' ]].reset_index(drop=True ) factors = factors[~factors['pred_rtn' ].isna()] factors.head()
stock_code close_date pred_rtn fac_ret fac_size fac_bm 0 000001.SZ 2006-01-06 -0.034375 NaN 9.431299 0.402075 1 000001.SZ 2006-01-13 0.008091 -0.031201 9.399601 0.415024 2 000001.SZ 2006-01-20 0.019262 0.003221 9.402816 0.413692 3 000001.SZ 2006-01-25 -0.022047 0.019262 9.421895 0.405874 4 000001.SZ 2006-02-10 0.004831 -0.022047 9.399601 0.415024
1 factors.to_csv("../cal_data/factors.csv" , index=False )
因子截尾处理 1 2 3 fac_name = 'fac_size' factors[factors['close_date' ]=='2019-10-18' ][fac_name].plot.kde(title="2019-10-18日 Size因子分布情况(截尾前)" )
1 2 3 factors = backtest.winsorize_factor(factors, 'fac_size' ) factors = backtest.winsorize_factor(factors, 'fac_ret' ) factors = backtest.winsorize_factor(factors, 'fac_bm' )
1 2 factors[factors['close_date' ]=='2019-10-18' ][fac_name].plot.kde(title="2019-10-18日 Size因子分布情况(截尾后)" )
对单个因子测试 通过Fama-MacBeth回归验证模型效果 1 2 3 4 5 res_list = [] for fac_name in ['fac_size' , 'fac_ret' , 'fac_bm' ]: res_list.append(backtest.fama_macbeth(factors, fac_name)) fama_macbeth_res = pd.DataFrame(res_list) fama_macbeth_res
fac_name t p pos_count neg_count 0 fac_size -4.576268 5.395101e-06 362 541 1 fac_ret -10.642792 5.330462e-25 290 612 2 fac_bm 4.019551 6.317205e-05 464 439
针对这一分析结果,三个因子t检验显著区别于0,是比较有效的因子;而其中账面市值比显著为正,其他两个显著为负数,也符合日常学术研究中对其的认知。其中,账面市值比因子回归后斜率分别为正负的数量基本相同,区分效应较差,因此从这一维度来说,他的效果并不是很好。
单因子分组收益情况 1 group_rtns, group_cum_rtns = backtest.group_return_analysis(factors, 'fac_size' )
1 group_rtns, group_cum_rtns = backtest.group_return_analysis(factors, 'fac_ret' )
1 group_rtns, group_cum_rtns = backtest.group_return_analysis(factors, 'fac_bm' )
回测后看出,三个因子都有一定的分组效果,其中账面市值比与市值因子分组效果最好,收益率因子分组效果相对差一些。
单因子周度百股策略回测 1 2 rtn, evaluate_result = backtest.backtest_1week_nstock(factors, 'fac_ret' , True ) evaluate_result
sharpe_ratio max_drawdown max_drawdown_start max_drawdown_end sortino_ratio annual_return annual_volatility section 0 1.405346 0.789046 2015-06-05 2018-10-12 2.030548 1.178091 0.763147 Sum 1 6.653799 0.108753 2006-06-30 2006-07-28 9.882821 67.674182 0.674358 2006 2 7.957098 0.217392 2007-05-18 2007-06-22 11.966749 1336.033077 0.975551 2007 3 -2.535012 0.699955 2008-01-11 2008-10-24 -4.945847 -0.975181 1.181429 2008 4 6.570332 0.143081 2009-02-06 2009-02-20 8.999942 121.088407 0.783940 2009 5 3.688373 0.198535 2010-04-02 2010-06-25 6.296923 9.382755 0.702845 2010 6 -3.329041 0.426803 2011-07-08 2011-12-30 -4.826786 -0.906297 0.645434 2011 7 1.483952 0.243869 2012-03-02 2012-11-23 2.545457 1.049363 0.604799 2012 8 4.117017 0.134629 2013-05-24 2013-06-21 7.282607 7.479792 0.558735 2013 9 4.479553 0.096290 2014-11-21 2014-12-26 9.303073 8.344907 0.531904 2014 10 0.683548 0.575212 2015-06-05 2015-09-11 0.882963 -0.014136 1.347235 2015 11 0.957265 0.155600 2016-04-08 2016-05-06 1.431324 0.558164 0.753643 2016 12 -3.226729 0.330036 2017-03-17 2017-12-22 -4.293738 -0.829643 0.506809 2017 13 -2.996387 0.425417 2018-03-23 2018-10-12 -4.204086 -0.896640 0.677827 2018 14 1.305734 0.287685 2019-03-29 2019-08-02 2.413080 0.887732 0.639300 2019 15 -0.294687 0.168351 2020-02-14 2020-05-15 -0.414377 -0.325593 0.640734 2020 16 2.825869 0.139483 2021-02-10 2021-04-23 5.050823 2.591822 0.496437 2021 17 -1.196824 0.294684 2022-02-11 2022-04-22 -2.205050 -0.610259 0.625781 2022 18 -1.182782 0.165736 2023-03-31 2023-07-21 -2.770508 -0.440561 0.418921 2023 6 -3.329041 0.426803 2011-07-08 2011-12-30 -4.826786 -0.906297 0.645434 2011 7 1.483952 0.243869 2012-03-02 2012-11-23 2.545457 1.049363 0.604799 2012 8 4.117017 0.134629 2013-05-24 2013-06-21 7.282607 7.479792 0.558735 2013 9 4.479553 0.096290 2014-11-21 2014-12-26 9.303073 8.344907 0.531904 2014 10 0.683548 0.575212 2015-06-05 2015-09-11 0.882963 -0.014136 1.347235 2015 11 0.957265 0.155600 2016-04-08 2016-05-06 1.431324 0.558164 0.753643 2016 12 -3.226729 0.330036 2017-03-17 2017-12-22 -4.293738 -0.829643 0.506809 2017 13 -2.996387 0.425417 2018-03-23 2018-10-12 -4.204086 -0.896640 0.677827 2018 14 1.305734 0.287685 2019-03-29 2019-08-02 2.413080 0.887732 0.639300 2019 15 -0.294687 0.168351 2020-02-14 2020-05-15 -0.414377 -0.325593 0.640734 2020 16 2.825869 0.139483 2021-02-10 2021-04-23 5.050823 2.591822 0.496437 2021 17 -1.196824 0.294684 2022-02-11 2022-04-22 -2.205050 -0.610259 0.625781 2022 18 -1.182782 0.165736 2023-03-31 2023-07-21 -2.770508 -0.440561 0.418921 2023
1 2 rtn, evaluate_result = backtest.backtest_1week_nstock(factors, 'fac_size' , True ) evaluate_result
sharpe_ratio max_drawdown max_drawdown_start max_drawdown_end sortino_ratio annual_return annual_volatility section 0 3.122859 0.610775 2008-01-11 2008-10-24 4.453695 5.248983 0.658517 Sum 1 3.780915 0.082394 2006-09-29 2006-11-10 6.824324 7.448274 0.615477 2006 2 7.619150 0.277240 2007-05-18 2007-06-22 7.588304 550.342103 0.891182 2007 3 -1.678621 0.610775 2008-01-11 2008-10-24 -2.910806 -0.884322 0.992800 2008 4 7.547957 0.115055 2009-02-06 2009-02-20 10.933357 167.053901 0.719293 2009 5 3.022128 0.245051 2010-04-02 2010-06-25 4.223931 4.525651 0.633368 2010 6 -1.694132 0.302932 2011-04-15 2011-12-30 -2.558479 -0.681412 0.576306 2011 7 2.662808 0.146320 2012-03-02 2012-11-23 4.112191 3.122259 0.600360 2012 8 5.374411 0.131401 2013-05-24 2013-06-21 8.249390 9.844832 0.465409 2013 9 7.121590 0.112333 2014-11-21 2014-12-26 13.799625 13.415354 0.386941 2014 10 5.110443 0.288560 2015-06-05 2015-08-28 6.782858 81.649740 0.961526 2015 11 4.188296 0.088634 2016-04-08 2016-05-06 5.508037 11.105522 0.647160 2016 12 -1.330087 0.232662 2017-03-10 2017-07-14 -2.252187 -0.552614 0.507959 2017 13 -0.081387 0.247255 2018-05-18 2018-09-28 -0.119885 -0.276922 0.731677 2018 14 3.747108 0.148059 2019-04-12 2019-05-31 5.904712 5.290640 0.529220 2019 15 1.042077 0.141963 2020-08-28 2020-12-31 1.448336 0.533958 0.560545 2020 16 6.727338 0.085806 2021-01-15 2021-01-29 16.085113 18.249285 0.457282 2021 17 3.816535 0.131406 2022-02-25 2022-04-22 8.065974 5.825747 0.542672 2022 18 4.185553 0.107099 2023-03-03 2023-04-14 8.362568 4.334733 0.421798 2023
1 2 rtn, evaluate_result = backtest.backtest_1week_nstock(factors, 'fac_bm' ) evaluate_result
sharpe_ratio max_drawdown max_drawdown_start max_drawdown_end sortino_ratio annual_return annual_volatility section 0 2.006960 0.638422 2008-01-11 2008-10-24 2.918252 1.847405 0.617145 Sum 1 5.618410 0.105872 2006-06-30 2006-08-11 9.420005 21.852138 0.590378 2006 2 7.354710 0.281662 2007-05-18 2007-06-22 8.206541 980.089048 1.019448 2007 3 -2.479428 0.638422 2008-01-11 2008-10-24 -4.484640 -0.960315 1.067600 2008 4 6.410670 0.142736 2009-02-06 2009-02-20 10.016987 131.546624 0.820762 2009 5 0.156546 0.274959 2010-04-02 2010-06-25 0.242633 -0.062901 0.555388 2010 6 -2.983162 0.329127 2011-04-15 2011-12-30 -5.263885 -0.730526 0.410657 2011 7 2.215259 0.162853 2012-02-24 2012-09-14 4.250416 1.304317 0.415575 2012
![0_三因子模型策略实现_33_1](/0_三因子模型策略实现_33_1.png)多因子组合 简单分组打分法 1 2 rtn, evaluate_result = backtest.mutifactor_score(factors, ['-fac_ret' , '-fac_size' , 'fac_bm' ], group_num=10 ) evaluate_result
sharpe_ratio max_drawdown max_drawdown_start max_drawdown_end sortino_ratio annual_return annual_volatility section 0 2.762777 0.606990 2008-02-29 2008-10-24 3.828052 4.585105 0.718640 Sum 1 5.532422 0.102080 2006-06-30 2006-08-11 10.268628 26.228477 0.636676 2006 2 8.101543 0.287660 2007-05-18 2007-06-22 8.857011 2223.268814 1.031276 2007 3 -0.937202 0.606990 2008-02-29 2008-10-24 -1.643112 -0.829617 1.169914 2008 4 7.927451 0.138606 2009-02-06 2009-02-20 11.627565 466.483920 0.826747 2009 5 4.065468 0.241853 2010-04-02 2010-06-25 5.996514 11.165462 0.672587 2010 6 -2.182011 0.333801 2011-03-18 2011-12-30 -3.358018 -0.776748 0.602593 2011 7 2.119900 0.196653 2012-03-02 2012-11-23 3.458198 1.997337 0.603504 2012 8 4.936706 0.168503 2013-05-24 2013-06-21 7.691246 9.718052 0.508523 2013 9 5.649539 0.083951 2014-11-21 2014-12-26 11.179287 9.436769 0.433277 2014 10 2.837548 0.410130 2015-06-05 2015-07-03 3.294351 12.990229 1.190048 2015 11 3.568117 0.091504 2016-04-08 2016-05-06 4.329537 7.026963 0.643478 2016 12 -1.778931 0.230229 2017-02-17 2017-12-15 -2.780930 -0.592425 0.447876 2017 13 -0.659488 0.275744 2018-01-19 2018-10-12 -0.949217 -0.497481 0.686670 2018 14 2.215944 0.231712 2019-04-12 2019-08-02 4.274694 2.091116 0.585855 2019 15 1.242012 0.138234 2020-01-03 2020-01-17 1.719817 0.735576 0.577622 2020 16 5.216823 0.104639 2021-09-03 2021-10-22 10.806690 7.976897 0.440638 2021 17 1.588294 0.164306 2022-02-25 2022-04-22 3.120165 1.013708 0.526335 2022 18 3.335436 0.104342 2023-02-24 2023-05-19 6.708995 2.018183 0.349659 2023
相比于单个”市值“因子,因子组合后效果变差了。
多元回归选股法 1 rtn, evaluate_result = backtest.mutifactor_regression(factors, ['fac_ret' , 'fac_size' , 'fac_bm' ], stock_num=100 , plot=True )
sharpe_ratio max_drawdown max_drawdown_start max_drawdown_end sortino_ratio annual_return annual_volatility section 0 2.381925 0.653284 2008-01-11 2008-10-24 3.633037 2.883003 0.663260 Sum 1 7.397497 0.083543 2006-07-21 2006-08-11 14.551758 80.247146 0.625025 2006 2 8.605610 0.176219 2007-05-18 2007-06-22 12.445344 1427.010829 0.901967 2007 3 -2.899683 0.653284 2008-01-11 2008-10-24 -5.515209 -0.960043 0.949763 2008 4 5.760640 0.194681 2009-07-24 2009-09-25 10.102339 88.968733 0.848765 2009 5 3.671218 0.230784 2010-04-02 2010-06-25 5.466090 8.329292 0.671275 2010 6 -2.037976 0.284170 2011-04-08 2011-12-30 -3.026109 -0.713988 0.541419 2011 7 1.235993 0.245804 2012-03-02 2012-11-23 2.141001 0.664276 0.520239 2012 8 3.641103 0.138178 2013-05-24 2013-06-21 6.342497 4.966395 0.530263 2013 9 7.529679 0.045554 2014-03-14 2014-03-21 17.314748 37.479723 0.504501 2014 10 2.234365 0.500596 2015-06-05 2015-09-11 3.162736 4.403217 0.969233 2015 11 1.734332 0.101322 2016-04-08 2016-05-06 2.097476 1.306984 0.579563 2016 12 -0.528623 0.242558 2017-03-10 2017-08-04 -0.667913 -0.286437 0.448402 2017 13 -0.812079 0.301313 2018-03-23 2018-10-19 -1.385928 -0.489442 0.606122 2018 14 2.260950 0.221067 2019-04-12 2019-08-02 4.214248 1.881747 0.529546 2019 15 2.290055 0.152632 2020-07-03 2020-12-04 3.748461 2.656131 0.660382 2020 16 0.660965 0.179802 2021-01-15 2021-07-30 0.985279 0.216939 0.447280 2021 17 1.549873 0.132950 2022-06-24 2022-10-21 2.930357 0.977742 0.528017 2022 18 -0.475897 0.101506 2023-01-13 2023-05-19 -0.929460 -0.216187 0.371352 2023
对比前面的诸多策略,该策略的收益率并不算高(尤其和市值因子相比),这也是因为我们回归后的系数滞后了两周才进行预测的结果。但是,整体来看该策略的效果是比Ret,B/M因子的效果好的,而且相比于Size因子,该策略可以很好的消除市场风格的影响。
在2017年的大盘股行情中,该策略的最大回撤只有25%,比单纯的Size因子好很多。