策略简介

利用Fama三因子模型构建的A股周度百股策略。

clsl

环境与数据准备

1
2
3
4
5
6
7
8
9
import numpy as np
from tqdm import tqdm
import pandas as pd
import os
import gc
import warnings
warnings.filterwarnings('ignore')

from quantools import backtest
1
2
3
4
5
6
7
8
9
10
11
stk_data = pd.read_csv("../data/stk_data.csv")
stk_data['close_date'] = pd.to_datetime(stk_data['close_date'])
stk_data['open_date'] = pd.to_datetime(stk_data['open_date'])

open_days_data = pd.read_csv("../data/open_days_data.csv")
open_days_data['date'] = pd.to_datetime(open_days_data['date'])

equity = pd.read_csv("../data/eqy_belongto_parcomsh.csv")
equity['rpt_date'] = pd.to_datetime(equity['rpt_date'])

os.mkdir("../cal_data") # 存储计算结果的路径
1
2
3
4
5
6
7
8
9
10
11
# 沪深两市股票20060101-20230928周度股票数据 
# stock_code:股票代码
# open_date:开盘时间
# close_date:收盘时间
# open:后复权开盘价
# close:后复权收盘价
# uadj_close:未复权收盘价
# total_shares:总股本数

print(stk_data.shape)
stk_data.head()
TOTAL_SHARESCLOSEOPENstock_codeopen_dateclose_dateuadj_close
01.945822e+09160.348451153.344151000001.SZ2006-01-042006-01-066.41
11.945822e+09155.345379160.098298000001.SZ2006-01-092006-01-136.21
21.945822e+09155.845687154.594919000001.SZ2006-01-162006-01-206.23
31.945822e+09158.847530155.845687000001.SZ2006-01-232006-01-256.35
41.945822e+09155.345379158.847530000001.SZ2006-02-062006-02-106.21
1
2
3
4
5
6
# 沪深两市上市公司20050930-20230630报告期内归属母公司的股东权益数据 
# stock_code:股票代码
# rpt_date:报告期日期
# eqy_belongto_parcomsh:归属母公司的股东权益
print(equity.shape)
equity.head()
stock_codeEQY_BELONGTO_PARCOMSHrpt_date
0000001.SZ5.014966e+092005-09-30
1000002.SZ6.738774e+092005-09-30
2000004.SZ8.952654e+072005-09-30
3000005.SZ8.290555e+082005-09-30
4000006.SZ1.007023e+092005-09-30
1
2
3
4
5
6
7
8
9
10
11
# 沪深两市股票20060101-20230928,每周开盘日的高开低收量
# stock_code:股票代码
# date:交易日期
# high:最高价
# open:开盘价
# low:最低价
# close:收盘价
# volume:交易量

print(open_days_data.shape)
open_days_data.head()
stock_codeHIGHOPENLOWCLOSEVOLUMEdate
0000001.SZ158.347222153.344151153.093997157.09645515445068.02006-01-04
1000002.SZ206.631220194.684662194.684662206.18875538931043.02006-01-04
2000004.SZ13.19192313.03562012.94183913.098141401500.02006-01-04
3000005.SZ9.4361059.1552689.0429349.3799373713641.02006-01-04
4000006.SZ18.69824518.69824518.69824518.6982450.02006-01-04

数据计算

计算三因子

在这一步,考虑到公司财报的报告期各不相同,因此采用每批次财报的截止日期作为数据更新日期,也就是说计算账面市值比等因子时,计算因子的日期与财报日期的对应关系如下:

因子日期报告期
5、6、7、8月一季报(最晚04.30公布)
9、10月半年报(最晚08.30公布)
11、12月三季报(最晚10.30公布)
1、2、3、4月去年三季报(最晚去年10.30公布)

其中,由于年报与一季报截止时间一致,而一季报比去年年报数据更新,因此我们不使用年报数据。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 计算市值
stk_data['mkt_cap'] = stk_data['TOTAL_SHARES'] * stk_data['uadj_close']

# 计算每个交易周对应的报告期(用于匹配所有者权益)
def match_rpt_date(date):
"""
将日期转化为对应的报告期;
基于:一季报最晚4/30公布,半年报8/30,三季报10/30,年报来年4/30(因此不用)
"""
y = date.year
m = date.month
if m in (5, 6, 7, 8): return f"{y}0331"
elif m in (9, 10): return f"{y}0630"
elif m in (11, 12): return f"{y}0930"
elif m in (1, 2, 3, 4): return f"{y-1}0930"

stk_data['rpt_date'] = pd.to_datetime(stk_data['close_date'].apply(lambda x: match_rpt_date(x)))
1
all_data = pd.merge(stk_data, equity, on=['stock_code', 'rpt_date'], how='left')
1
2
3
odd = {}
for key in tqdm(['HIGH', 'OPEN', 'LOW', 'CLOSE', 'VOLUME']):
odd[key] = pd.pivot(open_days_data, index='date', columns='stock_code', values=key)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
odd['pred_rtn'] = (odd['OPEN'].shift(-2)-odd['OPEN'].shift(-1))/odd['OPEN'].shift(-1)

pred_rtn_na = odd['pred_rtn'].isna() # 不要把空值变成0

# 下周停牌的股票只能获得0的收益
vol0 = odd['VOLUME'].shift(-1)==0
volna = odd['VOLUME'].shift(-1).isna()
odd['pred_rtn'][vol0 | volna & (~pred_rtn_na)] = 0

# 下周一字涨停的股票无法买入,只能获得0的收益
yz = odd['HIGH'].shift(-1)==odd['LOW'].shift(-1) # “一字”,价格没有变化
zt = ~(odd['CLOSE'].shift(-1) <= odd['CLOSE']) # “涨停”,价格不比上周高
odd['pred_rtn'][yz & zt & (~pred_rtn_na)] = 0

pred_rtn = odd['pred_rtn'].stack().reset_index().rename(columns={0: 'pred_rtn', 'date': 'open_date'})

all_data = pd.merge(all_data, pred_rtn, on=['open_date', 'stock_code'], how='left')
all_data = all_data[~all_data['pred_rtn'].isna()]

del odd
gc.collect() # 释放内存
1
2
3
4
5
6
7
8
9
10
11
# 计算周收益率因子
close = pd.pivot(all_data, index='close_date', columns='stock_code', values='CLOSE')
fac_ret = (close-close.shift(1))/close.shift(1)
fac_ret = fac_ret.stack().reset_index().rename(columns={0: 'fac_ret', 'date': 'close_date'})
all_data = pd.merge(all_data, fac_ret, on=['close_date', 'stock_code'], how='left')

# 计算规模因子
all_data['fac_size'] = np.log(all_data['mkt_cap']/1000000)

# 账面市值比因子
all_data['fac_bm'] = all_data['EQY_BELONGTO_PARCOMSH'] / all_data['mkt_cap']
1
2
3
factors = all_data[['stock_code', 'close_date', 'pred_rtn', 'fac_ret', 'fac_size', 'fac_bm']].reset_index(drop=True)
factors = factors[~factors['pred_rtn'].isna()]
factors.head()
stock_codeclose_datepred_rtnfac_retfac_sizefac_bm
0000001.SZ2006-01-06-0.034375NaN9.4312990.402075
1000001.SZ2006-01-130.008091-0.0312019.3996010.415024
2000001.SZ2006-01-200.0192620.0032219.4028160.413692
3000001.SZ2006-01-25-0.0220470.0192629.4218950.405874
4000001.SZ2006-02-100.004831-0.0220479.3996010.415024
1
factors.to_csv("../cal_data/factors.csv", index=False)

因子截尾处理

1
2
3
# 截尾前(没有偏好,随便选的)
fac_name = 'fac_size'
factors[factors['close_date']=='2019-10-18'][fac_name].plot.kde(title="2019-10-18日 Size因子分布情况(截尾前)")

0_三因子模型策略实现_18_1

1
2
3
factors = backtest.winsorize_factor(factors, 'fac_size')
factors = backtest.winsorize_factor(factors, 'fac_ret')
factors = backtest.winsorize_factor(factors, 'fac_bm')
1
2
# 截尾后
factors[factors['close_date']=='2019-10-18'][fac_name].plot.kde(title="2019-10-18日 Size因子分布情况(截尾后)")

0_三因子模型策略实现_20_1

对单个因子测试

通过Fama-MacBeth回归验证模型效果

1
2
3
4
5
res_list = []
for fac_name in ['fac_size', 'fac_ret', 'fac_bm']:
res_list.append(backtest.fama_macbeth(factors, fac_name))
fama_macbeth_res = pd.DataFrame(res_list)
fama_macbeth_res
fac_nametppos_countneg_count
0fac_size-4.5762685.395101e-06362541
1fac_ret-10.6427925.330462e-25290612
2fac_bm4.0195516.317205e-05464439
针对这一分析结果,三个因子t检验显著区别于0,是比较有效的因子;而其中账面市值比显著为正,其他两个显著为负数,也符合日常学术研究中对其的认知。

其中,账面市值比因子回归后斜率分别为正负的数量基本相同,区分效应较差,因此从这一维度来说,他的效果并不是很好。

单因子分组收益情况

1
group_rtns, group_cum_rtns = backtest.group_return_analysis(factors, 'fac_size')

0_三因子模型策略实现_26_1

1
group_rtns, group_cum_rtns = backtest.group_return_analysis(factors, 'fac_ret')

0_三因子模型策略实现_27_1

1
group_rtns, group_cum_rtns = backtest.group_return_analysis(factors, 'fac_bm')

0_三因子模型策略实现_28_1

回测后看出,三个因子都有一定的分组效果,其中账面市值比与市值因子分组效果最好,收益率因子分组效果相对差一些。

单因子周度百股策略回测

1
2
rtn, evaluate_result = backtest.backtest_1week_nstock(factors, 'fac_ret', True)
evaluate_result
sharpe_ratiomax_drawdownmax_drawdown_startmax_drawdown_endsortino_ratioannual_returnannual_volatilitysection
01.4053460.7890462015-06-052018-10-122.0305481.1780910.763147Sum
16.6537990.1087532006-06-302006-07-289.88282167.6741820.6743582006
27.9570980.2173922007-05-182007-06-2211.9667491336.0330770.9755512007
3-2.5350120.6999552008-01-112008-10-24-4.945847-0.9751811.1814292008
46.5703320.1430812009-02-062009-02-208.999942121.0884070.7839402009
53.6883730.1985352010-04-022010-06-256.2969239.3827550.7028452010
6-3.3290410.4268032011-07-082011-12-30-4.826786-0.9062970.6454342011
71.4839520.2438692012-03-022012-11-232.5454571.0493630.6047992012
84.1170170.1346292013-05-242013-06-217.2826077.4797920.5587352013
94.4795530.0962902014-11-212014-12-269.3030738.3449070.5319042014
100.6835480.5752122015-06-052015-09-110.882963-0.0141361.3472352015
110.9572650.1556002016-04-082016-05-061.4313240.5581640.7536432016
12-3.2267290.3300362017-03-172017-12-22-4.293738-0.8296430.5068092017
13-2.9963870.4254172018-03-232018-10-12-4.204086-0.8966400.6778272018
141.3057340.2876852019-03-292019-08-022.4130800.8877320.6393002019
15-0.2946870.1683512020-02-142020-05-15-0.414377-0.3255930.6407342020
162.8258690.1394832021-02-102021-04-235.0508232.5918220.4964372021
17-1.1968240.2946842022-02-112022-04-22-2.205050-0.6102590.6257812022
18-1.1827820.1657362023-03-312023-07-21-2.770508-0.4405610.4189212023
6-3.3290410.4268032011-07-082011-12-30-4.826786-0.9062970.6454342011
71.4839520.2438692012-03-022012-11-232.5454571.0493630.6047992012
84.1170170.1346292013-05-242013-06-217.2826077.4797920.5587352013
94.4795530.0962902014-11-212014-12-269.3030738.3449070.5319042014
100.6835480.5752122015-06-052015-09-110.882963-0.0141361.3472352015
110.9572650.1556002016-04-082016-05-061.4313240.5581640.7536432016
12-3.2267290.3300362017-03-172017-12-22-4.293738-0.8296430.5068092017
13-2.9963870.4254172018-03-232018-10-12-4.204086-0.8966400.6778272018
141.3057340.2876852019-03-292019-08-022.4130800.8877320.6393002019
15-0.2946870.1683512020-02-142020-05-15-0.414377-0.3255930.6407342020
162.8258690.1394832021-02-102021-04-235.0508232.5918220.4964372021
17-1.1968240.2946842022-02-112022-04-22-2.205050-0.6102590.6257812022
18-1.1827820.1657362023-03-312023-07-21-2.770508-0.4405610.4189212023

0_三因子模型策略实现_31_1

1
2
rtn, evaluate_result = backtest.backtest_1week_nstock(factors, 'fac_size', True)
evaluate_result
sharpe_ratiomax_drawdownmax_drawdown_startmax_drawdown_endsortino_ratioannual_returnannual_volatilitysection
03.1228590.6107752008-01-112008-10-244.4536955.2489830.658517Sum
13.7809150.0823942006-09-292006-11-106.8243247.4482740.6154772006
27.6191500.2772402007-05-182007-06-227.588304550.3421030.8911822007
3-1.6786210.6107752008-01-112008-10-24-2.910806-0.8843220.9928002008
47.5479570.1150552009-02-062009-02-2010.933357167.0539010.7192932009
53.0221280.2450512010-04-022010-06-254.2239314.5256510.6333682010
6-1.6941320.3029322011-04-152011-12-30-2.558479-0.6814120.5763062011
72.6628080.1463202012-03-022012-11-234.1121913.1222590.6003602012
85.3744110.1314012013-05-242013-06-218.2493909.8448320.4654092013
97.1215900.1123332014-11-212014-12-2613.79962513.4153540.3869412014
105.1104430.2885602015-06-052015-08-286.78285881.6497400.9615262015
114.1882960.0886342016-04-082016-05-065.50803711.1055220.6471602016
12-1.3300870.2326622017-03-102017-07-14-2.252187-0.5526140.5079592017
13-0.0813870.2472552018-05-182018-09-28-0.119885-0.2769220.7316772018
143.7471080.1480592019-04-122019-05-315.9047125.2906400.5292202019
151.0420770.1419632020-08-282020-12-311.4483360.5339580.5605452020
166.7273380.0858062021-01-152021-01-2916.08511318.2492850.4572822021
173.8165350.1314062022-02-252022-04-228.0659745.8257470.5426722022
184.1855530.1070992023-03-032023-04-148.3625684.3347330.4217982023

0_三因子模型策略实现_32_1

1
2
rtn, evaluate_result = backtest.backtest_1week_nstock(factors, 'fac_bm')
evaluate_result
sharpe_ratiomax_drawdownmax_drawdown_startmax_drawdown_endsortino_ratioannual_returnannual_volatilitysection
02.0069600.6384222008-01-112008-10-242.9182521.8474050.617145Sum
15.6184100.1058722006-06-302006-08-119.42000521.8521380.5903782006
27.3547100.2816622007-05-182007-06-228.206541980.0890481.0194482007
3-2.4794280.6384222008-01-112008-10-24-4.484640-0.9603151.0676002008
46.4106700.1427362009-02-062009-02-2010.016987131.5466240.8207622009
50.1565460.2749592010-04-022010-06-250.242633-0.0629010.5553882010
6-2.9831620.3291272011-04-152011-12-30-5.263885-0.7305260.4106572011
72.2152590.1628532012-02-242012-09-144.2504161.3043170.4155752012
![0_三因子模型策略实现_33_1](/0_三因子模型策略实现_33_1.png)

多因子组合

简单分组打分法

1
2
rtn, evaluate_result = backtest.mutifactor_score(factors, ['-fac_ret', '-fac_size', 'fac_bm'], group_num=10)
evaluate_result
sharpe_ratiomax_drawdownmax_drawdown_startmax_drawdown_endsortino_ratioannual_returnannual_volatilitysection
02.7627770.6069902008-02-292008-10-243.8280524.5851050.718640Sum
15.5324220.1020802006-06-302006-08-1110.26862826.2284770.6366762006
28.1015430.2876602007-05-182007-06-228.8570112223.2688141.0312762007
3-0.9372020.6069902008-02-292008-10-24-1.643112-0.8296171.1699142008
47.9274510.1386062009-02-062009-02-2011.627565466.4839200.8267472009
54.0654680.2418532010-04-022010-06-255.99651411.1654620.6725872010
6-2.1820110.3338012011-03-182011-12-30-3.358018-0.7767480.6025932011
72.1199000.1966532012-03-022012-11-233.4581981.9973370.6035042012
84.9367060.1685032013-05-242013-06-217.6912469.7180520.5085232013
95.6495390.0839512014-11-212014-12-2611.1792879.4367690.4332772014
102.8375480.4101302015-06-052015-07-033.29435112.9902291.1900482015
113.5681170.0915042016-04-082016-05-064.3295377.0269630.6434782016
12-1.7789310.2302292017-02-172017-12-15-2.780930-0.5924250.4478762017
13-0.6594880.2757442018-01-192018-10-12-0.949217-0.4974810.6866702018
142.2159440.2317122019-04-122019-08-024.2746942.0911160.5858552019
151.2420120.1382342020-01-032020-01-171.7198170.7355760.5776222020
165.2168230.1046392021-09-032021-10-2210.8066907.9768970.4406382021
171.5882940.1643062022-02-252022-04-223.1201651.0137080.5263352022
183.3354360.1043422023-02-242023-05-196.7089952.0181830.3496592023

0_三因子模型策略实现_36_1

相比于单个”市值“因子,因子组合后效果变差了。

多元回归选股法

1
rtn, evaluate_result = backtest.mutifactor_regression(factors, ['fac_ret', 'fac_size', 'fac_bm'], stock_num=100, plot=True)

0_三因子模型策略实现_39_1

1
evaluate_result
sharpe_ratiomax_drawdownmax_drawdown_startmax_drawdown_endsortino_ratioannual_returnannual_volatilitysection
02.3819250.6532842008-01-112008-10-243.6330372.8830030.663260Sum
17.3974970.0835432006-07-212006-08-1114.55175880.2471460.6250252006
28.6056100.1762192007-05-182007-06-2212.4453441427.0108290.9019672007
3-2.8996830.6532842008-01-112008-10-24-5.515209-0.9600430.9497632008
45.7606400.1946812009-07-242009-09-2510.10233988.9687330.8487652009
53.6712180.2307842010-04-022010-06-255.4660908.3292920.6712752010
6-2.0379760.2841702011-04-082011-12-30-3.026109-0.7139880.5414192011
71.2359930.2458042012-03-022012-11-232.1410010.6642760.5202392012
83.6411030.1381782013-05-242013-06-216.3424974.9663950.5302632013
97.5296790.0455542014-03-142014-03-2117.31474837.4797230.5045012014
102.2343650.5005962015-06-052015-09-113.1627364.4032170.9692332015
111.7343320.1013222016-04-082016-05-062.0974761.3069840.5795632016
12-0.5286230.2425582017-03-102017-08-04-0.667913-0.2864370.4484022017
13-0.8120790.3013132018-03-232018-10-19-1.385928-0.4894420.6061222018
142.2609500.2210672019-04-122019-08-024.2142481.8817470.5295462019
152.2900550.1526322020-07-032020-12-043.7484612.6561310.6603822020
160.6609650.1798022021-01-152021-07-300.9852790.2169390.4472802021
171.5498730.1329502022-06-242022-10-212.9303570.9777420.5280172022
18-0.4758970.1015062023-01-132023-05-19-0.929460-0.2161870.3713522023
对比前面的诸多策略,该策略的收益率并不算高(尤其和市值因子相比),这也是因为我们回归后的系数滞后了两周才进行预测的结果。

但是,整体来看该策略的效果是比Ret,B/M因子的效果好的,而且相比于Size因子,该策略可以很好的消除市场风格的影响。

在2017年的大盘股行情中,该策略的最大回撤只有25%,比单纯的Size因子好很多。