适度冒险因子 由于本因子需要用到分钟级别的量价数据,全部得到数据量太大,难以保存与下载。因此,我只选取了其中299只股票2021年的数据进行简单的实现。
数据准备 1 2 3 4 5 6 7 8 9 import  pandas as  pdimport  jsonimport  osimport  zipfileimport  numpy as  npfrom  tqdm import  tqdmfrom  mytools import  backtestimport  warningswarnings.filterwarnings("ignore" ) 
1 2 with  open ("../data/stock_pool.json" , 'r' ) as  f:    stock_pool = json.load(f) 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 def  dataloader (stock_code ):    with  zipfile.ZipFile("../data/mins.zip" , 'r' ) as  zfile:         f = zfile.open (f'mins/{stock_code} .csv' )         df = pd.read_csv(f)     df['rtn' ] = df.groupby('date' ).apply(lambda  x: (x['close' ]-x['close' ].shift(1 )) / x['close' ].shift(1 )).reset_index(drop=True )     df['date' ] = pd.to_datetime(df['date' ])     return  df df = dataloader("000001.SZ" ) df.head() 
close volume stock_code date hour minute rtn 0 2853.1013 3887508 000001.SZ 2021-01-04 9 31 NaN 1 2847.0500 1843936 000001.SZ 2021-01-04 9 32 -0.002121 2 2850.0757 1673800 000001.SZ 2021-01-04 9 33 0.001063 3 2824.3584 2422714 000001.SZ 2021-01-04 9 34 -0.009023 4 2813.7690 2531900 000001.SZ 2021-01-04 9 35 -0.003749 
1 2 3 4 5 6 7 8 9 10 11 12 13 daily_stock_data = pd.read_csv("../data/daily_stock_data.csv" ) daily_stock_data['date' ] = pd.to_datetime(daily_stock_data['date' ]) daily_stock_data['rtn' ] = (daily_stock_data['close' ] - daily_stock_data['pre_close' ]) / daily_stock_data['pre_close' ] daily_stock_data.head(5 ) 
stock_code date open high low close pre_close volume rtn 0 000001.SZ 2021-12-31 16.86 16.90 16.40 16.48 16.82 1750760.89 -0.020214 1 000001.SZ 2021-12-30 16.76 16.95 16.72 16.82 16.75 796663.60 0.004179 2 000001.SZ 2021-12-29 17.16 17.16 16.70 16.75 17.17 1469373.98 -0.024461 3 000001.SZ 2021-12-28 17.22 17.33 17.09 17.17 17.22 1126638.91 -0.002904 4 000001.SZ 2021-12-27 17.33 17.35 17.16 17.22 17.31 731118.99 -0.005199 
计算预测收益 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 dsd = {} for  key in  tqdm(['high' , 'open' , 'low' , 'close' , 'volume' ]):    dsd[key] = pd.pivot(daily_stock_data, index='date' , columns='stock_code' , values=key) dsd['pred_rtn' ] = (dsd['close' ].shift(-1 )-dsd['close' ])/dsd['close' ] pred_rtn_na = dsd['pred_rtn' ].isna()  vol0 = dsd['volume' ].shift(-1 )==0  dsd['pred_rtn' ][vol0 & (~pred_rtn_na)] = 0   yz = dsd['high' ].shift(-1 )==dsd['low' ].shift(-1 )  zt = ~(dsd['close' ].shift(-1 ) > dsd['close' ])  dsd['pred_rtn' ][yz & zt & (~pred_rtn_na)] = 0   pred_rtn = dsd['pred_rtn' ].stack().reset_index().rename(columns={0 : 'pred_rtn' }) 
因子计算 因子计算思路:
计算分钟频率交易量的变化: Δ v o l u m e = v o l u m e t − v o l u m e t − 1 \Delta volume = volume_t-volume_{t-1} Δ v o l u m e = v o l u m e t  − v o l u m e t − 1  
由此得到每日放量的“激增时刻”: t s = Δ v o l u m e > Δ v o l u m e ‾ + σ ( Δ v o l u m e ) ? 0 : 1 t_s = \Delta volume>\overline{\Delta volume}+\sigma (\Delta volume)?0:1 t s  = Δ v o l u m e > Δ v o l u m e + σ ( Δ v o l u m e ) ? 0 : 1 
分别计算激增时刻后五分钟收益率的平均值,标准差作为这个激增时刻所引起的市场反应的“耀眼收益率r s r_s r s   ”与”“耀眼波动率σ s \sigma_s σ s   ”
分别计算t日所有激增时刻的“耀眼收益率r s r_s r s   ”与”“耀眼波动率\sigma_s”的均值,作为“日耀眼收益率r s t r_s^t r s t   ”与“日耀眼波动率σ s t \sigma_s^t σ s t   ”
计算二者在截面上的均值作为当日的“适度水平”,计算两个日度指标与市场平均水平的差距,然后计算差距的均值作为“月均耀眼指标”,标准差作为“月稳耀眼指标”。
最后:
月耀眼收益率 = 月均耀眼收益率 + 月稳耀眼收益率 月耀眼波动率 = 月均耀眼波动率 + 月稳耀眼波动率 月耀眼收益率 = 月均耀眼收益率 + 月稳耀眼收益率 \\ 月耀眼波动率 = 月均耀眼波动率 + 月稳耀眼波动率 月 耀 眼 收 益 率 = 月 均 耀 眼 收 益 率 + 月 稳 耀 眼 收 益 率 月 耀 眼 波 动 率 = 月 均 耀 眼 波 动 率 + 月 稳 耀 眼 波 动 率 
激增时刻 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 def  find_surge_time (stk_data ):    """          识别每日交易过程中的“激增时刻”,即交易量超过当天交易量增量mean+std的时刻     Args:         stk_data (_type_): 单只股票的分钟序列     Returns:         _type_: _description_     """     stk_data['volume_delta' ] = stk_data.groupby(['stock_code' , 'date' ]) \         .apply(lambda  x: x['volume' ]-x['volume' ].shift(1 )).reset_index(drop=True )     up_bound = stk_data.groupby(['stock_code' , 'date' ])['volume_delta' ] \         .apply(lambda  x: x.mean()+x.std()).reset_index() \         .rename(columns={"volume_delta" : 'up_bound' })     stk_data = pd.merge(stk_data, up_bound, on=['stock_code' , 'date' ], how="left" )     stk_data['surge' ] = 0      stk_data.loc[stk_data['volume_delta' ]>stk_data['up_bound' ], 'surge' ] = 1      return  stk_data 
1 2 3 4 5 6 ls = [] for  stock_code in  tqdm(stock_pool):    stk_data = dataloader(stock_code)     ls.append(stk_data) stk_data = pd.concat(ls).reset_index(drop=True ) stk_data = find_surge_time(stk_data) 
因子计算 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 def  calculate_moderate_risk_factor (stk_data0 ):    """          计算适度冒险因子     Args:         stk_data (_type_): 股票数据     """     def  monthly_excellent_factor (stk_data, aspect ):         """          计算股票不同指标月度情况         """                  stk_data.loc[stk_data['surge' ]==0 ][aspect] = np.nan         fac_ex = stk_data.groupby(['stock_code' , 'date' ], group_keys=False )[aspect] \             .apply(lambda  x: x.mean()).to_frame() \             .rename(columns={aspect: 'excellent' }).reset_index()                           market_level = fac_ex.groupby('date' )['excellent' ] \             .mean().to_frame().rename(columns={'excellent' : 'market_level' })         fac_ex = pd.merge(fac_ex, market_level, on="date" , how='left' )         fac_ex['moderate' ] = abs (fac_ex['excellent' ] - fac_ex['market_level' ])         fac_ex = fac_ex.set_index('date' )                  factor = pd.DataFrame()         factor['moderate_mean' ] = fac_ex.groupby('stock_code' )['moderate' ].rolling(20 ).mean()          factor['moderate_std' ] = fac_ex.groupby('stock_code' )['moderate' ].rolling(20 ).std()         factor['factor' ] = factor['moderate_mean' ] + factor['moderate_std' ]         return  factor[['factor' ]]          stk_data = stk_data0.copy()     stk_data['rtn_m5' ] = stk_data.groupby(['stock_code' , 'date' ], group_keys=False )['rtn' ] \         .apply(lambda  x: x.rolling(5 ).mean().shift(-5 ))     stk_data['rtn_s5' ] = stk_data.groupby(['stock_code' , 'date' ], group_keys=False )['rtn' ] \         .apply(lambda  x: x.rolling(5 ).std().shift(-5 ))               fac_ex_ret = monthly_excellent_factor(stk_data, "rtn_m5" )     fac_ex_vol = monthly_excellent_factor(stk_data, "rtn_s5" )     factor = (fac_ex_ret['factor' ] + fac_ex_vol['factor' ]).reset_index()     return  factor 
因子数据处理 1 2 3 4 5 6 7 8 factor = calculate_moderate_risk_factor(stk_data) factor = factor.dropna() factor = pd.merge(factor, pred_rtn, on=['date' , 'stock_code' ], how='left' ) factor = factor[~factor['pred_rtn' ].isna()].rename(columns={'factor' : "moderate_risk_factor" , 'date' : "close_date" }) factor = backtest.winsorize_factor(factor, 'moderate_risk_factor' ) factor.head(5 ) 
stock_code close_date moderate_risk_factor pred_rtn 5651 000001.SZ 2021-01-29 0.000551 0.063231 5652 000002.SZ 2021-01-29 0.001112 0.010076 5653 000004.SZ 2021-01-29 0.000724 -0.055281 5654 000005.SZ 2021-01-29 0.001220 -0.093458 5655 000006.SZ 2021-01-29 0.000634 0.014403 
因子检测 1 2 3 res_dict = backtest.fama_macbeth(factor, 'moderate_risk_factor' ) fama_macbeth_res = pd.DataFrame([res_dict]) fama_macbeth_res 
fac_name t p pos_count neg_count 0 moderate_risk_factor 0.189631 0.849773 113 109 
1 group_rtns, group_cum_rtns = backtest.group_return_analysis(factor, 'moderate_risk_factor' ) 
整体来看该因子是一个正向因子,从我选择的回测期来看,这并不是一个有效的因子。
通过Fama-MacBeth检验,其带来的收益几乎为0,而且并不显著。
对因子进行分组回测可以看到,收益两头高中间低,可以进行进一步的优化。
但是由于回测时间太短,而且只在300只股票中测试,无法判定因子的真实效果,可能只是收到市场风格影响,可以在更长的时间,更大的票池上测试。
1 rtn, evaluate_result = backtest.backtest_1week_nstock(factor, 'moderate_risk_factor' ) 
sharpe_ratio max_drawdown max_drawdown_start max_drawdown_end sortino_ratio annual_return annual_volatility section 0 1.930454 0.131483 2021-09-10 2021-11-04 2.777997 0.388784 0.178471 Sum 1 1.930454 0.131483 2021-09-10 2021-11-04 2.777997 0.388784 0.178471 2021 
从策略指标来看,效果其实还可以,夏普比接近2,回撤也较小。整体收益比300只股票的均值大一些。
1 2 3 market_rtn = daily_stock_data.groupby('date' )['rtn' ].mean().to_frame().rename(columns={'rtn' : 'market_rtn' }) rtn = pd.merge(rtn, market_rtn, right_index=True , left_index=True , how="left" ) rtn['market_cum_rtn' ] = (1  + rtn['market_rtn' ]).cumprod() 
1 rtn[['cum_rtn' , 'market_cum_rtn' ]].plot()