regpyhdfe package


regpyhdfe.regpyhdfe module

class regpyhdfe.regpyhdfe.Regpyhdfe(df, target, predictors, absorb_ids=[], cluster_ids=[], drop_singletons=True, intercept=False)[source]

Bases: object

__init__(df, target, predictors, absorb_ids=[], cluster_ids=[], drop_singletons=True, intercept=False)[source]

Regression wrapper for PyHDFE.

  • df (pandas Dataframe) – dataframe containing referenced data which includes target, predictors and absorb and cluster.

  • target (string) – name of target variable - the y in y = X*b + e.

  • predictors (string or list of strings) – names of predictors, the X in y = X*b + e.

  • absorb_ids (string or list of strings) – names of variables to be absorbed for fixed effects.

  • cluster_ids (string or list of strings) – names of variables to be clustered on.

  • drop_singletons (bool) – indicates whether to drop singleton groups. Defaults is True, same as stata. Setting to False is equivalent to passing keepsingletons to reghdfe.


Generate linear regression coefficients for given data.

The regression will cluster on variables provided during initialization.



regpyhdfe.regpyhdfe.summary(self, regpyhdfe, yname=None, xname=None, title=None, alpha=0.05)[source]

Summarize the Regression Results.

  • yname (str, optional) – Name of endogenous (response) variable. The Default is y.

  • xname (list[str], optional) – Names for the exogenous variables. Default is var_## for ## in the number of regressors. Must match the number of parameters in the model.

  • title (str, optional) – Title for the top table. If not None, then this replaces the default title.

  • alpha (float) – The significance level for the confidence intervals.


Instance holding the summary tables and text, which can be printed or converted to various output formats.

Return type:


See also


A class that holds summary results.

regpyhdfe.utils module


Prepends a column of 1s (an intercept column) to a a 2D numpy array.


X (numpy array) – 2D numpy array.


X with an appended column of 1s.

regpyhdfe.utils.get_np_columns(df, columns, intercept=False)[source]

Helper used to retreive columns as numpy array.

  • df (pandas dataframe) – dataframe containing desired columns

  • columns (list of strings) – list of names of desired columns. Must be a list even if only 1 column is desired.

  • intercept (bool) – set to True if You’d like resulting numpy array to have a column of 1s appended to it.


2D numpy array with columns of array consisting of feature vectors, i.e. the first column of the result is a numpy vector of the first column named in columns argument.


Converts (as well as it can) an sklearn dataset to a Pandas dataframe.


sklearn_dataset (sklearn.utils.Bunch) – this parameter is usually the result of using sklearn to quickly get a dataset, e.g. the object resulting from calling sklearn.load_datasets.load_boston().


Pandas dataframe df where df[‘target’] is the target variable in the original dataset.