regpyhdfe package

Submodules

regpyhdfe.regpyhdfe module

class regpyhdfe.regpyhdfe.Regpyhdfe(df, target, predictors, absorb_ids=[], cluster_ids=[], drop_singletons=True, intercept=False)[source]

Bases: object

__init__(df, target, predictors, absorb_ids=[], cluster_ids=[], drop_singletons=True, intercept=False)[source]

Regression wrapper for PyHDFE.

Parameters:

df (pandas Dataframe) – dataframe containing referenced data which includes target, predictors and absorb and cluster.
target (string) – name of target variable - the y in y = X*b + e.
predictors (string or list of strings) – names of predictors, the X in y = X*b + e.
absorb_ids (string or list of strings) – names of variables to be absorbed for fixed effects.
cluster_ids (string or list of strings) – names of variables to be clustered on.
drop_singletons (bool) – indicates whether to drop singleton groups. Defaults is True, same as stata. Setting to False is equivalent to passing keepsingletons to reghdfe.

fit()[source]

Generate linear regression coefficients for given data.

The regression will cluster on variables provided during initialization.

Returns:: statsmodels.regression.linear_model.RegressionResults.

regpyhdfe.regpyhdfe.summary(self, regpyhdfe, yname=None, xname=None, title=None, alpha=0.05)[source]

Summarize the Regression Results.

Parameters:

yname (str, optional) – Name of endogenous (response) variable. The Default is y.
xname (list[str], optional) – Names for the exogenous variables. Default is var_## for ## in the number of regressors. Must match the number of parameters in the model.
title (str, optional) – Title for the top table. If not None, then this replaces the default title.
alpha (float) – The significance level for the confidence intervals.

Returns:

Instance holding the summary tables and text, which can be printed or converted to various output formats.

Return type:

Summary

regpyhdfe.utils module

regpyhdfe.utils.add_intercept(X)[source]

Prepends a column of 1s (an intercept column) to a a 2D numpy array.

Parameters:: X (numpy array) – 2D numpy array.
Returns:: X with an appended column of 1s.

regpyhdfe.utils.get_np_columns(df, columns, intercept=False)[source]

Helper used to retreive columns as numpy array.

Parameters:

df (pandas dataframe) – dataframe containing desired columns
columns (list of strings) – list of names of desired columns. Must be a list even if only 1 column is desired.
intercept (bool) – set to True if You’d like resulting numpy array to have a column of 1s appended to it.

Returns:

2D numpy array with columns of array consisting of feature vectors, i.e. the first column of the result is a numpy vector of the first column named in columns argument.

regpyhdfe.utils.sklearn_to_df(sklearn_dataset)[source]

Converts (as well as it can) an sklearn dataset to a Pandas dataframe.

Parameters:: sklearn_dataset (sklearn.utils.Bunch) – this parameter is usually the result of using sklearn to quickly get a dataset, e.g. the object resulting from calling sklearn.load_datasets.load_boston().
Returns:: Pandas dataframe df where df[‘target’] is the target variable in the original dataset.