Code Documentation¶
This provides the documentation guide of the various functions in the repository.
QSA (Quasi-Seldonian Algorithm)¶
Following are the functions used by the Seldonian framework. This includes candidate selection process and safety test.
-
qsa.QSA(X, Y, T, seldonian_type, init_sol, init_sol1)¶ This function is used to run the qsa (Quasi-Seldonian Algorithm)
- Parameters
X – The features of the dataset
Y – The corresponding labels of the dataset
T – The corresponding sensitive attributes of the dataset
seldonian_type – The mode used in the experiment
init_sol – The initial theta values for the model
init_sol1 – The additional initial theta values for the model
- Returns
(theta, theta1, passed_safety) tuple containing optimal theta values and bool whether the candidate solution passed safety test or not.
-
qsa.cand_obj(theta, cand_data_X, cand_data_Y, cand_data_T, candidate_ratio, seldonian_type)¶ This function calculates the value of the objective function which would be minimized by the optimizer.
- Parameters
theta – The theta values for the model
cand_data_X – The features of the candidate dataset
cand_data_Y – The corresponding labels of the candidate dataset
cand_data_T – The corresponding sensitive attributes of the candidate dataset
candidate_ratio – The candidate:safety ratio used in the experiment
seldonian_type – The mode used in the experiment
- Returns
The objective value.
-
qsa.get_cand_solution(cand_data_X, cand_data_Y, cand_data_T, candidate_ratio, seldonian_type, init_sol, init_sol1)¶ This function provides the candidate solution.
- Parameters
cand_data_X – The features of the candidate dataset
cand_data_Y – The corresponding labels of the candidate dataset
cand_data_T – The corresponding sensitive attributes of the candidate dataset
seldonian_type – The mode used in the experiment
init_sol – The initial theta values for the model
init_sol1 – The additional initial theta values for the model
- Returns
The candidate solution (theta, theta1).
-
qsa.safety_test(theta, theta1, safe_data_X, safe_data_Y, safe_data_T, seldonian_type)¶ This function does the safety test.
- Parameters
theta – The optimal theta values for the model
theta1 – The additional optimal theta values for the model
safe_data_X – The features of the safety dataset
safe_data_Y – The corresponding labels of the safety dataset
safe_data_T – The corresponding sensitive attributes of the safety dataset
seldonian_type – The mode used in the experiment
- Returns
Bool value of whether the candidate solution passed safety test or not.
ML Model¶
The following functions correspond to model and constraint definition. The user is expected to make changes in this file.
-
logistic_regression_functions.fHat(theta, theta1, X, Y)¶ This is the main objective function. This must be change by the user according to his/her needs. Currently, it implements negative log loss of the model.
- Parameters
theta – The optimal theta values for the model
theta1 – The additional optimal theta values for the model
X – The features of the dataset
Y – The true labels of the dataset
- Returns
The negative log loss
-
logistic_regression_functions.predict(theta, theta1, X)¶ This is the predict function for Logistic Regression. This can be changed into predict function of the user defined model. Currently, it implements the following:
rac{1}{1 + e^-(X.theta + theta1)}
- param theta
The optimal theta values for the model
- param theta1
The additional optimal theta values for the model
- param X
The features of the dataset
- return
The probability value of label 1 of the complete dataset
-
logistic_regression_functions.simple_logistic(X, Y)¶ This function runs the simple logistic regression. This must be replaced by the user to include his/her own model.
- Parameters
X – The features of the dataset
Y – The true labels of the dataset
- Returns
The theta values (parameters) of the model
Equation parser¶
-
equation_parser.construct_expr_tree_base(rev_polish_notation)¶ Returns root of constructed tree for given postfix expression
- Parameters
rev_polish_notation – string with space as delimiter ‘ ‘
- Returns
expr_tree node
-
equation_parser.eval_expr_tree_base(t_node, Y, predicted_Y, T)¶ A utility function to evaluate estimate of the expression tree
- Parameters
t_node – expr_tree node
Y – pandas::Series
predicted_Y – tensor
T – pandas::Series
- Returns
estimate value: float
-
equation_parser.eval_expr_tree_conf_interval_base(t_node, Y, predicted_Y, T, delta, inequality, candidate_safety_ratio, predict_bound, modified_h)¶ To evaluate confidence interval of the expression tree
- Parameters
t_node – expr_tree node
Y – pandas::Series The true labels of the dataset
predicted_Y – tensor The predicted labels of the dataset
T – pandas::Series The sensitive attributes of the dataset
delta – float in [0, 1] The significance level
inequality – Enum The inequality to be used - Hoeffding/T-test
candidate_safety_ratio – The candidate to dafety ratio used in the experiment
predict_bound – Bool to tell whether we are finding bound for candidate or safety data
modified_h – Bool to tell whether or not modified confidence bound is to be used
- Returns
upper and lower bound of the estimate of the constraint
-
class
equation_parser.expr_tree(value)¶ An expression tree node of the constraint tree
-
equation_parser.inorder(t_node)¶ A utility function to print inorder traversal
- Parameters
t_node – expr_tree node
- Returns
None
Inequality¶
-
class
inequalities.Inequality¶ The Enum defining the inequality type. Currently, it supports T-test and Hoeffding.
-
inequalities.eval_estimate(element, Y, predicted_Y, T)¶ Estimates the value of the base variable. Assumes that Y and predicted_y contain 0,1 binary classification Suppose we are calculating for FP(A). Assume X to be an indicator function defined only in case type=A s.t. x_i = 1 if FP occurred for ith datapoint and x_i = 0 otherwise. Our data samples can be assumed to be independent and identically distributed. Our estimate of p, hat{p} = 1/n * sum(x_i). We can safely count this as binomial random variable. E[hat{p}] = 1/n * np = p As we do not know p, we approximate it to hat{p}.
- Parameters
element – expr_tree node
Y – pandas::Series
predicted_Y – tensor
T – pandas::Series
- Returns
estimate value: float
Viewing results¶
-
create_plots.loadAndPlotResults(fileName, ylabel, output_file, is_yAxis_prob, legend_loc)¶ This function is used to plot the results from the csv files and store the final graph
- Parameters
filename – The csv file path from where the data is imported
ylabel – The lable on the Y-axis of the graph
output_file – The path where the graph image must be stored
is_yAxis_prob – Bool of whether the Y-axis is probabity value or not
legend_loc – The location of the legend