Code Documentation¶

This provides the documentation guide of the various functions in the repository.

QSA (Quasi-Seldonian Algorithm)¶

Following are the functions used by the Seldonian framework. This includes candidate selection process and safety test.

qsa.QSA(X, Y, T, seldonian_type, init_sol, init_sol1)¶

This function is used to run the qsa (Quasi-Seldonian Algorithm)

Parameters

X – The features of the dataset
Y – The corresponding labels of the dataset
T – The corresponding sensitive attributes of the dataset
seldonian_type – The mode used in the experiment
init_sol – The initial theta values for the model
init_sol1 – The additional initial theta values for the model

Returns

(theta, theta1, passed_safety) tuple containing optimal theta values and bool whether the candidate solution passed safety test or not.

qsa.cand_obj(theta, cand_data_X, cand_data_Y, cand_data_T, candidate_ratio, seldonian_type)¶

This function calculates the value of the objective function which would be minimized by the optimizer.

Parameters

theta – The theta values for the model
cand_data_X – The features of the candidate dataset
cand_data_Y – The corresponding labels of the candidate dataset
cand_data_T – The corresponding sensitive attributes of the candidate dataset
candidate_ratio – The candidate:safety ratio used in the experiment
seldonian_type – The mode used in the experiment

Returns

The objective value.

qsa.get_cand_solution(cand_data_X, cand_data_Y, cand_data_T, candidate_ratio, seldonian_type, init_sol, init_sol1)¶

This function provides the candidate solution.

Parameters

cand_data_X – The features of the candidate dataset
cand_data_Y – The corresponding labels of the candidate dataset
cand_data_T – The corresponding sensitive attributes of the candidate dataset
seldonian_type – The mode used in the experiment
init_sol – The initial theta values for the model
init_sol1 – The additional initial theta values for the model

Returns

The candidate solution (theta, theta1).

qsa.safety_test(theta, theta1, safe_data_X, safe_data_Y, safe_data_T, seldonian_type)¶

This function does the safety test.

Parameters

theta – The optimal theta values for the model
theta1 – The additional optimal theta values for the model
safe_data_X – The features of the safety dataset
safe_data_Y – The corresponding labels of the safety dataset
safe_data_T – The corresponding sensitive attributes of the safety dataset
seldonian_type – The mode used in the experiment

Returns

Bool value of whether the candidate solution passed safety test or not.

ML Model¶

The following functions correspond to model and constraint definition. The user is expected to make changes in this file.

logistic_regression_functions.fHat(theta, theta1, X, Y)¶

This is the main objective function. This must be change by the user according to his/her needs. Currently, it implements negative log loss of the model.

Parameters

theta – The optimal theta values for the model
theta1 – The additional optimal theta values for the model
X – The features of the dataset
Y – The true labels of the dataset

Returns

The negative log loss

logistic_regression_functions.predict(theta, theta1, X)¶

This is the predict function for Logistic Regression. This can be changed into predict function of the user defined model. Currently, it implements the following:

rac{1}{1 + e^-(X.theta + theta1)}

param theta

The optimal theta values for the model

param theta1

The additional optimal theta values for the model

param X

The features of the dataset

return

The probability value of label 1 of the complete dataset

logistic_regression_functions.simple_logistic(X, Y)¶

This function runs the simple logistic regression. This must be replaced by the user to include his/her own model.

Parameters

X – The features of the dataset
Y – The true labels of the dataset

Returns

The theta values (parameters) of the model

Equation parser¶

equation_parser.construct_expr_tree_base(rev_polish_notation)¶

Returns root of constructed tree for given postfix expression

Parameters: rev_polish_notation – string with space as delimiter ‘ ‘
Returns: expr_tree node

equation_parser.eval_expr_tree_base(t_node, Y, predicted_Y, T)¶

A utility function to evaluate estimate of the expression tree

Parameters

t_node – expr_tree node
Y – pandas::Series
predicted_Y – tensor
T – pandas::Series

Returns

estimate value: float

equation_parser.eval_expr_tree_conf_interval_base(t_node, Y, predicted_Y, T, delta, inequality, candidate_safety_ratio, predict_bound, modified_h)¶

To evaluate confidence interval of the expression tree

Parameters

t_node – expr_tree node
Y – pandas::Series The true labels of the dataset
predicted_Y – tensor The predicted labels of the dataset
T – pandas::Series The sensitive attributes of the dataset
delta – float in [0, 1] The significance level
inequality – Enum The inequality to be used - Hoeffding/T-test
candidate_safety_ratio – The candidate to dafety ratio used in the experiment
predict_bound – Bool to tell whether we are finding bound for candidate or safety data
modified_h – Bool to tell whether or not modified confidence bound is to be used

Returns

upper and lower bound of the estimate of the constraint

class equation_parser.expr_tree(value)¶: An expression tree node of the constraint tree

equation_parser.inorder(t_node)¶

A utility function to print inorder traversal

Parameters: t_node – expr_tree node
Returns: None

Inequality¶

class inequalities.Inequality¶: The Enum defining the inequality type. Currently, it supports T-test and Hoeffding.

inequalities.eval_estimate(element, Y, predicted_Y, T)¶

Estimates the value of the base variable. Assumes that Y and predicted_y contain 0,1 binary classification Suppose we are calculating for FP(A). Assume X to be an indicator function defined only in case type=A s.t. x_i = 1 if FP occurred for ith datapoint and x_i = 0 otherwise. Our data samples can be assumed to be independent and identically distributed. Our estimate of p, hat{p} = 1/n * sum(x_i). We can safely count this as binomial random variable. E[hat{p}] = 1/n * np = p As we do not know p, we approximate it to hat{p}.

Parameters

element – expr_tree node
Y – pandas::Series
predicted_Y – tensor
T – pandas::Series

Returns

estimate value: float

Viewing results¶

create_plots.loadAndPlotResults(fileName, ylabel, output_file, is_yAxis_prob, legend_loc)¶

This function is used to plot the results from the csv files and store the final graph

Parameters

filename – The csv file path from where the data is imported
ylabel – The lable on the Y-axis of the graph
output_file – The path where the graph image must be stored
is_yAxis_prob – Bool of whether the Y-axis is probabity value or not
legend_loc – The location of the legend