Code Documentation

This provides the documentation guide of the various functions in the repository.

QSA (Quasi-Seldonian Algorithm)

Following are the functions used by the Seldonian framework. This includes candidate selection process and safety test.

qsa.QSA(X, Y, T, seldonian_type, init_sol, init_sol1)

This function is used to run the qsa (Quasi-Seldonian Algorithm)

Parameters
  • X – The features of the dataset

  • Y – The corresponding labels of the dataset

  • T – The corresponding sensitive attributes of the dataset

  • seldonian_type – The mode used in the experiment

  • init_sol – The initial theta values for the model

  • init_sol1 – The additional initial theta values for the model

Returns

(theta, theta1, passed_safety) tuple containing optimal theta values and bool whether the candidate solution passed safety test or not.

qsa.cand_obj(theta, cand_data_X, cand_data_Y, cand_data_T, candidate_ratio, seldonian_type)

This function calculates the value of the objective function which would be minimized by the optimizer.

Parameters
  • theta – The theta values for the model

  • cand_data_X – The features of the candidate dataset

  • cand_data_Y – The corresponding labels of the candidate dataset

  • cand_data_T – The corresponding sensitive attributes of the candidate dataset

  • candidate_ratio – The candidate:safety ratio used in the experiment

  • seldonian_type – The mode used in the experiment

Returns

The objective value.

qsa.get_cand_solution(cand_data_X, cand_data_Y, cand_data_T, candidate_ratio, seldonian_type, init_sol, init_sol1)

This function provides the candidate solution.

Parameters
  • cand_data_X – The features of the candidate dataset

  • cand_data_Y – The corresponding labels of the candidate dataset

  • cand_data_T – The corresponding sensitive attributes of the candidate dataset

  • seldonian_type – The mode used in the experiment

  • init_sol – The initial theta values for the model

  • init_sol1 – The additional initial theta values for the model

Returns

The candidate solution (theta, theta1).

qsa.safety_test(theta, theta1, safe_data_X, safe_data_Y, safe_data_T, seldonian_type)

This function does the safety test.

Parameters
  • theta – The optimal theta values for the model

  • theta1 – The additional optimal theta values for the model

  • safe_data_X – The features of the safety dataset

  • safe_data_Y – The corresponding labels of the safety dataset

  • safe_data_T – The corresponding sensitive attributes of the safety dataset

  • seldonian_type – The mode used in the experiment

Returns

Bool value of whether the candidate solution passed safety test or not.

ML Model

The following functions correspond to model and constraint definition. The user is expected to make changes in this file.

logistic_regression_functions.fHat(theta, theta1, X, Y)

This is the main objective function. This must be change by the user according to his/her needs. Currently, it implements negative log loss of the model.

Parameters
  • theta – The optimal theta values for the model

  • theta1 – The additional optimal theta values for the model

  • X – The features of the dataset

  • Y – The true labels of the dataset

Returns

The negative log loss

logistic_regression_functions.predict(theta, theta1, X)

This is the predict function for Logistic Regression. This can be changed into predict function of the user defined model. Currently, it implements the following:

rac{1}{1 + e^-(X.theta + theta1)}

param theta

The optimal theta values for the model

param theta1

The additional optimal theta values for the model

param X

The features of the dataset

return

The probability value of label 1 of the complete dataset

logistic_regression_functions.simple_logistic(X, Y)

This function runs the simple logistic regression. This must be replaced by the user to include his/her own model.

Parameters
  • X – The features of the dataset

  • Y – The true labels of the dataset

Returns

The theta values (parameters) of the model

Equation parser

equation_parser.construct_expr_tree_base(rev_polish_notation)

Returns root of constructed tree for given postfix expression

Parameters

rev_polish_notation – string with space as delimiter ‘ ‘

Returns

expr_tree node

equation_parser.eval_expr_tree_base(t_node, Y, predicted_Y, T)

A utility function to evaluate estimate of the expression tree

Parameters
  • t_node – expr_tree node

  • Y – pandas::Series

  • predicted_Y – tensor

  • T – pandas::Series

Returns

estimate value: float

equation_parser.eval_expr_tree_conf_interval_base(t_node, Y, predicted_Y, T, delta, inequality, candidate_safety_ratio, predict_bound, modified_h)

To evaluate confidence interval of the expression tree

Parameters
  • t_node – expr_tree node

  • Y – pandas::Series The true labels of the dataset

  • predicted_Y – tensor The predicted labels of the dataset

  • T – pandas::Series The sensitive attributes of the dataset

  • delta – float in [0, 1] The significance level

  • inequality – Enum The inequality to be used - Hoeffding/T-test

  • candidate_safety_ratio – The candidate to dafety ratio used in the experiment

  • predict_bound – Bool to tell whether we are finding bound for candidate or safety data

  • modified_h – Bool to tell whether or not modified confidence bound is to be used

Returns

upper and lower bound of the estimate of the constraint

class equation_parser.expr_tree(value)

An expression tree node of the constraint tree

equation_parser.inorder(t_node)

A utility function to print inorder traversal

Parameters

t_node – expr_tree node

Returns

None

Inequality

class inequalities.Inequality

The Enum defining the inequality type. Currently, it supports T-test and Hoeffding.

inequalities.eval_estimate(element, Y, predicted_Y, T)

Estimates the value of the base variable. Assumes that Y and predicted_y contain 0,1 binary classification Suppose we are calculating for FP(A). Assume X to be an indicator function defined only in case type=A s.t. x_i = 1 if FP occurred for ith datapoint and x_i = 0 otherwise. Our data samples can be assumed to be independent and identically distributed. Our estimate of p, hat{p} = 1/n * sum(x_i). We can safely count this as binomial random variable. E[hat{p}] = 1/n * np = p As we do not know p, we approximate it to hat{p}.

Parameters
  • element – expr_tree node

  • Y – pandas::Series

  • predicted_Y – tensor

  • T – pandas::Series

Returns

estimate value: float

Viewing results

create_plots.loadAndPlotResults(fileName, ylabel, output_file, is_yAxis_prob, legend_loc)

This function is used to plot the results from the csv files and store the final graph

Parameters
  • filename – The csv file path from where the data is imported

  • ylabel – The lable on the Y-axis of the graph

  • output_file – The path where the graph image must be stored

  • is_yAxis_prob – Bool of whether the Y-axis is probabity value or not

  • legend_loc – The location of the legend