Signal_Background Class

class Stats_Analysis.Compound_Dist.Signal_Background_Class.Signal_Background(mu, sigma, beta, m, lamb, mu_b, sigma_b, f, lower_bound_X, upper_bound_X, lower_bound_Y=None, upper_bound_Y=None)[source]

Bases: object

Combined Signal and Background probability distribution.

This class models a mixture distribution defined by: S(X, Y) = f * Signal(X, Y) + (1-f) * Background(X, Y)

The Signal distribution is defined as the product of a CrystalBall distribution (X dimension) and an ExponentialDecay distribution (Y dimension).

The Background distribution is defined as the product of a Uniform distribution (X dimension) and a Normal distribution (Y dimension).

Parameters:
  • mu (float) – The mean of the CrystalBall distribution in the X dimension for the Signal component.

  • sigma (float) – The standard deviation of the CrystalBall distribution in the X dimension for the Signal component.

  • beta (float) – The threshold value of the CrystalBall distribution in the X dimension for the Signal component. Must be beta > 0.

  • m (float) – The power-law tail exponent of the CrystalBall distribution in the X dimension for the Signal component. Must be m > 1.

  • lamb (float) – The decay constant (rate) of the ExponentialDecay distribution in the Y dimension for the Signal component. Must be lamb > 0.

  • mu_b (float) – The mean of the Normal distribution in the Y dimension for the Background component.

  • sigma_b (float) – The standard deviation of the Normal distribution in the Y dimension for the Background component. Must be sigma_b > 0.

  • f (float) – The fraction of the Signal distribution in the mixture. Must be in the range [0, 1].

  • lower_bound_X (float) – The lower bound of the Uniform distribution in the X dimension for the Background component and the truncation of the CrystalBall distribution in the Signal component.

  • upper_bound_X (float) – The upper bound of the Uniform distribution in the X dimension for the Background component and the truncation of the CrystalBall distribution in the Signal component. Must satisfy lower_bound_X < upper_bound_X.

  • lower_bound_Y (float, optional) – The lower bound of both the ExponentialDecay distribution (Signal component) and the Normal distribution (Background component) in the Y dimension. Default is None.

  • upper_bound_Y (float, optional) – The upper bound of both the ExponentialDecay distribution (Signal component) and the Normal distribution (Background component) in the Y dimension. Default is None.

Raises:

ValueError – If f is not in the range [0, 1]. If beta <= 0. If m <= 1. If lamb <= 0. If sigma_b <= 0. If lower_bound_X >= upper_bound_X. If lower_bound_Y >= upper_bound_Y.

__init__(mu, sigma, beta, m, lamb, mu_b, sigma_b, f, lower_bound_X, upper_bound_X, lower_bound_Y=None, upper_bound_Y=None)[source]
_find_max_pdf()[source]

Finds the maximum value of the joint PDF by first using a rough grid search and then a local optimisation. Only performed if both the lower and upper bounds are defined.

Returns:

The maximum value of the joint PDF within the defined bounds.

Return type:

float

accept_reject_sample(desired_samples=100000, init_batch_size=1000, max_batch_size=2000000, poisson=False, save_to_class=False)[source]

Generate random samples from the joint Signal-Background distribution using the accept-reject method with dynamic batch sizing.

This method uses an initial batch to estimate the acceptance rate and dynamically adjusts the batch size to efficiently generate the required number of samples.

Parameters:
  • desired_samples (int, optional) – Total number of samples to generate (default: 100,000).

  • init_batch_size (int, optional) – Batch size for the initial sampling to estimate the acceptance rate (default: 1,000).

  • max_batch_size (int, optional) – Maximum batch size for iterations to prevent overloading memory (default: 2,000,000). May need adjusting for devices with limited memory.

  • poisson (bool, optional) – If True, the total number of samples (desired_samples) will be drawn from a Poisson distribution with a mean of desired_samples (default: False).

  • save_to_class (bool, optional) – If True, the generated samples will be saved as an attribute of the class (self.samples) for later use (default: False).

Returns:

Array of shape (desired_samples, 2) containing the sampled (X, Y) points.

Return type:

np.ndarray

Raises:

ValueError – If any of the bounds are not defined, this is required to generate sample

Notes

  • The initial batch estimates the acceptance rate as: acceptance_rate = (Number of Accepted Samples in Initial Batch) / (Initial Batch Size)

  • Subsequent batch sizes are calculated dynamically based on the acceptance rate and the number of remaining desired samples, with a 10% buffer.

  • A maximum batch size (max_batch_size) is enforced to ensure memory efficiency.

cdf(X, Y)[source]

Compute the joint Cumulative Distribution Function (CDF).

The Joint CDF is defined as: C(X, Y) = Integral of S_B(X,Y) from 0, X and 0, Y

As the distributions are independent, the joint CDF is the product of the individual CDFs: C(X, Y) = f * Signal_CDF(X, Y) + (1-f) * Background_CDF(X, Y)

Parameters:
  • X (float or np.ndarray) – The value(s) of X at which to evaluate the CDF.

  • Y (float or np.ndarray) – The value(s) of Y at which to evaluate the CDF.

Returns:

The joint CDF value(s),

Return type:

float or np.ndarray

cdf_fitting(X, Y, mu, sigma, beta, m, lamb, mu_b, sigma_b, f)[source]

Calculate the joint Probability Density Function (PDF) for given parameters.

Parameters:
  • X (float or np.ndarray) – Values where the PDF is evaluated.

  • Y (float or np.ndarray) – Values where the PDF is evaluated.

  • mu (float) – Parameters to use for the calculation.

  • sigma (float) – Parameters to use for the calculation.

  • beta (float) – Parameters to use for the calculation.

  • m (float) – Parameters to use for the calculation.

  • lamb (float) – Parameters to use for the calculation.

  • mu_b (float) – Parameters to use for the calculation.

  • sigma_b (float) – Parameters to use for the calculation.

  • f (float) – Parameters to use for the calculation.

Returns:

PDF values for the input X, Y.

Return type:

np.ndarray

fit_params(initial_params, samples=None, print_results=False, save_to_class=False)[source]

Perform an extended maximum likelihood fit using iminuit.

This method fits the parameters of a model to the given data by minimizing the negative log-likelihood using the iminuit package. The fit is based on Extended Unbinned Maximum Likelihood (EUMLE).

Parameters:
  • initial_params (list of float) – Initial guesses for the model parameters in the order: [mu, sigma, beta, m, lamb, mu_b, sigma_b, f, N_expected].

  • samples (np.ndarray, optional) – Observed data points of shape (N, 2), where each row represents a pair of (X, Y) values. If not provided, the method attempts to use samples stored in the instance (self.samples).

  • print_results (bool, optional) – If True, prints the iminuit results to the console. Default is False.

  • save_to_class (bool, optional) – If True, saves the resulting Minuit object (self.mi) to the instance for later use. Default is False.

Returns:

A tuple containing: - mi.values : A dictionary of fitted parameter values. - mi.errors : A dictionary of parameter uncertainties.

Return type:

tuple

Raises:
  • ValueError – If no data samples are provided or stored in the instance (self.samples).

  • RuntimeError – If the minimization process fails to converge (migrad is invalid).

Notes

  • Parameter limits are set based on physical significance and distribution constraints:

  • The fit includes an error analysis using the Hesse algorithm to estimate parameter uncertainties.

  • The iminuit object provides detailed information about the fit, including parameter correlations.

fit_params_results()[source]

Visualise the results of the parameter fitting process.

Returns:

  • 1. Parameter Summary Table

  • 2. Correlation Heatmap

Raises:

ValueError – If the Minuit object (self.mi) is not available. Ensure that the fit_params method is run before calling this method.

Notes

  • True values must be stored in self.true_params for comparison in the table.

  • Fitted values and uncertainties are extracted from the Minuit object (self.mi).

  • The correlation matrix is calculated from the Minuit covariance matrix.

fit_params_sWeights(initial_params, samples=None, print_results=False, norm_check=True)[source]

Fit parameters using the sWeights method.

This method performs the following steps: 1. Fits the X dimension using an extended unbinned likelihood method. 2. Calculates signal and background weights using the sWeights method. 3. Projects the signal distribution to the Y dimension. 4. Fits the Y dimension using a binned maximum likelihood estimation.

Parameters:
  • initial_params (list or array-like) – Initial guesses for the parameters [mu, sigma, beta, m, f, lamb, N].

  • samples (np.ndarray, optional) – The samples to be used for fitting. If not provided, the samples generated by the accept_reject_sample method will be used.

  • print_results (bool, optional) – If True, prints the fitting results and plots the intermediate steps. Default is False.

  • norm_check (bool, optional) – If True, performs normalization checks during the sWeights calculation. Default is True.

Returns:

  • mi_total_values (dict) – A dictionary containing the fitted parameter values.

  • mi_total_errors (dict) – A dictionary containing the errors of the fitted parameters.

Raises:
  • ValueError – If no samples are provided or generated by the accept_reject_sample method.

  • RuntimeError – If the minimization does not converge.

marginal_cdf_x(X)[source]

Calculate the marginal Cumulative Distribution Function (CDF) in the X dimension. Integral of the joint CDF over the Y dimension, to remove the Y dependence.

The marginal CDF is defined as: C(X) = Integral of C_B(X, Y) wrt Y over [-infinity, infinity]

Parameters:

X (float or np.ndarray) – The value(s) of X at which to evaluate the CDF.

Returns:

A tuple containing: - The signal component of the marginal CDF. - The background component of the marginal CDF. - The total marginal CDF (signal + background).

Return type:

tuple of (float or np.ndarray, float or np.ndarray, float or np.ndarray)

marginal_cdf_y(Y)[source]

Calculate the marginal Cumulative Distribution Function (CDF) in the Y dimension. Integral of the joint CDF over the X dimension, to remove the X dependence.

The marginal CDF is defined as: C(Y) = Integral of C_B(X, Y) wrt X over [-infinity, infinity]

Parameters:

Y (float or np.ndarray) – The value(s) of Y at which to evaluate the CDF.

Returns:

A tuple containing: - The signal component of the marginal CDF. - The background component of the marginal CDF. - The total marginal CDF (signal + background).

Return type:

tuple of (float or np.ndarray, float or np.ndarray, float or np.ndarray)

marginal_pdf_x(X)[source]

Calculate the marginal Probability Density Function (PDF) in the X dimension. Integral of the joint PDF over the Y dimension, to remove the Y dependence.

The marginal PDF is defined as: S_B(X) = Integral of S_B(X, Y) wrt Y over [-infinity, infinity]

Parameters:

X (float or np.ndarray) – The value(s) of X at which to evaluate the PDF.

Returns:

A tuple containing: - The signal component of the marginal PDF. - The background component of the marginal PDF. - The total marginal PDF (signal + background).

Return type:

tuple of (float or np.ndarray, float or np.ndarray, float or np.ndarray)

marginal_pdf_y(Y)[source]

Calculate the marginal Probability Density Function (PDF) in the Y dimension. Integral of the joint PDF over the X dimension, to remove the X dependence.

The marginal PDF is defined as: S_B(Y) = Integral of S_B(X, Y) wrt X over [-infinity, infinity]

Parameters:

Y (float or np.ndarray) – The value(s) of Y at which to evaluate the PDF.

Returns:

A tuple containing: - The signal component of the marginal PDF. - The background component of the marginal PDF. - The total marginal PDF (signal + background).

Return type:

tuple of (float or np.ndarray, float or np.ndarray, float or np.ndarray)

normalisation_check(over_whole_plane=False)[source]

Check the normalization of the joint Probability Density Function (PDF) using numerical integration.

This method performs numerical integration with scipy.integrate.dblquad to ensure that the joint PDF integrates to 1. It supports both truncated and untruncated cases.

Parameters:

over_whole_plane (bool, optional) – If True, perform integration over the entire real plane (-infinity to infinity) for both X and Y. Default is False, in which case integration is only performed over the defined/truncated region.

Returns:

  • The defined/truncated region: [lower_bound_X, upper_bound_X] for X and [lower_bound_Y, upper_bound_Y] for Y.

  • The entire real plane: X in [-infinity, infinity] and Y in [-infinity, infinity] (only if over_whole_plane is True).

Return type:

Normalisation results for

Notes

  • If the PDF is truncated, the method integrates over the truncated region defined by the bounds (lower_bound_X, upper_bound_X, lower_bound_Y, upper_bound_Y).

  • If no bounds are defined, the truncated region defaults to the entire real plane.

  • The integration over the entire real plane is computationally intensive and can be skipped by setting over_whole_plane to False.

param_bootstrap_analysis(input_directory='Bootstrap/Param_Results', output_directory='Bootstrap/Plots')[source]

Analyse and visualise parameter fitting results from bootstrap sample, storing plots in the output directory.

Parameters:
  • input_directory (str, optional) – Path to the directory containing bootstrap fitting result files (default: “Bootstrap/Param_Results”). Each file is expected to follow the naming convention: “ParamResults_No_<num_samples>_BaseSize_<sample_size>.npy”.

  • output_directory (str, optional) – Path to the directory where plots will be saved (default: “Bootstrap/Plots”). Subdirectories for histograms, trends, and pull plots are created or cleared before use.

Returns:

  • This method generates the following plots

  • 1. Histograms for each parameter (value, error, and pull) across bootstrap samples.

  • 2. Bias and Error trends vs. sample size for each parameter.

  • 4. Pull distributions for each parameter across all samples.

  • results (dict) – A dictionary storing computed metrics for each sample size. Keys are sample sizes, and values are dictionaries with: - “Values_Mean”, “Values_Std”, “Values_Bias”, “Errors_Mean”, “Errors_Std”, “Pull_Mean”, “Pull_Std”, “Pull_Mean_Error”, “Pull_Std_Error”.

Raises:

FileNotFoundError – If the input directory does not exist or contains no valid files.

param_bootstrap_fit(initial_params, input_directory='Bootstrap/Samples', output_directory='Bootstrap/Param_Results')[source]

Perform parameter fitting using fit_params method on bootstrap samples and save the results.

Parameters:
  • initial_params (list) – Initial guesses for the parameters [mu, sigma, beta, m, lamb, mu_b, sigma_b, f].

  • input_directory (str, optional) – Path to the directory containing bootstrap sample files (default: “Bootstrap/Samples”). Each file should follow the naming convention “Samples_No_<num_samples>_BaseSize_<sample_size>.npy”.

  • output_directory (str, optional) – Path to the directory where the fitting results will be saved (default: “Bootstrap/Param_Results”). Each result file will be named “ParamResults_No_<num_samples>_BaseSize_<sample_size>.npy”.

Returns:

  • For each sample file

    • A .npy file containing an array of shape (2, num_samples, num_params) –

    • First dimension (values): Fitted parameter values for each sample.

    • Second dimension (errors): Corresponding errors for each parameter.

  • - Prints the number of non-converging samples and the output file path.

Raises:

ValueError – If the fit_params method encounters invalid inputs or fitting fails for unexpected reasons.

param_bootstrap_sWeights_analysis(input_directory='Bootstrap/sWeights/Results', output_directory='Bootstrap/sWeights/Plots')[source]

Perform a parametric bootstrap analysis using sWeights and generate plots.

This method performs the following steps: 1. Loads the bootstrap results from the specified input directory. 2. Calculates the mean, standard deviation, bias, and pull for each parameter. 3. Generates histograms for the values, errors, and pulls of each parameter. 4. Creates summary plots for bias and error against sample size. 5. Generates pull distribution plots for each parameter and sample size.

Parameters:
  • input_directory (str, optional) – The directory containing the bootstrap results (.npy files). Default is “Bootstrap/sWeights/Results”.

  • output_directory (str, optional) – The directory where the plots will be saved. Default is “Bootstrap/sWeights/Plots”.

Returns:

  • This method generates the following plots

  • 1. Histograms for each parameter (value, error, and pull) across bootstrap samples.

  • 2. Bias and Error trends vs. sample size for each parameter.

  • 4. Pull distributions for each parameter across all samples.

  • results (dict) – A dictionary storing computed metrics for each sample size. Keys are sample sizes, and values are dictionaries with: - “Values_Mean”, “Values_Std”, “Values_Bias”, “Errors_Mean”, “Errors_Std”, “Pull_Mean”, “Pull_Std”, “Pull_Mean_Error”, “Pull_Std_Error”.

Raises:

FileNotFoundError – If the input directory does not exist or contains no valid files.

param_bootstrap_sWeights_fit(initial_params, norm_check=True, input_directory='Bootstrap/Samples', output_directory='Bootstrap/sWeights/Results')[source]

Perform parameter fitting using sWeights for signal and background PDFs for all bootstrap samples.

This function fits the parameters of signal and background probability density functions (PDFs) using extended unbinned maximum likelihood for the X dimension. It then calculates the signal and background weights using the sWeight method and fits the signal PDF in the Y dimension using binned maximum likelihood.

Parameters:
  • initial_params (list) – Initial guesses for the parameters. The list must contain: [mu, sigma, beta, m, f, lamb, N].

  • samples (numpy.ndarray, optional) – Array of samples for fitting. If None, it uses samples generated or stored in the class. Defaults to None.

  • print_results (bool, optional) – If True, prints detailed fitting results and plots. Defaults to False.

  • norm_check (bool, optional) – If True, enables normalization checks in the sWeight method. Defaults to True.

Returns:

Two dictionaries containing the fitted parameter values and their associated errors:
  • mi_total_values (dict): Fitted parameter values.

  • mi_total_errors (dict): Fitted parameter errors.

Return type:

tuple

Raises:
  • ValueError – If no samples are provided or available for fitting.

  • RuntimeError – If minimization for X or Y dimension does not converge.

param_bootstrap_samples(num_samples, sample_sizes, output_directory='Bootstrap/Samples', poisson=False)[source]

Generate bootstrap samples with different sizes using the accept-reject method and save them as single files. Starts fresh by deleting the existing directory.

Parameters:
  • num_samples (int) – The number of bootstrap samples to generate for each sample size.

  • sample_sizes (list of int) – The base sizes of each bootstrap sample. Each size will have a separate output file.

  • output_directory (str, optional) – Directory to save the generated samples as .npy files (default: “Bootstrap/Samples”).

  • poisson (bool, optional) – Whether to apply Poisson variation to the sample sizes (default: False).

Notes

  • Each file will be named “Samples_No_<num_samples>_Size_<sample_size>.npy”.

  • Each file contains a list of arrays, where each array corresponds to a sample with its actual size.

pdf(X, Y)[source]

Calculate the joint Probability Density Function (PDF).

The Joint PDF is defined as: S_B(X, Y) = f * Signal_PDF(X, Y) + (1-f) * Background_PDF(X, Y)

Parameters:
  • X (float or np.ndarray) – The value(s) of X at which to evaluate the PDF.

  • Y (float or np.ndarray) – The value(s) of Y at which to evaluate the PDF.

Returns:

The normalized joint PDF value(s) 0 if X is outside [lower_bound_X, upper_bound_X] or Y is outside [lower_bound_Y, upper_bound_Y].

Return type:

float or np.ndarray

pdf_fitting(X, Y, mu, sigma, beta, m, lamb, mu_b, sigma_b, f)[source]

Calculate the joint Probability Density Function (PDF) for given parameters.

Parameters:
  • X (float or np.ndarray) – Values where the PDF is evaluated.

  • Y (float or np.ndarray) – Values where the PDF is evaluated.

  • mu (float) – Parameters to use for the calculation.

  • sigma (float) – Parameters to use for the calculation.

  • beta (float) – Parameters to use for the calculation.

  • m (float) – Parameters to use for the calculation.

  • lamb (float) – Parameters to use for the calculation.

  • mu_b (float) – Parameters to use for the calculation.

  • sigma_b (float) – Parameters to use for the calculation.

  • f (float) – Parameters to use for the calculation.

Returns:

PDF values for the input X, Y.

Return type:

np.ndarray

plot_dist()[source]

Visualize the joint Probability Density Function (PDF) in 2D and 3D.

LHS: A 2D contour plot with a color bar to represent the PDF values. RHS: A 3D surface plot to show the PDF as a function of X and Y.

The X and Y ranges for the plots are determined based on the bounds provided for the PDF, with slight overextensions to display regions of zero probability.

plot_marginal()[source]

Create a 2x2 grid of plots: - Top left: marginal_pdf_x with signal and background contributions and overall - Top right: marginal_cdf_x with signal and background contributions and overall - Bottom left: marginal_pdf_y with signal and background contributions and overall - Bottom right: marginal_cdf_y with signal and background contributions and overall

plot_profiled_likelihoods()[source]

Plot the profiled log-likelihoods for all parameters in a 3x3 grid with improved formatting.

Returns:

  • A 3x3 grid of plots, where each subplot corresponds to a parameter’s profiled log-likelihood (-ln L) as a function of its value.

  • The plots highlight the 1σ and 2σ confidence intervals for each parameter.

Raises:

ValueError – If the Minuit object (self.mi) is not available. Ensure that the fit_params method has been run successfully before calling this method.

plot_samples(samples=None)[source]

Plot the results of the sampled data in a 2x2 grid: - Top-left: 3D histogram of the joint distribution. - Top-right: Surface plot of the joint PDF. - Bottom-left: Histogram of sampled X values vs marginal PDF. - Bottom-right: Histogram of sampled Y values vs marginal PDF.

Parameters:

samples (np.ndarray, optional) – Array of shape (N, 2) containing the sampled data points (X, Y). If not provided, the method attempts to use self.samples. If neither is available, a ValueError is raised.

Raises:

ValueError – If no samples are provided and no samples are stored in self.samples.