This Python package enables estimation of cosmological quantities using photometric redshift probability distributions.

Tutorials

See the following IPython Notebook for an example of using chippr:

Simulation

chippr enables simulation of surveys of photo-z interim posteriors.

The discrete Class

class discrete.discrete(bin_ends, weights)[source]
evaluate(xs)[source]

Function to evaluate the discrete probability distribution at many points

Parameters:xs (ndarray, float) – values at which to evaluate discrete probability distribution
Returns:ps – values of discrete probability distribution at xs
Return type:ndarray, float
evaluate_one(x)[source]

Function to evaluate the discrete probability distribution at one point

Parameters:x (float) – value at which to evaluate discrete probability distribution
Returns:p – value of discrete probability distribution at x
Return type:float
pdf(xs)[source]
sample(n_samps)[source]

Function to take samples from discrete probability distribution

Parameters:n_samps (int) – number of samples to take
Returns:xs – array of points sampled from the discrete probability distribution
Return type:ndarray, float
sample_one()[source]

Function to sample a single value from discrete probability distribution

Returns:x – a single point sampled from the discrete probability distribution
Return type:float

The gauss Class

class gauss.gauss(mean, var, bounds=None)[source]
evaluate(xs)[source]

Function to evaluate univariate Gaussian probability distribution at multiple points

Parameters:xs (numpy.ndarray, float) – input values at which to evaluate probability
Returns:ps – output probabilities
Return type:ndarray, float
evaluate_one(x)[source]

Function to evaluate Gaussian probability distribution once

Parameters:x (float) – value at which to evaluate Gaussian probability distribution
Returns:p – probability associated with x
Return type:float
invert_var()[source]

Function to invert variance

norm_var()[source]

Function to create standard deviation from variance

pdf(xs)[source]
sample(n_samps)[source]

Function to sample univariate Gaussian probability distribution

Parameters:n_samps (positive int) – number of samples to take
Returns:xs – array of n_samps samples from Gaussian probability distribution
Return type:ndarray, float
sample_one()[source]

Function to take one sample from univariate Gaussian probability distribution

Returns:x – single sample from Gaussian probability distribution
Return type:float

The gmix Class

class gmix.gmix(amps, funcs, limits=(0.001, 3.501))[source]
evaluate(xs)[source]

Function to evaluate the Gaussian mixture probability distribution at many points

Parameters:xs (ndarray, float) – values at which to evaluate Gaussian mixture probability distribution
Returns:ps – values of Gaussian mixture probability distribution at xs
Return type:ndarray, float
evaluate_one(x)[source]

Function to evaluate Gaussian mixture once

Parameters:x (float) – value at which to evaluate Gaussian mixture
Returns:p – probability associated with x
Return type:float
pdf(xs)[source]
sample(n_samps)[source]

Function to take samples from Gaussian mixture probability distribution

Parameters:n_samps (int) – number of samples to take
Returns:xs – array of points sampled from the Gaussian mixture probability distribution
Return type:ndarray, float
sample_one()[source]

Function to sample a single value from Gaussian mixture probability distribution

Returns:x – a single point sampled from the Gaussian mixture probability distribution
Return type:float

The mvn Class

class mvn.mvn(mean, var)[source]
evaluate(zs)[source]

Function to evaluate multivariate Gaussian probability distribution at multiple points

Parameters:zs (ndarray, float) – input vectors at which to evaluate probability
Returns:ps – output probabilities
Return type:ndarray, float
evaluate_one(z)[source]

Function to evaluate multivariate Gaussian probability distribution once

Parameters:z (numpy.ndarray, float) – value at which to evaluate multivariate Gaussian probability distribution
Returns:p – probability associated with z
Return type:float
invert_var()[source]

Function to invert covariance matrix

Returns:inv – inverse variance
Return type:numpy.ndarray, float
norm_var()[source]

Function to normalize covariance matrix

Returns:det – determinant of variance
Return type:float
pdf(points)[source]
sample(n_samps)[source]

Function to sample from multivariate Gaussian probability distribution

Parameters:n_samps (positive int) – number of samples to take
Returns:zs – array of n_samps samples from multivariate Gaussian probability distribution
Return type:ndarray, float
sample_one()[source]

Function to take one sample from multivariate Gaussian probability distribution

Returns:z – single sample from multivariate Gaussian probability distribution
Return type:numpy.ndarray, float

The catalog Class

class catalog.catalog(params={}, vb=True, loc='.', prepend='')[source]
coarsify(fine)[source]

Function to bin function evaluated on fine grid

Parameters:fine (numpy.ndarray, float) – matrix of probability values of function on fine grid for N galaxies
Returns:coarse – vector of binned values of function
Return type:numpy.ndarray, float
create(truth, int_pr, N=4, vb=True)[source]

Function creating a catalog of interim posterior probability distributions, will split this up into helper functions

Parameters:
  • truth (chippr.gmix object or chippr.gauss object or chippr.discrete) –
  • object – true redshift distribution object
  • int_pr (chippr.gmix object or chippr.gauss object or chippr.discrete) –
  • object – interim prior distribution object
  • vb (boolean, optional) – True to print progress messages to stdout, False to suppress
Returns:

self.cat – dictionary comprising catalog information

Return type:

dict

evaluate_lfs(pspace, vb=True)[source]

Evaluates likelihoods based on observed sample values

Parameters:
  • pspace (chippr.gauss or chippr.gmix or chippr.gamma or chippr.multi object) – the probability function to evaluate
  • vb (boolean) – print progress to stdout?
Returns:

lfs – array of likelihood values for each item as a function of fine binning

Return type:

numpy.ndarray, float

make_probs(vb=True)[source]

Makes the continuous 2D probability distribution over z_spec, z_phot

Parameters:vb (boolean) – print progress to stdout?

Notes

TO DO: only one outlier population at a time for now, will enable more TO DO: also doesn’t yet include perpendicular features from passing between filter curves, should add that

proc_bins(vb=True)[source]

Function to process binning

Parameters:vb (boolean, optional) – True to print progress messages to stdout, False to suppress
read(loc='data', style='.txt')[source]

Function to read in catalog file

Parameters:loc (string, optional) – location of catalog file
sample(N, vb=False)[source]

Samples (z_spec, z_phot) pairs

Parameters:
  • N (int) – number of samples to take
  • vb (boolean) – print progress to stdout?
Returns:

samps – (z_spec, z_phot) pairs

Return type:

numpy.ndarray, float

write(loc='data', style='.txt')[source]

Function to write newly-created catalog to file

Parameters:
  • loc (string, optional) – file name into which to save catalog
  • style (string, optional) – file format in which to save the catalog

Inference

chippr currently enables estimation of the redshift density function.

The log_z_dens Class

class log_z_dens.log_z_dens(catalog, hyperprior, truth=None, loc='.', prepend='', vb=True)[source]
calculate_mexp(vb=True)[source]

Calculates the marginalized expected value estimator of the redshift density function

Parameters:vb (boolean, optional) – True to print progress messages to stdout, False to suppress
Returns:log_exp_nz – array of logged redshift density function bin values
Return type:ndarray, float
calculate_mmap(vb=True)[source]

Calculates the marginalized maximum a posteriori estimator of the redshift density function

Parameters:vb (boolean, optional) – True to print progress messages to stdout, False to suppress
Returns:log_map_nz – array of logged redshift density function bin values
Return type:ndarray, float
calculate_mmle(start, vb=True, no_data=0, no_prior=0)[source]

Calculates the marginalized maximum likelihood estimator of the redshift density function

Parameters:
  • start (numpy.ndarray, float) – array of log redshift density function bin values at which to begin optimization
  • vb (boolean, optional) – True to print progress messages to stdout, False to suppress
  • no_data (boolean, optional) – True to exclude data contribution to hyperposterior
  • no_prior (boolean, optional) – True to exclude prior contribution to hyperposterior
Returns:

log_mle_nz – array of logged redshift density function bin values maximizing hyperposterior

Return type:

numpy.ndarray, float

calculate_samples(ivals, n_accepted=3, n_burned=2, vb=True, n_procs=1, no_data=0, no_prior=0, gr_threshold=1.2)[source]

Calculates samples estimating the redshift density function

Parameters:
  • ivals (numpy.ndarray, float) – initial values of log n(z) for each walker
  • n_accepted (int, optional) – log10 number of samples to accept per walker
  • n_burned (int, optional) – log10 number of samples between tests of burn-in condition
  • n_procs (int, optional) – number of processors to use, defaults to single-thread
  • vb (boolean, optional) – True to print progress messages to stdout, False to suppress
  • no_data (boolean, optional) – True to exclude data contribution to hyperposterior
  • no_prior (boolean, optional) – True to exclude prior contribution to hyperposterior
Returns:

log_samples_nz – array of sampled log redshift density function bin values

Return type:

ndarray, float

calculate_stacked(vb=True)[source]

Calculates the stacked estimator of the redshift density function

Parameters:vb (boolean, optional) – True to print progress messages to stdout, False to suppress
Returns:log_stk_nz – array of logged redshift density function bin values
Return type:ndarray, float
compare(vb=True)[source]

Calculates all available goodness of fit measures

Parameters:vb (boolean, optional) – True to print progress messages to stdout, False to suppress
Returns:out_info – dictionary of all available statistics
Return type:dict
evaluate_log_hyper_likelihood(log_nz)[source]

Function to evaluate log hyperlikelihood

Parameters:log_nz (numpy.ndarray, float) – vector of logged redshift density bin values at which to evaluate the hyperlikelihood
Returns:log_hyper_likelihood – log likelihood probability associated with parameters in log_nz
Return type:float
evaluate_log_hyper_posterior(log_nz)[source]

Function to evaluate log hyperposterior

Parameters:log_nz (numpy.ndarray, float) – vector of logged redshift density bin values at which to evaluate the full posterior
Returns:log_hyper_posterior – log hyperposterior probability associated with parameters in log_nz
Return type:float
evaluate_log_hyper_prior(log_nz)[source]

Function to evaluate log hyperprior

Parameters:log_nz (numpy.ndarray, float) – vector of logged redshift density bin values at which to evaluate the hyperprior
Returns:log_hyper_prior – log prior probability associated with parameters in log_nz
Return type:float
optimize(start, no_data, no_prior, vb=True)[source]

Maximizes the hyperposterior of the redshift density

Parameters:
  • start (numpy.ndarray, float) – array of log redshift density function bin values at which to begin optimization
  • no_data (boolean) – True to exclude data contribution to hyperposterior
  • no_prior (boolean) – True to exclude prior contribution to hyperposterior
  • vb (boolean, optional) – True to print progress messages to stdout, False to suppress
Returns:

res.x – array of logged redshift density function bin values maximizing hyperposterior

Return type:

numpy.ndarray, float

plot_estimators(log=True, mini=True)[source]

Plots all available estimators of the redshift density function.

read(read_loc, style='pickle', vb=True)[source]

Function to load inferred quantities from files.

Parameters:
  • read_loc (string) – filepath where inferred redshift density function is stored
  • style (string, optional) – keyword for file format, currently only ‘pickle’ supported
  • vb (boolean, optional) – True to print progress messages to stdout, False to suppress
Returns:

self.info – returns the log_z_dens information dictionary object

Return type:

dict

sample(ivals, n_samps, vb=True)[source]

Samples the redshift density hyperposterior

Parameters:
  • ivals (numpy.ndarray, float) – initial values of the walkers
  • n_samps (int) – number of samples to accept before stopping
  • vb (boolean, optional) – True to print progress messages to stdout, False to suppress
Returns:

mcmc_outputs – dictionary containing array of sampled redshift density function bin values as well as posterior probabilities, acceptance fractions, and autocorrelation times

Return type:

dict

write(write_loc, style='pickle', vb=True)[source]

Function to write results of inference to files.

Parameters:
  • write_loc (string) – filepath where results of inference should be saved.
  • style (string, optional) – keyword for file format, currently only ‘pickle’ supported
  • vb (boolean, optional) – True to print progress messages to stdout, False to suppress

Utilities

chippr includes a number of modules containing helper functions.

Default Settings

defaults.check_basic_setup(params)[source]

Sets parameter values pertaining to basic constants of simulation

Parameters:params (dict) – dictionary containing key/value pairs for simulation
Returns:params – dictionary containing key/value pairs for simulation
Return type:dict
defaults.check_bias_params(params)[source]

Sets parameter values pertaining to presence of a systematic bias

Parameters:params (dict) – dictionary containing key/value pairs for simulation
Returns:params – dictionary containing key/value pairs for simulation
Return type:dict
defaults.check_catastrophic_outliers(params)[source]

Sets parameter values pertaining to presence of a catastrophic outlier population

Parameters:params (dict) – dictionary containing key/value pairs for simulation
Returns:params – dictionary containing key/value pairs for simulation
Return type:dict

Notes

defaults.check_inf_params(params={})[source]

Checks inference parameter dictionary for various keywords and sets to default values if not present

Parameters:params (dict, optional) – dictionary containing initial key/value pairs for inference
Returns:params – dictionary containing final key/value pairs for inference
Return type:dict
defaults.check_sampler_params(params)[source]

Sets parameter values pertaining to basic constants of inference

Parameters:params (dict) – dictionary containing key/value pairs for inference
Returns:params – dictionary containing key/value pairs for inference
Return type:dict
defaults.check_sim_params(params={})[source]

Checks simulation parameter dictionary for various keywords and sets to default values if not present

Parameters:params (dict, optional) – dictionary containing initial key/value pairs for simulation of catalog
Returns:params – dictionary containing final key/value pairs for simulation of catalog
Return type:dict
defaults.check_variable_sigmas(params)[source]

Sets parameter values pertaining to widths of Gaussian PDF components

Parameters:params (dict) – dictionary containing key/value pairs for simulation
Returns:params – dictionary containing key/value pairs for simulation
Return type:dict

Notes

rms_scatter –> variable_sigmas

General Utilities

utils.ingest(in_info)[source]

Function reading in parameter file to define functions necessary for generation of posterior probability distributions

Parameters:in_info (string or dict) – string containing path to plaintext input file or dict containing likelihood input parameters
Returns:in_dict – dict containing keys and values necessary for posterior probability distributions
Return type:dict
utils.safe_log(arr, threshold=4.450147717014403e-308)[source]

Takes the natural logarithm of an array that might contain zeros.

Parameters:
  • arr (ndarray, float) – array of values to be logged
  • threshold (float, optional) – small, positive value to replace zeros and negative numbers
Returns:

logged – logged values, with small value replacing un-loggable values

Return type:

ndarray

Simulation Utilities

sim_utils.choice(weights)[source]

Function sampling discrete distribution

Parameters:weights (numpy.ndarray) – relative probabilities for each category
Returns:index – chosen category
Return type:int

Statistics

stat_utils.acors(xtimeswalkersbins, mode='bins')[source]

Calculates autocorrelation time for MCMC chains

Parameters:
  • xtimeswalkersbins (numpy.ndarray, float) – emcee chain values of dimensions (n_iterations, n_walkers, n_parameters)
  • mode (string, optional) – ‘bins’ for one autocorrelation time per parameter, ‘walkers’ for one autocorrelation time per walker
Returns:

taus – autocorrelation times by bin or by walker depending on mode

Return type:

numpy.ndarray, float

stat_utils.calculate_kld(pe, qe, vb=True)[source]

Calculates the Kullback-Leibler Divergence between two PDFs.

Parameters:
  • pe (numpy.ndarray, float) – probability distribution evaluated on a grid whose distance from q will be calculated.
  • qe (numpy.ndarray, float) – probability distribution evaluated on a grid whose distance to p will be calculated.
  • vb (boolean) – report on progress to stdout?
Returns:

Dpq – the value of the Kullback-Leibler Divergence from q to p

Return type:

float

stat_utils.calculate_rms(pe, qe, vb=True)[source]

Calculates the Root Mean Square Error between two PDFs.

Parameters:
  • pe (numpy.ndarray, float) – probability distribution evaluated on a grid whose distance _from_ q will be calculated.
  • qe (numpy.ndarray, float) – probability distribution evaluated on a grid whose distance _to_ p will be calculated.
  • vb (boolean) – report on progress to stdout?
Returns:

rms – the value of the RMS error between q and p

Return type:

float

stat_utils.cf(xtimes)[source]

Helper function to calculate autocorrelation time for chain of MCMC samples

Parameters:xtimes (numpy.ndarray, float) – single parameter values for a single walker over all iterations
Returns:cf – autocorrelation time over all time lags for one parameter of one walker
Return type:numpy.ndarray, float
stat_utils.cfs(x, mode)[source]

Helper function for calculating autocorrelation time for MCMC chains

Parameters:
  • x (numpy.ndarray, float) – input parameter values of length number of iterations by number of walkers if mode=’walkers’ or dimension of parameters if mode=’bins’
  • mode (string) – ‘bins’ for one autocorrelation time per parameter, ‘walkers’ for one autocorrelation time per walker
Returns:

cfs – autocorrelation times for all walkers if mode=’walkers’ or all parameters if mode=’bins’

Return type:

numpy.ndarray, float

stat_utils.cft(xtimes, lag)[source]

Helper function to calculate autocorrelation time for chain of MCMC samples

Parameters:
  • xtimes (numpy.ndarray, float) – single parameter values for a single walker over all iterations
  • lag (int) – maximum lag time in number of iterations
Returns:

ans – autocorrelation time for one time lag for one parameter of one walker

Return type:

numpy.ndarray, float

stat_utils.gr_test(sample, threshold=1.2)[source]

Performs the Gelman-Rubin test of convergence of an MCMC chain

Parameters:
  • sample (numpy.ndarray, float) – chain output
  • threshold (float, optional) – Gelman-Rubin test statistic criterion (usually around 1)
Returns:

test_result – True if burning in, False if post-burn in

Return type:

boolean

stat_utils.mean(population)[source]

Calculates the mean of a population

Parameters:population (np.array, float) – population over which to calculate the mean
Returns:mean – mean value over population
Return type:np.array, float
stat_utils.multi_parameter_gr_stat(sample)[source]

Calculates the Gelman-Rubin test statistic of convergence of an MCMC chain over multiple parameters

Parameters:sample (numpy.ndarray, float) – multi-parameter chain output
Returns:Rs – vector of the potential scale reduction factors
Return type:numpy.ndarray, float
stat_utils.norm_fit(population)[source]

Calculates the mean and standard deviation of a population

Parameters:population (np.array, float) – population over which to calculate the mean
Returns:norm_stats – mean and standard deviation over population
Return type:tuple, list, float
stat_utils.single_parameter_gr_stat(chain)[source]

Calculates the Gelman-Rubin test statistic of convergence of an MCMC chain over one parameter

Parameters:chain (numpy.ndarray, float) – single-parameter chain
Returns:R_hat – potential scale reduction factor
Return type:float

Plotting Utilities

plot_utils.plot_h(sub_plot, bin_ends, to_plot, s='--', c='k', a=1, w=1, d=[(0, (1, 0.0001))], l=None, r=False)[source]

Helper function to plot horizontal lines of a step function

Parameters:
  • sub_plot (matplotlib.pyplot subplot object) – subplot into which step function is drawn
  • bin_ends (list or ndarray) – list or array of endpoints of bins
  • to_plot (list or ndarray) – list or array of values within each bin
  • s (string, optional) – matplotlib.pyplot linestyle
  • c (string, optional) – matplotlib.pyplot color
  • a (int or float, [0., 1.], optional) – matplotlib.pyplot alpha (transparency)
  • w (int or float, optional) – matplotlib.pyplot linewidth
  • d (list of tuple, optional) – matplotlib.pyplot dash style, of form [(start_point, (points_on, points_off, …))]
  • l (string, optional) – label for function
  • r (boolean, optional) – True for rasterized, False for vectorized
plot_utils.plot_step(sub_plot, bin_ends, to_plot, s='--', c='k', a=1, w=1, d=[(0, (1, 0.0001))], l=None, r=False)[source]

Plots a step function

Parameters:
  • sub_plot (matplotlib.pyplot subplot object) – subplot into which step function is drawn
  • bin_ends (list or ndarray) – list or array of endpoints of bins
  • to_plot (list or ndarray) – list or array of values within each bin
  • s (string, optional) – matplotlib.pyplot linestyle
  • c (string, optional) – matplotlib.pyplot color
  • a (int or float, [0., 1.], optional) – matplotlib.pyplot alpha (transparency)
  • w (int or float, optional) – matplotlib.pyplot linewidth
  • d (list of tuple, optional) – matplotlib.pyplot dash style, of form [(start_point, (points_on, points_off, …))]
  • l (string, optional) – label for function
  • r (boolean, optional) – True for rasterized, False for vectorized

Notes

Make this not need a subplot

plot_utils.plot_v(sub_plot, bin_ends, to_plot, s='--', c='k', a=1, w=1, d=[(0, (1, 0.0001))], r=False)[source]

Helper function to plot vertical lines of a step function

Parameters:
  • sub_plot (matplotlib.pyplot subplot object) – subplot into which step function is drawn
  • bin_ends (list or ndarray) – list or array of endpoints of bins
  • to_plot (list or ndarray) – list or array of values within each bin
  • s (string, optional) – matplotlib.pyplot linestyle
  • c (string, optional) – matplotlib.pyplot color
  • a (int or float, [0., 1.], optional) – matplotlib.pyplot alpha (transparency)
  • w (int or float, optional) – matplotlib.pyplot linewidth
  • d (list of tuple, optional) – matplotlib.pyplot dash style, of form [(start_point, (points_on, points_off, …))]
  • r (boolean, optional) – True for rasterized, False for vectorized
plot_utils.set_up_plot()[source]

Sets up plots to look decent