ApproxPosterior Class

approx.py - ApproxPosterior

Approximate Bayesian Posterior estimation and Bayesian optimzation. approxposterior uses Dan Forman-Mackey’s Gaussian Process implementation, george, and the Metropolis-Hastings MCMC ensemble sampler, emcee, to infer the approximate posterior distributions given the GP model.

class approxposterior.approx.ApproxPosterior(theta, y, lnprior, lnlike, priorSample, bounds, gp=None, algorithm='bape')[source]

Class used to estimate approximate Bayesian posterior distributions or perform Bayesian optimization using a Gaussian process surrogate model

Initial parameters:

Parameters
  • theta (array-like) – Input features (n_samples x n_features). Defaults to None.

  • y (array-like) – Input result of forward model (n_samples,). Defaults to None.

  • lnprior (function) – Defines the log prior over the input features.

  • lnlike (function) – Defines the log likelihood function. In this function, it is assumed that the forward model is evaluated on the input theta and the output is used to evaluate the log likelihood.

  • priorSample (function) – Method to randomly sample points over region allowed by prior

  • bounds (tuple/iterable) – Hard bounds for parameters

  • gp (george.GP, optional) – Gaussian Process that learns the likelihood conditioned on forward model input-output pairs (theta, y). It’s recommended that users specify their own kernel, GP using george. If None is provided, then approxposterior initialized a GP with a single ExpSquaredKernel as these work well in practice.

  • algorithm (str, optional) –

    Point selection algorithm that specifies which utility (also referred to as acquisition) function to use. Defaults to bape. Options are bape (Bayesian Active Learning for Posterior Estimation, Kandasamy et al. (2015)), agp (Adapted Gaussian Process Approximation, Wang & Li (2017)), alternate (between AGP and BAPE), and jones (Jones et al. (1998) expected improvement). Case doesn’t matter. If alternate, runs agp on even numbers and bape on odd.

    For approximate Bayesian posterior estimation, bape or alternate are typically the best optimizations. For Bayesian optimization, jones (expected improvement) usually performs best.

bayesOpt(nmax, theta0=None, tol=0.001, kmax=3, seed=None, verbose=True, runName='apRun', cache=True, gpMethod='powell', gpOptions=None, gpP0=None, optGPEveryN=1, nGPRestarts=1, nMinObjRestarts=5, initGPOpt=True, minObjMethod='nelder-mead', gpHyperPrior=<function defaultHyperPrior>, minObjOptions=None, findMAP=True, args=None, **kwargs)[source]

Perform Bayesian optimization given a GP surrogate model to estimate

thetaBest = argmax(fn(theta))

given a GP trained on (theta, y). In this case, fn is the function specified by self._lnlike + self._lnprior. Note that this function maximizes the objective, so if performing a minimization, define the objective as the negative of your function. See Brochu et al. (2009) or Frazier (2018) for good reviews of Bayesian optimization.

This function terminates once nmax points have been selected or when the function value changes by less than tol over consecutive iterations, whichever one happens first.

Note 1: lnlike does not have to be a log likelihood, but rather can be any continous function one wishes to optimize. The function lnprior is used to place priors on parameters of the function, theta. The typical use of lnprior is to ensure the solution remains within a hypercube or simplex, i.e., bounding the possible values of theta.

Note 2: For this function, it is recommended to keep optGPEveryN = 1 to ensure the GP properly learns the underlying function.

Note 3: Bayesian optimization and MAP estimation typically work better when fitAmp = True, that is the GP kernel has an amplitude term

Parameters
  • nmax (int) – Maximum number of new design points to find. These are the points that are selected by maximizing the utility function, e.g. the expected improvement, and sequentially added to the GP training set.

  • theta0 (iterable) – Initial guess. Defaults to a sample from the prior function.

  • tol (float, optional) – Convergence tolerance. This function will terminate if the function value at the estimated extremum changes by less than tol over kmax consecutive iterations. Defaults to 1.0e-3.

  • kmax (int, optional) – Number of iterations required for the difference in estimated extremum functions values < tol required for convergence. Defaults to 3.

  • seed (int, optional) – RNG seed. Defaults to None.

  • verbose (bool, optional) – Output all the diagnostics? Defaults to True.

  • runName (str, optional) – Filename to prepend to cache files where model input-output pairs and the current GP hyperparameter values are saved. Defaults to apRun.

  • cache (bool, optional) – Whether or not to cache forward model input-output pairs, and GP kernel parameters. Defaults to True since they’re expensive to evaluate. In practice, users should cache forward model inputs, outputs, ancillary parameters, etc in each likelihood function evaluation, but saving theta and y here doesn’t hurt.

  • gpMethod (str, optional) – scipy.optimize.minimize method used when optimized GP hyperparameters. Defaults to powell (it usually works)

  • gpOptions (dict, optional) – kwargs for the scipy.optimize.minimize function used to optimize GP hyperparameters. Defaults to None.

  • gpP0 (array, optional) – Initial guess for kernel hyperparameters. If None, defaults to np.random.randn for each parameter.

  • optGPEveryN (int, optional) – How often to optimize the GP hyperparameters. Defaults to re-optimizing everytime a new design point is found, e.g. every time a new (theta, y) pair is added to the training set.

  • nGPRestarts (int, optional) – Number of times to restart GP hyperparameter optimization. Defaults to 1. Increase this number if the GP is not well-optimized.

  • nMinObjRestarts (int, optional) – Number of times to restart minimizing -utility function to select next point to improve GP performance. Defaults to 5. Increase this number of the point selection is not working well.

  • initGPOpt (bool, optional) – Whether or not to optimize GP hyperparameters before 0th iteration. Defaults to True (aka assume user didn’t optimize GP hyperparameters)

  • gpHyperPrior (str/callable, optional) – Prior function for GP hyperparameters. Defaults to the defaultHyperPrior fn. This function asserts that the mean must be negative and that each log hyperparameter is within the range [-20,20].

  • minObjMethod (str, optional) – scipy.optimize.minimize method used when optimizing utility functions for point selection. Defaults to nelder-mead.

  • minObjOptions (dict, optional) – kwargs for the scipy.optimize.minimize function used when optimizing utility functions for point selection. Defaults to None, but if method == “nelder-mead”, options = {“adaptive” : True}

  • args (iterable, optional) – Arguments for user-specified loglikelihood function that calls the forward model. Defaults to None.

  • kwargs (dict, optional) – Keyword arguments for user-specified loglikelihood function that calls the forward model.

Returns

soln – Dictionary that contains the following keys: “thetaBest” : Best fit solution, “valBest” : function value at best fit solution, thetas : solution vector, vals : function values along solution, “nev” : number of forward model evaluations, aka number of iterations

Return type

dict

findMAP(theta0=None, method='nelder-mead', options=None, nRestarts=15)[source]

Find the maximum a posteriori (MAP) estimate of the function learned by the GP. To find the MAP, this function minimizes -mean predicted by the GP, aka finds what the GP believes is the maximum of whatever function is definded by self._lnlike + self._lnprior.

Note: MAP estimation typically work better when fitAmp = True, that is the GP kernel fits for an amplitude term.

Parameters
  • theta0 (iterable) – Initial guess. Defaults to a sample from the prior function.

  • method (str, optional) – scipy.optimize.minimize method. Defaults to powell.

  • options (dict, optional) – kwargs for the scipy.optimize.minimize function. Defaults to None.

  • nRestarts (int, optional) – Number of times to restart the optimization. Defaults to 15.

Returns

  • MAP (iterable) – maximum a posteriori estimate

  • MAPVal (float) – Mean of GP predictive function at MAP solution

findNextPoint(theta0=None, computeLnLike=True, seed=None, cache=True, gpOptions=None, gpP0=None, verbose=True, nGPRestarts=1, nMinObjRestarts=5, gpMethod='powell', minObjMethod='nelder-mead', minObjOptions=None, runName='apRun', numNewPoints=1, optGPEveryN=1, gpHyperPrior=<function defaultHyperPrior>, args=None, **kwargs)[source]

Find numNewPoints new point(s), thetaT, by maximizing utility function. Note that we call a minimizer because minimizing negative of utility function is the same as maximizing it.

This function can be used in 2 ways:
  1. Finding the new point(s), thetaT, that would maximally improve the GP’s predictive ability. This point could be used to select where to run a new forward model, for example.

  2. Find a new thetaT and evaluate the forward model at this location to iteratively improve the GP’s predictive performance, a core function of the BAPE and AGP algorithms.

If computeLnLike is True, all results of this function are appended to the corresponding object elements, e.g. thetaT appended to self.theta. thetaT is returned, as well as yT if computeLnLike is True. Note that returning yT requires running the forward model and updating the GP.

If numNewPoints > 1, iteratively find numNewPoints. After each new point is found, re-compute the GP covariance matrix. The GP hyperparameters are then optionally re-optimized at the specified cadence.

Parameters
  • theta0 (float/iterable, optional) – Initial guess for optimization. Defaults to None, which draws a sample from the prior function using sampleFn.

  • computeLnLike (bool, optional) – Whether or not to run the forward model and compute yT, the sum of the lnlikelihood and lnprior. Defaults to True. If True, also appends all new values to self.theta, self.y, in addition to returning the new values

  • seed (int, optional) – RNG seed. Defaults to None.

  • cache (bool, optional) – Whether or not to cache forward model input-output pairs. Defaults to True since the forward model is expensive to evaluate. In practice, users should cache forward model inputs, outputs, ancillary parameters, etc in each likelihood function evaluation, but saving theta and y here doesn’t hurt. Saves the results to apFModelCache.npz in the current working directory (name can change if user specifies runName).

  • optGPEveryN (int, optional) – How often to optimize the GP hyperparameters. Defaults to re-optimizing everytime a new design point is found, e.g. every time a new (theta, y) pair is added to the training set. Increase this parameter if approxposterior is running slowly. NB: GP hyperparameters are optimized only if computeLnLike == True

  • gpMethod (str, optional) – scipy.optimize.minimize method used when optimized GP hyperparameters. Defaults to None, which is powell, and it usually works.

  • gpOptions (dict, optional) – kwargs for the scipy.optimize.minimize function used to optimize GP hyperparameters. Defaults to None.

  • gpP0 (array, optional) – Initial guess for kernel hyperparameters. If None, defaults to np.random.randn for each parameter.

  • nGPRestarts (int, optional) – Number of times to restart GP hyperparameter optimization. Defaults to 1. Increase this number if the GP is not well-optimized.

  • nMinObjRestarts (int, optional) – Number of times to restart minimizing -utility function to select next point to improve GP performance. Defaults to 5. Increase this number of the point selection is not working well.

  • runName (str, optional) – Filename for hdf5 file where mcmc chains are saved. Defaults to apRun.

  • gpHyperPrior (str/callable, optional) – Prior function for GP hyperparameters. Defaults to the defaultHyperPrior fn. This function asserts that the mean must be negative and that each log hyperparameter is within the range [-20,20].

  • numNewPoints (int, optional) – Number of new points to find. Defaults to 1.

  • minObjMethod (str, optional) – scipy.optimize.minimize method used when optimizing utility functions for point selection. Defaults to nelder-mead.

  • minObjOptions (dict, optional) – kwargs for the scipy.optimize.minimize function used when optimizing utility functions for point selection. Defaults to None, but if method == “nelder-mead”, options = {“adaptive” : True}

  • verbose (bool, optional) – Output all the diagnostics? Defaults to True.

  • args (iterable, optional) – Arguments for user-specified loglikelihood function that calls the forward model. Defaults to None.

  • kwargs (dict, optional) – Keyword arguments for user-specified loglikelihood function that calls the forward model. Defaults to None.

Returns

  • thetaT (float or iterable) – New design point(s) selected by maximizing GP utility function.

  • yT (float or iterable, optional) – Value(s) of loglikelihood + logprior at thetaT. Only returned if computeLnLike == True

optGP(seed=None, method='powell', options=None, p0=None, nGPRestarts=1, gpHyperPrior=<function defaultHyperPrior>)[source]

Optimize hyperparameters of approx object’s GP

Parameters
  • seed (int, optional) – numpy RNG seed. Defaults to None.

  • nGPRestarts (int, optional) – Number of times to restart GP hyperparameter optimization. Defaults to 1. Increase this number if the GP is not well-optimized.

  • method (str, optional) – scipy.optimize.minimize method. Defaults to powell.

  • options (dict, optional) – kwargs for the scipy.optimize.minimize function. Defaults to None.

  • p0 (array, optional) – Initial guess for kernel hyperparameters. If None, defaults to np.random.randn for each parameter

  • gpHyperPrior (str/callable, optional) – Prior function for GP hyperparameters. Defaults to the defaultHyperPrior fn. This function asserts that the mean must be negative and that each log hyperparameter is within the range [-20,20].

Returns

optimizedGP

Return type

george.GP

run(m=10, nmax=2, seed=None, timing=False, verbose=True, mcmcKwargs=None, samplerKwargs=None, estBurnin=False, thinChains=False, runName='apRun', cache=True, gpMethod='powell', gpOptions=None, gpP0=None, optGPEveryN=1, nGPRestarts=1, nMinObjRestarts=5, onlyLastMCMC=False, initGPOpt=True, kmax=3, gpHyperPrior=<function defaultHyperPrior>, eps=1.0, convergenceCheck=False, minObjMethod='nelder-mead', minObjOptions=None, args=None, **kwargs)[source]

Core method to estimate the approximate posterior distribution via Gaussian Process regression

Parameters
  • m (int, optional) – Number of new design points to find each iteration. These are the points that are selected by maximizing the utility function, e.g. bape or agp, and sequentially added to the GP training set. Defaults to 10.

  • nmax (int, optional) – Maximum number of iterations. Defaults to 2. Algorithm will terminate if either nmax iterations is met or the convergence criterion is met if convergenceCheck is True.

  • seed (int, optional) – RNG seed. Defaults to None.

  • timing (bool, optional) – Whether or not to time the code for profiling/speed tests. Defaults to False.

  • verbose (bool, optional) – Output all the diagnostics? Defaults to True.

  • samplerKwargs (dict, optional) –

    Parameters for emcee.EnsembleSampler object If None, defaults to the following:

    nwalkersint, optional

    Number of emcee walkers. Defaults to 10 * dim

  • mcmcKwargs (dict, optional) –

    Parameters for emcee.EnsembleSampler.sample/.run_mcmc methods. If None, defaults to the following required parameters:

    iterationsint, optional

    Number of MCMC steps. Defaults to 10,000

    initial_statearray/emcee.State, optional

    Initial guess for MCMC walkers. Defaults to None and creates guess from priors.

  • estBurnin (bool, optional) – Estimate burn-in time using integrated autocorrelation time heuristic. Defaults to True. In general, we recommend users inspect the chains (note that approxposterior always at least saves the last sampler object, or all chains if cache = True) and calculate the burnin after the fact to ensure convergence.

  • thinChains (bool, optional) – Whether or not to thin chains before GMM fitting. Useful if running long chains. Defaults to True. If true, estimates a thin cadence via int(0.5*np.min(tau)) where tau is the intergrated autocorrelation time.

  • runName (str, optional) – Filename for hdf5 file where mcmc chains are saved. Defaults to apRun and will be saved as apRunii.h5 for ii in range(nmax).

  • cache (bool, optional) – Whether or not to cache MCMC chains, forward model input-output pairs, and GP kernel parameters. Defaults to True since they’re expensive to evaluate. In practice, users should cache forward model inputs, outputs, ancillary parameters, etc in each likelihood function evaluation, but saving theta and y here doesn’t hurt. Saves the forward model, results to runNameAPFModelCache.npz, the chains as runNameii.h5 for each, iteration ii, and the GP parameters in runNameAPGP.npz in the current working directory, etc.

  • gpMethod (str, optional) – scipy.optimize.minimize method used when optimized GP hyperparameters. Defaults to powell (it usually works)

  • gpOptions (dict, optional) – kwargs for the scipy.optimize.minimize function used to optimize GP hyperparameters. Defaults to None.

  • gpP0 (array, optional) – Initial guess for kernel hyperparameters. If None, defaults to np.random.randn for each parameter.

  • optGPEveryN (int, optional) – How often to optimize the GP hyperparameters. Defaults to re-optimizing everytime a new design point is found, e.g. every time a new (theta, y) pair is added to the training set. Increase this parameter if approxposterior is running slowly.

  • nGPRestarts (int, optional) – Number of times to restart GP hyperparameter optimization. Defaults to 1. Increase this number if the GP is not well-optimized.

  • nMinObjRestarts (int, optional) – Number of times to restart minimizing -utility function to select next point to improve GP performance. Defaults to 5. Increase this number of the point selection is not working well.

  • onlyLastMCMC (bool, optional) – Whether or not to only run the MCMC last iteration. Defaults to False.

  • initGPOpt (bool, optional) – Whether or not to optimize GP hyperparameters before 0th iteration. Defaults to True (aka assume user didn’t optimize GP hyperparameters)

  • gpHyperPrior (str/callable, optional) – Prior function for GP hyperparameters. Defaults to the defaultHyperPrior fn. This function asserts that the mean must be negative and that each log hyperparameter is within the range [-20,20].

  • eps (float, optional) – Change in the mean of the approximate marginal posterior distributions, relative to the previous marginal posterior distribution’s standard deviation (aka relative z score), for kmax iterations required for convergence. Defaults to 1.0.

  • kmax (int, optional) – Number of consecutive iterations for convergence check to pass before successfully ending algorithm. Defaults to 3.

  • convergenceCheck (bool, optional) – Whether or not to terminate the execution if the change in the mean of the approximate marginal posterior distributions, relative to the previous marginal posterior distribution’s standard deviation (aka relative z score) varies by less than eps for kmax consecutive iterations. Defaults to False. Note: if using this, make sure you’re confortable with the burnin and thinning applied to the MCMC chains. See estBurnin and thinChains parameters.

  • minObjMethod (str, optional) – scipy.optimize.minimize method used when optimizing utility functions for point selection. Defaults to nelder-mead.

  • minObjOptions (dict, optional) – kwargs for the scipy.optimize.minimize function used when optimizing utility functions for point selection. Defaults to None, but if method == “nelder-mead”, options = {“adaptive” : True}

  • args (iterable, optional) – Arguments for user-specified loglikelihood function that calls the forward model. Defaults to None.

  • kwargs (dict, optional) – Keyword arguments for user-specified loglikelihood function that calls the forward model.

runMCMC(samplerKwargs=None, mcmcKwargs=None, runName='apRun', cache=True, estBurnin=True, thinChains=True, verbose=False, args=None, **kwargs)[source]

Given forward model input-output pairs, theta and y, and a trained GP, run an MCMC using the GP to evaluate the logprobability instead of the true, computationally-expensive forward model.

Parameters
  • samplerKwargs (dict, optional) –

    Parameters for emcee.EnsembleSampler object If None, defaults to the following:

    nwalkersint, optional

    Number of emcee walkers. Defaults to 10 * dim

  • mcmcKwargs (dict, optional) –

    Parameters for emcee.EnsembleSampler.sample/.run_mcmc methods. If None, defaults to the following required parameters:

    iterationsint, optional

    Number of MCMC steps. Defaults to 10,000

    initial_statearray/emcee.State, optional

    Initial guess for MCMC walkers. Defaults to None and creates guess from priors.

  • runName (str, optional) – Filename prefix for all cached files, e.g. for hdf5 file where mcmc chains are saved. Defaults to runNameii.h5. where ii is the current iteration number.

  • cache (bool, optional) – Whether or not to cache MCMC chains, forward model input-output pairs, and GP kernel parameters. Defaults to True since they’re expensive to evaluate. In practice, users should cache forward model inputs, outputs, ancillary parameters, etc in each likelihood function evaluation, but saving theta and y here doesn’t hurt. Saves the forward model, results to runNameAPFModelCache.npz, the chains as runNameii.h5 for each, iteration ii, and the GP parameters in runNameAPGP.npz in the current working directory, etc.

  • estBurnin (bool, optional) – Estimate burn-in time using integrated autocorrelation time heuristic. Defaults to True. In general, we recommend users inspect the chains and calculate the burnin after the fact to ensure convergence, but this function works pretty well.

  • thinChains (bool, optional) – Whether or not to thin chains. Useful if running long chains. Defaults to True. If true, estimates a thin cadence via int(0.5*np.min(tau)) where tau is the intergrated autocorrelation time.

  • verbose (bool, optional) – Output all the diagnostics? Defaults to False.

  • args (iterable, optional) – Arguments for user-specified loglikelihood function that calls the forward model. Defaults to None.

  • kwargs (dict, optional) – Keyword arguments for user-specified loglikelihood function that calls the forward model.

Returns

  • sampler (emcee.EnsembleSampler) – emcee sampler object

  • iburn (int) – burn-in index estimate. If estBurnin == False, returns 0.

  • ithin (int) – thin cadence estimate. If thinChains == False, returns 1.