Batch Optimization in Bayesian Optimization: An Accessible Introduction Using Constant Liar as a Practical Example

シェア

In many real research and development workflows, experiments are not run strictly one by one. Teams often have enough equipment, time slots, or parallel capacity to test several candidates at once. This practical constraint creates a different optimization problem: instead of asking which single experiment should be run next, we need to ask which set of experiments should be run together. This is the role of batch optimization in Bayesian optimization. This article explains the idea in plain language, outlines the main strategy families used in practice, and uses Constant Liar as a concrete example of how a batch can be built before earlier results are known. The discussion is grounded in published literature rather than informal intuition alone [1-7].

Zhihan Xueさんのプロフィール写真

Zhihan Xue

MI-6 Ltd.Machine Learning Researcher

Received a Ph.D. from Sophia University, with research focused on deep reinforcement learning. After graduation, joined MI-6 as a Machine Learning Researcher, working on machine learning and Bayesian optimization for materials and chemistry applications. Currently develops and validates optimization algorithms, benchmark frameworks, and supporting ML infrastructure for miHub®.

Why batch optimization matters

Bayesian optimization is widely used for expensive black-box problems: situations where each trial is costly, the search space is large, and there is no simple formula telling us where the optimum lies [1][2]. In these settings, the method uses past observations to make a more informed decision about what to try next.

In the simplest setup, the process is sequential. One candidate is selected, evaluated, and then the model is updated before the next decision is made. This works well when experiments must be run one at a time.

However, many practical workflows are parallel rather than purely sequential. A lab may be able to prepare multiple samples in the same round, a compute cluster may run many jobs at once, or a customer may prefer to receive several recommendations in a single cycle. The literature on batch and parallel Bayesian optimization was developed precisely to address this mismatch between a sequential decision rule and a parallel experimental reality [2-7].

A conceptual diagram illustrating the difference between sequential and batch Bayesian optimization in Materials Informatics workflows. In sequential optimization, one experiment is selected and evaluated before the next decision is made. In batch optimization — relevant to Lab Automation and parallel experimental pipelines — multiple candidates are selected simultaneously in a single round. The diagram highlights that the core challenge in batch Bayesian optimization is not only identifying individually promising candidates, but selecting a set of experiments that collectively maximises information gain within one cycle of an AI for Science workflow.

Figure 1. A simple comparison between single-point optimization and batch optimization. In batch optimization, the key challenge is not only choosing good candidates, but choosing candidates that make sense together as one experimental round.

A brief intuitive view of Bayesian optimization

A simple way to explain Bayesian optimization is to imagine a researcher exploring an unfamiliar landscape. The researcher wants to find a good region quickly, but each experiment is expensive. Instead of trying everything, the researcher uses the data already collected to estimate where success is more likely and where uncertainty is still high.

This is why many introductions describe Bayesian optimization as balancing two goals: focusing on promising areas and still exploring uncertain areas [1]. For this article, the important point is not the mathematics behind that balance, but the practical consequence: the method is meant to help us spend a limited experimental budget more wisely.

What changes when we choose a batch

Once several experiments can be run in parallel, the decision problem changes. We are no longer asking only, “What is the next best point?” Instead, we are asking, “What is the next best set of points?”

That difference matters because the value of a batch is not simply the sum of individual scores. If the top five recommended candidates are all minor variations of one another, the batch may consume a lot of resource while teaching us very little new. A useful batch therefore needs two things at once: good individual promise and reasonable diversity within the batch.

This issue appears throughout the batch Bayesian optimization literature. Researchers propose different ways to account for the interactions among points that are chosen together, because once one point is selected, later points in the same batch should not ignore that choice [3-7].

Two-panel diagram demonstrating why naive top-N selection fails in batch Bayesian optimization for Materials Informatics. The left panel shows a surrogate model landscape where all selected batch points cluster in a single local region — a common failure mode in parallel experimental design that wastes Lab Automation capacity. The right panel shows a well-designed batch: candidates remain in high-promise regions but are distributed to reduce redundancy, improving the efficiency of each experimental round in an AI for Science or R&D DX context.

Figure 2. Why a batch is not just “the top five points.” The left panel shows a wasteful batch in which all selected points crowd into one local area. The right panel shows a more informative batch: still promising, but less redundant.

Main families of batch strategies

The literature contains many variants, but for a non-specialist audience it is helpful to group them into a few broad families.

Sequential simulation approaches

One common idea is to build the batch one point at a time. After the first candidate is selected, the algorithm temporarily pretends that some information about that candidate is already known. It then uses that temporary assumption to choose the second candidate, and repeats the process until the batch is full. Constant Liar belongs to this family [3][4].

Diversity-promoting approaches

Another family tries to reduce redundancy more directly. A representative example is local penalization, which discourages later points in the same batch from clustering too closely around earlier points. The goal is not diversity for its own sake, but a better spread across regions that still look useful [5].

Joint batch scoring approaches

A more direct but often more demanding idea is to score the whole batch together rather than constructing it greedily. Multi-point expected improvement, often written as q-EI, is a well-known example. It is attractive because it matches the true batch problem more closely, but several papers note that optimizing such criteria can become computationally challenging as the batch grows [6].

Sampling-based parallel approaches

Another line of work uses sampling-based rules such as parallel Thompson sampling. These methods can be conceptually simple and naturally compatible with parallel execution, especially when evaluation times are variable or asynchronous execution matters [7].

A 2×2 scientific figure presenting four representative strategies for batch Bayesian optimization used in Materials Informatics and Lab Automation workflows. Panel (a) shows the sequential simulation approach — specifically Constant Liar — in which a batch is built greedily by inserting a placeholder value for each newly selected candidate and updating the surrogate model before the next selection, repeating until the batch is full. Panel (b) illustrates the diversity-promoting approach using local penalization: a comparison between a poorly designed batch where all five candidates cluster in one high-score region, and a well-designed batch where penalty radii around earlier selections naturally spread the remaining candidates across the search space. Panel (c) depicts joint batch scoring via q-EI, contrasting a greedy sequential approach where each selection is made independently against the q-EI method, which evaluates the value of the full candidate set together in a single optimisation step — faithful to the true batch problem but more computationally demanding. Panel (d) shows the sampling-based parallel approach using Thompson sampling, in which each of three independent agents draws a separate posterior sample from the shared surrogate model and maximises it to select a candidate, allowing all three candidates to be determined simultaneously, making the method naturally compatible with parallel and asynchronous AI for Science and R&D DX pipelines.

Figure 3. Overview of four representative batch Bayesian optimisation strategies.
(a) Sequential simulation builds the batch greedily using placeholder values.
(b) Diversity-promoting penalises clustering. (c) Joint scoring evaluates the whole batch together.
(d) Sampling-based methods draw independent posterior samples in parallel.

Constant Liar 

Constant Liar is one of the best-known practical heuristics for building a batch in a step-by-step way. Its core idea is straightforward. After choosing the first candidate, the algorithm does not wait for the real result. Instead, it inserts a temporary placeholder value for that point, updates the model as if that value were already observed, and then picks the next candidate [3][4].

The name comes from the fact that the placeholder is not the real result. It is a deliberately invented stand-in. In the original line of work on parallel Kriging-based optimization, this family of heuristics was introduced as a way to avoid the intractability of exact batch calculations while still accounting for earlier choices in the same batch [3].

This gives Constant Liar a very practical interpretation: it is a structured approximation that makes sequential logic usable in a batch setting.

Workflow diagram of the Constant Liar algorithm, a practical heuristic for batch Bayesian optimization used in Materials Informatics and Lab Automation settings. The figure shows how Constant Liar builds a batch sequentially: after selecting the first candidate via an acquisition function, a temporary placeholder value is inserted and the surrogate model is updated before the next candidate is chosen. This process repeats until the full batch is assembled, allowing the algorithm to approximate joint batch decisions without waiting for real experimental results — a key practical advantage in AI for Science and parallel R&D workflows.

Figure 4. Constant Liar as a workflow. The method greedily builds a batch by inserting a temporary value for each newly selected point so that the next choice is not made in complete ignorance of the earlier ones.

Why Constant Liar is still used

In this article, we present Constant Liar as a representative practical strategy for batch optimization, rather than as a universally preferred solution.

Even so, there are clear reasons why teams still use it. First, it is relatively easy to understand and explain. Second, it extends a familiar sequential workflow into a batch workflow without requiring a complete redesign. Third, the literature repeatedly treats it as a useful baseline or benchmark because it is simple, practical, and often effective enough for real experimental pipelines [4][6].

In our internal evaluations, Constant Liar was not uniformly the best-performing method in every scenario. Nevertheless, it showed consistently stable optimization behavior across a broad range of settings, including both low- and high-dimensional problems, different numbers of objectives, discrete and continuous parameter spaces, and functions with both smooth and steep response characteristics. This breadth of applicability is one of its key practical strengths. Even when another method outperforms it in a particular setting, Constant Liar tends to remain a reliable choice across diverse optimization conditions.

In other words, Constant Liar survives not because it is guaranteed to be universally best, but because it often offers a good engineering trade-off between simplicity, cost, and performance.

Where other methods may be preferable

Constant Liar is still an approximation. Its behavior depends on the placeholder value used during batch construction, and different choices can push the algorithm toward more conservative or more aggressive behavior [4].

If a team needs a method that accounts for the whole batch more directly, q-EI-based methods may be more attractive, although the computational cost can be higher [6]. If avoiding point clustering is the primary concern, diversity-oriented methods such as local penalization may be appealing [5]. If the workflow is highly asynchronous, sampling-based parallel methods can be especially attractive [7].

The larger lesson is that batch optimization is a family of design choices, not a single fixed recipe. 

Key takeaway

Batch optimization in Bayesian optimization is about making better use of each experimental round when several experiments can be run in parallel. The problem is not only to find good candidates, but to find a good group of candidates. Constant Liar is a classic example of how practitioners approximate that group decision in a simple and workable way, and the wider batch optimization literature shows why different workflows may prefer different strategies [1-7].

Appendix: A quick comparison of representative batch strategy families

Strategy family

Simple intuition

Main strength

Typical trade-off

Sequential simulation (e.g., Constant Liar)

Pretend earlier batch points already have temporary results, then keep selecting

Easy to explain and implement

Uses an approximation, not the true outcomes

Diversity-promoting (e.g., local penalization)

Push later selections away from earlier ones

Reduces redundancy within a batch

Can spread points too widely if overdone

Joint batch scoring (e.g., q-EI)

Score the value of the whole batch together

Closer to the true batch decision

Can be more computationally demanding

Sampling-based parallel methods (e.g., parallel TS)

Use randomized posterior-guided choices in parallel

Works naturally with parallel and asynchronous settings

Behavior may be less straightforward to explain to non-specialists

References

  1. Brochu, E., Cora, V. M., & de Freitas, N. (2010). A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. arXiv:1012.2599. https://arxiv.org/abs/1012.2599
  2. Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems 25. https://proceedings.neurips.cc/paper/2012/file/05311655a15b75fab86956663e1819cd-Paper.pdf
  3. Ginsbourger, D., Le Riche, R., & Carraro, L. (2008/2010). Kriging is Well-Suited to Parallelize Optimization / A Multi-points Criterion for Deterministic Parallel Global Optimization based on Gaussian Processes. Introduces Constant Liar and related heuristics for parallel EI-style selection. https://www.cs.ubc.ca/labs/algorithms/EARG/stack/2010_CI_Ginsbourger-ParallelKriging.pdf
  4. Azimi, J., Fern, A., & Fern, X. (2012). Hybrid Batch Bayesian Optimization. ICML 2012. Discusses Constant Liar as a representative batch heuristic and analyzes simulated sequential batch policies. https://icml.cc/2012/papers/614.pdf
  5. González, J., Dai, Z., Lawrence, N. D., & Hennig, P. (2016). Batch Bayesian Optimization via Local Penalization. AISTATS 2016. https://proceedings.mlr.press/v51/gonzalez16a.html
  6. Wang, J., Clark, S. C., Liu, E., & Frazier, P. I. (2020). Parallel Bayesian Global Optimization of Expensive Functions. Operations Research, 68(2),  1-21. Compares q-EI-style optimization with Constant Liar baselines and discusses computational trade-offs. https://pubsonline.informs.org/doi/10.1287/opre.2019.1966
  7. Kandasamy, K., Krishnamurthy, A., Schneider, J., & Póczos, B. (2018). Parallelised Bayesian Optimisation via Thompson Sampling. AISTATS 2018. https://proceedings.mlr.press/v84/kandasamy18a.html