Google vizier is the de facto system for black box optimization about objective functions and hyperparameters at Google, having serviced some of Google’s largest research efforts and optimized a wide range of products (eg, search, ads, YouTube). For research, he not only reduced the latency of the language model for users, designed computer architectures, accelerated hardware, assisted protein discovery, and improved robotics, but also provided a reliable back-end interface for users to search for neural architectures. and develop reinforcement learning algorithms. To operate at the scale of optimizing thousands of critical user systems and tuning millions of machine learning models, Google Vizier solved key design challenges to support diverse use cases and workflows, while remaining highly fault tolerant.
Today we are pleased to announce Open Source Vizier (OSS) (accompanied systems white paper published in AutoML Conference 2022), a standalone Python package based on Google Vizier. OSS Vizier is designed for two primary purposes: (1) to manage and optimize experiments at scale in a reliable and distributed way for users, and (2) to develop and benchmark algorithms for automated machine learning (AutoML) researchers.
System design
OSS Vizier works by having a server provide services, namely optimizing the black box goals or functions of multiple clients. In the main workflow, a customer submits a remote procedure call (RPC) and requests a hint (i.e., a proposed input to the client’s black box function), from which the service begins generating a worker to start an algorithm (i.e., a pythian policy) to calculate the following hints. The suggestions are then evaluated by clients to form their corresponding objective values and measures, which are sent back to the service. This channelization is repeated several times to form a complete tuning path.
The use of the ubiquitous gRPC The library, which is compatible with most programming languages such as C++ and Rust, allows for maximum flexibility and customization, where the user can also write their own custom clients and even algorithms outside of the default Python interface. Since the entire process is saved in a sql data warehouse, smooth recovery after a crash is ensured, and usage patterns can be stored as valuable data sets for multitasking and meta-learning research transfer-learning methods like the OptFormer and hyperBO.
Use
Due to OSS Vizier’s emphasis as servicein which clients can send requests to the server at any time, is designed for a wide range of scenarios: the budget of evaluations or judgments, can range from tens to millions, and evaluation latency can range from seconds to weeks. Evaluations can be performed asynchronously (eg, by fitting an ML model) or in synchronous batches (eg, by wet lab scenarios involving multiple simultaneous experiments). In addition, assessments may fail due to transient errors and be retried, or they may fail due to persistent errors (eg, assessment is impossible) and should not be retried.
This widely supports a variety of applications, including hyperparameter tuning deep learning models or optimize non-computational targets, which can be, for example, physical, chemical, biological, mechanical, or even human, such as cookie recipes
Integrations, algorithms and benchmarks
Since Google Vizier is highly integrated with many of Google’s internal frameworks and products, OSS Vizier will naturally be highly integrated with many of Google’s open source and external frameworks. Most notably, OSS Vizier will serve as a distributed backend for PyGlove to enable large-scale evolutionary searches on combinatorial primitives such as neural architectures and reinforcement learning algorithms. Additionally, OSS Vizier shares the same client-based API with vertex vizierallowing users to quickly switch between open source and production quality services.
For AutoML researchers, OSS Vizier is also equipped with a useful collection of algorithms and benchmarks (ie objective functions) unified under common APIs for evaluating the strengths and weaknesses of proposed methods. In particular, through TensorFlow Probabilityinvestigators can now use the JAX-based Gaussian Process Bandit algorithmbased on the default algorithm in Google Vizier that adjusts the goals of internal users.
Resources and future direction
We provide links to the base code, documentationand systems white paper. We plan to allow user contributions, especially in the form of algorithms and benchmarks, and further integration with the open source AutoML ecosystem. In the future, we hope to see OSS Vizier as a central tool for expanding research and development on black box optimization and hyperparameter tuning.
Thanks
OSS Vizier was developed by members of the Google Vizier team in collaboration with the TensorFlow Probability team: Setareh Ariafar, Lior Belenki, Emily Fertig, Daniel Golovin, Tzu-Kuo Huang, Greg Kochanski, Chansoo Lee, Sagi Perel, Adrian Reyes, Xingyou (Richard) Song and Richard Zhang.
In addition, thanks to Srinivas Vasudevan, Jacob Burnim, Brian Patton, Ben Lee, Christopher Suter, and Rif A. Saurous for additional TensorFlow Probability integrations, Daiyi Peng and Yifeng Lu for PyGlove integrations, Hao Li for Vertex/ Cloud, Yingjie Miao for AutoRL integrations, Tom Hennigan, Varun Godbole, Pavel Sountsov, Alexey Volkov, Mihir Paradkar, Richard Belleville, Bu Su Kim, Vytenis Sakenas, Yujin Tang, Yingtao Tian, and Yutian Chen for code and infrastructure help open, and George Dahl, Aleksandra Faust, Claire Cui, and Zoubin Ghahramani for discussions.
Finally, thanks to Tom Small for designing the animation for this post.