Google AI Introduces ‘Uncertainty Baseline Library’ for Uncertainty and Robustness in Deep Learning


Machine learning is a buzzword in today’s technology. It is increasingly used in a wide range of real world applications such as image and speech recognition, self-driving cars, medical diagnostics, to name a few. Therefore, it becomes essential to understand its behavior and performance in practice. High-quality estimates of robustness and uncertainty are crucial for many functions, especially deep learning.

In order to solve this problem and understand the behavior of a machine learning model, Google researchers introduced the concept of baselines of uncertainty for each task of interest. This is a collection of high-quality implementations of industry-standard and state-of-the-art deep learning methods on various tasks. The collection covers nineteen methods across nine tasks, each with a minimum of five metrics.

A Baseline, in general, is defined as a number which is a reasonable and defined starting point for comparison studies. Each baseline in the collection is a standalone pipeline of experimentation with reusable and effortlessly expandable components. The pipelines were run in TensorFlow, PyTorch, and Jax with limited dependencies outside of the framework. The hyperparameters of each baseline have been trained through many iterations to provide superior results.

In this research, Uncertainty Baselines provides 83 baselines incorporating 19 methods encompassing the most recent strategies. Some of the methods are BatchEnsemble, DeepEnsembles, Rank 1 Bayesian Neural Networks, and act as a successor by merging various benchmarks in the community. Each baseline is adjusted on its hyperparameters to maximize performance for a given set of metrics.


The baselines vary along three different axes:

  • Basic models: simple, fully connected networks.
  • Training datasets: Data required to train a machine learning model.
  • Evaluation metrics: predictive metrics like precision, uncertainty metrics like calibration error, computational metrics like inference latency.

To allow easy use of these baselines, they are deliberately optimized to be as minimal and modular as possible. Instead of building new class abstractions, existing ones are used. The training / assessment pipeline is contained in a stand-alone Python file for special experience to ensure independence between different baselines. It can be developed in any TensorFlow, PyTorch or JAX. Simple python flags defined using Abseil are used to manage hyperparameters and other experiment setup values.

In the future, researchers aim to publish the results of hyperparameter tuning and final control points of the model to enable baseline reproducibility. They also ensure that the repository has undergone extensive hyperparameter tuning and can be easily used by other researchers without retraining or readjusting. The researchers hope to avoid minor differences in pipeline implementations that tend to affect baseline comparisons and prompt people to contribute new methods to the repository.




Source link

Leave A Reply

Your email address will not be published.