Bayesian optimization (ML syllabus edition 1/8)

The Lindahl Letter

1×

0:00

-9:43

Bayesian optimization (ML syllabus edition 1/8)

Part 1 of 8 in the ML syllabus series

Dr. Nels Lindahl

Aug 05, 2022

You might remember the Substack post from week 57 titled, “How would I compose an ML syllabus?” We have now reached the point in the program where you are going to receive 8 straight Substack posts that would combine together to compose what I would provide somebody as an introduction to machine learning syllabus. We are going to begin to address the breadth and depth of the field of machine learning. Please do consider that machine learning is widely considered just a small slice of the totality of artificial intelligence research. As a spoken analogy, you could say that machine learning is just one slice of bread in the loaf that is artificial intelligence. I did seriously entertain the idea of organizing the previous 79 posts into a syllabus based format for maximum delivery efficiency. That idea gave way quickly as it would be visually and topically overwhelming and that is the opposite of how this content needs to be presented. Let’s take this in the direction it was originally intended to take. To that end, let’s consider the framework that back during the week 57 writing process I thought was important. My very high level introduction to the creation of a machine learning syllabus from back in week 57 on February 25, 2022, would center on 8 core topics:

Week 80: Bayesian optimization (ML syllabus edition 1/8)
Week 81: A machine learning literature review (ML syllabus edition 2/8)
Week 82: ML algorithms (ML syllabus edition 3/8)
Week 83: Neural networks (ML syllabus edition 4/8)
Week 84: Reinforcement learning (ML syllabus edition 5/8)
Week 85: Graph neural networks (ML syllabus edition 6/8)
Week 86: Neuroscience (ML syllabus edition 7/8)
Week 87: Ethics (fairness, bias, privacy) (ML syllabus edition 8/8)

That is what we are going to cover. At the end of the process, I’ll have a first glance at an introduction to machine learning syllabus. My efforts are annotated and include some narrative compared to a pure outline based syllabus. Bringing content together that is foundational is an important part about building this collection. At this point, just describing the edge of where things are in the field of machine learning would create something that would only be current for a moment and would fade away as the technology frontier curve advances. Instead of going that route it will be better to build a strong foundation for people to consume that will support the groundwork necessary to move from introductory to advanced machine learning. Yes, you might have caught from that last sentence that at some point I’ll need to write the next syllabus as a companion to this one. Stay tuned for a future advanced machine learning syllabus to go along with this introductory to machine learning edition. Enough overview has now occurred. It’s time to get started…

Introduction to ML: Bayesian optimization (Lecture 1 of 8)

I remember digging into Armstrong’s “Principles of forecasting” book which was published back in 2001 [1]. You can get a paper copy or find it online for a lot less than the $429 dollars Springer wants for the eBook. I thought the price was a typo at first, but I don’t think it actually is a typo. It’s just another example of how publishers are confused about how much academic work should cost for students to be able to read. Within that weighty tome of knowledge you can find coverage of the concept of Bayesian pooling which people have used for, “Forecasting analogous time series.” That bit of mathematics is always where my thoughts wander when considering Bayesian optimization. I have spent a lot of time researching machine learning and I really do believe most of the statistical foundations you would need to understand the field could be found in the book, “Principles of forecasting: A handbook for researchers and practitioners.”

I do not think you should pay $429 dollars for it, but it is a wonderful book. Keep in mind that the book does not mention machine learning at all. It is from 2001 and does not really consider how forecasting tools would be extended within the field of machine learning. A lot of machine learning use cases are based on observation and the prediction of things. That is pretty much at the heart of the mathematics of forecasting. You need to understand the foundations of the statistical paradigm that Thomas Bayes introduced a couple hundred years ago in the 1700’s. The outcome of that journey will be the simple aside that we are about to work toward inferring some things. Yes, at this point in the journey we are about to work on inference.

You could move directly to the point and examine Peter Frazier’s 2018 “A Tutorial on Bayesian Optimization” paper [2]. You may want to extend that analysis to figure out all the connected papers [3]. Instead of wandering off into the vast collection of papers that are connected to that one I started to wonder about a very different set of questions. You may have wondered as well if Bayesian optimization is an equation. Within the field of machine learning it is treated more like an algorithm and people typically invoke or call it from previously coded efforts. It does not appear that generally within the field of machine learning people really do the math themselves. You are going to see a whole lot of extending things that are developed as part of a package or framework. Applied Bayesian optimization is going to fall into that format of delivery and application without question.

The rest of this lecture on Bayesian optimization consists of three parts. First, 3 different videos you could watch. Second, 3 papers you could read to really dig into the subject and start to flush out your own research path. Third, an introduction to where you would find this type of effort expressed in code. Between those 3 different areas of consideration you can take your understanding of Bayesian optimization to the next level.

3 solid video explanations:

“Bayesian Optimization - Math and Algorithm Explained”

“Bayesian Optimization (Bayes Opt): Easy explanation of popular hyperparameter tuning method”

“Machine learning - Bayesian optimization and multi-armed bandits”

3 highly cited papers for background:

Pelikan, M., Goldberg, D. E., & Cantú-Paz, E. (1999, July). BOA: The Bayesian optimization algorithm. In Proceedings of the genetic and evolutionary computation conference GECCO-99 (Vol. 1, pp. 525-532).

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.467.8687&rep=rep1&type=pdf

Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., & De Freitas, N. (2015). Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1), 148-175.

https://ieeexplore.ieee.org/abstract/document/7352306

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25.

https://proceedings.neurips.cc/paper/2012/file/05311655a15b75fab86956663e1819cd-Paper.pdf

Where would you find the code for this?

Tensorflow:

https://blog.tensorflow.org/2020/01/hyperparameter-tuning-with-keras-tuner.html

Keras:

https://github.com/keras-team/keras-tuner

Scikit-learn:

https://scikit-optimize.github.io/stable/

A Google Colab notebook:

https://colab.research.google.com/github/krasserm/bayesian-machine-learning/blob/master/bayesian_optimization.ipynb

The base Github for the above Google Colab notebook:

https://github.com/krasserm/

Closing out this lecture on Bayesian optimization has to end with a general bit of caution about the mathematics of machine learning. A lot of very complex mathematics including statistical devices are available to you within the machine learning space. Working toward a solid general understanding of what the underlying methods (especially the statistical methods) are doing is really important as a foundation for your future work. It is easy to allow the software to pick up the slack and to report outputs. Moving purely toward this type of effort allows the potential for problematic internal breakdowns of the mathematics to occur. You may very well get the outcome you wanted, but it is not explainable or repeatable in any way shape or form. Yes, I’m willing to accept that the majority of people working within the machine learning space could not take a step back and express their work in a pure mathematical way by abstracting away the code to a pure equation based form. That type of pure mathematical explanation by equation is not generally required in papers or read outs. Most of the time it comes down to the simple truth of working in production.

Footnotes:

[1] https://link.springer.com/book/10.1007/978-0-306-47630-3

[2] https://arxiv.org/pdf/1807.02811.pdf

[3] https://www.connectedpapers.com/main/c27078d60737ea10e8ca4f05acd114fef29c8276/graph

What’s next for The Lindahl Letter?

Week 81: Deep learning (ML syllabus edition 2/8)
Week 82: ML algorithms (ML syllabus edition 3/8)
Week 83: Neural networks (ML syllabus edition 4/8)
Week 84: Reinforcement learning (ML syllabus edition 5/8)
Week 85: Graph neural networks (ML syllabus edition 6/8)
Week 86: Neuroscience (ML syllabus edition 7/8)
Week 87: Ethics (fairness, bias, privacy) (ML syllabus edition 8/8)

I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.