When building machine learning models, we want to keep error as low as possible. Two major sources of error are bias and variance. If we managed to reduce these two, then we could build more accurate models.
But how do we diagnose bias and variance in the first place? And what actions should we take once we've detected something?
In this post, we'll learn how to answer both these questions using learning curves. We'll work with a real world data set and try to predict the electrical energy output of a power plant.
We'll generate learning curves while trying to predict the electrical energy output of a power plant. Image source: Pexels.
Some familiarity with scikit-learn and machine learning theory is assumed. If you don't frown when I say cross-validation or supervised learning, then you're good to go. If you're new to machine learning and have never tried scikit, a good place to start is this blog post.
We begin with a brief introduction to bias and variance.