Hello Qualification Set!
Nicholai Stålung is a lead data scientist at Trifork who's spawned and led multiple data science and machine learning teams.
Deploying Machine Learning to production is hard for several reasons. One reason is data drift. That is when the training and testing set no longer correspond to the world we have modeled. As engineers and statisticians, we tend to input and output validate through logic and statistics to account for drift and outliers. However, these implementations imply that a model is running in production. Thus, we are taking the risk of deploying models without understanding their implications.
Nicholai, therefore, proposes a fourth data set definition: the qualification set. The qualification set is different from the testing and validation sets. It is not defined by the real-world data distributions but instead of tails, corner cases, and testable observations that are derived through domain knowledge and curiosity. Before deployment, the purpose of the qualification set is to assert that the machine learning system will qualify to all imaginable scenarios.
-
War is Peace, Freedom is Slavery, Ignorance is Strength, Scrum is AgileAllen HolubWednesday Nov 10 @ 16:10
-
Is Software Engineering Still an Oxymoron?Alan KayMonday Nov 8 @ 09:30
-
The Future of FlightAnita SenguptaMonday Nov 8 @ 13:50
-
Continuous Delivery Pipelines: How to Build Better SW FasterDave FarleyWednesday Nov 10 @ 16:10
-
An Average Working Day on Visionary NASA ProjectsKenneth Harris IIMonday Nov 8 @ 16:10
-
The Worst Programming Language EverMark RendleTuesday Nov 9 @ 19:00