MLFlow Primer

So after pip install mlflow you can track the runs of your code by inserting a few extra lines. This following is not a full tutorial of course, just something quick to show the basic and convince you it’s easy to work with.

MLFlow is organized into ‘experiments’, which are essentially just collections of runs. One run is one execution of your code. MLFlow tracks a bunch of metadata automatically, and in addition you can store basically whatever you want in a run. MLFlow uses a number of concepts to seperate information logically and displays them in different ways: ‘parameters’ (inputs), ‘metrics’ (outputs), ‘tags’ (labels) and ‘artifacts’ (files).

Once your runs are stored, you can view them either through the UI or the API. We won’t use the UI in this guide, because we need to access the stored runs programmatically through the API, but the UI is very useful and trivial to run (checkout the MLFlow docs).

The skeleton of the MLFlow code to be inserted basically looks like this:

import mlfow

    mlflow.log_param('param_1', 3.14)
    mlflow.log_metric('answer', 42)

Below, you will see a more elaborate and realistic example. (note that not all dependent functions are shown)

import mlflow

# set up some parameters for my code
svc_pars = dict(kernel='rbf', random_state=0, gamma=.10, C=1.0)
knn_pars = dict(n_neighbors=5, p=2, metric='minkowski')
algo = 'knn'

# some free text that you can save with a run
notes = "I think an knn will work better" 
# you can define your own tags as well. In this case, 
# I'm reminding myself that this is not a serious run (but a test for example)
tags = {"valid": False} 
# set location to save the run data
# name of my experiment(= grouping of runs)

run_name = f'iris_{algo}'

# let MLFlow know this is a run to track
with mlflow.start_run(run_name=run_name) as run:
    # -- here is just some code, it's not important for now -- 
    X_train, X_test, y_train, y_test = get_data()
    X_train, X_test = feature_engineering(X_train, X_test)

    if algo == 'svc':
        params = svc_pars
        model = train_svc(X_train, y_train, **params)
    elif algo == 'knn':
        params = knn_pars
        model = train_knn(X_train, y_train, **params)

    acc_train = model.score(X_train, y_train)
    acc_test = model.score(X_test, y_test)

    X_stack, y_stack = recombine_data(X_train, X_test, y_train, y_test)
    ## -- computations finished --
    # we can log parameters to this run (inputs):
    mlflow.log_param('algo', algo)
    # and we can log metrics to this run (outputs)
    mlflow.log_metric('acc_train', acc_train)
    mlflow.log_metric('acc_test', acc_test)
    # and also model artifacts. 
    # even if you don't do ML, if you use sklearn, tensorflow or other common frameworks, 
    # you may still be able to save some useful objects with various log_model methods,
    # or with the log_artifact method.
    mlflow.sklearn.log_model(model, 'model')

    # we can also log plots (and basically any other file)...
    plot_decision_regions(X=X_stack, y=y_stack, classifier=model, test_idx=range(105,150))
    plt.xlabel('petal length [standardized]')
    plt.ylabel('petal width [standardized]')
    plt.legend(loc='upper left')
    plot_filename = 'decision_region.png'
    # with this method
    mlflow.log_artifact(plot_filename, 'figures')

    # and also apply some tags to this run
    # the content tag is a special one
    mlflow.set_tag('mlflow.note.content', notes)
    for key, value in tags.items():
        mlflow.set_tag(key, value)

You don’t see it here, but this run is now saved by mlflow. You can query all the runs through the python API (which we will do in the next section), but there is also an UI where you can view them conveniently.