Automating Time Series With WhizzML
Automating Time Series With WhizzML
Learn how to split a dataset based on the range parameter as it’s important to keep the data in the same order for creating the time series and also for evaluating it.
Join the DZone community and get the full member experience.Join For Free
Since the beginning of our civilization, humans have worried about the future. In particular, we worry about predicting the future. It’s widely known that in Ancient Greece, the most famous oracle was in Delphi. Greek people went there to find out about their future and to decide what they should do to turn their fortunes around. Three thousand years later, the worry about how to act in the future remains, however, we’ve learned to base our decisions on the algorithms and science inherited from Pythagoras, Euclid, Thales, or Archimedes rather than on Pythia’s words.
To continue with our series of posts about time series, this fifth blog post focuses on WhizzML users. WhizzML is our domain-specific language for machine learning workflow automation, which provides programmatic support for all your BigML resources in a way that is completely executed in the BigML back-end.
Every resource in BigML can be managed through WhizzML. It follows naturally that you can now use WhizzML scripts to create time series models and make forecasts with it.
For the time series resource, we are going to begin explaining how to split a dataset based on the range parameter as it’s important to keep the data in the same order for creating the time series and also for evaluating it. As explained in a previous blog post, when testing other supervised models, we can use as test data any randomly sampled subset of instances. However, evaluating time series requires the hold out to be a range of our data, as the training-test splits must be sequential. For an 80-20 split, the test set is the final 20% of rows in the dataset. WhizzML calculates the split ranges by specifying a percentage of rows. In the script snippet below, we set aside 80% of our rows.
linear-split receives a dataset and a percentage, and it creates two complementary datasets — one for testing
test-ds and one for training
train-ds — by splitting the existing rows into two ranges. The one above is the largest script in this post, so it only gets easier from here on.
Now let’s see how to create a time series that models our data. Because we would like to evaluate our time series later on, we will use the
training_ds_id that is output by our script. In fact, the unique mandatory parameter to create a time series is the dataset ID used for training. You can also specify which ETS models you would like to generate. By default, BigML will explore all of them. So the simplest code to create a Time Series is as easy as the example that follows.
In case you have more information about your data, you might want to use other time series creation parameters. The full list of parameters can be found in the time series section of the API documentation.
For monthly data and seasonal activities, you can set an additive trend for your data with its seasonality set to additive (with value 1) and its period specified as 12. Then you may fill the properties in the function that creates the time series as in the example below.
This is a good point to remind ourselves about the fact that most WhizzML requests are asynchronous. Thus, it’s quite possible that you will need to wait for the resource to finish before referring to it in other scripts or accessing its properties. For the previous example, the code would look like this:
Once the time series has been created, we can evaluate how good its fit is. Remember that our original dataset was chronologically split into two parts. Now, we will use the remaining 20% of the dataset to check the time series model performance. The
test-ds parameter in the code below represents the second part of the dataset. Following WhizzML’s less is more philosophy, creating an evaluation requires a simple code snippet with only two mandatory parameters: a time series to be evaluated and a dataset to use as test data.
In the evaluation object, there are some measures for each one of the ETS models in the time series. For more on this, see section 5 of our previous post.
After evaluating your time series, what’s next is calling for the aforementioned “modern-day Oracle.” Once you build a time series with the entire original dataset, that is, including the hold-out rows, you can predict the future values of one or many features in your data domain. In this code, we demonstrate the simplest case, where the forecast is made only for one of the fields in your dataset.
As a developer, the cool part of WhizzML is that it allows you to create a complete script with all the steps you need and execute them in the cloud or in an on-premises deployment with a single call. This takes advantage of the service’s built-in scalability and parallelization capabilities and minimizes latency or possible network brittleness while exchanging information with the cloud. You can do this by directly creating an execution of your script either directly in the BigML API or by using any of the existing BigML Bindings.
BigML supports bindings in different programming languages that allow you to create not only the resources available in the platform, such as time series and evaluations but also scripts and executions. Everything can be managed from your favorite programming languages like Python and Node.js, among many others. You can see the complete list of our bindings and the related documentation on our dedicated Tools page.
Published at DZone with permission of Jose Ribes , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.