Using H2O AutoML for Safe Driver Prediction [Code Snippet]
Kaggle is currently hosting a Porto Seguro Safe Driver Prediction Competition. This H2O AutoML Python script is what I used!
Join the DZone community and get the full member experience.Join For Free
I did try to use the given training dataset as it is with H2O AutoML. It ran for about five hours and I was able to get into the top 280th position. If you can transform the dataset properly and run H2O AutoML, you may be able to get an even higher ranking.
Following is a very simple H2O AutoML Python script that you can try, as well. (Note: Make sure to change
run_automl_for_seconds to the desired time that you want to run the experiment.)
import h2o import pandas as pd from h2o.automl import H2OAutoML h2o.init() train = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroTrain.csv') test = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroTest.csv') sub_data = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroSample_submission.csv') y = 'target' x = train.columns x.remove(y) ## Time to run the experiment run_automl_for_seconds = 18000 ## Running AML for 4 Hours aml = H2OAutoML(max_runtime_secs =run_automl_for_seconds) train_final, valid = train.split_frame(ratios=[0.9]) aml.train(x=x, y =y, training_frame=train_final, validation_frame=valid) leader_model = aml.leader pred = leader_model.predict(test_data=test) pred_pd = pred.as_data_frame() sub = sub_data.as_data_frame() sub['target'] = pred_pd sub.to_csv('/data/avkash/PortoSeguro/PortoSeguroResult.csv', header=True, index=False)
That’s it; enjoy!
Published at DZone with permission of Avkash Chauhan, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.