Over a million developers have joined DZone.

How to Predict Port Locations in a Logistics Network

Using Weka and machine learning to predict how and where ships will land at port.

· Big Data Zone

Read this eGuide to discover the fundamental differences between iPaaS and dPaaS and how the innovative approach of dPaaS gets to the heart of today’s most pressing integration problems, brought to you in partnership with Liaison.

Machine learning has been present for several decades now. However, some industries, such as the logistics industry, have been dependent on human decisions and actions until recently. How GIS can help the logistics industry was discussed here. With this in mind, we wanted to look into using machine learning within the logistics industry. Working with Wardell Samotshozo, a graduate student at Howard University's Computer Science Department, we ran a small experiment to predict port destinations of logistics data to show an example of this.

After experimenting with the data and tools, we ended up using Weka. Weka has a set of algorithms that are great for machine learning and data mining tasks. It contains tools for data pre-processing as well.

The Chosen Dataset

We took our data from Enigma, and consisted of incoming shipments from U.S. Customs and Border Protection's Automated Manifest System (AMS) for 2015.

1427 was the number of instances/rows of incoming shipments used. The port_of_destination is the final port of destination if the cargo travels by ship beyond its initial port of unloading in the United States. The cities picked for port of location were Miami, Florida, Norfolk, Virginia and Oakland, California.

Here is a list of the rest of the attributes used:

  • identifier - Unique shipment identifier. Can be used as the key to link to more detailed Bill of Lading, Cargo Description, & Hazardous Materials tables.

  • trade_update_date - Date trade records were updated.

  • run_date - Run Date

  • vessel_name - Name of the ship carrying the cargo.

  • port_of_unlading - Location where the items first entered the United States.

  • estimated_arrival_date - Estimated date the cargo would arrive at its destination.

  • foreign_port_of_lading - Foreign port where the cargo embarked on its voyage to the United States by sea.

  • record_status_indicator - Whether the record is New, Updated, or has been Deleted. Any records marked deleted should not be counted in any summations or rankings.

  • place_of_receipt- Location where the shippers first took possession of the cargo.

  • port_of_destination - Final port of destination if the cargo travels by ship beyond its initial port of unlading in the United States.

  • actual_arrival_date - Actual date the shipment arrived at its destination.

  • consignee_name - The company or person receiving the items.

  • shipper_party_name - The company or person shipping the items.

  • container_number - Container Number

  • description_sequence_number - Description Sequence Number

  • piece_count - Number of items contained in the shipment.

  • harmonized_number - Harmonized Tarrif Code

  • harmonized_value - Harmonized Value

  • harmonized_weight - Harmonized Weight

  • harmonized_weight_unit - Harmonized Weight Unit

Also, you can see information about the dataset in the picture below:

Screen Shot 2016-04-08 at 3.57.33 PM

Chosen Algorithm to Predict Port Locations

The Decision Tree Algorithm was chosen to create a model based off the dataset. In particular, we chose to use the J48 Classifier, which is a version of the Decision Tree Algorithm.  Using the J48 Classifier, a model or in this case, a ruleset was created. This model was then used predict or classify the target value for port location of the training set and test set. The dataset was then split so that 66% of the dataset was used to train the model and come up with the rule set.  This 66% is called the training dataset. Then 33% was used for the test dataset that will be used to test the predictions on a data that was not used to train the model.

Here are some pictures below that show this process:

Description of attributes and algorithm

Results for using all the attributes.

As you can see in the image above, the results to predict the port location for the 33% or test dataset. Out of 485 instances or records, 474 or 97.732% were predicted or classified correctly. Only 11 or 2.268% were predicted or classified incorrectly.

Now this simple experiment was done to demonstrate how machine learning can be used. Weka is one of many tools. It is easy to run and view results on a small scale. Other tools are RScikit-Learn, and Mllib with Apache Spark.

If you are interested in discussing more about how to use Machine Learning in your software applications, leave a comment, or contact me here or at adetola@adelabs.com.

Discover the unprecedented possibilities and challenges, created by today’s fast paced data climate and why your current integration solution is not enough, brought to you in partnership with Liaison

machine learning,weka,data mining

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}