DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • How to Port CV/ML Models to NPU for Faster Face Recognition
  • Unsupervised Learning Methods for Analyzing Encrypted Network Traffic
  • Understanding Neural Networks
  • Understanding the Basics of Neural Networks and Deep Learning

Trending

  • Build a Simple REST API Using Python Flask and SQLite (With Tests)
  • How to Create a Successful API Ecosystem
  • Introducing Graph Concepts in Java With Eclipse JNoSQL
  • Enforcing Architecture With ArchUnit in Java
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Feed-Forward Neural Networks With mxnetR

Feed-Forward Neural Networks With mxnetR

mxnetR is a Deep Learning package that works with all Deep Learning flavors, including feed-forward neural networks. FNNs have simple processing units with hidden layers.

By 
Sibanjan Das user avatar
Sibanjan Das
DZone Core CORE ·
Feb. 21, 17 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
7.8K Views

Join the DZone community and get the full member experience.

Join For Free

This is the third part of our Deep Learning series. The first in the series was Dive Deep Into Deep Learning, which focused on the basics of Deep Learning. The second was on using the H2O Deep Learning package as an autoencoder to create an anomaly detector. 

In this piece, we are going to introduce you to feed-forward neural networks. This part will focus on mxnetR, an open-source Deep Learning package that works with all Deep Learning flavors: feed-forward neural networks (FNN), convoluted neural networks (CNN), and recurrent neural networks (RNN).

Feed-Forward Neural Networks

To start with a formal definition, a feed-forward neural network (AKA a multilayer perceptron, or MLP) consists of a large number of simple processing units called perceptrons organized in multiple hidden layers. To reiterate what I described in my earlier post:

  • The input layer consists of neurons that accept the input values. The output from these neurons is same as the input predictors.

  • The output layer is the final layer of a neural network that returns the result back to the user environment. Based on the design of a neural network, it also signals the previous layers
    on how they have performed in learning the information and accordingly improved their functions.

  • Hidden layers are in between input and output layers. Typically, the number of hidden layers ranges from one to many. These are the central computation layers that have the functions that map the input to the output of a node.  

Image title

We can say a perceptron is the basic processing unit of an artificial neural network. A perceptron takes several inputs and produces an output, as shown in below figure. 

Image title

Typically, the inputs to these perceptrons are associated with a weight. A perceptron computes a weighted sum of inputs and applies a certain function to it. This function is called an activation or transfer function. The transfer function transforms the result of the summation output to a working output using mathematical functions.

Mostly, transfer functions are differential functions to enable continuous error correction and compute local gradients — such as sigmoid or tanh. This resembles real neurons that output probabilities for all output classes. However, they can typically also be a step function in which the output is set at two levels based on certain thresholds. There is also the third kind for linear units, in which the output is proportional to the total weighted output. Wikipedia has a complete list of activation functions.

The best part of a neural network is that the neurons adapt to learn from the errors and improve their results. Various methods are incorporated into a neural network to make it adaptive. The most used ones are the Delta rule and the back error propagation. The former is used in feed-forward networks and is based on the gradient descent learning, and the latter, in feedback networks such as recurrent neural networks.

I have explained a bit more about neural networks in my book Data Science Using Oracle Data Miner and Oracle R Enterprise (published by Apress).

Getting Started With MXNet Using R

As described earlier, MXNet is a deep neural net that contains feed-forward neural networks (FNN), convolution neural networks (CNN), and recurrent neural networks (RNN). CNN and RNN using MXNet are a part of the discussion for future articles.

The MXNet R package brings flexible and efficient GPU computing and state-of-art Deep Learning to R. Though we demonstrate MXNet using R, it is also supported by various other languages such as Python, Julia, C++, and Scala. So, if you're not interested in R, try this example using your favorite programming language.

MXNet installation in R is very straight-forward. You can run the below script directly in your R environment to set it up.

# Installation
 install.packages("drat", repos="https://cran.rstudio.com")
 drat:::addRepo("dmlc")
 install.packages("mxnet")

However, for Windows 7 users, there is a version issue with its one of the components DiagrammeR. You can downgrade it to v 0.8.1 using the below command. If you encounter any other issue, the MXNet website has a list of common installation problems.

install_version("DiagrammeR", version = "0.8.1", repos = "http://cran.us.r-project.org")

Creating Deep Learning Models With MXNet

We are all set to explore MXNet in R. I am using Kaggle's HR analytics data set for this demonstration. The data set is a small sample of around 14,999 rows. You can try it out with other data sets after you have learned how to use MXNet to build a feed-forward network. Our intention in this article is to help you understand and get started with MXNet.

library(mxnet)
hr_data <- read.csv("F:/git/deep_learning/mxnet/hrdata/HR.csv")
head(hr_data)
str(hr_data)
summary(hr_data)

Next, we will perform some necessary data pre-processing and partition the data into training and test sets. The training data set will be used to train a model and test data set to verify the accuracy of the newly trained model.

#Convert some variables to factors
 hr_data$sales<-as.factor(hr_data$sales)
 hr_data$salary<-as.factor(hr_data$salary)
 hr_data$Work_accident <-as.factor(hr_data$Work_accident)
 hr_data$promotion_last_5years <-as.factor(hr_data$promotion_last_5years)
 smp_size <- floor(0.70 * nrow(hr_data))

## set the seed to make your partition reproductible
 set.seed(1)
 train_ind <- sample(seq_len(nrow(hr_data)), size = smp_size)

 train <- hr_data[train_ind, ]
 test <- hr_data[-train_ind, ]

 train.preds <- data.matrix(train[,! colnames(hr_data) %in% c("left")])
 train.target <- train$left
 head(train.preds)
 head(train.target)

 test.preds <- data.matrix(test[,! colnames(hr_data) %in% c("left")])
 test.target <- test$left
 head(test.preds)
 head(test.target)

To create a feed-forward network, you can directly use mx.mlp, which is a convenient interface for creating multiple-layer perceptrons. The parameters descriptions are in comments for each used parameter.

#set seed to reproduce results
 mx.set.seed(1)

  mlpmodel <- mx.mlp(data = train.preds
   ,label = train.target
   ,hidden_node = c(3,2) #two layers — first layer with 3 nodes and second with 2 nodes
   ,out_node = 2 #number of output nodes
   ,activation="sigmoid" #activation function for hidden layers
   ,out_activation = "softmax" 
   ,num.round = 10 #number of iterations
   ,array.batch.size = 5 #batch size for updating weights
   ,learning.rate = 0.03 #same as step size
   ,eval.metric= mx.metric.accuracy
   ,eval.data = list(data = test.preds, label = test.target))

Once the training is complete, you can use the predict method to make predictions on the test data set:

#make a prediction
 preds <- predict(mlpmodel, test.x)
 dim(preds)

The function mx.mlp() is essentially a substitute to a more flexible but lengthy symbol system of defining a neural network using MXNet. The symbol is the building block a neural network in MXNet. It is a type of functional object that can take in several input variables and produce more than one output variable. Individual symbols can be stacked up on one another to produce complex symbol. This helps in formulating a complex neural network with various layers with each layer being defined as individual symbols stacked up on one another.

The equivalent of the previous network in symbolic definition will be:

#configure the network structure
 data <- mx.symbol.Variable("data")
 fc1 <- mx.symbol.FullyConnected(data, name = "fc1", num_hidden=3) #first hidden layer with activation function sigmoid
 act1 <- mx.symbol.Activation(fc1, name = "sig", act_type="sigmoid") 
 fc2 <- mx.symbol.FullyConnected(act1, name = "fc2", num_hidden=2) #second hidden layer with activation function relu
 act2 <- mx.symbol.Activation(fc2, name = "relu", act_type="relu") 
 out <- mx.symbol.SoftmaxOutput(act2, name = "soft")

#train the network
 dp_model <- mx.model.FeedForward.create(symbol = out
  ,X = train.preds
  ,y = train.target
  ,ctx = mx.cpu()
  ,num.round = 10
  ,eval.metric = mx.metric.accuracy
  ,array.batch.size = 50
  ,learning.rate = 0.005
  ,eval.data = list(data = test.preds, label = test.target))

This type of configuration gives flexibility to configure a network with different parameters for multiple hidden layers. For example, we can use the sigmoid activation function for Layer 1, the relu for Layer two, and so on.

You can also visualy inspect the neural network that is created using the below code snippet:

graph.viz(mlpmodel$symbol$as.json()) 

The computation graph shows the structure of defined neural network. We can see the first hidden layer with three nodes with the sigmoid activation function, the second hidden layer with two nodes with the relu activation function, and the final output with the softmax function.

Image title

Towards the end, we can use the same predict API for creating the predictions and create a confusion matrix to establish the accuracy of the predictions on the new data set.

#make a prediction
preds <- predict(dp_model, test.preds)
preds.target <- apply(data.frame(preds[1,]), 1, function(x) {ifelse(x >0.5, 1, 0)})
table(test.target,preds.target)

You can fork and download the code from my GitHub page. We have some more articles on MXNet on creating CNN and RNN coming. In the meantime, you can explore MXNet's excellent tutorials.

neural network Network Feed forward (control)

Opinions expressed by DZone contributors are their own.

Related

  • How to Port CV/ML Models to NPU for Faster Face Recognition
  • Unsupervised Learning Methods for Analyzing Encrypted Network Traffic
  • Understanding Neural Networks
  • Understanding the Basics of Neural Networks and Deep Learning

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!