Four Machine Learning Techniques With Python
Four Machine Learning Techniques With Python
Want to learn more ML techniques? Look no further than Python.
Join the DZone community and get the full member experience.Join For Free
Bias comes in a variety of forms, all of them potentially damaging to the efficacy of your ML algorithm. Read how Alegion's Chief Data Scientist discusses the source of most headlines about AI failures here.
Machine Learning Techniques Vs. Algorithms
While this tutorial is dedicated to machine learning techniques with Python, we will move over to algorithms pretty soon. But before we can begin focussing on techniques and algorithms, let’s find out if they’re the same thing.
A technique is a way of solving a problem. This is quite generic as a term. But when we say we have an algorithm, we mean that we have an input and desire a certain output from it. We have clearly defined what steps to follow to get there. We will go the lengths to say an algorithm may make use of multiple techniques to get to the output.
Now that we have distinguished between the two, let’s find out more about machine learning techniques.
Machine Learning Techniques With Python
Machine Learning Regression
The dictionary will tell you that to regress is to return to a former state — one that is often less developed. In books on statistics, you will find regression to be a measure of how one variable’s mean and corresponding values of other values relate to each other. But let’s talk about it how you will see it.
Regressing to the Mean
Francis Galton, Charles Darwin’s half-cousin, observed the sizes of sweet peas over generations. What he concluded was that letting nature do its job will result in a range of sizes. But if we selectively breed sweet peas for size, it makes for larger ones. With nature at the steering wheel, even bigger peas begin to produce smaller offsprings with time. We have a certain size for peas that varies, but we can map these values to a specific line or curve.
Another Example: Monkeys and Stocks
In 1973, Burton Malkiel, Princeton University professor, put a claim in his book, A Random Walk Down Wall Street, which was a bestseller, and insisted that a blindfolded monkey could do an equally good job as experts at selecting a portfolio by throwing darts at a newspaper’s financial pages. In such stock-picking competitions, monkeys have beaten pros. But this was for once or twice. With enough events, the monkeys’ performance declines; it regresses to the mean.
What Is Machine Learning Regression?
In this plot, the line best fits all the data marked by the points. Using this line, we can predict what values we will find for x=70 (with a degree of uncertainty).
As a Machine Learning technique, regression finds its foundation in supervised learning. We use it to predict a continuous and numerical target and begins by working on the data set values we already know. It compares known and predicted values and labels the difference between the expected and predicted values as the error/residual.
Types of Regression in Machine Learning
We generally observe two kinds of regression:
- Linear Regression: When we can denote the relationship between a target and a predictor in a straight line, we use linear regression, as seen below:
- Non-Linear Regression: When we observe a non-linear relationship between a target and a predictor, we cannot denote it as a straight line.
Machine Learning Classification
What Is Machine Learning Classification?
Classification is a data mining technique that lets us predict group membership for data instances. This uses labeled data in advance and falls under supervised learning. This means we train data and expect to predict its future. By ‘prediction,’ we mean we classify data into the classes they can belong. We have two kinds of attributes available:
- Output Attribute, or the Dependent attribute.
- Input Attribute, or the Independent attribute.
Methods of Classification
- Decision Tree Induction: We build a decision tree from the class labeled tuples. This has internal nodes, branches, and leaf nodes. The internal nodes denote the test on an attribute, the branches, the test outcome, the leaf nodes, and the class label. The two steps involved are learning and testing, and these are fast.
- Rule-based Classification: This classification is based on a set of IF-THEN rules. A rule is denoted as:
IF condition THEN conclusion
- Classification by Backpropagation: Neural network learning, often called connectionist learning, builds connections. Backpropagation is a neural-network learning algorithm, one of the most popular ones. It iteratively processes data and compares the target value with the results to learn.
- Lazy Learners: In a lazy learner approach, the machine stores the training tuple and waits for a test tuple. This supports incremental learning. This contrasts with the early learner approach.
ML Classification Example
Let’s take an example. Consider we’re here to teach you about different kinds of codes. We present to you ITF Barcodes, Code 93 Barcodes, QR codes, Aztecs, and data matrices among others. Once through most of the examples, it is now your turn to identify the kind of code it is when we show you one. This is supervised learning and we use parts of the examples of both — training and testing.
Notice how some stars of each type end up on the other side of the curve.
Clustering is an unsupervised classification. This is an exploratory data analysis with no labeled data available. With clustering, we separate unlabeled data into finite and discrete sets of data structures that are natural and hidden. We observe two kinds of clustering-
- Hard Clustering: One object belongs to a single cluster.
- Soft Clustering: One object may belong to multiple clusters.
In clustering, we first select features, then design the clustering algorithm and then validate the clusters. Finally, we interpret the results.
Recall the above example. You could group these codes together. QR code, Aztec, and Data Matrix would be in a group; we could call this 2D Codes. ITF Barcodes and Code 39 Barcodes would group into a ‘1D Codes’ category. This is what a cluster looks like:
An anomaly is something that deviates from its expected course. With machine learning, sometimes, we may want to spot an outlier. One such example would be to detect a dentist bill 85 fillings per hour. This amounts to 42 seconds per patient. Another would be to find a particular dentist bill only on Thursdays. Such situations raise suspicion and anomaly detection is a great way to highlight these anomalies since this isn’t something we’re looking for specifically.
So, this was all about machine learning techniques with Python. Hope you liked our explanation!
Published at DZone with permission of Rinu Gour . See the original article here.
Opinions expressed by DZone contributors are their own.