- What is Machine Learning – algorithms point of view
- Application Areas of Machine Learning
- Popular Machine Learning Algorithms
- Types of Machine Learning Algorithms
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Classification Supervised learning
- Regression Supervised learning
- Recommendation Systems / Association
- Machine Learning Process
What is Machine Learning
Simply to say:Machine Learning is teaching computers to learn to perform task from past experiences ie., data. Machine Learning can find relationships and patterns within volumes of data that the human mind is incapable of processing (Eg: IRIS Dataset – 3 clusters)
Machine learning is everywhere– influencing nearly everything we do. You’ve likely heard that Uber is world’s largest taxi company, yet owns no vehicles. Facebook, the world’s most popular media owner, creates no content. Alibaba, the most valuable retailer, has no inventory. And Airbnb, the world’s largest accommodation provider, owns no real estate. But what you haven’t explicitly heard is that all of these companies are machine learning companies at their very core. Companies like Netflix use machine learning to recommend movies for us to watch. Navigation apps like Waze use machine learning to help optimize our driving experience. Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.
Def:Machine learning is the semi-automated extraction of knowledge from data. Breaking the above definition into three components:
Knowledge from data:Machine learning always starts with data and our goal is to extract knowledge /insight from that data. We have a question and we hypothesize (to give a possible but not yet proved explanation for something) that our question might be answerable by the data.
Automated extraction:Machine learning needs some automation. We apply some process/algorithm to the data using a computer so that the computer can provide us the insight.
Semi-automated:Machine learning is not fully automated process. Machine learning requires us to make many smart decisions in order for the process to be successful.
What is Machine Learning – algorithms point of view
Machine learning is the idea that there are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data. Eg: For example, one kind of algorithm is a classification algorithm. It can put data into different groups. The same classification algorithm used to recognize handwritten numbers could also be used to classify emails into spam and not-spam without changing a line of code. It’s the same algorithm but it’s fed different training data so it comes up with different classification logic.
Application Areas of Machine Learning· YouTube Video Recommendations
· E-commerce recommendation engines
· Image/Face/Smile Recognition Bike Number recognition and alarm at Signal Junction for the Traffic Police.o Face Recognition at Railway stations for a criminal with a known photo.o Face Recognition at a mass gathering for a known criminal with a known photo.o Open Google’s Mobile App, open camera, focus on a Shop/Business’s logo/name and get its reviews or more details from google servers. (more data it already has, more the accuracy would be).
· Voice Recognition
· Email Spam Detection
· Teaching a Computer how to play Chess
· Self-Driving Cars
· Detecting Credit Card fraud
· Detecting which insurance customer is likely to file a claim
· Sentiment Analysis / Opinion Mining
· Predict the price of a house
· Character Recognition (Recognizing Signatures)
· Computer Games
· Customer Segmentation.
Popular Machine Learning Algorithms
- Linear Regression
- Logistic Regression
- Decision Trees
- Naïve Bayes Classification
- Support Vector Machines
- KNN (K-Nearest Neighbours)
- Random Forest
There are 3 main types of Machine Learning algorithms
- Supervised Learning
- Un-Supervised Learning
- Reinforcement Learning
Supervised Learning (based on historical data) (related to prediction):
Supervised learning is also known as predictive modeling. It is the process of making future predictions using data.
Ex1: online shopping trends
Ex2: stock shares values
Ex3: email message is spam or ham (not spam)
Ex4: customer churning
Ex5: optimal price determination
Here the baby is already taught (data is labeled) about apple and banana. Later when a different colored banana is shown it can label it (mapping)
Here predictions are made on new data for which the label is unknown.
Let’s say you are a real estate agent. Your business is growing, so you hire a bunch of new trainee agents to help you out. But there’s a problem — you can glance at a house and have a pretty good idea of what a house is worth, but your trainees don’t have your experience so they don’t know how to price their houses.
To help your trainees (and maybe free yourself up for a vacation), you decide to write a little app that can estimate the value of a house in your area based on it’s size, neighborhood, etc, and what similar houses have sold for.
So you write down every time someone sells a house in your city for 3 months. For each house, you write down a bunch of details — number of bedrooms, size in square feet, neighborhood, etc. But most importantly, you write down the final sale price:
This is our training data (for which labels/price is already known)
Using that training data, we want to create a program that can estimate how much any other unsold house in your area is worth:
We want to use the training data to predict the prices of other houses.
This is called supervised learning. You knew how much each house sold for, so in other words, you knew the answer to the problem and could work backward from there to figure out the logic.
To build your app, you feed your training data about each house into your machine learning algorithm. The algorithm is trying to figure out what kind of math needs to be done to make the numbers work out.
Unsupervised Learning (Categorized learning):
Extracting structure from data or learning how to best represent data.
The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data.
Here the baby is not yet taught about dogs and cats (data is not labeled). But still, the baby recognizes that they can be categorized into two groups – dogs group and cats group without labeling them like dogs and cats based on their looks and heights.
- Unsupervised learning is used against data that has no historical labels.
- The System is not told the “Right Answer”. The algorithm must figure out what is being shown.
- The goal is to explore the data and find some structure within.
Note: 1 indicates customer has responded to the email and purchased a product/service
Let’s go back to our original example with the real estate agent. What if you didn’t know the sale price for each house? Even if all you know is the size, location, etc of each house, it turns out you can still do some really cool stuff. This is called unsupervised learning.
Even if you aren’t trying to predict an unknown number (like price), you can still do interesting things with machine learning.
So what could do with this data? For starters, you could have an algorithm that automatically identified different market segments in your data. Maybe you’d find out that home buyers in the neighborhood near the local college really like small houses with lots of bedrooms, but home buyers in the suburbs prefer 3-bedroom houses with lots of square footage. Knowing about these different kinds of customers could help direct your marketing efforts.
Another cool thing you could do is automatically identify any outlier houses that were way different than everything else. Maybe those outlier houses are giant mansions and you can focus your best salespeople on those areas because they have bigger commissions.
Classification Supervised learning (for discrete values)
Taking some kind of input (pictures) and mapping it to the discrete number of labels like:
- True or False
- Male or Female(whether an image is of Male or Female)
- Yes or No (whether a candidate would get a university seat or not, whether a customer would buy this product or not)
Classification, also known as categorization, is a machine learning technique that uses known data to determine how the new data should be classified into a set of existing categories. A classification is a form of supervised learning.
- Mail service providers such as Yahoo! and Gmail use this technique to decide whether a new mail should be classified as spam. The categorization algorithm trains itself by analyzing user habits of marking certain emails as spams. Based on that, the classifier decides whether a future mail should be deposited in your inbox or in the spams folder.
- iTunes application uses classification to prepare playlists.
- Decision Trees
- Naive Bayes Classifier Algorithm (Detecting Spam e-mails)
- Logistic Regression (Student Admitted or Rejected)
Regression Supervised learning (for continuous values)
It is used to predict continuous values.
Ex: Finding Price of a House:
Sachin has a house with W rooms, X bathrooms, Y square-footage and Z lot-size. Based on other houses in the area that have recently sold, how much (rupees) can he sell his house for?
It can be some 'numerical value' (which can be continuous): this relates to regression.
So we would use regression for this kind of problem.
Other Applications of Regression (check):
- Loan repayment – based on credit scores (How much should be the credit score)
- Grades – getting a job (How much should be the grade value)
- Grades – getting a seat in a top university (How much should be the grade value)
Regression Supervised Learning Algorithms examples:
- Regression (detecting weight by knowing height)
- Multiple Linear Regression Analysis (detecting mileage by knowing hp, wt etc)
Classification and Regression difference with another example:
Suppose from your past data (trained data) you come to know that your best friend likes the above movies.
Now one new movie (test data) has released. Hopefully, you want to know your best friend like it or not. If you strongly confirmed about the chances of your friend like the move, you can take your friend to the movie this weekend.
Now one new movie (test data) has released. Now you are going to find how many times this newly released movie will your friend watch? It could be 5 times, 6 times, 10 times etc…
If you clearly observe the problem is about finding the count, sometimes we can say this as predicting the value.
- If forecasting target class ( Classification )
- If forecasting a value ( Regression )
Clustering – Unsupervised learning
Clustering is used to form groups or clusters of similar data based on common characteristics. Clustering is a form of unsupervised learning.
- Search engines such as Google and Yahoo! use clustering techniques to group data with similar characteristics.
- Newsgroups use clustering techniques to group various articles based on related topics – technology, politics, sports etc.
Recommendation Systems /Association – Unsupervised learning
The recommendation is a popular technique that provides close recommendations based on user information such as previous purchases, clicks, and ratings.
- Amazon uses this technique to display a list of recommended items (“customers who bought this item also bought”) that you might be interested in, drawing information from your past actions. There are recommender engines that work behind Amazon to capture user behavior and recommend selected items based on your earlier actions.
- Facebook uses the recommended technique to identify and recommend the “people you may know list”.
- Google Search Engine uses the recommended technique to recommend the “People also search for”
Ex1: Amazon E-commerce
Ex2: Google Search
Explaining Supervised Vs unsupervised Machine Learning In other words:
- is like learning with a teacher
- training dataset is like a teacher
- the training data set is used to train the machine
- is like learning without a teacher
- the machine learns through observation & find structures in data
It is often used for Robotics, Gaming, and Navigation
With Reinforcement Learning, the algorithm discovers through trial and error which actions yield the greatest rewards.
It has three primary components:
- The Agent: The learner or decision maker (Eg: The Driverless Car)
- The Environment: Everything the Agent interacts with (Eg: The Roads, Trafic Signal lights, Humans, traffic lines on roads, Parking area etc)
- The Actions: What the Agent can do (Eg: Drive the car, stop, park, horn, start the engine, stop the engine, open doors etc)
Note: The Goal in Reinforcement learning is to learn the best policy.
Machine learning process
---Knowledge acquired from various resources online and our take on this subject