The use of computers and Internet allowed the businesses to store and access the past data& use it for future.With the advancement in Data Science & technology, the field of Machine Learning is noticing a tremendous boost.
This section covers Machine Learning Interview questions and answers on general, conceptual and technical topics. You can also find interesting examples with the questions, as required.
Who are these Machine Learning Interview Questions useful for?
The Machine Learning Interview Questions will be useful for all the beginners and experienced candidates interviewing for the role of Machine Learning Developers, Machine Learning Engineers, Machine Learning Interns etc.
1. What are the different variants of Machine Learning algorithms?
Well, quite a simple and expected question on this subject.
Machine Learning is a set of algorithms that makes the machines learn the data, predict the trends and make decisions to be used in various walks of our life.
The Machine learning models can be divided into three types based on the way they learn.
i.) Supervised Learning - In Supervised learning algorithms, the labelled data which is also called as input or tagged data is fed to the algorithm, which is taught what the expected output should be.
Here the algorithm is taught again and again by humans about what a particular data says.
These algorithms are useful when all the data is labelled.
Supervised Learning problems can be further categorized into:
i.) Classification problems - when the output variable is a category
ii.) Regression problems - when the output variable is a real value.
Examples of such models are: Logistics Regression, Random Forest, Nearest Neighbors, Support Vector Machines etc.
ii.) Unsupervised Learning - In unsupervised learning algorithms, the inputs are fed to the algorithm without telling it the expected output. These algorithms are capable of solving their own problems.
Here, the algorithm observes the patterns and structures in the given data & makes its own decisions.
These algorithms are useful in cases when the data is not tagged or labelled or divided into categories.
Unsupervised Learning Problems can be categorized into:
i.) Clustering
ii.) Association
Some examples of such models are : k-means, GMM, PCA etc.
When only some of the data is labelled while the major portion of it is unlabeled, the algorithms used are semi-supervised.
iii.) Reinforcement Learning - In reinforcement learning, the algorithm continuously learns from the environment in an iterative process until it has explored the complete range of possibilities.
The agent or the algorithm just needs reward feedback to learn and fine tune its behavior, which is referred to as a Reinforcement Signal.
Video : Machine Learning Interview Questions and Answers - For Freshers and Experienced Candidates
2. What are the qualities of a good Machine Learning code?
A machine learning program is expected to work with a large amount of data, time and again, even in complex situations.
This needs it to possess following important qualities:
i.) Scalability - A good machine learning code is capable of being scaled up, when the complexities increase with time.
ii.) No manual checks should be required - Since the machines are expected to work on this code, it should not require any human intervention to check which functions and parameters were run together.
iii.) Data should get auto saved - The data, used and generated both, should get automatically saved at the right place. Humans should not be required to keep a record of that.
iv.) Easy to understand for others - Other members from your team may need to work on your code to upscale it, fix it or maximize it. It should easy for them to read and work on it.
3. What is the difference between Train Data vs Test Data?
When you want to create a supervised learning algorithm, you use Train Data and Test Data sets.
Training Dataset contains both the input and the expected output. It is used to train the algorithm.
Testing Dataset contains just the input and examines how well was the algorithm trained.
During the process you have to be careful that your Test Data doesn't leak into the Train Data otherwise while the algorithm might perform very well during training and testing, it can fail miserably in real life situations.
4. What are the most popular Regression algorithms used in Machine Learning?
The most popular Regression algorithms used in Machine Learning are:
i.) Linear Regression
ii.) Logistic Regression
iii.) Clustering
iv.) Support Vector Machines
v.) Decision Trees
vi.) Naïve Bayes
5. What are the advantages and disadvantages of K-Nearest Neighbors Algorithm?
While it is important to know the technical details and working of each algorithm, it is also very important to know the advantages and limitations of each model, so that you can decide if a particular model is favorable to be used in a particular case or not.
Talking about the KNN algorithm, its main advantages are:
i.) It is very easy to understand and implement. Works well with basic recognition problems.
ii.) It is non-parametric so doesn't require any assumptions to be made and met by the data like other parametric models.
iii.) Since it tags the new data simply based on the learning from the historical data and the labels of the nearest neighbors, it doesn't require much training time.
iv.) It continuously evolves itself with the new data getting into the system.
v.) It works well with both Regression and Classification problems.
Some of the disadvantages of this model are:
i.) Declining Speed - As the data grows, the speed declines.
ii.) Not effective with large no. of input variables.
iii.) Choosing the optimal number of neighbors while trying to classify a new entry is another problem.
iv.) If your data inclines towards a particular class, there's a high possibility of getting a new entry classified wrongly with KNN algorithm.
v.) The outliers may also affect the performance of the model as the classification is based on the distance.
vi.) The model doesn't learn anything from training data, it just uses to the training data to classify the data in actual situations.
vii.) Changing the value of K can change the predicted class variable.
So, these are some of the advantages and disadvantages of KNN. Prepare yourself to answer this type of a question for other models as well.
6. What are the important stages in Machine Learning Life Cycle?
The process of Machine Learning goes through a set of stages in its life cycle. They include:
i.) Gathering Data
ii.) Preparing Data - i.e. making it usable for our machine learning algorithm. This data is divided into training data and test data. Cleaning, Normalization of data etc. are a part of this step.
iii.) Choosing the right model - Depending on your data type.
iv.) Training the model - This stage consumes the maximum time because until and unless you are happy with the model's performance at this stage, you need to readjust it to make correct predictions
v.) Testing the model - At this stage you test it with the "Test Data Set" that you kept aside at the time of splitting the data initially. This data is different from what you used for training the model.
vi.) Tuning the parameters - This is done to further improve the performance of the model. Here, it is also very important to be alert that you don't land up over tuning the model or parameters because that can lead to a lot of wastage of time and inaccurate predictions.
vii.) Making predictions - This is the outcome of all the hard work you put in to make your machines learn to predict in the real world.
7. Difference between Classification and Regression in Machine Learning.
To be able to solve a prediction problem correctly, it is very important to clearly understand if the problem is that of Classification or Regression.
The biggest difference between the two is -
The output variable in case of Regression is numerical (or continuous) while in case of Classification, it is categorical (or discrete).
So, Regression is the task of predicting a continuous quantity while Classification is the task of predicting a discrete class label.