5 Data Science Interview Questions Part IV

1. Explain the SVM algorithm.

Edna Figueira Fernandes
3 min readAug 23, 2020

Support Vector Machine (SVM) is a supervised learning algorithm that can be used for regression or classification problems, however, they are more commonly used for classification.

The goal of SVMs is to find the optimal hyperplane that best divides the dataset into classes. The support vectors are the points nearest to the hyperplane; they dictate the positioning of the hyperplane. The distance between the support vectors is called the margin.

2. Explain the C value in SVM.

The C value indicates how much misclassification is allowed during training. A large C value results in a small margin and therefore less misclassification during training, while a small C value results in a larger margin allowing more misclassification.

3. What are the different kernel functions in SVM?

There are several kernel functions with the most common ones being:

  • Linear Kernel: it creates a “linear decision boundary” to separate the classes in the dataset.
  • Polynomial kernel: it creates a bent decision boundary dependent on the specified degree.
  • Sigmoid kernel: this is similar to the sigmoid function in linear regression.
  • RBF kernel (radial basis function): it creates Gaussian distributions.

4. Explain the Kernel trick.

The kernel trick involves creating non-linear combinations of the features in the dataset and projecting them into a higher dimension in order to be able to separate them linearly.

5. What is a confusion matrix?

A confusion matrix is a table used to show the performance of a classifier. On the horizontal axis, we have the classifier’s predictions while the actual values are represented by the vertical axis.

  • TP (true positives): represent the positive cases in the training set that the model classified as positives.
  • TN (true negatives): represent the negative cases in the training set that the model classified as negatives.
  • FP (false positives): represent the negatives cases in the training set that were classified as positives.
  • FN (false negatives): represent the positive cases in the training set that were classified as negatives.

References

https://www.udemy.com/course/machinelearning/learn/lecture/5714410?start=0#overview

--

--

No responses yet