5 Data Science Interview Questions Part VII

2 min readSep 13, 2020

1. What is a decision tree model?

The decision tree model is a machine learning model in which the algorithm is considered to be a tree. Essentially, it partitions the sample space into sets with points homogeneous or close to one another. Compared to other algorithms, these models are simple to understand and explain. They can also be referred to as CART because they can be used for both classification and regression problems.

2. What is a random forest?

Random forests are an ensemble of decision trees; multiple decision trees put together. This type of algorithm uses bagging and the subspace sampling method to ensure the existence of variance amongst the trees in the forest, and therefore, improve the performance of the model.

Bagging (bootstrap aggregation) consists of training each tree at two-thirds of the data and using the remaining for testing.

The subspace sampling method consists of randomly selecting a subset of features as predictors for each node during training. This helps ensure farther variability of the trees in the forest.

3. Why is a random forest model better than a decision tree model?

Because random forests are an ensemble of decision trees, meaning that they “take many weak decision trees to make a strong learner”. Therefore, they tend to be more accurate and less prone to overfitting.

4. When would you use random forests as opposed to SVM?

The choice of algorithm depends on the data and what you are trying to solve.

Random forests are good for multiclass problems, while SVMs are more suited for two classes.
Random forests allow you to determine the feature importance.
Random forests are quicker and simpler to build.
Random forests can handle a mixture of numerical and categorical features. SVMs are based on the concept of the distance between different data points and therefore, they expect numerical features only (you need to one-hot encode categorical features).

5. Is it beneficial to perform dimensionality reduction before fitting an SVM?

Generally, a good rule of thumb is to compare the number of features vs. the number of observations. If the number of features is greater than the number of observations, performing dimensionality reduction typically improves the performance of the SVM model.

References

OVER 100 Data Scientist Interview Questions and Answers!

Interview Questions from Amazon, Google, Facebook, Microsoft, and more!

towardsdatascience.com

When to use Random Forest over SVM and vice versa?

begingroup$ I would say, the choice depends very much on what data you have and what is your purpose. A few "rules of…

datascience.stackexchange.com