Sollicitatiegesprek voor de functie Solutions Architect



Suppose you had bank transaction data, and wanted to separate out likely fraudulent transactions. How would you approach it? Why might accuracy be a bad metric for evaluating success?


Antwoorden op sollicitatievragen

3 antwoorden


What they were getting at here is that fraudulent bank data is has extremely imbalanced classes. If you were to train a supervised classifier on the data as you got it ( with no method of counteracting the class imbalances) then your classifier would predict probably 98-99% accuracy. Why? Because it only saw a few actual cases of fraud. So it learned to ALWAYS predict that the transaction was real. Then, if your test set has 99 real cases and 1 fake case, then it predicts all are real, and achieves 99% accuracy. This is bad. Thats what they were wanting to hear.

Anoniem op


There are traditional machine learning approaches and deep learning approaches. It can be treated as a classification problem (variations of decision trees, etc.) or a clustering problem, or an anomaly detection problem. Accuracy as a metric could be problematic because unbalanced data set.

John Doe op


This can be solved using both supervised and un supervised method

Anoniem op

Voeg antwoorden of opmerkingen toe

Meld u aan of registreer u om hier een opmerking over te maken.