Er werd een Data Mining Scientist gevraagd...13 oktober 2011

Given the set a a b b a a c c b b of unknown length, write an algorithm that figures out which occurs most frequently and with the most continuous repetition.

doesn't matter, no answer is right!

Maintains Hashmap to store the overall frequency and longest continuous sequence. Minder


What was the angle between the clock hands at 3:15.

7.5 degrees

Zero degrees


List the strings that are anagrams from a set of strings?

Sorting the strings is not optimal because each sort is O(N log N) where N is the number of characters in each word. A more optimal solution is to create a function to encode each word as a hash table of character frequencies, which is O(N) for each word. Minder

sort the strings and compare


there really were none.

they seemed ot want to hear what I had ot say about my past assignments and relevance to the opening. i think they were not impressed. Minder



How would you design a recommendation system (like amazon)?

Use collaborate filtering to compare personal preference with others. If A and B are similar, we can recommend preferred items in B to A. Minder

Why downvote on other answer? He/she is right. Collaborative filtering is the most common strategy for recommendation systems. You see user A buys these things and user B also bought those things but user B bought this other thing too so let's show that thing to User A. Minder

Compass Group

why do you think you should be chosen for this position?

I'm hard working, great team player, reliable, quick learner etc etc

cuz i got a 10inch and great performer in front of the camera - porn industry


We do pre-screening on the data to remove fraud threats -- so how do we find a data sample that we can use to determine a real representation of fraud events.

Remove screen and look at the unbiased data.

Yes, remove prescreen and look the unbiased sample. IF the unbiased sample becomes too big, then just randomly choose 1/2, or small, for the purpose of representation of fraud events. Minder


Implement a sampling function with nominal distribution.

I think you mean Normal distribution! If you are using R use set.seed(). You can then use rnorm() with size, mean & SD. e.g. >set.seed(123) >rnorm(100, 2, 5) Minder

I'm the original poster, sorry for my typo. I actually mean multinomial distribution. And the advanced question was, if the probability is a skewed distribution, how would you speed up your algorithm. You can find both answer from Wikipedia. :) Minder

Tell me about your past experience in engineering.

Provided examples from my education and work.

Bharat Aluminium

What's ur favorite subject

Mine Development.

