# Research scientist sollicitatievragen

# 9K

Sollicitatievragen voor een Research Scientist gedeeld door sollicitanten## Meest gestelde sollicitatievragen

### You're about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call 3 random friends of yours who live there and ask each independently if it's raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of messing with you by lying. All 3 friends tell you that "Yes" it is raining. What is the probability that it's actually raining in Seattle?

33 antwoorden↳

Bayesian stats: you should estimate the prior probability that it's raining on any given day in Seattle. If you mention this or ask the interviewer will tell you to use 25%. Then it's straight-forward: P(raining | Yes,Yes,Yes) = Prior(raining) * P(Yes,Yes,Yes | raining) / P(Yes, Yes, Yes) P(Yes,Yes,Yes) = P(raining) * P(Yes,Yes,Yes | raining) + P(not-raining) * P(Yes,Yes,Yes | not-raining) = 0.25*(2/3)^3 + 0.75*(1/3)^3 = 0.25*(8/27) + 0.75*(1/27) P(raining | Yes,Yes,Yes) = 0.25*(8/27) / ( 0.25*8/27 + 0.75*1/27 ) **Bonus points if you notice that you don't need a calculator since all the 27's cancel out and you can multiply top and bottom by 4. P(training | Yes,Yes,Yes) = 8 / ( 8 + 3 ) = 8/11 But honestly, you're going to Seattle, so the answer should always be: "YES, I'm bringing an umbrella!" (yeah yeah, unless your friends mess with you ALL the time ;) Minder

↳

Answer from a frequentist perspective: Suppose there was one person. P(YES|raining) is twice (2/3 / 1/3) as likely as P(LIE|notraining), so the P(raining) is 2/3. If instead n people all say YES, then they are either all telling the truth, or all lying. The outcome that they are all telling the truth is (2/3)^n / (1/3)^n = 2^n as likely as the outcome that they are not. Thus P(ALL YES | raining) = 2^n / (2^n + 1) = 8/9 for n=3 Notice that this corresponds exactly the bayesian answer when prior(raining) = 1/2. Minder

↳

26/27 is incorrect. That is the number of times that at least one friend would tell you the truth (i.e., 1 - probability that would all lie: 1/27). What you have to figure out is the odds it raining | (i.e., given) all 3 friends told you the same thing. Because they all say the same thing, they must all either be lying or they must all be telling the truth. What are the odds that would all lie and all tell the truth? In 1/27 times, they would the all lie and and in 8/27 times they would all tell the truth. So there are 9 ways in which all your friends would tell you the same thing. And in 8 of them (8 out of 9) they would be telling you the truth. Minder

### Given an list A of objects and another list B which is identical to A except that one element is removed, find that removed element.

19 antwoorden↳

All these supposed answers are missing the point, and this question isn't even worded correctly. It should be lists of NUMBERS, not "objects". Anyway, the question is asking how you figure out the number that is missing from list B, which is identical to list A except one number is missing. Before getting into the coding, think about it logically - how would you find this? The answer of course is to sum all the numbers in A, sum all the numbers in B, subtract the sum of B from the sum of A, and that gives you the number. Minder

↳

select b.element from b left join a on b.element = a.element where a.element is null Minder

↳

In Python: (just numbers) def rem_elem_num(listA,listB): sumA = 0 sumB = 0 for i in listA: sumA += i for j in listB: sumB += j return sumA-sumB (general) def rem_elem(listA, listB): dictB = {} for j in listB: dictB[j] = None for i in listA: if i not in dictB: return i Minder

### Given two tables Friend_request (requester_id, sent_to_id, time) Request_accepted (acceptor_id, requestor_id, time) Find the overall acceptance rate of requests.

14 antwoorden↳

Based on "Quick and Dirty"'s assumptions above (e.g. 1 week), here's an example [using Bigquery's SQL syntax] query: select round(100*count(case when b.requestor_id is null then 1 else 0 end)/count(a.requester_id),2) as acceptance_rate from Friend_requests as a left join Request_accepted as b on a.sent_to_id = b.acceptor_id and a.requester_id = b.requestor_id where date(a.time) < date_add(current_date(), "-7", "day") Minder

↳

In both tables, concat the requestor and the recipient IDs then do a left join. Friend_requests[111,aaa,01-01-15;222,aaa,02-01-15] request_accepted[aaa,111,02-01-15] Concat and your left join is searching the second table for 111aaa & 222aaa. It finds the first one and the second one is null. You have a 50% acceptance rate. Regarding the dates, alot can be done with them but they are not strictly part of the question. The only thing that dates mean is that you could have multiple requests before an accept so use distinct. Minder

↳

SELECT (CAST(COUNT(r.acceptor_id) AS FLOAT) / CAST(COUNT(f.requestor_id) AS float)) AS acceptance_rate FROM friend_request f FULL OUTER JOIN request_accepted r ON (f.requestor_id=r.requestor_id AND f.sent_to_id = r.acceptor_id) WHERE f.date > (CURRENT_DATE - INTERVAL '30 day'); Minder

### 1) Provided a table with user_id and dates they visited platform, find the top 100 users with the longest continuous streak of visiting the platform as of yesterday. 2) Provided a table with page_id, event timestamp and a flag for a state (which is on/off), find the number of pages that are currently on.

13 antwoorden↳

For question 2: With Max as (SELECT Page_id, MAX(timestamp) as MaxDate FROM page_status Group by Page_id) SELECT Count(*) From page_status as P INNER JOIN Max as M on P. Page_id=M. Page_id and P.timestamp=M.MaxDate WHERE state=’on’ Minder

↳

The answer to first question could look like this (returning more than 100 top users if there are more ranking same at 100th place): WITH user_streaks AS ( SELECT user_id, level as lvl, rank() over (order by level desc) rnk FROM table WHERE connect_by_isleaf = 1 START WITH date = trunc(sysdate) - 1 CONNECT BY PRIOR date = date + 1 AND PRIOR user_id = user_id ) SELECT user_id, lvl, rnk FROM user_streaks WHERE rnk <= 100 Minder

↳

Self join on userid and date = date-1

### Behavioral questions probing about fit

11 antwoorden↳

Hey Ramii, Congratulation on your achievement! I am in the onsite stage now. I wonder have you encountered pure business case on the onsite interview? For the tech case, do you mind give me a structure of it? Thank you so much! Minder

↳

When you say 3rd round do you mean onsite? I have my onsite and dont exactly know what will come for interview - an analytical case study or a MBA case interview? Minder

↳

I didn't apply through school, if that's what you're implying by onsite. I have a Masters degree in a quantitative field and around 4 years' experience in data science. The city I live in has a BCG genralist consulting office, but the nearest GAMMA hub is in another city. All my interviews were VC, where I connected with interviewers in the other city or overseas, from my local BCG office. Regarding case studies in interviews, the homework case was analytical. Others varied based on the work the interviewer did. People who work in GAMMA had cases with more numbers and charts, case interviews with generalist consultants were less analytical and more abstract in some sense. In both rounds - second and third - one interviewer was from GAMMA and other from generalist consulting. Hope this helps! And all the best with your interview! Minder

### Write a function that takes in two sorted lists and outputs a sorted list that is their union.

10 antwoorden↳

Second part of merge sort. Don't answer with sort(a), etc. Anyone can do that... def merge(A, B): i=0 j=0 sorted_list = [] while i < len(A) and j < len(B): if A[i] <= B[j]: sorted_list.append(A[i]) i += 1 else: sorted_list.append(B[j]) j += 1 if i < len(A): sorted_list.extend(A[i:]) elif j < len(B): sorted_list.extend(B[j:]) return sorted_list Minder

↳

write 2 helpers: 1) INSERT(A, b) = put element b within A in the sort order 2) DEL(A, a) = delete element a from A Then do this recursion: f(A,B) : if max(A) <= min(B) return [A B] else { B = INSERT(B, max(a)); A = DEL(A, max(a); f(A,B); } something like that. try coding and testing. I haven't. Minder

↳

I assumed that we can not use any "sort" function and we want it with linear time. so here it is: def my_sort(list_a, list_b): if len(list_a) ==0: return list_b elif len(list_b) ==0: return list_a else: if list_a[-1] > list_b[-1]: return( my_sort(list_a[0:-1], list_b) + [list_a.pop(-1)]) else: return(my_sort(list_a,list_b[:-1]) + [list_b.pop(-1)]) Minder

### They asked probability question: 1) The probability that item an item at location A is 0.6 , and 0.8 at location B. What is the probability that item would be found on Amazon website. 2). I have table 1, with 1million records, with ID, AGE (column names) , Table 2 with 100 records with ID and Salary then the interviewer gave me the following SQL script SELECT A.ID,A.AGE,B.SALARY FROM TABLE 1 A LEFT JOIN TABLE 2 B ON A.ID = B.ID + WHERE B.SALARY > 50000 ( HE ASKED TO MODIFY THIS LINE OF QUERY) How many records would be returned? 3. Give a csv file with ID, and Quantity columns, 50million records and size of data is 2gig, write a program in any language of your choice to aggregate the QUANTITY column.

9 antwoorden↳

P(A) = 0.6, P(B) = 0.8 P(AUB) = P(A) + P(B) - P(A and B) = 0.6 + 0.4 - 0.48 = 0.92 [Independent Events] i.e., there is 8% probability the item is not from either A or B. So, there is 92% probability the item listed on the website is from A or B. This takes an assumption that the items available in A and B are listed on the website. We don't have conditional probability of item being found on Website given it is from A or B . P(W/A) or P(W/B) is not available. Hence, insufficient data available to answer what is the probability of item being found on website. Minder

↳

Insufficient data to make an inference on what is the probability that an item is in Amazon given that the probability of the item being in A is 0.6 and probability that it is in B is 0.8. Are A and B associated with Amazon? Is the vendor who owns the item an exclusive vendor of Amazon? Is the item easily available? Minder

↳

The simple explanation is to take the probability of not finding the two items (0.4 * 0.2 = 0.08). So the probability of both items not showing would be 8%. That means you would have a 92% probability of at least one item showing up. Minder

### Mostly situational.. guesstimate and SPSS related questions to mark proficiency in statictics and visualizations.

7 antwoorden↳

has anyone undergone the certification? please let me know

↳

No I dont know anyone who has undergone this certification, even I tried finding out but was unable to search for someone within such a short time. Try texting those who already working in this position at Merkle on Linkedin. Minder

↳

I had a same situation. I guess this is a fake offer. No company will make you to do the course on your own money (before giving the offer) . I am also thinking of discontinuing Interview Minder

### The HR personnel did not ask any questions, he only wanted to schedule an interview.

6 antwoorden### We have a table called ad_accounts(account_id, date, status). Status can be active/closed/fraud. A) what percent of active accounts are fraud? B) How many accounts became fraud today for the first time? C) What would be the financial impact of letting fraud accounts become active (how would you approach this question)?

6 antwoorden↳

A) what percent of active accounts are fraud? Select sum(Case when status = ‘fraud’ then 1 else 0 end)/count(*) as Fraud_percentage from ad_accounts where status ‘closed’; B) How many accounts became fraud today for the first time? select count(*) from ( select account_id, min(date) as First_fraud from ad_accounts where status = 'fraud' group by account_id having First_fraud = current_date() ); Minder

↳

Yep, should be A) what percent of active accounts are fraud? SELECT COUNT(DISTINCT t2.account_id)/COUNT( DISTINCT t1.account_id) AS perc_fraud FROM ad_accounts AS t1 LEFT JOIN ad_accounts AS t2 ON t1.account_id = t2.account_id AND t2.status = 'fraud' AND t2.date > t1.date WHERE t1.status = 'active' Minder

↳

For question B, if I assume i have today's data ans yesterday's data in the table, would this work? Select Count (distinct a.Account_id) From ad_accounts A Inner join ad_accounts b On a.account_id=b.account_id Where a.date=current_data and b.date=date_add (‘day’, -1, current_date) And a.status=’fraud’ And b.status!=’fraud’ Minder