Sollicitatievragen voor Data Scientist in Utrecht

In een sollicitatiegesprek voor de functie data scientist (M/V/X) kunt u verwachten dat de werkgever vragen stelt die uw vaardigheden voor gegevensmodellering, probleemoplossing en programmeren onderzoeken. Wees voorbereid op algemene vragen die uw kennis van statistiek en data science. Stel u ook in op open vragen die uw creativiteit, sociale vaardigheden en formele opleiding in gegevensmodellering en programmeren testen.

42.339Sollicitatievragen voor Data Scientist gedeeld door sollicitanten

Meest gestelde sollicitatievragen voor een data scientist (M/V/X) en hoe te antwoorden

Tips om deze drie veelgestelde sollicitatievragen voor een data scientist te beantwoorden:

Vraag 1: Welke gegevensmodelleertechnieken hebben uw voorkeur en waarom?

Zo antwoordt u: Gegevens vertalen naar begrijpelijke en bruikbare informatie, is een essentieel onderdeel van de rol van een data scientist. Met deze vraag kunnen werkgevers uw gegevensmodelleringsvaardigheden en achtergrond doorgronden. Noem de voor u preferente gegevensmodelleringstechnieken en bespreek deze, bijvoorbeeld voordelen als gebruiksgemak, flexibiliteit, etc.

Vraag 2: Hoe zou u nepaccounts op Instagram detecteren die gebruikt worden om consumenten op te lichten?

Zo antwoordt u: Met zulke vragen kan een werkgever uw probleemoplossend vermogen testen. Bij het beantwoorden van open vragen als deze, is het prima om zelf naar verduidelijking te vragen en een whiteboard te gebruiken om te laten zien dat u kunt programmeren en dat u diagrammen kunt maken. Deel uw gedachtegang terwijl u de stappen van het probleem behandelt.

Vraag 3: Beschrijf omstandigheden die een lijst, tupel of set in Python vereisen.

Zo antwoordt u: Vraagstellers gebruiken dergelijke vragen om uw kennis van de programmeertaal Python te testen. Ga voor het sollicitatiegesprek de grondbeginselen van Python, zoals lijsten, tupels en sets. U zou moeten kunnen uitleggen wanneer en hoe elke tool door data scientists wordt gebruikt.

Meest gestelde sollicitatievragen

Sorteren: Relevantie|Populair|Datum
Meta
Er werd een Data Scientist gevraagd...1 maart 2016

Write an SQL query that makes recommendations using the pages that your friends liked. Assume you have two tables: a two-column table of users and their friends, and a two-column table of users and the pages they liked. It should not recommend pages you already like.

40 antwoorden

CREATE temporary table likes ( userid int not null, pageid int not null ) CREATE temporary table friends ( userid int not null, friendid int not null ) insert into likes VALUES (1, 101), (1, 201), (2, 201), (2, 301); insert into friends VALUES (1, 2); select f.userid, l.pageid from friends f join likes l ON l.userid = f.friendid LEFT JOIN likes r ON (r.userid = f.userid AND r.pageid = l.pageid) where r.pageid IS NULL; Minder

select w.userid, w.pageid from ( select f.userid, l.pageid from rollups_new.friends f join rollups_new.likes l ON l.userid = f.friendid) w left join rollups_new.likes l on w.userid=l.userid and w.pageid=l.pageid where l.pageid is null Minder

Use Except select f.user_id, l.page_id from friends f inner join likes l on f.fd_id = l.user_id group by f.user_id, l.page_id -- for each user, the unique pages that liked by their friends Except select user_id, page_id from likes Minder

Meer reacties weergeven
Meta

You're about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call 3 random friends of yours who live there and ask each independently if it's raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of messing with you by lying. All 3 friends tell you that "Yes" it is raining. What is the probability that it's actually raining in Seattle?

34 antwoorden

Bayesian stats: you should estimate the prior probability that it's raining on any given day in Seattle. If you mention this or ask the interviewer will tell you to use 25%. Then it's straight-forward: P(raining | Yes,Yes,Yes) = Prior(raining) * P(Yes,Yes,Yes | raining) / P(Yes, Yes, Yes) P(Yes,Yes,Yes) = P(raining) * P(Yes,Yes,Yes | raining) + P(not-raining) * P(Yes,Yes,Yes | not-raining) = 0.25*(2/3)^3 + 0.75*(1/3)^3 = 0.25*(8/27) + 0.75*(1/27) P(raining | Yes,Yes,Yes) = 0.25*(8/27) / ( 0.25*8/27 + 0.75*1/27 ) **Bonus points if you notice that you don't need a calculator since all the 27's cancel out and you can multiply top and bottom by 4. P(training | Yes,Yes,Yes) = 8 / ( 8 + 3 ) = 8/11 But honestly, you're going to Seattle, so the answer should always be: "YES, I'm bringing an umbrella!" (yeah yeah, unless your friends mess with you ALL the time ;) Minder

Answer from a frequentist perspective: Suppose there was one person. P(YES|raining) is twice (2/3 / 1/3) as likely as P(LIE|notraining), so the P(raining) is 2/3. If instead n people all say YES, then they are either all telling the truth, or all lying. The outcome that they are all telling the truth is (2/3)^n / (1/3)^n = 2^n as likely as the outcome that they are not. Thus P(ALL YES | raining) = 2^n / (2^n + 1) = 8/9 for n=3 Notice that this corresponds exactly the bayesian answer when prior(raining) = 1/2. Minder

26/27 is incorrect. That is the number of times that at least one friend would tell you the truth (i.e., 1 - probability that would all lie: 1/27). What you have to figure out is the odds it raining | (i.e., given) all 3 friends told you the same thing. Because they all say the same thing, they must all either be lying or they must all be telling the truth. What are the odds that would all lie and all tell the truth? In 1/27 times, they would the all lie and and in 8/27 times they would all tell the truth. So there are 9 ways in which all your friends would tell you the same thing. And in 8 of them (8 out of 9) they would be telling you the truth. Minder

Meer reacties weergeven
Meta

Write a SQL query to compute a frequency table of a certain attribute involving two joins. What if you want to GROUP or ORDER BY some attribute? What changes would you need to make? How would you account for NULLs?

24 antwoorden

If you group by parent_id, you'll be leaving out all posts with zero comments.

@ RLeung shouldn't you use left join? You are effectively losing all posts with zero comment. Minder

Here is the solution. You need a left self join that accounts for posts with zero comments. Select children , count(submission_id) from ( Select a.submission_id, count(b.submission_id) as children from Submissions a Left Join submissions b on On a.submission_id=b.parent_id Where a.parent_id is null Group by a.submission_id ) a Group by children Minder

Meer reacties weergeven
Meta

Given an list A of objects and another list B which is identical to A except that one element is removed, find that removed element.

19 antwoorden

All these supposed answers are missing the point, and this question isn't even worded correctly. It should be lists of NUMBERS, not "objects". Anyway, the question is asking how you figure out the number that is missing from list B, which is identical to list A except one number is missing. Before getting into the coding, think about it logically - how would you find this? The answer of course is to sum all the numbers in A, sum all the numbers in B, subtract the sum of B from the sum of A, and that gives you the number. Minder

select b.element from b left join a on b.element = a.element where a.element is null Minder

In Python: (just numbers) def rem_elem_num(listA,listB): sumA = 0 sumB = 0 for i in listA: sumA += i for j in listB: sumB += j return sumA-sumB (general) def rem_elem(listA, listB): dictB = {} for j in listB: dictB[j] = None for i in listA: if i not in dictB: return i Minder

Meer reacties weergeven
Meta

Data challenge was very similar to the ads analysis challenge on the book the collection of data science takehome challenge, so that was easy (if you have done your homework). SQL was: you have a table where you have date, user_id, song_id and count. It shows at the end of each day how many times in her history a user has listened to a given song. So count is cumulative sum. You have to update this on a daily basis based on a second table that records in real time when a user listens to a given song. Basically, at the end of each day, you go to this second table and pull a count of each user/song combination and then add this count to the first table that has the lifetime count. If it is the first time a user has listened to a given song, you won't have this pair in the lifetime table, so you have to create the pair there and then add the count of the last day. Onsite: lots of ads related and machine learning questions. How to build an ad model, how to test it, describe a model. I didn't do well in some of these.

18 antwoorden

Can't tell you the solution of the ads analysis challenge. I would recommend getting in touch with the book author though. It was really useful to prep for all these interviews. SQL is a full outer join between life time count and last day count and then sum the two. Minder

Can you post here your solution for the ads analysis from the takehome challenge book. I also bought the book and was interested in comparing the solutions. Also can you post here how you solved the SQL question? Minder

for the SQL, I think both should work. Outer join between lifetime count and new day count and then sum columns replacing NULLs with 0, or union all between those two, group by and then sum. Minder

Meer reacties weergeven
Meta

We have two options for serving ads within Newsfeed: 1 - out of every 25 stories, one will be an ad 2 - every story has a 4% chance of being an ad For each option, what is the expected number of ads shown in 100 news stories? If we go with option 2, what is the chance a user will be shown only a single ad in 100 stories? What about no ads at all?

16 antwoorden

For the questions 1: I think both options have the same expected value of 4 For the question 2: Use binomial distribution function. So basically, for one case to happen, you will use this function p(one case) = (0.96)^99*(0.04)^1 In total, there are 100 positions for the ad. 100 * p(one case) = 7.03% Minder

For "MockInterview dot co": The binomial part is correct but you argue that the expected value for option 2 is not 4 but this is false. In both cases E(x) = np = 100*(4/100) = 4 and E(x) = np=100*(1/25) = 4 again. Minder

Chance of getting exactly one add is ~7% As the formula is (NK) (0,04)^K * (0,96)^(N−K) where the first (NK) is the combination number N over K Minder

Meer reacties weergeven
Meta

Consider a game with 2 players, A and B. Player A has 8 stones, player B has 6. Game proceeds as follows. First, A rolls a fair 6-sided die, and the number on the die determines how many stones A takes over from B. Next, B rolls the same die, and the exact same thing happens in reverse. This concludes the round. Whoever has more stones at the end of the round wins and the game is over. If players end up with equal # of stones at the end of the round, it is a tie and another round ensues. What is the probability that B wins in 1, 2, ..., n rounds?

16 antwoorden

Because at the beginning time, A has 8 and B has 6, so let A:x and B:y, then A:8+x-y and B:6-x+y; so there are 10/36 prob of B wins. And A wins prob is 21/36 and the equal prob for next round is 5/36. So for B wins at round prob is 10/36. And if they are equal and to have another round, the number has changed to 7 and 7. So A:7+x-y and B:7-x+y, so this time B wins has prob 15/36 and A wins has prob 15/36. And the equal to have another round is 6/36=1/6. So overall B wins in 2 rounds has prob 5/36*15/36. And for round 3,4,...etc, since after each equal round, the number will go back to 7 and 7 so the prob will not change. So B wins in round 3,4,...n has prob 5/36*(6/36)^(r-2)*15/36. r means the number of the total rounds. Minder

So many answers...Here's my version: For round1, B win only if it gets 3 or more stones than A, which is (A,B) = (1,4) (1,5) (1, 6) (2, 5) (2,6) (3,6) which is 6 cases out of all 36 probabilities. So B has 1/6 chance to win. To draw, B needs to get exactly 2 stones more than A, which is (A, B) = (1,3) (2,4) (3,5) (4,6) or 1/9. Entering the second round, all stones should be equal, so the chance to draw become 1/6, and the chance for either to win is 5/12. So the final answer is (1/6, 1/9*5/12, (1/9)^2*5/12, .....(1/9)^(n-1)*5/12) ) Minder

I don't get it. Shouldn't prob of B winning given it's tie at 1st round be 15/36? given it's tie at 1st round, at the 2nd round Nb > Na can happen if (B,A) is (2,1), (3,1/2),(4,1/2/3), (5,1/2/3/4),(6,1/2/3/4/5), which totals 15 out of 36. Minder

Meer reacties weergeven
LinkedIn

Find the second largest element in a Binary Search Tree

15 antwoorden

The above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null && root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; } Minder

find the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child. Minder

One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch. Minder

Meer reacties weergeven
Meta

Given the following tables how would you know who has the most friends REQUESTS date | sender_id | accepter_id ACCEPTED accepted_at | accepter_id | sender_id

14 antwoorden

Since if two people become friends, the request has to be accepted. We may use the accepted table only for the question how many friends each id has. However, one person can either send or accept friend, we will need to remove the duplication. select a.accepter_id, count(*) as cnt from (select distinct accepter_id, send_id from accepted union select distinct send_id as accepter_id, accepter_id as send_id from accepted ) a group by accpeter_id order by cnt limit 1; Minder

In the vein of answers 6 and 7: SELECT a.user, COUNT(DISTINCT a.friend) AS friend_count FROM ( (SELECT accepter_id AS user, sender_id AS friend FROM ACCEPTED) UNION (SELECT sender_id AS user, acceptor_id AS friend FROM ACCEPTED) ) a GROUP BY a.user ORDER BY friend_count LIMIT 1; Minder

I think this would be simpler select requester_id from request_accepted union all select accepter_id from request_accepted) t group by 1 order by count desc limit 1 Minder

Meer reacties weergeven
Meta

Given two tables Friend_request (requester_id, sent_to_id, time) Request_accepted (acceptor_id, requestor_id, time) Find the overall acceptance rate of requests.

14 antwoorden

Based on "Quick and Dirty"'s assumptions above (e.g. 1 week), here's an example [using Bigquery's SQL syntax] query: select round(100*count(case when b.requestor_id is null then 1 else 0 end)/count(a.requester_id),2) as acceptance_rate from Friend_requests as a left join Request_accepted as b on a.sent_to_id = b.acceptor_id and a.requester_id = b.requestor_id where date(a.time) < date_add(current_date(), "-7", "day") Minder

In both tables, concat the requestor and the recipient IDs then do a left join. Friend_requests[111,aaa,01-01-15;222,aaa,02-01-15] request_accepted[aaa,111,02-01-15] Concat and your left join is searching the second table for 111aaa & 222aaa. It finds the first one and the second one is null. You have a 50% acceptance rate. Regarding the dates, alot can be done with them but they are not strictly part of the question. The only thing that dates mean is that you could have multiple requests before an accept so use distinct. Minder

SELECT (CAST(COUNT(r.acceptor_id) AS FLOAT) / CAST(COUNT(f.requestor_id) AS float)) AS acceptance_rate FROM friend_request f FULL OUTER JOIN request_accepted r ON (f.requestor_id=r.requestor_id AND f.sent_to_id = r.acceptor_id) WHERE f.date > (CURRENT_DATE - INTERVAL '30 day'); Minder

Meer reacties weergeven
Weergave: 1 - 10 van 42.339 sollicitatievragen

Sollicitatievragen weergeven voor vergelijkbare functies

Glassdoor heeft 42.339 sollicitatievragen en verslagen van Data scientist in Utrecht. Bereid uw sollicitatiegesprek voor. Bedrijven ontdekken. Uw droombaan vinden.