Sollicitatievraag

Sollicitatiegesprek voor de functie Data Scientist, Analytics

-

Meta

Write a SQL query to compute a frequency table of a certain attribute involving two joins. What if you want to GROUP or ORDER BY some attribute? What changes would you need to make? How would you account for NULLs?

Antwoord

Antwoorden op sollicitatievragen

25 antwoorden

5

If you group by parent_id, you'll be leaving out all posts with zero comments.

Anoniem op

3

select number_comments, count(submission_id) as number_posts from ( # more than zero comments select submission_id, count(post_id) as number_comments from ( select submission_id, case when parent_id is null 1 else 0 end as post, case when parent_id is not null parent_id else null end as post_id, body from Submissions )k where post =0 group by submission_id ) k1 group by number_comments union select number_comments, count(submission_id) as number_posts from ( # comments= 0 select submission_id, 0 as number_comments from ( select submission_id, case when parent_id is null 1 else 0 end as post, case when parent_id is not null parent_id else null end as post_id, body from Submissions )k where post =1 group by submission_id ) k1 group by number_comments

Anoniem op

1

@ RLeung shouldn't you use left join? You are effectively losing all posts with zero comment.

anonymous op

2

Here is the solution. You need a left self join that accounts for posts with zero comments. Select children , count(submission_id) from ( Select a.submission_id, count(b.submission_id) as children from Submissions a Left Join submissions b on On a.submission_id=b.parent_id Where a.parent_id is null Group by a.submission_id ) a Group by children

Anoniem op

2

I've tested all these on a mock data set and none of them work! Does anyone have the correct solution? I'm stuck on this one..

QR op

2

Posts and comments in the same table looks weird. Here's my attempt (made easy with CASE) to exclude all the posts from the table and grouping/counting comments. SEL parent_id ,COUNT(*) as comment_count ( SEL * ,CASE WHEN perent_id IS NULL THEN 'Post' ELSE 'comment' END as post_or_comment FROM Submissions ) a WHERE post_or_comment = 'comment'

Alberto op

0

select comments_count, count(submission_id) as post_count from ( select submission_id, count( distinct parent_id) as comments_count from Table A group by submission_id )A group by comments_count

Anoniem op

0

I think all of the Posts are missing Parent_ID. I am editing the code shared above. This will solve the duplicate problem select parent_id as post, count(parent_id) as num_of_comments from submissions group by parent_id union select submission_id as post, 0 as num_of_comments from submissions where parent.id not in (select submission_id from submissions)

Anoniem op

7

Can you explain why you would even need the self-join here? Can you not just group by parent_id and do the COUNT() on each group, since the parent_id values correspond to the post values when they're not null?

Anoniem op

0

select t.post_id, count(t.submission_id) -1 from (select submission_id, case when parent_id is null then submission_id else parent_id end as post_id from submissions) t group by post_id

Anoniem op

0

Not the shortest answer but I think much clearer than anything posted here. Also gives output table that could actually be fed directly into a histogram which was part of the question. SELECT CASE WHEN num_comments IS NULL THEN 0 ELSE num_comments END AS num_comments, COUNT(parent_post_id) AS cnt_posts FROM ( SELECT submission_id AS parent_post_id, comment_count.num_comments FROM Submissions WHERE parent_id IS NULL LEFT JOIN ( SELECT parent_id, COUNT(parent_id) AS num_comments FROM Submissions WHERE parent_id IS NOT NULL GROUP BY 1 ) comment_count ON submission_id = comment_count.parent_id ) GROUP BY 1 ORDER BY 1

Cam op

0

select p.parent_id as posts, count(c.submission_id) as commentcount from submissions c inner join submissions p on c.parent_id = p.submission_id group by p.parent_id;

Harish op

0

select case when parent_id is not null then parent_id else sub_id end as post_id, sum(case when parent_id is not null then 1 else 0 end) as comment_count from submissions group by case when parent_id is not null then parent_id else sub_id end;

Priyanka Shukla op

0

Create table: create table submissions ( submission_id int null, body varchar(500) null, parent_id int null ); Insert records: (change your database name) INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (1, 'POST1', null); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (null, 'C1', 1); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (2, 'POST2', null); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (3, 'POST3', null); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (null, 'C2', 3); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (null, 'C3', 3); Solution: SELECT a.submission_id AS post_id, a.body, sum(CASE WHEN t.parent_id > 0 THEN 1 ELSE IFNULL(t.parent_id,0) END) AS comment_id FROM submissions AS a LEFT JOIN (SELECT b.parent_id FROM submissions AS b) t ON a.submission_id = t.parent_id WHERE a.submission_id IS NOT NULL GROUP BY post_id; Results: 1 POST1 1 2 POST2 0 3 POST3 2

Complete solution op

0

CREATE TABLE users( sid INT , pid INT , body Varchar(255)); insert into users Values ( 2,null, "cover"), (1,2,"Ami is"),(3,2,"hi"),(4,2,"good pic"),(5,null ,"profil pic"),(6,5,"nice"); (select pid , COUNT(pid) as total from users where pid is not null group by pid)

Anoniem op

0

create table subs( sub_id integer, parent_id integer ) insert into subs values(1,null); insert into subs values(2,null); insert into subs values(3,null); insert into subs values(4,null); commit; insert into subs values(5,1); insert into subs values(6,1); insert into subs values(7,1); insert into subs values(8,1); insert into subs values(9,2); insert into subs values(10,2); insert into subs values(11,3); insert into subs values(12,3); insert into subs values(12,4); commit; select * from subs select cc, count(sub_id) from ( select a.sub_id, count(b.sub_id) cc from subs a inner join subs b on(b.parent_id = a.sub_id) group by 1) group by 1

Anoniem op

0

I found it easier to explain when I broke it out into named sub tables to handle the case when there are no comments on a post and you want the end result to be the histogram of the number of comments per post: with parent_comment_ct as ( SELECT parent_id, COUNT(parent_id) AS num_comments FROM submissions WHERE parent_id IS NOT NULL GROUP BY parent_id ), submission_comment_ct as ( SELECT su.submission_id AS parent_post_id, pcc.num_comments AS num_comments FROM submissions su LEFT JOIN parent_comment_ct pcc ON su.submission_id = pcc.parent_id WHERE su.parent_id IS NULL ) SELECT CASE WHEN scc.num_comments IS NULL THEN 0 ELSE scc.num_comments END AS num_comments, COUNT(scc.parent_post_id) AS cnt_posts FROM submission_comment_ct scc GROUP BY 1 ORDER BY 1

icarus op

0

SELECT recommended_page FROM (SELECT f.user1_id as users, f.user2_id as freinds, l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user2_id = l.user_id WHERE f.user1_id = 1 UNION ALL SELECT f.user2_id as users,f.user1_id as friends,l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user1_id = l.user_id WHERE f.user2_id = 1) MINUS (SELECT page_id as recommended_page FROM likes WHERE user_id = 1);

Anoniem op

0

SELECT recommended_page FROM (SELECT f.user1_id as users, f.user2_id as freinds, l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user2_id = l.user_id WHERE f.user1_id = 1 UNION ALL SELECT f.user2_id as users,f.user1_id as friends,l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user1_id = l.user_id WHERE f.user2_id = 1) MINUS (SELECT page_id as recommended_page FROM likes WHERE user_id = 1);

Anoniem op

0

select c.subID as SubmissionID, count(c.body)-1 as Counts_Comments from subm c LEFT JOIN subm b ON c.subID = b.pID where b.pID is null AND c.pID is NULL group by c.subID UNION ALL select a.pID as SubmissionID, count(a.body) as Counts_Comments from ( select *, case when pID IS NULL then 'P' Else 'C' END as P_O_C from subm)a where P_O_C = 'C' group by a.pID Order by SubmissionID;

Aslesha op

0

select a.user_name,b.user_name,page_liked from services_db.pages_liked a, services_db.user_friends b where 1=1 and a.user_name = b.friend_user and a.page_liked not in ( select page_liked from services_db.pages_liked c where 1=1 and c.user_name = b.user_name ) ;

GV op

0

select parent_id as post, count(parent_id) as num_of_comments from submissions group by parent_id union select submission_id as post, 0 as num_of_comments from submissions where parent.id=null

Anoniem op

3

In my case this question was like: 'you have a table Submissions with the submission_id, the body, and the parent_id. Submissions can be posts, or comments to a post. In posts, parent_id is null, and in comments, the parent_id is the post the comment is commenting about. How would you go and make a histogram of number of posts per comment_count?' I think i solved it along the lines of: SELECT comment_counts.n_comments, count distinct(n_comments.submission_id) ( select s1.submission_id, COUNT DISTINCT(s2.parent_id) as n_comments OUTER join submissions on s1.submission_id = s2.parent_id group by submission_id) comment_counts GROUP BY comment_counts.n_comments

Anoniem op

0

select k.post_id, count(submission_id) -1 from (select submission_id, case when parent_id is null then submission_id else parent_id end as post_id from submissions) t group by post_id

Anoniem op

0

I think it is pretty straight forward. All the posts will have null parent_id. Considering the table schema to be something like this: CREATE TABLE submissions ( submission_id INT, body VARCHAR(500), parent_id INT ); SELECT DISTINCT nvl(parent_id::TEXT,'Post with no comments') AS post_id, COUNT(CASE WHEN parent_id IS NOT NULL THEN submission_id ELSE 0 END) AS number_of_comments_or_post FROM submissions GROUP BY 1; This will give results like this: post_id number_of_comments_or_post Post with no comments 8 1 10 7 11 13 8 19 9 25 7 So, the first row will give the number of posts with no comments which is 8 and remaining rows tell the number of comments per post. Is there a flaw in this?

ig op

Voeg antwoorden of opmerkingen toe

Meld u aan of registreer u om hier een opmerking over te maken.