SQL Murder Mystery

Select Star Sql

Beazley’s Last Statement

查找此人的遗言:

1
2
3
4
SELECT first_name, last_name, last_statement
FROM executions
WHERE first_name LIKE '_apoleon'
AND last_name LIKE '%ley'

Claims of Innocence

查找所有拒绝遗言的人:(第二种写法可以查询符合该条件的数量,但需要遍历全表)

1
2
3
4
5
6
7
8
9
10
11
12
13
-- WHERE --
SELECT COUNT(*)
FROM executions
WHERE last_statement IS NULL

-- COUNT CASE WHEN --
SELECT
COUNT(CASE WHEN last_statement IS NULL THEN 1
ELSE NULL END)
FROM executions

-- COUNT functions --
SELECT COUNT(*) - COUNT(last_statement) FROM executions

平均遗言长度:LENGTH 统计字符串长度

1
2
SELECT AVG(LENGTH(last_statement))
FROM executions

去重:DISTINCT

1
2
SELECT DISTINCT county
FROM executions

也可以用 GROUP BY:

1
2
3
SELECT county
FROM executions
GROUP BY county

查询喊冤的人数比:

1
2
3
4
SELECT
1.0 * COUNT(CASE WHEN last_statement LIKE '%innocent%'
THEN 1 ELSE NULL END) / COUNT(*)
FROM executions

The long tail

Strange Query

如果查找的其中一列是聚合,另一列既不是聚合,又不是 GROUP BY 的列,这种现象称作“Strange Query”。比如第一列是 Count,第二列是 first_name,那么查询得到的表格中,first_name 列对应最后一个表项的值。

In this case, our database picks the first name from the last entry in our table.


WHERE 在聚合前执行,而 HAVING 可以看作拿到了聚合的结果,在聚合后执行。

查找 年龄大于50岁被处刑的人数大于2 的县:

1
2
3
4
5
SELECT county, COUNT(*) AS number_over_50
FROM executions
WHERE ex_age > 50
GROUP BY county
HAVING number_over_50 > 2

Nest select

查找遗言最长的人:

1
2
3
4
5
SELECT first_name, last_name
FROM executions
WHERE LENGTH(last_statement) =
(SELECT MAX(LENGTH(last_statement))
FROM executions)

查找各县被处决人数占比:

1
2
3
4
5
6
7
SELECT
county,
100.0 * COUNT(*) / SELECT COUNT(*) FROM executions
AS percentage
FROM executions
GROUP BY county
ORDER BY percentage DESC

Execution Hiatuses

计算经过了多少天:

1
SELECT JULIANDAY('1993-08-10') - JULIANDAY('1989-07-07') AS day_difference

计算每次处决人的天数间隔:

previous is derived from executions, so we’re effectively joining executions to itself. This is called a “self join” and is a powerful technique for allowing rows to get information from other parts of the same table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT
last_ex_date AS start,
ex_date AS end,
-- 计算天数间隔 --
JULIANDAY(ex_date) - JULIANDAY(last_ex_date)
AS day_difference
FROM executions
JOIN (
-- self join --
SELECT
-- 相邻 ex_number 间的天数间隔,因此是 ex_number + 1 --
ex_number + 1 AS ex_number,
ex_date as last_ex_date
FROM executions
WHERE ex_number < 553
) previous
ON executions.ex_number = previous.ex_number
ORDER BY day_difference DESC

更加优雅的写法:

1
2
3
4
5
6
7
8
9
10
SELECT
previous.ex_date AS start,
executions.ex_date AS end,
JULIANDAY(executions.ex_date) - JULIANDAY(previous.ex_date)
AS day_difference
FROM executions
-- alias --
JOIN executions previous
ON executions.ex_number = previous.ex_number + 1
ORDER BY day_difference DESC

Sql murder mystery

第一步

查找案件信息:

1
2
3
4
5
SELECT *
FROM crime_scene_report
WHERE date = 20180115
AND type = 'murder'
AND city = 'SQL City'
date type description city
20180115 murder Security footage shows that there were 2 witnesses. The first witness lives at the last house on “Northwestern Dr”. The second witness, named Annabel, lives somewhere on “Franklin Ave”. SQL City

第一个目击者:

1
2
3
4
5
SELECT * 
FROM person
WHERE address_street_name = 'Northwestern Dr'
ORDER BY address_number DESC
LIMIT 1
id name license_id address_number address_street_name ssn
14887 Morty Schapiro 118009 4919 Northwestern Dr 111564949

第二个目击者:

1
2
3
4
SELECT * 
FROM person
WHERE name LIKE '%Annabel%'
AND address_street_name = 'Franklin Ave'
id name license_id address_number address_street_name ssn
16371 Annabel Miller 490173 103 Franklin Ave 318771143

拿到 id 后,可以去 interviews 中查找:

1
2
3
4
SELECT *
FROM interview
WHERE person_id = 14887
OR person_id = 16371
person_id transcript
14887 I heard a gunshot and then saw a man run out. He had a “Get Fit Now Gym” bag. The membership number on the bag started with “48Z”. Only gold members have those bags. The man got into a car with a plate that included “H42W”.
16371 I saw the murder happen, and I recognized the killer from my gym when I was working out last week on January the 9th.

由此处信息可知,凶手有以下特征:

  • 拥有 “Get Fit Now Gym” 包,进出 Gym,并且有一次是 1 月 9 日。
  • 包号码以”48Z”开头。
  • 车牌号包含 “H42W”。

先从 “48Z” 入手:

1
2
3
SELECT *
FROM get_fit_now_member
WHERE id LIKE '48Z%'
id person_id name membership_start_date membership_status
48Z38 49550 Tomas Baisley 20170203 silver
48Z7A 28819 Joe Germuska 20160305 gold
48Z55 67318 Jeremy Bowers 20160101 gold

再查找车牌号:

1
2
3
SELECT *
FROM drivers_license
WHERE plate_number LIKE '%H42W%'
id age height eye_color hair_color gender plate_number car_make car_model
183779 21 65 blue blonde female H42W0X Toyota Prius
423327 30 70 brown brown male 0H42W2 Chevrolet Spark LS
664760 21 71 black black male 4H42WR Nissan Altima

最后看看进出日期:

1
2
3
4
SELECT *
FROM get_fit_now_check_in
WHERE check_in_date = 20180109
AND membership_id LIKE '48Z%'
membership_id check_in_date check_in_time check_out_time
48Z7A 20180109 1600 1730
48Z55 20180109 1530 1700

这样直接把凶手范围缩小到两个人了。

看一下这两个人的个人信息:

1
2
3
4
SELECT *
FROM person
WHERE id = 28819
OR id = 67318
id name license_id address_number address_street_name ssn
28819 Joe Germuska 173289 111 Fisk Rd 138909730
67318 Jeremy Bowers 423327 530 Washington Pl, Apt 3A 871539279

第二个人的 license_id 和前面对上了,说明凶手是他!

Congrats, you found the murderer! But wait, there’s more… If you think you’re up for a challenge, try querying the interview transcript of the murderer to find the real villain behind this crime. If you feel especially confident in your SQL skills, try to complete this final step with no more than 2 queries. Use this same INSERT statement with your new suspect to check your answer.

案件还有幕后黑手,查找凶手的 interview,挑战是使用两次以内的查询。

第二步(多步查询)

先用笨办法查一查:

1
2
3
SELECT *
FROM interview
WHERE person_id = 67318
person_id transcript
67318 I was hired by a woman with a lot of money. I don’t know her name but I know she’s around 5’5” (65”) or 5’7” (67”). She has red hair and she drives a Tesla Model S. I know that she attended the SQL Symphony Concert 3 times in December 2017.

查找 drivers_license 表:

1
2
3
4
5
6
7
8
SELECT *
FROM drivers_license
WHERE car_make = 'Tesla'
AND car_model = 'Model S'
AND hair_color = 'red'
AND gender = 'female'
AND age >= 60
AND age <= 70
id age height eye_color hair_color gender plate_number car_make car_model
202298 68 66 green red female 500123 Tesla Model S
291182 65 66 blue red female 08CM64 Tesla Model S

从 license_id 得到人的 id:

1
2
3
SELECT * FROM 'person'
WHERE license_id = 202298
OR license_id = 291182
id name license_id address_number address_street_name ssn
90700 Regina George 291182 332 Maple Ave 337169072
99716 Miranda Priestly 202298 1883 Golden Ave 987756388

最后使用”参加音乐会”的条件:

1
2
3
4
5
6
7
SELECT *
FROM facebook_event_checkin
WHERE date >= 20171201
AND date <= 20171231
AND event_name = 'SQL Symphony Concert'
GROUP BY person_id
HAVING COUNT(person_id) = 3
person_id event_id event_name date
24556 1143 SQL Symphony Concert 20171224
99716 1143 SQL Symphony Concert 20171229

对比可知幕后黑手是 Miranda Priestly。

第二步(两步查询)

先查找 interview,因为信息蕴含在他的话语中,这一步没法被合并。

1
2
3
SELECT *
FROM interview
WHERE person_id = 67318
person_id transcript
67318 I was hired by a woman with a lot of money. I don’t know her name but I know she’s around 5’5” (65”) or 5’7” (67”). She has red hair and she drives a Tesla Model S. I know that she attended the SQL Symphony Concert 3 times in December 2017.

接下来的查询用一次完成:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SELECT name
FROM drivers_license l
-- 连接写在 WHERE 前面 --
INNER JOIN person p
ON p.license_id = l.id
INNER JOIN (
SELECT *
FROM facebook_event_checkin
WHERE date >= 20171201
AND date <= 20171231
AND event_name = 'SQL Symphony Concert'
GROUP BY person_id
HAVING COUNT(person_id) = 3
) f
ON p.id = f.person_id

WHERE car_make = 'Tesla'
AND car_model = 'Model S'
AND hair_color = 'red'
AND gender = 'female'
AND age >= 60
AND age <= 70
name
Miranda Priestly