Project 1: SQL
环境配置
git clone https://github.com/berkeley-cs186/sp25-proj1.git其余配置见cs186.gitbook
一些简单尝试
(base) PS F:\cs186\sp25-proj1> python test.py
FAIL q0 see diffs/q0.txt
FAIL q1i see diffs/q1i.txt
FAIL q1ii see diffs/q1ii.txt
FAIL q1iii see diffs/q1iii.txt
FAIL q1iv see diffs/q1iv.txt
FAIL q2i see diffs/q2i.txt
FAIL q2ii see diffs/q2ii.txt
FAIL q2iii see diffs/q2iii.txt
FAIL q3i see diffs/q3i.txt
FAIL q3ii see diffs/q3ii.txt
FAIL q3iii see diffs/q3iii.txt
FAIL q4i see diffs/q4i.txt
FAIL q4ii_bins_0_to_8 see diffs/q4ii_bins_0_to_8.txt
FAIL q4ii_bin_9 see diffs/q4ii_bin_9.txt
FAIL q4iii see diffs/q4iii.txt
FAIL q4iv see diffs/q4iv.txt
FAIL q4v see diffs/q4v.txt
(base) PS F:\cs186\sp25-proj1> sqlite3 -header lahman.db
SQLite version 3.45.3 2024-04-15 13:34:05 (UTF-16 console I/O)
Enter ".help" for usage hints.
sqlite> .schema people
CREATE TABLE IF NOT EXISTS "people" (
"playerID" VARCHAR(9) NOT NULL,
"birthYear" INTEGER NULL,
"birthMonth" INTEGER NULL,
"birthDay" INTEGER NULL,
"birthCountry" VARCHAR(255) NULL,
"birthState" VARCHAR(255) NULL,
"birthCity" VARCHAR(255) NULL,
"deathYear" INTEGER NULL,
"deathMonth" INTEGER NULL,
"deathDay" INTEGER NULL,
"deathCountry" VARCHAR(255) NULL,
"deathState" VARCHAR(255) NULL,
"deathCity" VARCHAR(255) NULL,
"nameFirst" VARCHAR(255) NULL,
"nameLast" VARCHAR(255) NULL,
"nameGiven" VARCHAR(255) NULL,
"weight" INTEGER NULL,
"height" INTEGER NULL,
"bats" VARCHAR(255) NULL,
"throws" VARCHAR(255) NULL,
"debut" VARCHAR(255) NULL,
"finalGame" VARCHAR(255) NULL,
"retroID" VARCHAR(255) NULL,
"bbrefID" VARCHAR(255) NULL,
"birth_date" DATE NULL,
"debut_date" DATE NULL,
"finalgame_date" DATE NULL,
"death_date" DATE NULL,
PRIMARY KEY ("playerID")
);
sqlite> SELECT playerid, namefirst, namelast FROM people;
此处省略···
sqlite> SELECT COUNT(*) FROM fielding;
COUNT(*)
143046一个问题
按照gitbook的步骤修改proj1.sql后,运行test.py可以看到PASS。
按照教程中使用python test.py -q 0命令通过
检查/your_output/q0.txt中的输出也同样正确。
Task1:Basics
i. In the
peopletable, find thenamefirst,namelastandbirthyearfor all players withweightgreater than 300 pounds.
ii. Find the
namefirst,namelastandbirthyearof all players whosenamefirstfield contains a space. Order the results bynamefirst, breaking ties withnamelastboth in ascending order.
iii. From the
peopletable, group together players with the samebirthyear, and report thebirthyear, averageheight, and number of players for eachbirthyear. Order the results bybirthyearin ascending order.
iv. Following the results of part iii, now only include groups with an average height >
70. Again order the results bybirthyearin ascending order.
Task 2 准备
在开始任务2的代码前,发现gitbook中并没有提供关于名人堂这个表的相关信息,则首先需要
在数据库中查找。
发现halloffame应该是我们这个任务的主表,再查看一下该表的结构。
现在可以继续了。
Task 2: Hall of Fame Schools
i. Find the
namefirst,namelast,playeridandyearidof all people who were successfully inducted into the Hall of Fame in descending order ofyearid. Break ties onyearidbyplayerid(ascending).
ii. Find the people who were successfully inducted into the Hall of Fame and played in college at a school located in the state of California. For each person, return their
namefirst,namelast,playerid,schoolid, andyearidin descending order ofyearid. Break ties onyearidbyschoolid,playerid(ascending). For this question,yearidrefers to the year of induction into the Hall of Fame.
首先检查一下需要的表的结构
初次尝试使用以下代码,出现问题
发现q2ii中没有输出,进行排查,逐步增加条件。
这时q2ii中正常输出结果,再添加学校筛选即s.state = 'California'无输出。发现
是因为表中州使用的是简写😢,被自己蠢哭🤦♂️
终于通过了(长呼一口气)。。。。。
iii. Find the
playerid,namefirst,namelastandschoolidof all people who were successfully inducted into the Hall of Fame -- whether or not they played in college. Return people in descending order ofplayerid. Break ties onplayeridbyschoolid(ascending). (Note:schoolidshould be NULL if they did not play in college.)
一开始没有用LEFT JOIN没通过。
INNER JOIN(JOIN)
只有当两个表中都有匹配的记录时,查询才会返回结果。
LEFT JOIN
即使
collegeplaying表中没有匹配的记录,仍然会返回halloffame中的全部记录, 并将没有匹配的schoolID设置为NULL。
Task 3: SaberMetrics
i. Find the
playerid,namefirst,namelast,yearidand single-yearslg(Slugging Percentage) of the players with the 10 best annual Slugging Percentage recorded over all time. A player can appear multiple times in the output. For example, if Babe Ruth’sslgin 2000 and 2001 both landed in the top 10 best annual Slugging Percentage of all time, then we should include Babe Ruth twice in the output. For statistical significance, only include players with more than 50 at-bats in the season. Order the results byslgdescending, and break ties byyearid,playerid(ascending).
ii. Following the results from Part i, find the
playerid,namefirst,namelastandlslg(Lifetime Slugging Percentage) for the players with the top 10 Lifetime Slugging Percentage. Lifetime Slugging Percentage (LSLG) uses the same formula as Slugging Percentage (SLG), but it uses the number of singles, doubles, triples, home runs, and at bats each player has over their entire career, rather than just over a single season.
iii. Find the
namefirst,namelastand Lifetime Slugging Percentage (lslg) of batters whose lifetime slugging percentage is higher than that of San Francisco favorite Willie Mays.
这里使用WITH语句定义了一个CTE(公共表达式),注释掉的HAVING有语法错误,需要跟
GROUP BY一起使用。
Task 4: Salaries
i. Find the
yearid,min,maxandaverageof all player salaries for each year recorded, ordered byyearidin ascending order.
ii. For salaries in 2016, compute a histogram. Divide the salary range into 10 equal bins from
mintomax, withbinids0 through 9, and count the salaries in each bin. Return thebinid,lowandhighboundaries for each bin, as well as the number of salaries in each bin, with results sorted from smallest bin to largest.
iii. Now let's compute the Year-over-Year change in min, max and average player salary. For each year with recorded salaries after the first, return the
yearid,mindiff,maxdiff, andavgdiffwith respect to the previous year. Order the output byyearidin ascending order. (You should omit the very first year of recorded salaries from the result.)
iv. In 2001, the max salary went up by over $6 million. Write a query to find the players that had the max salary in 2000 and 2001. Return the
playerid,namefirst,namelast,salaryandyearidfor those two years. If multiple players tied for the max salary in a year, return all of them.
v. Each team has at least 1 All Star and may have multiple. For each team in the year 2016, give the
teamidanddiffAvg(the difference between the team's highest paid all-star's salary and the team's lowest paid all-star's salary).
完结😊
Last updated