说实话,平常我用不到 SQL,所以一直没有去学。因为要面试,在准备 SQL 方面的知识,就快速学习了一下。我的感觉是和 Pandas 没差多少。
我用的是苹果电脑,Windows 的会不太一样。
我选择用 Sqlite ,因为这个最简单,不需要服务器。
首先检查 sqlite3
是否安装:
sqlite3 --version
我假设你已经安装了 Anaconda。接着,我们建一个虚拟环境
conda create --name sql
conda activate sql
conda install pandas jupyter ipython-sql
pip install SQLAlchemy==1.4.46 # 因为最新版本的 sqlalchemy 和 ipython-sql 有冲突
然后创建一个文件夹
mkdir sql-learn
cd sql-learn
下载两个文档:
curl -o sqlite.txt https://www.sqltutorial.org/wp-content/uploads/2020/04/sqlite.txt
curl -o sqlite-data.txt https://www.sqltutorial.org/wp-content/uploads/2020/04/sqlite-data.txt
如果你怕安全性有问题,我建议你把两个网址打开,确认是单纯的 txt
文档。
然后:
sqlite3 tutorial.db
接着
.read sqlite.txt
继续
.read sqlite-data.txt
检查一下
.tables
结果应该是
sqlite> .tables
countries dependents jobs regions
departments employees locations
SELECT * FROM countries LIMIT 5;
结果应该是
sqlite> SELECT * FROM countries LIMIT 5;
AR|Argentina|2
AU|Australia|3
BE|Belgium|1
BR|Brazil|2
CA|Canada|2
如果没问题,就可以继续开始了。
我在 VS Code 里用 Jupyter Notebook。建一个 ipynb
文档,然后下面的都是在那里面操作。
本文严重参考 https://www.sqltutorial.org/ 上的内容。
首先
%load_ext sql
%sql sqlite:///tutorial.db
但是我这个博客里因为数据在别的地方,所以第二行会不一样:
%load_ext sql
%sql sqlite:///../static/files/cnblog/sql-learn/tutorial.db
'Connected: @../static/files/cnblog/sql-learn/tutorial.db'
All tables #
%%sql
SELECT name FROM sqlite_master WHERE type='table';
* sqlite:///../static/files/cnblog/sql-learn/tutorial.db
Done.
name |
---|
regions |
sqlite_sequence |
countries |
locations |
departments |
jobs |
employees |
dependents |
Query #
%%sql
select * from locations limit 5
Done.
location_id | street_address | postal_code | city | state_province | country_id |
---|---|---|---|---|---|
1400 | 2014 Jabberwocky Rd | 26192 | Southlake | Texas | US |
1500 | 2011 Interiors Blvd | 99236 | South San Francisco | California | US |
1700 | 2004 Charade Rd | 98199 | Seattle | Washington | US |
1800 | 147 Spadina Ave | M5V 2L7 | Toronto | Ontario | CA |
2400 | 8204 Arthur St | None | London | None | UK |
%%sql
select first_name, last_name from employees limit 10
Done.
first_name | last_name |
---|---|
Steven | King |
Neena | Kochhar |
Lex | De Haan |
Alexander | Hunold |
Bruce | Ernst |
David | Austin |
Valli | Pataballa |
Diana | Lorentz |
Nancy | Greenberg |
Daniel | Faviet |
%%sql
select employee_id, salary from employees where salary < 3000;
Done.
employee_id | salary |
---|---|
118 | 2866.5 |
119 | 2756.25 |
126 | 2976.75 |
# %%sql
# delete from employees where hire_date < '1990-01-01';
%%sql
select first_name, last_name, salary, salary * 1.5
from employees limit 10;
Done.
first_name | last_name | salary | salary * 1.5 |
---|---|---|---|
Steven | King | 24000.0 | 36000.0 |
Neena | Kochhar | 17000.0 | 25500.0 |
Lex | De Haan | 17000.0 | 25500.0 |
Alexander | Hunold | 9000.0 | 13500.0 |
Bruce | Ernst | 6000.0 | 9000.0 |
David | Austin | 4800.0 | 7200.0 |
Valli | Pataballa | 4800.0 | 7200.0 |
Diana | Lorentz | 4200.0 | 6300.0 |
Nancy | Greenberg | 12000.0 | 18000.0 |
Daniel | Faviet | 9000.0 | 13500.0 |
%%sql
select
first_name,
last_name,
salary,
salary * 1.5 as new_salary
from employees limit 10;
Done.
first_name | last_name | salary | new_salary |
---|---|---|---|
Steven | King | 24000.0 | 36000.0 |
Neena | Kochhar | 17000.0 | 25500.0 |
Lex | De Haan | 17000.0 | 25500.0 |
Alexander | Hunold | 9000.0 | 13500.0 |
Bruce | Ernst | 6000.0 | 9000.0 |
David | Austin | 4800.0 | 7200.0 |
Valli | Pataballa | 4800.0 | 7200.0 |
Diana | Lorentz | 4200.0 | 6300.0 |
Nancy | Greenberg | 12000.0 | 18000.0 |
Daniel | Faviet | 9000.0 | 13500.0 |
Sort #
%%sql
select
employee_id,
first_name,
last_name,
hire_date,
salary
from
employees
limit
10;
Done.
employee_id | first_name | last_name | hire_date | salary |
---|---|---|---|---|
100 | Steven | King | 1987-06-17 | 24000.0 |
101 | Neena | Kochhar | 1989-09-21 | 17000.0 |
102 | Lex | De Haan | 1993-01-13 | 17000.0 |
103 | Alexander | Hunold | 1990-01-03 | 9000.0 |
104 | Bruce | Ernst | 1991-05-21 | 6000.0 |
105 | David | Austin | 1997-06-25 | 4800.0 |
106 | Valli | Pataballa | 1998-02-05 | 4800.0 |
107 | Diana | Lorentz | 1999-02-07 | 4200.0 |
108 | Nancy | Greenberg | 1994-08-17 | 12000.0 |
109 | Daniel | Faviet | 1994-08-16 | 9000.0 |
%%sql
select
employee_id,
first_name,
last_name,
hire_date,
salary
from
employees
order by
first_name
limit
10;
Done.
employee_id | first_name | last_name | hire_date | salary |
---|---|---|---|---|
121 | Adam | Fripp | 1997-04-10 | 8200.0 |
103 | Alexander | Hunold | 1990-01-03 | 9000.0 |
115 | Alexander | Khoo | 1995-05-18 | 3100.0 |
193 | Britney | Everett | 1997-03-03 | 3900.0 |
104 | Bruce | Ernst | 1991-05-21 | 6000.0 |
179 | Charles | Johnson | 2000-01-04 | 6200.0 |
109 | Daniel | Faviet | 1994-08-16 | 9000.0 |
105 | David | Austin | 1997-06-25 | 4800.0 |
114 | Den | Raphaely | 1994-12-07 | 11000.0 |
107 | Diana | Lorentz | 1999-02-07 | 4200.0 |
%%sql
select
employee_id,
first_name,
last_name,
hire_date,
salary
from
employees
order by
first_name,
last_name desc
limit
10;
Done.
employee_id | first_name | last_name | hire_date | salary |
---|---|---|---|---|
121 | Adam | Fripp | 1997-04-10 | 8200.0 |
115 | Alexander | Khoo | 1995-05-18 | 3100.0 |
103 | Alexander | Hunold | 1990-01-03 | 9000.0 |
193 | Britney | Everett | 1997-03-03 | 3900.0 |
104 | Bruce | Ernst | 1991-05-21 | 6000.0 |
179 | Charles | Johnson | 2000-01-04 | 6200.0 |
109 | Daniel | Faviet | 1994-08-16 | 9000.0 |
105 | David | Austin | 1997-06-25 | 4800.0 |
114 | Den | Raphaely | 1994-12-07 | 11000.0 |
107 | Diana | Lorentz | 1999-02-07 | 4200.0 |
Filter data #
DISTINCT #
%%sql
select
salary
from
employees
order by
salary desc
limit 10;
Done.
salary |
---|
24000.0 |
17000.0 |
17000.0 |
14000.0 |
13500.0 |
13000.0 |
12000.0 |
12000.0 |
11000.0 |
10000.0 |
%%sql
select distinct
salary
from
employees
order by
salary desc
limit 10;
Done.
salary |
---|
24000.0 |
17000.0 |
14000.0 |
13500.0 |
13000.0 |
12000.0 |
11000.0 |
10000.0 |
9000.0 |
8600.0 |
%%sql
select
job_id,
salary
from
employees
order by
job_id,
salary desc
limit 10;
Done.
job_id | salary |
---|---|
1 | 8300.0 |
2 | 12000.0 |
3 | 4400.0 |
4 | 24000.0 |
5 | 17000.0 |
5 | 17000.0 |
6 | 9000.0 |
6 | 8200.0 |
6 | 7800.0 |
6 | 7700.0 |
%%sql
select distinct
job_id,
salary
from
employees
order by
job_id,
salary desc
limit 10;
Done.
job_id | salary |
---|---|
1 | 8300.0 |
2 | 12000.0 |
3 | 4400.0 |
4 | 24000.0 |
5 | 17000.0 |
6 | 9000.0 |
6 | 8200.0 |
6 | 7800.0 |
6 | 7700.0 |
6 | 6900.0 |
DISTINCT
only keeps one NULL
.
%%sql
select distinct
phone_number
from
employees
order by
phone_number
limit 10;
Done.
phone_number |
---|
None |
515.123.4444 |
515.123.4567 |
515.123.4568 |
515.123.4569 |
515.123.5555 |
515.123.7777 |
515.123.8080 |
515.123.8181 |
515.123.8888 |
LIMIT & OFFSET #
LIMIT
means to show many many rows. OFFSET
means starting from which row. First row is 0. So OFFSET 3
means starting from the fourth row.
%%sql
select
employee_id,
first_name,
last_name
from
employees
order by
first_name
limit 10;
Done.
employee_id | first_name | last_name |
---|---|---|
121 | Adam | Fripp |
103 | Alexander | Hunold |
115 | Alexander | Khoo |
193 | Britney | Everett |
104 | Bruce | Ernst |
179 | Charles | Johnson |
109 | Daniel | Faviet |
105 | David | Austin |
114 | Den | Raphaely |
107 | Diana | Lorentz |
%%sql
select
employee_id,
first_name,
last_name
from
employees
order by
first_name
limit 5 offset 3;
Done.
employee_id | first_name | last_name |
---|---|---|
193 | Britney | Everett |
104 | Bruce | Ernst |
179 | Charles | Johnson |
109 | Daniel | Faviet |
105 | David | Austin |
Where #
%%sql
select
employee_id,
first_name,
last_name,
salary
from
employees
where
salary > 14000
order by
salary desc
Done.
employee_id | first_name | last_name | salary |
---|---|---|---|
100 | Steven | King | 24000.0 |
101 | Neena | Kochhar | 17000.0 |
102 | Lex | De Haan | 17000.0 |
%%sql
select
employee_id,
first_name,
last_name,
department_id
from
employees
where
department_id = 5
order by
first_name;
Done.
employee_id | first_name | last_name | department_id |
---|---|---|---|
121 | Adam | Fripp | 5 |
193 | Britney | Everett | 5 |
126 | Irene | Mikkilineni | 5 |
120 | Matthew | Weiss | 5 |
122 | Payam | Kaufling | 5 |
192 | Sarah | Bell | 5 |
123 | Shanta | Vollman | 5 |
%%sql
select
employee_id,
first_name,
last_name
from
employees
where
last_name = "Chen"
Done.
employee_id | first_name | last_name |
---|---|---|
110 | John | Chen |
Comparison oprators #
=
: equal<>
: not equal>
>=
<
<=
Logical operators #
ALL
AND
ANY
BETWEEN
EXISTS
IN
LIKE
NOT
OR
SOME
IS NULL
AND #
%%sql
select first_name, last_name, salary
from employees
where salary > 5000 AND salary < 7000
order by salary desc;
Done.
first_name | last_name | salary |
---|---|---|
Luis | Popp | 6900.0 |
Shanta | Vollman | 6500.0 |
Susan | Mavris | 6500.0 |
Charles | Johnson | 6200.0 |
Bruce | Ernst | 6000.0 |
Pat | Fay | 6000.0 |
OR #
%%sql
select first_name, last_name, salary
from employees
where salary = 7000 OR salary = 8000;
Done.
first_name | last_name | salary |
---|---|---|
Matthew | Weiss | 8000.0 |
Kimberely | Grant | 7000.0 |
IS NULL #
%%sql
select first_name, last_name, phone_number
from employees
where phone_number IS NULL
order by first_name, last_name;
Done.
first_name | last_name | phone_number |
---|---|---|
Charles | Johnson | None |
Jack | Livingston | None |
John | Russell | None |
Jonathon | Taylor | None |
Karen | Partners | None |
Kimberely | Grant | None |
BETWEEN #
%%sql
select first_name, last_name, salary
from employees
where salary BETWEEN 9000 and 12000
order by salary desc;
Done.
first_name | last_name | salary |
---|---|---|
Nancy | Greenberg | 12000.0 |
Shelley | Higgins | 12000.0 |
Den | Raphaely | 11000.0 |
Hermann | Baer | 10000.0 |
Alexander | Hunold | 9000.0 |
Daniel | Faviet | 9000.0 |
IN #
%%sql
select first_name, last_name, department_id
from employees
where department_id IN (6, 8, 9)
order by department_id;
Done.
first_name | last_name | department_id |
---|---|---|
Alexander | Hunold | 6 |
Bruce | Ernst | 6 |
David | Austin | 6 |
Valli | Pataballa | 6 |
Diana | Lorentz | 6 |
John | Russell | 8 |
Karen | Partners | 8 |
Jonathon | Taylor | 8 |
Jack | Livingston | 8 |
Kimberely | Grant | 8 |
Charles | Johnson | 8 |
Steven | King | 9 |
Neena | Kochhar | 9 |
Lex | De Haan | 9 |
LIKE #
%%sql
select employee_id, first_name, last_name
from employees
where first_name LIKE '_h%'
order by first_name;
Done.
employee_id | first_name | last_name |
---|---|---|
179 | Charles | Johnson |
123 | Shanta | Vollman |
205 | Shelley | Higgins |
116 | Shelli | Baida |
_
represents a single character.
%
represents 0
, 1
, or multiple characters.
ALL, ANY #
SQLITE does not support ALL
or ANY
. But there are alternatives.
For example
SELECT * FROM employees
WHERE salary >= ALL (
SELECT salary FROM employees WHERE department_id = 8
);
can be done via
SELECT * FROM employees
WHERE salary >= (
SELECT MAX(salary) FROM employees WHERE department_id = 8
)
And
SELECT * FROM employees
WHERE salary >= ANY (
SELECT salary FROM employees WHERE department_id = 8
);
can be achieved through
SELECT * FROM employees
WHERE salary >= (
SELECT MIN(salary) FROM employees WHERE department_id = 8
);
or
SELECT * FROM employees
WHERE salary IN (
SELECT salary FROM employees WHERE department_id = 8
);
EXISTS #
For example, show names of those who have dependents:
%%sql
select first_name, last_name
from employees e
where EXISTS(
select * from dependents d
where d.employee_id = e.employee_id
)
limit 10;
Done.
first_name | last_name |
---|---|
Steven | King |
Neena | Kochhar |
Lex | De Haan |
Alexander | Hunold |
Bruce | Ernst |
David | Austin |
Valli | Pataballa |
Diana | Lorentz |
Nancy | Greenberg |
Daniel | Faviet |
在我看来,上面的这行就说明了 sql 相比于 csv 的优势。通过 employee_id
这一栏,我们把两个数据连在一起,放在了一个数据库。
Conditional Expression #
%%sql
SELECT first_name, last_name,
CASE
when salary < 3000 then 'low'
when salary >= 3000 and salary <= 5000 then 'average'
ELSE 'high'
END AS evaluation
FROM employees
LIMIT 10;
Done.
first_name | last_name | evaluation |
---|---|---|
Steven | King | high |
Neena | Kochhar | high |
Lex | De Haan | high |
Alexander | Hunold | high |
Bruce | Ernst | high |
David | Austin | average |
Valli | Pataballa | average |
Diana | Lorentz | average |
Nancy | Greenberg | high |
Daniel | Faviet | high |
%%sql
SELECT
first_name,
last_name,
phone_number,
CASE
WHEN phone_number IS NULL THEN 'No phone number'
ELSE phone_number
END AS final_phone_number
FROM employees
where phone_number IS NULL
LIMIT 10
Done.
first_name | last_name | phone_number | final_phone_number |
---|---|---|---|
John | Russell | None | No phone number |
Karen | Partners | None | No phone number |
Jonathon | Taylor | None | No phone number |
Jack | Livingston | None | No phone number |
Kimberely | Grant | None | No phone number |
Charles | Johnson | None | No phone number |
Join multiple tables #
Alias #
%%sql
SELECT first_name, last_name, salary * 1.1 AS new_salary
FROM employees
ORDER BY new_salary DESC
LIMIT 10;
Done.
first_name | last_name | new_salary |
---|---|---|
Steven | King | 26400.000000000004 |
Neena | Kochhar | 18700.0 |
Lex | De Haan | 18700.0 |
John | Russell | 15400.000000000002 |
Karen | Partners | 14850.000000000002 |
Michael | Hartstein | 14300.000000000002 |
Nancy | Greenberg | 13200.000000000002 |
Shelley | Higgins | 13200.000000000002 |
Den | Raphaely | 12100.000000000002 |
Hermann | Baer | 11000.0 |
%%sql
SELECT e.first_name, e.last_name
FROM employees AS e
LIMIT 10;
Done.
first_name | last_name |
---|---|
Steven | King |
Neena | Kochhar |
Lex | De Haan |
Alexander | Hunold |
Bruce | Ernst |
David | Austin |
Valli | Pataballa |
Diana | Lorentz |
Nancy | Greenberg |
Daniel | Faviet |
Inner Join #
取共有的。
图片来源: https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-INNER-JOIN.png
我们来看一个例子:
图片来源:https://www.sqltutorial.org/wp-content/uploads/2016/03/emp_dept_tables.png
%%sql
SELECT department_id, department_name
FROM departments
WHERE department_id IN (1, 2, 3)
Done.
department_id | department_name |
---|---|
1 | Administration |
2 | Marketing |
3 | Purchasing |
%%sql
SELECT first_name, last_name, department_id
FROM employees
WHERE department_id IN (1, 2, 3)
ORDER BY department_id
Done.
first_name | last_name | department_id |
---|---|---|
Jennifer | Whalen | 1 |
Michael | Hartstein | 2 |
Pat | Fay | 2 |
Den | Raphaely | 3 |
Alexander | Khoo | 3 |
Shelli | Baida | 3 |
Sigal | Tobias | 3 |
Guy | Himuro | 3 |
Karen | Colmenares | 3 |
%%sql
SELECT
e.first_name,
e.last_name,
e.department_id,
d.department_id,
d.department_name
FROM
employees AS e
INNER JOIN
departments AS d ON d.department_id = e.department_id
WHERE
e.department_id IN (1, 2, 3)
Done.
first_name | last_name | department_id | department_id_1 | department_name |
---|---|---|---|---|
Jennifer | Whalen | 1 | 1 | Administration |
Michael | Hartstein | 2 | 2 | Marketing |
Pat | Fay | 2 | 2 | Marketing |
Den | Raphaely | 3 | 3 | Purchasing |
Alexander | Khoo | 3 | 3 | Purchasing |
Shelli | Baida | 3 | 3 | Purchasing |
Sigal | Tobias | 3 | 3 | Purchasing |
Guy | Himuro | 3 | 3 | Purchasing |
Karen | Colmenares | 3 | 3 | Purchasing |
LEFT JOIN #
图片来源:https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-LEFT-JOIN.png
%%sql
SELECT country_id, country_name
FROM countries
WHERE country_id IN ('US', 'UK', 'CN')
Done.
country_id | country_name |
---|---|
CN | China |
UK | United Kingdom |
US | United States of America |
%%sql
SELECT country_id, street_address, city
FROM locations
WHERE country_id IN ('US', 'UK', 'CN')
Done.
country_id | street_address | city |
---|---|---|
US | 2014 Jabberwocky Rd | Southlake |
US | 2011 Interiors Blvd | South San Francisco |
US | 2004 Charade Rd | Seattle |
UK | 8204 Arthur St | London |
UK | Magdalen Centre, The Oxford Science Park | Oxford |
%%sql
SELECT
c.country_name,
c.country_id,
l.country_id,
l.street_address,
l.city
FROM
countries AS c
LEFT JOIN locations AS l on l.country_id = c.country_id
WHERE
c.country_id IN ('US', 'UK', 'CN')
Done.
country_name | country_id | country_id_1 | street_address | city |
---|---|---|---|---|
China | CN | None | None | None |
United Kingdom | UK | UK | 8204 Arthur St | London |
United Kingdom | UK | UK | Magdalen Centre, The Oxford Science Park | Oxford |
United States of America | US | US | 2014 Jabberwocky Rd | Southlake |
United States of America | US | US | 2011 Interiors Blvd | South San Francisco |
United States of America | US | US | 2004 Charade Rd | Seattle |
%%sql
SELECT
c.country_name,
c.country_id,
l.country_id,
l.street_address,
l.city
FROM
locations AS l
LEFT JOIN countries AS c on l.country_id = c.country_id
WHERE
c.country_id IN ('US', 'UK', 'CN')
Done.
country_name | country_id | country_id_1 | street_address | city |
---|---|---|---|---|
United Kingdom | UK | UK | 8204 Arthur St | London |
United Kingdom | UK | UK | Magdalen Centre, The Oxford Science Park | Oxford |
United States of America | US | US | 2014 Jabberwocky Rd | Southlake |
United States of America | US | US | 2011 Interiors Blvd | South San Francisco |
United States of America | US | US | 2004 Charade Rd | Seattle |
%%sql
SELECT
c.country_name,
c.country_id,
l.country_id,
l.street_address,
l.city
FROM
countries AS c
INNER JOIN locations AS l on l.country_id = c.country_id
WHERE
c.country_id IN ('US', 'UK', 'CN')
Done.
country_name | country_id | country_id_1 | street_address | city |
---|---|---|---|---|
United Kingdom | UK | UK | 8204 Arthur St | London |
United Kingdom | UK | UK | Magdalen Centre, The Oxford Science Park | Oxford |
United States of America | US | US | 2014 Jabberwocky Rd | Southlake |
United States of America | US | US | 2011 Interiors Blvd | South San Francisco |
United States of America | US | US | 2004 Charade Rd | Seattle |
Aggregate functions #
待续
#编程最后一次修改于 2024-10-11