SQL 简易教程

郝鸿涛 / 2024-10-05

说实话,平常我用不到 SQL,所以一直没有去学。因为要面试,在准备 SQL 方面的知识,就快速学习了一下。我的感觉是和 Pandas 没差多少。

我用的是苹果电脑,Windows 的会不太一样。

我选择用 Sqlite ,因为这个最简单,不需要服务器。

首先检查 sqlite3 是否安装:

sqlite3 --version

我假设你已经安装了 Anaconda。接着,我们建一个虚拟环境

conda create --name sql
conda activate sql
conda install pandas jupyter ipython-sql 
pip install SQLAlchemy==1.4.46 # 因为最新版本的 sqlalchemy 和 ipython-sql 有冲突

然后创建一个文件夹

mkdir sql-learn
cd sql-learn 

下载两个文档:

curl -o sqlite.txt https://www.sqltutorial.org/wp-content/uploads/2020/04/sqlite.txt
curl -o sqlite-data.txt https://www.sqltutorial.org/wp-content/uploads/2020/04/sqlite-data.txt

如果你怕安全性有问题,我建议你把两个网址打开,确认是单纯的 txt 文档。

然后:

sqlite3 tutorial.db

接着

.read sqlite.txt

继续

.read sqlite-data.txt

检查一下

.tables

结果应该是

sqlite> .tables
countries    dependents   jobs         regions    
departments  employees    locations  
SELECT * FROM countries LIMIT 5;

结果应该是

sqlite> SELECT * FROM countries LIMIT 5;
AR|Argentina|2
AU|Australia|3
BE|Belgium|1
BR|Brazil|2
CA|Canada|2

如果没问题,就可以继续开始了。

我在 VS Code 里用 Jupyter Notebook。建一个 ipynb 文档,然后下面的都是在那里面操作。

本文严重参考 https://www.sqltutorial.org/ 上的内容。

首先

%load_ext sql
%sql sqlite:///tutorial.db

但是我这个博客里因为数据在别的地方,所以第二行会不一样:

%load_ext sql
%sql sqlite:///../static/files/cnblog/sql-learn/tutorial.db
'Connected: @../static/files/cnblog/sql-learn/tutorial.db'

All tables #

%%sql
SELECT name FROM sqlite_master WHERE type='table';
 * sqlite:///../static/files/cnblog/sql-learn/tutorial.db
Done.
name
regions
sqlite_sequence
countries
locations
departments
jobs
employees
dependents

Query #

%%sql
select * from locations limit 5
Done.
location_id street_address postal_code city state_province country_id
1400 2014 Jabberwocky Rd 26192 Southlake Texas US
1500 2011 Interiors Blvd 99236 South San Francisco California US
1700 2004 Charade Rd 98199 Seattle Washington US
1800 147 Spadina Ave M5V 2L7 Toronto Ontario CA
2400 8204 Arthur St None London None UK
%%sql 
select first_name, last_name from employees limit 10
Done.
first_name last_name
Steven King
Neena Kochhar
Lex De Haan
Alexander Hunold
Bruce Ernst
David Austin
Valli Pataballa
Diana Lorentz
Nancy Greenberg
Daniel Faviet
%%sql
select employee_id, salary from employees where salary < 3000;
Done.
employee_id salary
118 2866.5
119 2756.25
126 2976.75
# %%sql 
# delete from employees where hire_date < '1990-01-01';
%%sql 
select first_name, last_name, salary, salary * 1.5 
from employees limit 10;
Done.
first_name last_name salary salary * 1.5
Steven King 24000.0 36000.0
Neena Kochhar 17000.0 25500.0
Lex De Haan 17000.0 25500.0
Alexander Hunold 9000.0 13500.0
Bruce Ernst 6000.0 9000.0
David Austin 4800.0 7200.0
Valli Pataballa 4800.0 7200.0
Diana Lorentz 4200.0 6300.0
Nancy Greenberg 12000.0 18000.0
Daniel Faviet 9000.0 13500.0
%%sql 
select 
    first_name, 
    last_name, 
    salary, 
    salary * 1.5 as new_salary 
from employees limit 10;
Done.
first_name last_name salary new_salary
Steven King 24000.0 36000.0
Neena Kochhar 17000.0 25500.0
Lex De Haan 17000.0 25500.0
Alexander Hunold 9000.0 13500.0
Bruce Ernst 6000.0 9000.0
David Austin 4800.0 7200.0
Valli Pataballa 4800.0 7200.0
Diana Lorentz 4200.0 6300.0
Nancy Greenberg 12000.0 18000.0
Daniel Faviet 9000.0 13500.0

Sort #

%%sql 
select 
    employee_id,
    first_name,
    last_name,
    hire_date,
    salary 
from 
    employees
limit 
    10;
Done.
employee_id first_name last_name hire_date salary
100 Steven King 1987-06-17 24000.0
101 Neena Kochhar 1989-09-21 17000.0
102 Lex De Haan 1993-01-13 17000.0
103 Alexander Hunold 1990-01-03 9000.0
104 Bruce Ernst 1991-05-21 6000.0
105 David Austin 1997-06-25 4800.0
106 Valli Pataballa 1998-02-05 4800.0
107 Diana Lorentz 1999-02-07 4200.0
108 Nancy Greenberg 1994-08-17 12000.0
109 Daniel Faviet 1994-08-16 9000.0
%%sql 
select 
    employee_id,
    first_name,
    last_name,
    hire_date,
    salary 
from 
    employees
order by   
    first_name
limit 
    10;
Done.
employee_id first_name last_name hire_date salary
121 Adam Fripp 1997-04-10 8200.0
103 Alexander Hunold 1990-01-03 9000.0
115 Alexander Khoo 1995-05-18 3100.0
193 Britney Everett 1997-03-03 3900.0
104 Bruce Ernst 1991-05-21 6000.0
179 Charles Johnson 2000-01-04 6200.0
109 Daniel Faviet 1994-08-16 9000.0
105 David Austin 1997-06-25 4800.0
114 Den Raphaely 1994-12-07 11000.0
107 Diana Lorentz 1999-02-07 4200.0
%%sql 
select 
    employee_id,
    first_name,
    last_name,
    hire_date,
    salary 
from 
    employees
order by   
    first_name,
    last_name desc
limit 
    10;
Done.
employee_id first_name last_name hire_date salary
121 Adam Fripp 1997-04-10 8200.0
115 Alexander Khoo 1995-05-18 3100.0
103 Alexander Hunold 1990-01-03 9000.0
193 Britney Everett 1997-03-03 3900.0
104 Bruce Ernst 1991-05-21 6000.0
179 Charles Johnson 2000-01-04 6200.0
109 Daniel Faviet 1994-08-16 9000.0
105 David Austin 1997-06-25 4800.0
114 Den Raphaely 1994-12-07 11000.0
107 Diana Lorentz 1999-02-07 4200.0

Filter data #

DISTINCT #

%%sql 
select 
    salary 
from 
    employees 
order by 
    salary desc 
limit 10;
Done.
salary
24000.0
17000.0
17000.0
14000.0
13500.0
13000.0
12000.0
12000.0
11000.0
10000.0
%%sql 
select distinct 
    salary 
from 
    employees 
order by 
    salary desc 
limit 10;
Done.
salary
24000.0
17000.0
14000.0
13500.0
13000.0
12000.0
11000.0
10000.0
9000.0
8600.0
%%sql 
select 
    job_id,
    salary
from 
    employees
order by
    job_id,
    salary desc
limit 10;
Done.
job_id salary
1 8300.0
2 12000.0
3 4400.0
4 24000.0
5 17000.0
5 17000.0
6 9000.0
6 8200.0
6 7800.0
6 7700.0
%%sql 
select distinct
    job_id,
    salary
from 
    employees
order by
    job_id,
    salary desc
limit 10;
Done.
job_id salary
1 8300.0
2 12000.0
3 4400.0
4 24000.0
5 17000.0
6 9000.0
6 8200.0
6 7800.0
6 7700.0
6 6900.0

DISTINCT only keeps one NULL.

%%sql 
select distinct
    phone_number
from 
    employees 
order by 
    phone_number
limit 10;
Done.
phone_number
None
515.123.4444
515.123.4567
515.123.4568
515.123.4569
515.123.5555
515.123.7777
515.123.8080
515.123.8181
515.123.8888

LIMIT & OFFSET #

LIMIT means to show many many rows. OFFSET means starting from which row. First row is 0. So OFFSET 3 means starting from the fourth row.

%%sql 
select 
    employee_id,
    first_name,
    last_name
from    
    employees
order by
    first_name
limit 10;
Done.
employee_id first_name last_name
121 Adam Fripp
103 Alexander Hunold
115 Alexander Khoo
193 Britney Everett
104 Bruce Ernst
179 Charles Johnson
109 Daniel Faviet
105 David Austin
114 Den Raphaely
107 Diana Lorentz
%%sql 
select 
    employee_id,
    first_name,
    last_name
from    
    employees
order by
    first_name
limit 5 offset 3;
Done.
employee_id first_name last_name
193 Britney Everett
104 Bruce Ernst
179 Charles Johnson
109 Daniel Faviet
105 David Austin

Where #

%%sql 
select  
    employee_id,
    first_name,
    last_name,
    salary
from 
    employees 
where
    salary > 14000 
order by 
    salary desc 
Done.
employee_id first_name last_name salary
100 Steven King 24000.0
101 Neena Kochhar 17000.0
102 Lex De Haan 17000.0
%%sql 
select  
    employee_id,
    first_name,
    last_name,
    department_id
from 
    employees 
where
    department_id = 5
order by 
    first_name;
Done.
employee_id first_name last_name department_id
121 Adam Fripp 5
193 Britney Everett 5
126 Irene Mikkilineni 5
120 Matthew Weiss 5
122 Payam Kaufling 5
192 Sarah Bell 5
123 Shanta Vollman 5
%%sql 
select  
    employee_id,
    first_name,
    last_name
from 
    employees 
where
    last_name = "Chen"
Done.
employee_id first_name last_name
110 John Chen

Comparison oprators #

Logical operators #

AND #

%%sql 
select first_name, last_name, salary 
from employees 
where salary > 5000 AND salary < 7000 
order by salary desc;
Done.
first_name last_name salary
Luis Popp 6900.0
Shanta Vollman 6500.0
Susan Mavris 6500.0
Charles Johnson 6200.0
Bruce Ernst 6000.0
Pat Fay 6000.0

OR #

%%sql 
select first_name, last_name, salary 
from employees 
where salary = 7000 OR salary = 8000; 
Done.
first_name last_name salary
Matthew Weiss 8000.0
Kimberely Grant 7000.0

IS NULL #

%%sql 
select first_name, last_name, phone_number 
from employees 
where phone_number IS NULL 
order by first_name, last_name;
Done.
first_name last_name phone_number
Charles Johnson None
Jack Livingston None
John Russell None
Jonathon Taylor None
Karen Partners None
Kimberely Grant None

BETWEEN #

%%sql 
select first_name, last_name, salary 
from employees 
where salary BETWEEN 9000 and 12000 
order by salary desc;
Done.
first_name last_name salary
Nancy Greenberg 12000.0
Shelley Higgins 12000.0
Den Raphaely 11000.0
Hermann Baer 10000.0
Alexander Hunold 9000.0
Daniel Faviet 9000.0

IN #

%%sql 
select first_name, last_name, department_id 
from employees 
where department_id IN (6, 8, 9)
order by department_id;
Done.
first_name last_name department_id
Alexander Hunold 6
Bruce Ernst 6
David Austin 6
Valli Pataballa 6
Diana Lorentz 6
John Russell 8
Karen Partners 8
Jonathon Taylor 8
Jack Livingston 8
Kimberely Grant 8
Charles Johnson 8
Steven King 9
Neena Kochhar 9
Lex De Haan 9

LIKE #

%%sql 
select employee_id, first_name, last_name 
from employees 
where first_name LIKE '_h%'
order by first_name;
Done.
employee_id first_name last_name
179 Charles Johnson
123 Shanta Vollman
205 Shelley Higgins
116 Shelli Baida

_ represents a single character.

% represents 0, 1, or multiple characters.

ALL, ANY #

SQLITE does not support ALL or ANY. But there are alternatives.

For example

SELECT * FROM employees
WHERE salary >= ALL (
    SELECT salary FROM employees WHERE department_id = 8
);

can be done via

SELECT * FROM employees
WHERE salary >= (
    SELECT MAX(salary) FROM employees WHERE department_id = 8
)

And

SELECT * FROM employees
WHERE salary >= ANY (
    SELECT salary FROM employees WHERE department_id = 8
);

can be achieved through

SELECT * FROM employees
WHERE salary >= (
    SELECT MIN(salary) FROM employees WHERE department_id = 8
);

or

SELECT * FROM employees
WHERE salary IN (
    SELECT salary FROM employees WHERE department_id = 8
);

EXISTS #

For example, show names of those who have dependents:

%%sql 
select first_name, last_name 
from employees e 
where EXISTS(
    select * from dependents d 
    where d.employee_id = e.employee_id
)
limit 10;
Done.
first_name last_name
Steven King
Neena Kochhar
Lex De Haan
Alexander Hunold
Bruce Ernst
David Austin
Valli Pataballa
Diana Lorentz
Nancy Greenberg
Daniel Faviet

在我看来,上面的这行就说明了 sql 相比于 csv 的优势。通过 employee_id 这一栏,我们把两个数据连在一起,放在了一个数据库。

Conditional Expression #

%%sql 
SELECT first_name, last_name,
CASE 
    when salary < 3000 then 'low'
    when salary >= 3000 and salary <= 5000 then 'average'
    ELSE 'high'
END AS evaluation 
FROM employees
LIMIT 10;
Done.
first_name last_name evaluation
Steven King high
Neena Kochhar high
Lex De Haan high
Alexander Hunold high
Bruce Ernst high
David Austin average
Valli Pataballa average
Diana Lorentz average
Nancy Greenberg high
Daniel Faviet high
%%sql 
SELECT 
    first_name, 
    last_name,
    phone_number,
    CASE
        WHEN phone_number IS NULL THEN 'No phone number'
        ELSE phone_number 
    END AS final_phone_number
FROM employees 
where phone_number IS NULL 
LIMIT 10
Done.
first_name last_name phone_number final_phone_number
John Russell None No phone number
Karen Partners None No phone number
Jonathon Taylor None No phone number
Jack Livingston None No phone number
Kimberely Grant None No phone number
Charles Johnson None No phone number

Join multiple tables #

Alias #

%%sql 

SELECT first_name, last_name, salary * 1.1 AS new_salary 
FROM employees 
ORDER BY new_salary DESC
LIMIT 10;
Done.
first_name last_name new_salary
Steven King 26400.000000000004
Neena Kochhar 18700.0
Lex De Haan 18700.0
John Russell 15400.000000000002
Karen Partners 14850.000000000002
Michael Hartstein 14300.000000000002
Nancy Greenberg 13200.000000000002
Shelley Higgins 13200.000000000002
Den Raphaely 12100.000000000002
Hermann Baer 11000.0
%%sql 

SELECT e.first_name, e.last_name 
FROM employees AS e
LIMIT 10;
Done.
first_name last_name
Steven King
Neena Kochhar
Lex De Haan
Alexander Hunold
Bruce Ernst
David Austin
Valli Pataballa
Diana Lorentz
Nancy Greenberg
Daniel Faviet

Inner Join #

取共有的。

图片来源: https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-INNER-JOIN.png

我们来看一个例子:

图片来源:https://www.sqltutorial.org/wp-content/uploads/2016/03/emp_dept_tables.png

%%sql 

SELECT department_id, department_name 
FROM departments 
WHERE department_id IN (1, 2, 3)
Done.
department_id department_name
1 Administration
2 Marketing
3 Purchasing
%%sql 
SELECT first_name, last_name, department_id 
FROM employees 
WHERE department_id IN (1, 2, 3)
ORDER BY department_id
Done.
first_name last_name department_id
Jennifer Whalen 1
Michael Hartstein 2
Pat Fay 2
Den Raphaely 3
Alexander Khoo 3
Shelli Baida 3
Sigal Tobias 3
Guy Himuro 3
Karen Colmenares 3
%%sql 

SELECT 
    e.first_name,
    e.last_name, 
    e.department_id, 
    d.department_id,
    d.department_name 
FROM 
    employees AS e
        INNER JOIN
    departments AS d ON d.department_id = e.department_id 
WHERE 
    e.department_id IN (1, 2, 3)
Done.
first_name last_name department_id department_id_1 department_name
Jennifer Whalen 1 1 Administration
Michael Hartstein 2 2 Marketing
Pat Fay 2 2 Marketing
Den Raphaely 3 3 Purchasing
Alexander Khoo 3 3 Purchasing
Shelli Baida 3 3 Purchasing
Sigal Tobias 3 3 Purchasing
Guy Himuro 3 3 Purchasing
Karen Colmenares 3 3 Purchasing

LEFT JOIN #

图片来源:https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-LEFT-JOIN.png

%%sql 

SELECT country_id, country_name 
FROM countries 
WHERE country_id IN ('US', 'UK', 'CN')
Done.
country_id country_name
CN China
UK United Kingdom
US United States of America
%%sql 

SELECT country_id, street_address, city 
FROM locations 
WHERE country_id IN ('US', 'UK', 'CN')
Done.
country_id street_address city
US 2014 Jabberwocky Rd Southlake
US 2011 Interiors Blvd South San Francisco
US 2004 Charade Rd Seattle
UK 8204 Arthur St London
UK Magdalen Centre, The Oxford Science Park Oxford
%%sql 
SELECT 
    c.country_name,
    c.country_id,
    l.country_id,
    l.street_address,
    l.city 
FROM 
    countries AS c 
LEFT JOIN locations AS l on l.country_id = c.country_id 
WHERE 
    c.country_id IN ('US', 'UK', 'CN')
Done.
country_name country_id country_id_1 street_address city
China CN None None None
United Kingdom UK UK 8204 Arthur St London
United Kingdom UK UK Magdalen Centre, The Oxford Science Park Oxford
United States of America US US 2014 Jabberwocky Rd Southlake
United States of America US US 2011 Interiors Blvd South San Francisco
United States of America US US 2004 Charade Rd Seattle
%%sql 
SELECT 
    c.country_name,
    c.country_id,
    l.country_id,
    l.street_address,
    l.city 
FROM 
    locations AS l
LEFT JOIN countries AS c on l.country_id = c.country_id 
WHERE 
    c.country_id IN ('US', 'UK', 'CN')
Done.
country_name country_id country_id_1 street_address city
United Kingdom UK UK 8204 Arthur St London
United Kingdom UK UK Magdalen Centre, The Oxford Science Park Oxford
United States of America US US 2014 Jabberwocky Rd Southlake
United States of America US US 2011 Interiors Blvd South San Francisco
United States of America US US 2004 Charade Rd Seattle
%%sql 
SELECT 
    c.country_name,
    c.country_id,
    l.country_id,
    l.street_address,
    l.city 
FROM 
    countries AS c 
INNER JOIN locations AS l on l.country_id = c.country_id 
WHERE 
    c.country_id IN ('US', 'UK', 'CN')
Done.
country_name country_id country_id_1 street_address city
United Kingdom UK UK 8204 Arthur St London
United Kingdom UK UK Magdalen Centre, The Oxford Science Park Oxford
United States of America US US 2014 Jabberwocky Rd Southlake
United States of America US US 2011 Interiors Blvd South San Francisco
United States of America US US 2004 Charade Rd Seattle

Aggregate functions #

待续

#编程

最后一次修改于 2024-10-11