SQL 简易教程

郝鸿涛 / 2024-10-05

说实话,平常我用不到 SQL,所以一直没有去学。因为要面试,在准备 SQL 方面的知识,就快速学习了一下。我的感觉是和 Pandas 没差多少。

我用的是苹果电脑,Windows 的会不太一样。

我选择用 Sqlite ,因为这个最简单,不需要服务器。

首先检查 sqlite3 是否安装:

sqlite3 --version

我假设你已经安装了 Anaconda。接着,我们建一个虚拟环境

conda create --name sql
conda activate sql
conda install pandas jupyter ipython-sql 
pip install SQLAlchemy==1.4.46 # 因为最新版本的 sqlalchemy 和 ipython-sql 有冲突

然后创建一个文件夹

mkdir sql-learn
cd sql-learn 

下载两个文档:

curl -o sqlite.txt https://www.sqltutorial.org/wp-content/uploads/2020/04/sqlite.txt
curl -o sqlite-data.txt https://www.sqltutorial.org/wp-content/uploads/2020/04/sqlite-data.txt

如果你怕安全性有问题,我建议你把两个网址打开,确认是单纯的 txt 文档。

然后:

sqlite3 tutorial.db

接着

.read sqlite.txt

继续

.read sqlite-data.txt

检查一下

.tables

结果应该是

sqlite> .tables
countries    dependents   jobs         regions    
departments  employees    locations  
SELECT * FROM countries LIMIT 5;

结果应该是

sqlite> SELECT * FROM countries LIMIT 5;
AR|Argentina|2
AU|Australia|3
BE|Belgium|1
BR|Brazil|2
CA|Canada|2

如果没问题,就可以继续开始了。

我在 VS Code 里用 Jupyter Notebook。建一个 ipynb 文档,然后下面的都是在那里面操作。

本文严重参考 https://www.sqltutorial.org/ 上的内容。

首先

%load_ext sql
%sql sqlite:///tutorial.db

但是我这个博客里因为数据在别的地方,所以第二行会不一样:

%load_ext sql
%sql sqlite:///../static/files/cnblog/sql-learn/tutorial.db
'Connected: @../static/files/cnblog/sql-learn/tutorial.db'

All tables #

%%sql
SELECT name FROM sqlite_master WHERE type='table';
name
regions
sqlite_sequence
countries
locations
departments
jobs
employees
dependents
* sqlite:///../static/files/cnblog/sql-learn/tutorial.db Done.
## Query
%%sql
select * from locations limit 5
location_id street_address postal_code city state_province country_id
1400 2014 Jabberwocky Rd 26192 Southlake Texas US
1500 2011 Interiors Blvd 99236 South San Francisco California US
1700 2004 Charade Rd 98199 Seattle Washington US
1800 147 Spadina Ave M5V 2L7 Toronto Ontario CA
2400 8204 Arthur St None London None UK
Done.
```sql %%sql select first_name, last_name from employees limit 10 ```
first_name last_name
Steven King
Neena Kochhar
Lex De Haan
Alexander Hunold
Bruce Ernst
David Austin
Valli Pataballa
Diana Lorentz
Nancy Greenberg
Daniel Faviet
Done.
```sql %%sql select employee_id, salary from employees where salary < 3000; ```
employee_id salary
118 2866.5
119 2756.25
126 2976.75
Done.
```python # %%sql # delete from employees where hire_date < '1990-01-01'; ```
%%sql 
select first_name, last_name, salary, salary * 1.5 
from employees limit 10;
first_name last_name salary salary * 1.5
Steven King 24000.0 36000.0
Neena Kochhar 17000.0 25500.0
Lex De Haan 17000.0 25500.0
Alexander Hunold 9000.0 13500.0
Bruce Ernst 6000.0 9000.0
David Austin 4800.0 7200.0
Valli Pataballa 4800.0 7200.0
Diana Lorentz 4200.0 6300.0
Nancy Greenberg 12000.0 18000.0
Daniel Faviet 9000.0 13500.0
Done.
```sql %%sql select first_name, last_name, salary, salary * 1.5 as new_salary from employees limit 10; ```
first_name last_name salary new_salary
Steven King 24000.0 36000.0
Neena Kochhar 17000.0 25500.0
Lex De Haan 17000.0 25500.0
Alexander Hunold 9000.0 13500.0
Bruce Ernst 6000.0 9000.0
David Austin 4800.0 7200.0
Valli Pataballa 4800.0 7200.0
Diana Lorentz 4200.0 6300.0
Nancy Greenberg 12000.0 18000.0
Daniel Faviet 9000.0 13500.0
Done.
## Sort
%%sql 
select 
    employee_id,
    first_name,
    last_name,
    hire_date,
    salary 
from 
    employees
limit 
    10;
employee_id first_name last_name hire_date salary
100 Steven King 1987-06-17 24000.0
101 Neena Kochhar 1989-09-21 17000.0
102 Lex De Haan 1993-01-13 17000.0
103 Alexander Hunold 1990-01-03 9000.0
104 Bruce Ernst 1991-05-21 6000.0
105 David Austin 1997-06-25 4800.0
106 Valli Pataballa 1998-02-05 4800.0
107 Diana Lorentz 1999-02-07 4200.0
108 Nancy Greenberg 1994-08-17 12000.0
109 Daniel Faviet 1994-08-16 9000.0
Done.
```sql %%sql select employee_id, first_name, last_name, hire_date, salary from employees order by first_name limit 10; ```
employee_id first_name last_name hire_date salary
121 Adam Fripp 1997-04-10 8200.0
103 Alexander Hunold 1990-01-03 9000.0
115 Alexander Khoo 1995-05-18 3100.0
193 Britney Everett 1997-03-03 3900.0
104 Bruce Ernst 1991-05-21 6000.0
179 Charles Johnson 2000-01-04 6200.0
109 Daniel Faviet 1994-08-16 9000.0
105 David Austin 1997-06-25 4800.0
114 Den Raphaely 1994-12-07 11000.0
107 Diana Lorentz 1999-02-07 4200.0
Done.
```sql %%sql select employee_id, first_name, last_name, hire_date, salary from employees order by first_name, last_name desc limit 10; ```
employee_id first_name last_name hire_date salary
121 Adam Fripp 1997-04-10 8200.0
115 Alexander Khoo 1995-05-18 3100.0
103 Alexander Hunold 1990-01-03 9000.0
193 Britney Everett 1997-03-03 3900.0
104 Bruce Ernst 1991-05-21 6000.0
179 Charles Johnson 2000-01-04 6200.0
109 Daniel Faviet 1994-08-16 9000.0
105 David Austin 1997-06-25 4800.0
114 Den Raphaely 1994-12-07 11000.0
107 Diana Lorentz 1999-02-07 4200.0
Done.
## Filter data ### DISTINCT
%%sql 
select 
    salary 
from 
    employees 
order by 
    salary desc 
limit 10;
salary
24000.0
17000.0
17000.0
14000.0
13500.0
13000.0
12000.0
12000.0
11000.0
10000.0
Done.
```sql %%sql select distinct salary from employees order by salary desc limit 10; ```
salary
24000.0
17000.0
14000.0
13500.0
13000.0
12000.0
11000.0
10000.0
9000.0
8600.0
Done.
```sql %%sql select job_id, salary from employees order by job_id, salary desc limit 10; ```
job_id salary
1 8300.0
2 12000.0
3 4400.0
4 24000.0
5 17000.0
5 17000.0
6 9000.0
6 8200.0
6 7800.0
6 7700.0
Done.
```sql %%sql select distinct job_id, salary from employees order by job_id, salary desc limit 10; ```
job_id salary
1 8300.0
2 12000.0
3 4400.0
4 24000.0
5 17000.0
6 9000.0
6 8200.0
6 7800.0
6 7700.0
6 6900.0
Done.
`DISTINCT` only keeps one `NULL`.
%%sql 
select distinct
    phone_number
from 
    employees 
order by 
    phone_number
limit 10;
phone_number
None
515.123.4444
515.123.4567
515.123.4568
515.123.4569
515.123.5555
515.123.7777
515.123.8080
515.123.8181
515.123.8888
Done.
### LIMIT & OFFSET

LIMIT means to show many many rows. OFFSET means starting from which row. First row is 0. So OFFSET 3 means starting from the fourth row.

%%sql 
select 
    employee_id,
    first_name,
    last_name
from    
    employees
order by
    first_name
limit 10;
employee_id first_name last_name
121 Adam Fripp
103 Alexander Hunold
115 Alexander Khoo
193 Britney Everett
104 Bruce Ernst
179 Charles Johnson
109 Daniel Faviet
105 David Austin
114 Den Raphaely
107 Diana Lorentz
Done.
```sql %%sql select employee_id, first_name, last_name from employees order by first_name limit 5 offset 3; ```
employee_id first_name last_name
193 Britney Everett
104 Bruce Ernst
179 Charles Johnson
109 Daniel Faviet
105 David Austin
Done.
### Where
%%sql 
select  
    employee_id,
    first_name,
    last_name,
    salary
from 
    employees 
where
    salary > 14000 
order by 
    salary desc 
employee_id first_name last_name salary
100 Steven King 24000.0
101 Neena Kochhar 17000.0
102 Lex De Haan 17000.0
Done.
```sql %%sql select employee_id, first_name, last_name, department_id from employees where department_id = 5 order by first_name; ```
employee_id first_name last_name department_id
121 Adam Fripp 5
193 Britney Everett 5
126 Irene Mikkilineni 5
120 Matthew Weiss 5
122 Payam Kaufling 5
192 Sarah Bell 5
123 Shanta Vollman 5
Done.
```sql %%sql select employee_id, first_name, last_name from employees where last_name = "Chen" ```
employee_id first_name last_name
110 John Chen
Done.
### Comparison oprators

Logical operators #

AND #

%%sql 
select first_name, last_name, salary 
from employees 
where salary > 5000 AND salary < 7000 
order by salary desc;
first_name last_name salary
Luis Popp 6900.0
Shanta Vollman 6500.0
Susan Mavris 6500.0
Charles Johnson 6200.0
Bruce Ernst 6000.0
Pat Fay 6000.0
Done.
#### OR
%%sql 
select first_name, last_name, salary 
from employees 
where salary = 7000 OR salary = 8000; 
first_name last_name salary
Matthew Weiss 8000.0
Kimberely Grant 7000.0
Done.
#### IS NULL
%%sql 
select first_name, last_name, phone_number 
from employees 
where phone_number IS NULL 
order by first_name, last_name;
first_name last_name phone_number
Charles Johnson None
Jack Livingston None
John Russell None
Jonathon Taylor None
Karen Partners None
Kimberely Grant None
Done.
#### BETWEEN
%%sql 
select first_name, last_name, salary 
from employees 
where salary BETWEEN 9000 and 12000 
order by salary desc;
first_name last_name salary
Nancy Greenberg 12000.0
Shelley Higgins 12000.0
Den Raphaely 11000.0
Hermann Baer 10000.0
Alexander Hunold 9000.0
Daniel Faviet 9000.0
Done.
#### IN
%%sql 
select first_name, last_name, department_id 
from employees 
where department_id IN (6, 8, 9)
order by department_id;
first_name last_name department_id
Alexander Hunold 6
Bruce Ernst 6
David Austin 6
Valli Pataballa 6
Diana Lorentz 6
John Russell 8
Karen Partners 8
Jonathon Taylor 8
Jack Livingston 8
Kimberely Grant 8
Charles Johnson 8
Steven King 9
Neena Kochhar 9
Lex De Haan 9
Done.
#### LIKE
%%sql 
select employee_id, first_name, last_name 
from employees 
where first_name LIKE '_h%'
order by first_name;
employee_id first_name last_name
179 Charles Johnson
123 Shanta Vollman
205 Shelley Higgins
116 Shelli Baida
Done.
`_` represents a single character.

% represents 0, 1, or multiple characters.

ALL, ANY #

SQLITE does not support ALL or ANY. But there are alternatives.

For example

SELECT * FROM employees
WHERE salary >= ALL (
    SELECT salary FROM employees WHERE department_id = 8
);

can be done via

SELECT * FROM employees
WHERE salary >= (
    SELECT MAX(salary) FROM employees WHERE department_id = 8
)

And

SELECT * FROM employees
WHERE salary >= ANY (
    SELECT salary FROM employees WHERE department_id = 8
);

can be achieved through

SELECT * FROM employees
WHERE salary >= (
    SELECT MIN(salary) FROM employees WHERE department_id = 8
);

or

SELECT * FROM employees
WHERE salary IN (
    SELECT salary FROM employees WHERE department_id = 8
);

EXISTS #

For example, show names of those who have dependents:

%%sql 
select first_name, last_name 
from employees e 
where EXISTS(
    select * from dependents d 
    where d.employee_id = e.employee_id
)
limit 10;
first_name last_name
Steven King
Neena Kochhar
Lex De Haan
Alexander Hunold
Bruce Ernst
David Austin
Valli Pataballa
Diana Lorentz
Nancy Greenberg
Daniel Faviet
Done.
在我看来,上面的这行就说明了 sql 相比于 csv 的优势。通过 `employee_id` 这一栏,我们把两个数据连在一起,放在了一个数据库。

Conditional Expression #

%%sql 
SELECT first_name, last_name,
CASE 
    when salary < 3000 then 'low'
    when salary >= 3000 and salary <= 5000 then 'average'
    ELSE 'high'
END AS evaluation 
FROM employees
LIMIT 10;
first_name last_name evaluation
Steven King high
Neena Kochhar high
Lex De Haan high
Alexander Hunold high
Bruce Ernst high
David Austin average
Valli Pataballa average
Diana Lorentz average
Nancy Greenberg high
Daniel Faviet high
Done.
```sql %%sql SELECT first_name, last_name, phone_number, CASE WHEN phone_number IS NULL THEN 'No phone number' ELSE phone_number END AS final_phone_number FROM employees where phone_number IS NULL LIMIT 10 ```
first_name last_name phone_number final_phone_number
John Russell None No phone number
Karen Partners None No phone number
Jonathon Taylor None No phone number
Jack Livingston None No phone number
Kimberely Grant None No phone number
Charles Johnson None No phone number
Done.
## Join multiple tables

Alias #

%%sql 

SELECT first_name, last_name, salary * 1.1 AS new_salary 
FROM employees 
ORDER BY new_salary DESC
LIMIT 10;
first_name last_name new_salary
Steven King 26400.000000000004
Neena Kochhar 18700.0
Lex De Haan 18700.0
John Russell 15400.000000000002
Karen Partners 14850.000000000002
Michael Hartstein 14300.000000000002
Nancy Greenberg 13200.000000000002
Shelley Higgins 13200.000000000002
Den Raphaely 12100.000000000002
Hermann Baer 11000.0
Done.
```sql %%sql

SELECT e.first_name, e.last_name FROM employees AS e LIMIT 10;


<table>
    <thead>
        <tr>
            <th>first_name</th>
            <th>last_name</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Steven</td>
            <td>King</td>
        </tr>
        <tr>
            <td>Neena</td>
            <td>Kochhar</td>
        </tr>
        <tr>
            <td>Lex</td>
            <td>De Haan</td>
        </tr>
        <tr>
            <td>Alexander</td>
            <td>Hunold</td>
        </tr>
        <tr>
            <td>Bruce</td>
            <td>Ernst</td>
        </tr>
        <tr>
            <td>David</td>
            <td>Austin</td>
        </tr>
        <tr>
            <td>Valli</td>
            <td>Pataballa</td>
        </tr>
        <tr>
            <td>Diana</td>
            <td>Lorentz</td>
        </tr>
        <tr>
            <td>Nancy</td>
            <td>Greenberg</td>
        </tr>
        <tr>
            <td>Daniel</td>
            <td>Faviet</td>
        </tr>
    </tbody>
</table>
Done.
### Inner Join 取共有的。 ![](https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-INNER-JOIN.png) 图片来源: https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-INNER-JOIN.png 我们来看一个例子: ![](https://www.sqltutorial.org/wp-content/uploads/2016/03/emp_dept_tables.png) 图片来源:https://www.sqltutorial.org/wp-content/uploads/2016/03/emp_dept_tables.png ```sql %%sql SELECT department_id, department_name FROM departments WHERE department_id IN (1, 2, 3)
department_id department_name
1 Administration
2 Marketing
3 Purchasing
Done.
```sql %%sql SELECT first_name, last_name, department_id FROM employees WHERE department_id IN (1, 2, 3) ORDER BY department_id ```
first_name last_name department_id
Jennifer Whalen 1
Michael Hartstein 2
Pat Fay 2
Den Raphaely 3
Alexander Khoo 3
Shelli Baida 3
Sigal Tobias 3
Guy Himuro 3
Karen Colmenares 3
Done.
```sql %%sql

SELECT e.first_name, e.last_name, e.department_id, d.department_id, d.department_name FROM employees AS e INNER JOIN departments AS d ON d.department_id = e.department_id WHERE e.department_id IN (1, 2, 3)


<table>
    <thead>
        <tr>
            <th>first_name</th>
            <th>last_name</th>
            <th>department_id</th>
            <th>department_id_1</th>
            <th>department_name</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Jennifer</td>
            <td>Whalen</td>
            <td>1</td>
            <td>1</td>
            <td>Administration</td>
        </tr>
        <tr>
            <td>Michael</td>
            <td>Hartstein</td>
            <td>2</td>
            <td>2</td>
            <td>Marketing</td>
        </tr>
        <tr>
            <td>Pat</td>
            <td>Fay</td>
            <td>2</td>
            <td>2</td>
            <td>Marketing</td>
        </tr>
        <tr>
            <td>Den</td>
            <td>Raphaely</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
        <tr>
            <td>Alexander</td>
            <td>Khoo</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
        <tr>
            <td>Shelli</td>
            <td>Baida</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
        <tr>
            <td>Sigal</td>
            <td>Tobias</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
        <tr>
            <td>Guy</td>
            <td>Himuro</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
        <tr>
            <td>Karen</td>
            <td>Colmenares</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
    </tbody>
</table>
Done.
### LEFT JOIN ![](https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-LEFT-JOIN.png) 图片来源:https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-LEFT-JOIN.png ```sql %%sql SELECT country_id, country_name FROM countries WHERE country_id IN ('US', 'UK', 'CN')
country_id country_name
CN China
UK United Kingdom
US United States of America
Done.
```sql %%sql

SELECT country_id, street_address, city FROM locations WHERE country_id IN (‘US’, ‘UK’, ‘CN’)


<table>
    <thead>
        <tr>
            <th>country_id</th>
            <th>street_address</th>
            <th>city</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>US</td>
            <td>2014 Jabberwocky Rd</td>
            <td>Southlake</td>
        </tr>
        <tr>
            <td>US</td>
            <td>2011 Interiors Blvd</td>
            <td>South San Francisco</td>
        </tr>
        <tr>
            <td>US</td>
            <td>2004 Charade Rd</td>
            <td>Seattle</td>
        </tr>
        <tr>
            <td>UK</td>
            <td>8204 Arthur St</td>
            <td>London</td>
        </tr>
        <tr>
            <td>UK</td>
            <td>Magdalen Centre, The Oxford Science Park</td>
            <td>Oxford</td>
        </tr>
    </tbody>
</table>
Done.
```sql %%sql SELECT c.country_name, c.country_id, l.country_id, l.street_address, l.city FROM countries AS c LEFT JOIN locations AS l on l.country_id = c.country_id WHERE c.country_id IN ('US', 'UK', 'CN')
country_name country_id country_id_1 street_address city
China CN None None None
United Kingdom UK UK 8204 Arthur St London
United Kingdom UK UK Magdalen Centre, The Oxford Science Park Oxford
United States of America US US 2014 Jabberwocky Rd Southlake
United States of America US US 2011 Interiors Blvd South San Francisco
United States of America US US 2004 Charade Rd Seattle
Done.
```sql %%sql SELECT c.country_name, c.country_id, l.country_id, l.street_address, l.city FROM locations AS l LEFT JOIN countries AS c on l.country_id = c.country_id WHERE c.country_id IN ('US', 'UK', 'CN') ```
country_name country_id country_id_1 street_address city
United Kingdom UK UK 8204 Arthur St London
United Kingdom UK UK Magdalen Centre, The Oxford Science Park Oxford
United States of America US US 2014 Jabberwocky Rd Southlake
United States of America US US 2011 Interiors Blvd South San Francisco
United States of America US US 2004 Charade Rd Seattle
Done.
```sql %%sql SELECT c.country_name, c.country_id, l.country_id, l.street_address, l.city FROM countries AS c INNER JOIN locations AS l on l.country_id = c.country_id WHERE c.country_id IN ('US', 'UK', 'CN') ```
country_name country_id country_id_1 street_address city
United Kingdom UK UK 8204 Arthur St London
United Kingdom UK UK Magdalen Centre, The Oxford Science Park Oxford
United States of America US US 2014 Jabberwocky Rd Southlake
United States of America US US 2011 Interiors Blvd South San Francisco
United States of America US US 2004 Charade Rd Seattle
Done.
## Aggregate functions

待续

#编程

最后一次修改于 2024-12-07