SQL 简易教程

郝鸿涛 / 2024-10-05

说实话,平常我用不到 SQL,所以一直没有去学。因为要面试,在准备 SQL 方面的知识,就快速学习了一下。我的感觉是和 Pandas 没差多少。

我用的是苹果电脑,Windows 的会不太一样。

我选择用 Sqlite ,因为这个最简单,不需要服务器。

首先检查 sqlite3 是否安装:

sqlite3 --version

我假设你已经安装了 Anaconda。接着,我们建一个虚拟环境

conda create --name sql
conda activate sql
conda install pandas jupyter ipython-sql 
pip install SQLAlchemy==1.4.46 # 因为最新版本的 sqlalchemy 和 ipython-sql 有冲突

然后创建一个文件夹

mkdir sql-learn
cd sql-learn 

下载两个文档:

curl -o sqlite.txt https://www.sqltutorial.org/wp-content/uploads/2020/04/sqlite.txt
curl -o sqlite-data.txt https://www.sqltutorial.org/wp-content/uploads/2020/04/sqlite-data.txt

如果你怕安全性有问题,我建议你把两个网址打开,确认是单纯的 txt 文档。

然后:

sqlite3 tutorial.db

接着

.read sqlite.txt

继续

.read sqlite-data.txt

检查一下

.tables

结果应该是

sqlite> .tables
countries    dependents   jobs         regions    
departments  employees    locations  
SELECT * FROM countries LIMIT 5;

结果应该是

sqlite> SELECT * FROM countries LIMIT 5;
AR|Argentina|2
AU|Australia|3
BE|Belgium|1
BR|Brazil|2
CA|Canada|2

如果没问题,就可以继续开始了。

我在 VS Code 里用 Jupyter Notebook。建一个 ipynb 文档,然后下面的都是在那里面操作。

本文严重参考 https://www.sqltutorial.org/ 上的内容。

首先

%load_ext sql
%sql sqlite:///tutorial.db

但是我这个博客里因为数据在别的地方,所以第二行会不一样:

%load_ext sql
%sql sqlite:///../static/files/cnblog/sql-learn/tutorial.db
'Connected: @../static/files/cnblog/sql-learn/tutorial.db'

All tables #

%%sql
SELECT name FROM sqlite_master WHERE type='table';
name
regions
sqlite_sequence
countries
locations
departments
jobs
employees
dependents
* sqlite:///../static/files/cnblog/sql-learn/tutorial.db Done.
## Query
%%sql
select * from locations limit 5
location_idstreet_addresspostal_codecitystate_provincecountry_id
14002014 Jabberwocky Rd26192SouthlakeTexasUS
15002011 Interiors Blvd99236South San FranciscoCaliforniaUS
17002004 Charade Rd98199SeattleWashingtonUS
1800147 Spadina AveM5V 2L7TorontoOntarioCA
24008204 Arthur StNoneLondonNoneUK
Done.
```sql %%sql select first_name, last_name from employees limit 10 ```
first_namelast_name
StevenKing
NeenaKochhar
LexDe Haan
AlexanderHunold
BruceErnst
DavidAustin
ValliPataballa
DianaLorentz
NancyGreenberg
DanielFaviet
Done.
```sql %%sql select employee_id, salary from employees where salary < 3000; ```
employee_idsalary
1182866.5
1192756.25
1262976.75
Done.
```python # %%sql # delete from employees where hire_date < '1990-01-01'; ```
%%sql 
select first_name, last_name, salary, salary * 1.5 
from employees limit 10;
first_namelast_namesalarysalary * 1.5
StevenKing24000.036000.0
NeenaKochhar17000.025500.0
LexDe Haan17000.025500.0
AlexanderHunold9000.013500.0
BruceErnst6000.09000.0
DavidAustin4800.07200.0
ValliPataballa4800.07200.0
DianaLorentz4200.06300.0
NancyGreenberg12000.018000.0
DanielFaviet9000.013500.0
Done.
```sql %%sql select first_name, last_name, salary, salary * 1.5 as new_salary from employees limit 10; ```
first_namelast_namesalarynew_salary
StevenKing24000.036000.0
NeenaKochhar17000.025500.0
LexDe Haan17000.025500.0
AlexanderHunold9000.013500.0
BruceErnst6000.09000.0
DavidAustin4800.07200.0
ValliPataballa4800.07200.0
DianaLorentz4200.06300.0
NancyGreenberg12000.018000.0
DanielFaviet9000.013500.0
Done.
## Sort
%%sql 
select 
    employee_id,
    first_name,
    last_name,
    hire_date,
    salary 
from 
    employees
limit 
    10;
employee_idfirst_namelast_namehire_datesalary
100StevenKing1987-06-1724000.0
101NeenaKochhar1989-09-2117000.0
102LexDe Haan1993-01-1317000.0
103AlexanderHunold1990-01-039000.0
104BruceErnst1991-05-216000.0
105DavidAustin1997-06-254800.0
106ValliPataballa1998-02-054800.0
107DianaLorentz1999-02-074200.0
108NancyGreenberg1994-08-1712000.0
109DanielFaviet1994-08-169000.0
Done.
```sql %%sql select employee_id, first_name, last_name, hire_date, salary from employees order by first_name limit 10; ```
employee_idfirst_namelast_namehire_datesalary
121AdamFripp1997-04-108200.0
103AlexanderHunold1990-01-039000.0
115AlexanderKhoo1995-05-183100.0
193BritneyEverett1997-03-033900.0
104BruceErnst1991-05-216000.0
179CharlesJohnson2000-01-046200.0
109DanielFaviet1994-08-169000.0
105DavidAustin1997-06-254800.0
114DenRaphaely1994-12-0711000.0
107DianaLorentz1999-02-074200.0
Done.
```sql %%sql select employee_id, first_name, last_name, hire_date, salary from employees order by first_name, last_name desc limit 10; ```
employee_idfirst_namelast_namehire_datesalary
121AdamFripp1997-04-108200.0
115AlexanderKhoo1995-05-183100.0
103AlexanderHunold1990-01-039000.0
193BritneyEverett1997-03-033900.0
104BruceErnst1991-05-216000.0
179CharlesJohnson2000-01-046200.0
109DanielFaviet1994-08-169000.0
105DavidAustin1997-06-254800.0
114DenRaphaely1994-12-0711000.0
107DianaLorentz1999-02-074200.0
Done.
## Filter data ### DISTINCT
%%sql 
select 
    salary 
from 
    employees 
order by 
    salary desc 
limit 10;
salary
24000.0
17000.0
17000.0
14000.0
13500.0
13000.0
12000.0
12000.0
11000.0
10000.0
Done.
```sql %%sql select distinct salary from employees order by salary desc limit 10; ```
salary
24000.0
17000.0
14000.0
13500.0
13000.0
12000.0
11000.0
10000.0
9000.0
8600.0
Done.
```sql %%sql select job_id, salary from employees order by job_id, salary desc limit 10; ```
job_idsalary
18300.0
212000.0
34400.0
424000.0
517000.0
517000.0
69000.0
68200.0
67800.0
67700.0
Done.
```sql %%sql select distinct job_id, salary from employees order by job_id, salary desc limit 10; ```
job_idsalary
18300.0
212000.0
34400.0
424000.0
517000.0
69000.0
68200.0
67800.0
67700.0
66900.0
Done.
`DISTINCT` only keeps one `NULL`.
%%sql 
select distinct
    phone_number
from 
    employees 
order by 
    phone_number
limit 10;
phone_number
None
515.123.4444
515.123.4567
515.123.4568
515.123.4569
515.123.5555
515.123.7777
515.123.8080
515.123.8181
515.123.8888
Done.
### LIMIT & OFFSET

LIMIT means to show many many rows. OFFSET means starting from which row. First row is 0. So OFFSET 3 means starting from the fourth row.

%%sql 
select 
    employee_id,
    first_name,
    last_name
from    
    employees
order by
    first_name
limit 10;
employee_idfirst_namelast_name
121AdamFripp
103AlexanderHunold
115AlexanderKhoo
193BritneyEverett
104BruceErnst
179CharlesJohnson
109DanielFaviet
105DavidAustin
114DenRaphaely
107DianaLorentz
Done.
```sql %%sql select employee_id, first_name, last_name from employees order by first_name limit 5 offset 3; ```
employee_idfirst_namelast_name
193BritneyEverett
104BruceErnst
179CharlesJohnson
109DanielFaviet
105DavidAustin
Done.
### Where
%%sql 
select  
    employee_id,
    first_name,
    last_name,
    salary
from 
    employees 
where
    salary > 14000 
order by 
    salary desc 
employee_idfirst_namelast_namesalary
100StevenKing24000.0
101NeenaKochhar17000.0
102LexDe Haan17000.0
Done.
```sql %%sql select employee_id, first_name, last_name, department_id from employees where department_id = 5 order by first_name; ```
employee_idfirst_namelast_namedepartment_id
121AdamFripp5
193BritneyEverett5
126IreneMikkilineni5
120MatthewWeiss5
122PayamKaufling5
192SarahBell5
123ShantaVollman5
Done.
```sql %%sql select employee_id, first_name, last_name from employees where last_name = "Chen" ```
employee_idfirst_namelast_name
110JohnChen
Done.
### Comparison oprators

Logical operators #

AND #

%%sql 
select first_name, last_name, salary 
from employees 
where salary > 5000 AND salary < 7000 
order by salary desc;
first_namelast_namesalary
LuisPopp6900.0
ShantaVollman6500.0
SusanMavris6500.0
CharlesJohnson6200.0
BruceErnst6000.0
PatFay6000.0
Done.
#### OR
%%sql 
select first_name, last_name, salary 
from employees 
where salary = 7000 OR salary = 8000; 
first_namelast_namesalary
MatthewWeiss8000.0
KimberelyGrant7000.0
Done.
#### IS NULL
%%sql 
select first_name, last_name, phone_number 
from employees 
where phone_number IS NULL 
order by first_name, last_name;
first_namelast_namephone_number
CharlesJohnsonNone
JackLivingstonNone
JohnRussellNone
JonathonTaylorNone
KarenPartnersNone
KimberelyGrantNone
Done.
#### BETWEEN
%%sql 
select first_name, last_name, salary 
from employees 
where salary BETWEEN 9000 and 12000 
order by salary desc;
first_namelast_namesalary
NancyGreenberg12000.0
ShelleyHiggins12000.0
DenRaphaely11000.0
HermannBaer10000.0
AlexanderHunold9000.0
DanielFaviet9000.0
Done.
#### IN
%%sql 
select first_name, last_name, department_id 
from employees 
where department_id IN (6, 8, 9)
order by department_id;
first_namelast_namedepartment_id
AlexanderHunold6
BruceErnst6
DavidAustin6
ValliPataballa6
DianaLorentz6
JohnRussell8
KarenPartners8
JonathonTaylor8
JackLivingston8
KimberelyGrant8
CharlesJohnson8
StevenKing9
NeenaKochhar9
LexDe Haan9
Done.
#### LIKE
%%sql 
select employee_id, first_name, last_name 
from employees 
where first_name LIKE '_h%'
order by first_name;
employee_idfirst_namelast_name
179CharlesJohnson
123ShantaVollman
205ShelleyHiggins
116ShelliBaida
Done.
`_` represents a single character.

% represents 0, 1, or multiple characters.

ALL, ANY #

SQLITE does not support ALL or ANY. But there are alternatives.

For example

SELECT * FROM employees
WHERE salary >= ALL (
    SELECT salary FROM employees WHERE department_id = 8
);

can be done via

SELECT * FROM employees
WHERE salary >= (
    SELECT MAX(salary) FROM employees WHERE department_id = 8
)

And

SELECT * FROM employees
WHERE salary >= ANY (
    SELECT salary FROM employees WHERE department_id = 8
);

can be achieved through

SELECT * FROM employees
WHERE salary >= (
    SELECT MIN(salary) FROM employees WHERE department_id = 8
);

or

SELECT * FROM employees
WHERE salary IN (
    SELECT salary FROM employees WHERE department_id = 8
);

EXISTS #

For example, show names of those who have dependents:

%%sql 
select first_name, last_name 
from employees e 
where EXISTS(
    select * from dependents d 
    where d.employee_id = e.employee_id
)
limit 10;
first_namelast_name
StevenKing
NeenaKochhar
LexDe Haan
AlexanderHunold
BruceErnst
DavidAustin
ValliPataballa
DianaLorentz
NancyGreenberg
DanielFaviet
Done.
在我看来,上面的这行就说明了 sql 相比于 csv 的优势。通过 `employee_id` 这一栏,我们把两个数据连在一起,放在了一个数据库。

Conditional Expression #

%%sql 
SELECT first_name, last_name,
CASE 
    when salary < 3000 then 'low'
    when salary >= 3000 and salary <= 5000 then 'average'
    ELSE 'high'
END AS evaluation 
FROM employees
LIMIT 10;
first_namelast_nameevaluation
StevenKinghigh
NeenaKochharhigh
LexDe Haanhigh
AlexanderHunoldhigh
BruceErnsthigh
DavidAustinaverage
ValliPataballaaverage
DianaLorentzaverage
NancyGreenberghigh
DanielFaviethigh
Done.
```sql %%sql SELECT first_name, last_name, phone_number, CASE WHEN phone_number IS NULL THEN 'No phone number' ELSE phone_number END AS final_phone_number FROM employees where phone_number IS NULL LIMIT 10 ```
first_namelast_namephone_numberfinal_phone_number
JohnRussellNoneNo phone number
KarenPartnersNoneNo phone number
JonathonTaylorNoneNo phone number
JackLivingstonNoneNo phone number
KimberelyGrantNoneNo phone number
CharlesJohnsonNoneNo phone number
Done.
## Join multiple tables

Alias #

%%sql 

SELECT first_name, last_name, salary * 1.1 AS new_salary 
FROM employees 
ORDER BY new_salary DESC
LIMIT 10;
first_namelast_namenew_salary
StevenKing26400.000000000004
NeenaKochhar18700.0
LexDe Haan18700.0
JohnRussell15400.000000000002
KarenPartners14850.000000000002
MichaelHartstein14300.000000000002
NancyGreenberg13200.000000000002
ShelleyHiggins13200.000000000002
DenRaphaely12100.000000000002
HermannBaer11000.0
Done.
```sql %%sql

SELECT e.first_name, e.last_name FROM employees AS e LIMIT 10;


<table>
    <thead>
        <tr>
            <th>first_name</th>
            <th>last_name</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Steven</td>
            <td>King</td>
        </tr>
        <tr>
            <td>Neena</td>
            <td>Kochhar</td>
        </tr>
        <tr>
            <td>Lex</td>
            <td>De Haan</td>
        </tr>
        <tr>
            <td>Alexander</td>
            <td>Hunold</td>
        </tr>
        <tr>
            <td>Bruce</td>
            <td>Ernst</td>
        </tr>
        <tr>
            <td>David</td>
            <td>Austin</td>
        </tr>
        <tr>
            <td>Valli</td>
            <td>Pataballa</td>
        </tr>
        <tr>
            <td>Diana</td>
            <td>Lorentz</td>
        </tr>
        <tr>
            <td>Nancy</td>
            <td>Greenberg</td>
        </tr>
        <tr>
            <td>Daniel</td>
            <td>Faviet</td>
        </tr>
    </tbody>
</table>
Done.
### Inner Join 取共有的。 ![](https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-INNER-JOIN.png) 图片来源: https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-INNER-JOIN.png 我们来看一个例子: ![](https://www.sqltutorial.org/wp-content/uploads/2016/03/emp_dept_tables.png) 图片来源:https://www.sqltutorial.org/wp-content/uploads/2016/03/emp_dept_tables.png ```sql %%sql SELECT department_id, department_name FROM departments WHERE department_id IN (1, 2, 3)
department_iddepartment_name
1Administration
2Marketing
3Purchasing
Done.
```sql %%sql SELECT first_name, last_name, department_id FROM employees WHERE department_id IN (1, 2, 3) ORDER BY department_id ```
first_namelast_namedepartment_id
JenniferWhalen1
MichaelHartstein2
PatFay2
DenRaphaely3
AlexanderKhoo3
ShelliBaida3
SigalTobias3
GuyHimuro3
KarenColmenares3
Done.
```sql %%sql

SELECT e.first_name, e.last_name, e.department_id, d.department_id, d.department_name FROM employees AS e INNER JOIN departments AS d ON d.department_id = e.department_id WHERE e.department_id IN (1, 2, 3)


<table>
    <thead>
        <tr>
            <th>first_name</th>
            <th>last_name</th>
            <th>department_id</th>
            <th>department_id_1</th>
            <th>department_name</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Jennifer</td>
            <td>Whalen</td>
            <td>1</td>
            <td>1</td>
            <td>Administration</td>
        </tr>
        <tr>
            <td>Michael</td>
            <td>Hartstein</td>
            <td>2</td>
            <td>2</td>
            <td>Marketing</td>
        </tr>
        <tr>
            <td>Pat</td>
            <td>Fay</td>
            <td>2</td>
            <td>2</td>
            <td>Marketing</td>
        </tr>
        <tr>
            <td>Den</td>
            <td>Raphaely</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
        <tr>
            <td>Alexander</td>
            <td>Khoo</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
        <tr>
            <td>Shelli</td>
            <td>Baida</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
        <tr>
            <td>Sigal</td>
            <td>Tobias</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
        <tr>
            <td>Guy</td>
            <td>Himuro</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
        <tr>
            <td>Karen</td>
            <td>Colmenares</td>
            <td>3</td>
            <td>3</td>
            <td>Purchasing</td>
        </tr>
    </tbody>
</table>
Done.
### LEFT JOIN ![](https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-LEFT-JOIN.png) 图片来源:https://www.sqltutorial.org/wp-content/uploads/2016/03/SQL-LEFT-JOIN.png ```sql %%sql SELECT country_id, country_name FROM countries WHERE country_id IN ('US', 'UK', 'CN')
country_idcountry_name
CNChina
UKUnited Kingdom
USUnited States of America
Done.
```sql %%sql

SELECT country_id, street_address, city FROM locations WHERE country_id IN (‘US’, ‘UK’, ‘CN’)


<table>
    <thead>
        <tr>
            <th>country_id</th>
            <th>street_address</th>
            <th>city</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>US</td>
            <td>2014 Jabberwocky Rd</td>
            <td>Southlake</td>
        </tr>
        <tr>
            <td>US</td>
            <td>2011 Interiors Blvd</td>
            <td>South San Francisco</td>
        </tr>
        <tr>
            <td>US</td>
            <td>2004 Charade Rd</td>
            <td>Seattle</td>
        </tr>
        <tr>
            <td>UK</td>
            <td>8204 Arthur St</td>
            <td>London</td>
        </tr>
        <tr>
            <td>UK</td>
            <td>Magdalen Centre, The Oxford Science Park</td>
            <td>Oxford</td>
        </tr>
    </tbody>
</table>
Done.
```sql %%sql SELECT c.country_name, c.country_id, l.country_id, l.street_address, l.city FROM countries AS c LEFT JOIN locations AS l on l.country_id = c.country_id WHERE c.country_id IN ('US', 'UK', 'CN')
country_namecountry_idcountry_id_1street_addresscity
ChinaCNNoneNoneNone
United KingdomUKUK8204 Arthur StLondon
United KingdomUKUKMagdalen Centre, The Oxford Science ParkOxford
United States of AmericaUSUS2014 Jabberwocky RdSouthlake
United States of AmericaUSUS2011 Interiors BlvdSouth San Francisco
United States of AmericaUSUS2004 Charade RdSeattle
Done.
```sql %%sql SELECT c.country_name, c.country_id, l.country_id, l.street_address, l.city FROM locations AS l LEFT JOIN countries AS c on l.country_id = c.country_id WHERE c.country_id IN ('US', 'UK', 'CN') ```
country_namecountry_idcountry_id_1street_addresscity
United KingdomUKUK8204 Arthur StLondon
United KingdomUKUKMagdalen Centre, The Oxford Science ParkOxford
United States of AmericaUSUS2014 Jabberwocky RdSouthlake
United States of AmericaUSUS2011 Interiors BlvdSouth San Francisco
United States of AmericaUSUS2004 Charade RdSeattle
Done.
```sql %%sql SELECT c.country_name, c.country_id, l.country_id, l.street_address, l.city FROM countries AS c INNER JOIN locations AS l on l.country_id = c.country_id WHERE c.country_id IN ('US', 'UK', 'CN') ```
country_namecountry_idcountry_id_1street_addresscity
United KingdomUKUK8204 Arthur StLondon
United KingdomUKUKMagdalen Centre, The Oxford Science ParkOxford
United States of AmericaUSUS2014 Jabberwocky RdSouthlake
United States of AmericaUSUS2011 Interiors BlvdSouth San Francisco
United States of AmericaUSUS2004 Charade RdSeattle
Done.
## Aggregate functions

待续

#编程

最后一次修改于 2024-12-07