How to Get WOS ID from DOI

Hongtao Hao / 2022-01-11


Many times, we need data from Web of Science, but we don’t know the WOS ID. If you know the paper DOI, then it’s easy to solve.

First of all, I recommend you to read these two posts by John Kitchin:

  1. Accessing web of science entry, citing and related articles from a doi in emacs
  2. Getting a WOS Accession number from a DOI

As John noted, if the paper’s DOI is 10.1021/jp047349j, for example, you can directly access the data of Web of Science regarding this paper via this link: http://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:doi/10.1021/jp047349j. And you can access the information about papers that have cited this paper via this link: http://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info%3Adoi%2F10.1021/jp047349j&svc_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Asch_svc&svc.citing=yes

You can use this pattern for all other papers with different DOIs.

If you have to get the corresponding WOS ID, you can use this Python script I created:

# loading pacakages

import pandas as pd
import csv
import random
import requests
import re
import time
import numpy as np

# Suppose I have these DOIS and I want WOS ID
dois = ['10.1109/TVCG.2006.143',
 '10.1109/VISUAL.1997.663902',
 '10.1109/TVCG.2006.120',
 '10.1109/TVCG.2006.160',
 '10.1109/INFVIS.2002.1173156',
 '10.1109/VISUAL.1996.568127',
 '10.1109/TVCG.2008.137',
 '10.1109/VISUAL.1996.568113',
 '10.1109/TVCG.2009.122',
 '10.1109/VISUAL.1999.809894']

# initialize a list of dictionaries
doi_wos_dict_list = []

def get_wos_id_from_doi(doi):
    """obtain wos id based on paper doi
    
    Arguments:
       doi: paper doi
    
    Returns:
      The updated doi_wos_dict_list
    """
    url = 'http://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:doi/' + doi
    headers = {
        "user-agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36",
    }
    response = requests.get(url=url, headers=headers)
    wos_url = response.history[-1].url
    wos_id_list = re.findall(r'(?<=2FWOS%3A)(.*)(?=%3F)', wos_url)
    # In case that the response does not contain WOS ID:
    if wos_id_list:
        wos_id = wos_id_list[0]
    else:
        wos_id = np.NaN
    doi_wos_dict = {
        'DOI': doi,
        'WOS ID': wos_id
    }
    doi_wos_dict_list.append(doi_wos_dict)

# For each doi in the list, run the function 
for doi in dois:
    get_wos_id_from_doi(doi)
    time.sleep(2+random.uniform(0, 2))
    print(f'{dois.index(doi)} is done')

# initiate a dataframe 
doi_wos_df_initiate = pd.DataFrame(columns=['DOI', 'WOS ID'])

def build_df_from_dict_list(df, dict_list):
    """build df from a list of dictionaries
    
    Arguments:
       df: an empty df you just initiated
       
       dict_list: a list of dictionaries containing data you want to form a df
    
    Returns:
      The updated df
    """
    for i in dict_list:
        df_1 = pd.DataFrame([i])
        df = df.append(df_1, ignore_index=True)
    return df

doi_wos_df = build_df_from_dict_list(doi_wos_df_initiate, doi_wos_dict_list)

doi_wos_df will contain the results.

After you know the WOS ID, for example,000270778900093, you can access this paper’s data via this link: https://www.webofscience.com/wos/woscc/full-record/WOS:000270778900093

Last modified on 2022-05-18