"EDirect for PubMed: Part 1: Getting PubMed Data" Sample Code

Below you will find sample code for the examples, in-class exercises and homework presented in the first session of the “EDirect for PubMed” Insider’s Guide class. These examples are written for use with EDirect in a Unix environment. If you need help installing and setting up EDirect, please see our “Installing EDirect” page.

For more examples, please see the sample code from the other parts of “EDirect for PubMed”:

The code below is lightly annotated to explain how it works, but if you are looking for more information, we suggest you review our EDirect documentation.

There are many different ways to answer the questions discussed in class. The sample code below provides some options, but by no means the only options. Feel free to modify, adapt, edit, re-use or completely discard any of the suggestions below when trying to find a solution that works best for you.

esearch

Conduct a simple search of PubMed for articles on seasonal affective disorder

esearch -db pubmed -query "seasonal affective disorder"

This line of code uses the esearch command to search PubMed (-db pubmed) for our search query (-query "seasonal affective disorder").

Conduct a simple search of PubMed for articles on malaria in the journal JAMA

esearch -db pubmed -query "malaria AND jama[journal]"

This line of code uses the esearch command to search PubMed (-db pubmed) for our search query (-query "malaria AND jama[journal]"). Note that the search query can include Boolean operators (AND) and search field tags ([journal]) to help focus our search, just as we can in the web version of PubMed.

Restrict search results by publication date

esearch -db pubmed -query "malaria AND jama[journal]" \
-datetype PDAT -mindate 2015 -maxdate 2017

The first line of code is the same as our previous example, though the “\” character at the end of the line allows us to continue our command on the next line, for easier-to-read formatting.

The second line limits the search results by publication date (-datetype PDAT), including only articles published between 2015 and 2017 (-mindate 2015 -maxdate 2017).

Conduct a PubMed search with a search string that includes quotation marks

esearch -db pubmed -query "cancer AND \"science\"[journal]"

This line of code uses the esearch command to search PubMed (-db pubmed) for our search query (-query "cancer AND \"science\"[journal]"). We need to “escape” the double quotation marks (“) in our search query by putting a ”\" before them. This tells EDirect to interpret the quotation marks as just another character, and not a special character that marks the end of the -query argument. Otherwise, EDirect would interpret the double quotation marks before the first date as marking the end of the search query, and the rest of the query would not be searched.

efetch

Retrieve a single PubMed record in text abstract format

efetch -db pubmed -id 25359968 -format abstract

This line of code uses the efetch command to retrieve a record from PubMed (-db pubmed). We specify that we will retrieve the record for PMID 25359968 (-id 25359968) and that we want the results in the text abstract format (-format abstract).

Retrieve multiple PubMed records in text abstract format

efetch -db pubmed -id 24102982,21171099,17150207 -format abstract

This line of code uses the efetch command to retrieve records from PubMed (-db pubmed). We specify that we will retrieve the records for PMID 24102982,21171099,17150207 (-id 24102982,21171099,17150207) and that we want the results in the text abstract format (-format abstract).

Creating a data pipeline

Conduct a PubMed search and retrieve the results as a list of PMIDs

esearch -db pubmed -query "asthenopia[mh] AND nursing[sh]" | efetch -format uid

This line of code uses the esearch command to search PubMed (-db pubmed) for our search query (-query "asthenopia[mh] AND nursing[sh]"), and then pipes the resulting PMIDs into an efetch command (| efetch), which retrieves the PubMed records, but outputs only the PMIDs (-format uid). For more information about piping data from one EDirect command to another, please review the page on Making data pipelines with the History server in our EDirect overview.

In-class exercise solutions

Exercise 1: esearch

How many Spanish-language articles about diabetes are in PubMed?

Solution:

esearch -db pubmed -query "diabetes AND spanish[lang]"

This line of code uses the esearch command to search PubMed (-db pubmed) for our search query (-query "diabetes AND spanish[lang]"). Note that the search query can include Boolean operators (AND) and search field tags ([lang]) to help focus our search, just as we can in the web version of PubMed.

Exercise 2: esearch

How many articles were written by BH Smith between 2012 and 2017, inclusive?

Solutions:

esearch -db pubmed -query "smith bh[author]" -datetype PDAT -mindate 2012 -maxdate 2017

There are multiple possible solutions to this exercise. This solution uses the esearch command to search PubMed (-db pubmed) for our search query (-query "smith bh[author]"). Note that the search query can include search field tags ([author]) to help focus our search, just as we can in the web version of PubMed. The esearch command also limits the search results by publication date (-datetype PDAT), including only articles published between 2012 and 2017 (-mindate 2012 -maxdate 2017).

esearch -db pubmed -query "smith bh[author] AND (2012/01/01[pdat] : 2017/12/31[pdat])"

The second solution is largely the same as the first. Rather than use the -datetype, -mindate, and -maxdate arguments to limit the search by publication date, this solution incorporates the date restriction into the search string itself (-query "smith bh[author] AND (2012/01/01[pdat] : 2017/12/31[pdat])"), just as you would include a date restriction in a search string in the web version of PubMed.

Exercise 3: efetch

Who is the first author listed on the PubMed record 26287646?

Solution:

efetch -db pubmed -id 26287646 -format abstract

This line of code uses the efetch command to retrieve a record from PubMed (-db pubmed). We specify that we will retrieve the record for PMID 25359968 (-id 25359968). The command retrieves the record in the text abstract format (-format abstract), which allows us to easily see that the first author of the article is PF Brennan. Rather than using the abstract format, we could instead use -format medline or -format xml to retrieve the record in the MEDLINE or XML formats, if we prefer.

Exercise 4: Combining Commands

How do we get a list of PMIDs for all of the articles written by BH Smith between 2012 and 2017?

Solutions:

esearch -db pubmed -query "smith bh[author]" -datetype PDAT -mindate 2012 -maxdate 2017 | \
efetch -format uid

This solution begins the same as the first solution for Exercise 2. The first line concludes by piping (|) the results of the esearch command into a command on the next line (the “\” character at the end of the line allows us to continue our command on the next line, for easier-to-read formatting).

The efetch command in the second line accepts the PMIDs piped from the previous line, and retrieves the PubMed records, but outputs only the PMIDs (-format uid).

esearch -db pubmed -query "smith bh[author] AND (2012/01/01[pdat] : 2017/12/31[pdat]" | \
efetch -format uid

Similarly, this solution begins the same as the second solution for Exercise 2, and then pipes the results of the esearch into the efetch, which retrieves the PubMed records, but outputs only the PMIDs (-format uid).

For more information about piping data from one EDirect command to another, please review the page on Making data pipelines with the History server in our EDirect overview.

Homework solutions

Question 1

Using EDirect, write a command to find out how many citations are in PubMed for articles about using melatonin to treat sleep disorders.

Solution:

esearch -db pubmed -query "melatonin sleep disorder"

This line of code uses the esearch command to search PubMed (-db pubmed) for our search query (-query "melatonin sleep disorder").

Question 2

How many of the PubMed citations identified in question 2 were added to PubMed (i.e. created) between January 1, 2015 and July 1, 2017?

Solution:

There are multiple possible solutions to this question.

esearch -db pubmed -query "melatonin sleep disorder" -datetype CRDT -mindate 2015/01/01 -maxdate 2017/07/01

Both of these solutions use the esearch command to search PubMed (-db pubmed) for our search query (-query "melatonin sleep disorder"). In the first solution, the esearch command also limits the search results by the date citations were added to PubMed, using the “CRDT” date type (-datetype CRDT), including only articles created between January 1, 2015 and July 1, 2017 (-mindate 2015/01/01 -maxdate 2017/07/01).

esearch -db pubmed -query "melatonin sleep disorder"  AND (2015/01/01[crdt] : 2017/07/01[crdt])"

The second solution is largely the same as the first. Rather than use the -datetype, -mindate, and -maxdate arguments to limit the search by create date, this solution incorporates the date restriction into the search string itself (-query "melatonin sleep disorder" AND (2015/01/01[crdt] : 2017/07/01[crdt])"), just as you would include a date restriction in a search string in the web version of PubMed.

Question 3

Write a command to retreive the abstracts of the following PubMed records:

27240713,27027883,22468771,20121990

Solution:

efetch -db pubmed -id 27240713,27027883,22468771,20121990 -format abstract

This line of code uses the efetch command to retrieve records from PubMed (-db pubmed). We specify that we will retrieve the records for four PMIDs: 27240713, 27027883, 22468771, and 20121990 (-id 27240713,27027883,22468771,20121990). The command retrieves the records in the text abstract format (-format abstract).

Question 4

Modify your answer to Question 3 to retrieve the full XML of all four records.

Solution:

efetch -db pubmed -id 27240713,27027883,22468771,20121990 -format xml

This solution is largely the same as the solution for Question 3, but the -format argument has been changed to retrieve XML instead of the text Abstract format (-format xml).

Question 5

Write a series of commands that retrieves a list of PMIDs for all citations for papers written by the author with the ORCID 0000-0002-1141-6306.

Solution:

esearch -db pubmed -query "0000-0002-1141-6306[auid]" | \
efetch -format uid

This solution begins by using the esearch command to search PubMed (-db pubmed) for citations including an author identifier of “0000-0002-1141-6306” (-query "0000-0002-1141-6306[auid]"). The first line concludes by piping (|) the results of the esearch command into a command on the next line (the “\” character at the end of the line allows us to continue our command on the next line, for easier-to-read formatting).

The efetch command in the second line accepts the PMIDs piped from the previous line, and retrieves the PubMed records, but outputs only the PMIDs (-format uid).