"EDirect for PubMed: Part 1: Getting PubMed Data" Sample Code
As new Insider's Guide classes are no longer being offered, this site is not currently being updated. Please refer to NCBI's E-utilities documentation for more up-to-date information.
Below you will find sample code for the examples, in-class exercises and homework presented in the first session of the “EDirect for PubMed” Insider’s Guide class. These examples are written for use with EDirect in a Unix environment. If you need help installing and setting up EDirect, please see our “Installing EDirect” page.
For more examples, please see the sample code from the other parts of “EDirect for PubMed”:
- Part 2: Extracting Data from XML
- Part 3: Formatting Results and Unix Tools
- Part 4: xtract Conditional Arguments
- Part 5: Developing and Building Scripts
The code below is lightly annotated to explain how it works, but if you are looking for more information, we suggest you review our EDirect documentation.
There are many different ways to answer the questions discussed in class. The sample code below provides some options, but by no means the only options. Feel free to modify, adapt, edit, re-use or completely discard any of the suggestions below when trying to find a solution that works best for you.
esearch
Conduct a simple search of PubMed for articles on seasonal affective disorder
esearch -db pubmed -query "seasonal affective disorder"
This line of code uses the esearch
command to search PubMed (-db pubmed
) for our search query (-query "seasonal affective disorder"
).
If you want to see the query translation for your search (like you would see in the Search Details box with the web version of PubMed), you can add an additional argument to your command:
esearch -db pubmed -query "seasonal affective disorder" -log
By adding the -log
argument to esearch
, the command will also output the E-utilities URL and query translation for your search.
Conduct a simple search of PubMed for articles on malaria in the journal JAMA
esearch -db pubmed -query "malaria AND jama[journal]"
This line of code uses the esearch
command to search PubMed (-db pubmed
) for our search query (-query "malaria AND jama[journal]"
). Note that the search query can include Boolean operators (AND
) and search field tags ([journal]
) to help focus our search, just as we can in the web version of PubMed.
Restrict search results by publication date
esearch -db pubmed -query "malaria AND jama[journal]" \
-datetype PDAT -mindate 2015 -maxdate 2017
The first line of code is the same as our previous example, though the “\” character at the end of the line allows us to continue our command on the next line, for easier-to-read formatting.
The second line limits the search results by publication date (-datetype PDAT
), including only articles published between 2015 and 2017 (-mindate 2015 -maxdate 2017
).
Conduct a PubMed search with a search string that includes quotation marks
esearch -db pubmed -query "cancer AND \"science\"[journal]"
This line of code uses the esearch
command to search PubMed (-db pubmed
) for our search query (-query "cancer AND \"science\"[journal]"
). We need to “escape” the double quotation marks (“) in our search query by putting a ”\" before them. This tells EDirect to interpret the quotation marks as just another character, and not a special character that marks the end of the -query
argument. Otherwise, EDirect would interpret the double quotation marks before the first date as marking the end of the search query, and the rest of the query would not be searched.
efetch
Retrieve a single PubMed record in text abstract format
efetch -db pubmed -id 25359968 -format abstract
This line of code uses the efetch
command to retrieve a record from PubMed (-db pubmed
). We specify that we will retrieve the record for PMID 25359968 (-id 25359968
) and that we want the results in the text abstract format (-format abstract
).
Retrieve multiple PubMed records in text abstract format
efetch -db pubmed -id 24102982,21171099,17150207 -format abstract
This line of code uses the efetch
command to retrieve records from PubMed (-db pubmed
). We specify that we will retrieve the records for PMID 24102982,21171099,17150207 (-id 24102982,21171099,17150207
) and that we want the results in the text abstract format (-format abstract
).
Creating a data pipeline
Conduct a PubMed search and retrieve the results as a list of PMIDs
esearch -db pubmed -query "asthenopia[mh] AND nursing[sh]" | efetch -format uid
This line of code uses the esearch
command to search PubMed (-db pubmed
) for our search query (-query "asthenopia[mh] AND nursing[sh]"
), and then pipes the resulting PMIDs into an efetch
command (| efetch
), which retrieves the PubMed records, but outputs only the PMIDs (-format uid
). For more information about piping data from one EDirect command to another, please review the page on Making data pipelines with the History server in our EDirect overview.
In-class exercise solutions
Exercise 1: esearch
How many Spanish-language articles about diabetes are in PubMed?
Solution:
esearch -db pubmed -query "diabetes AND spanish[lang]"
This line of code uses the esearch
command to search PubMed (-db pubmed
) for our search query (-query "diabetes AND spanish[lang]"
). Note that the search query can include Boolean operators (AND
) and search field tags ([lang]
) to help focus our search, just as we can in the web version of PubMed.
Exercise 2: esearch
How many articles were written by BH Smith between 2012 and 2017, inclusive?
Solutions:
esearch -db pubmed -query "smith bh[author]" -datetype PDAT -mindate 2012 -maxdate 2017
There are multiple possible solutions to this exercise. This solution uses the esearch
command to search PubMed (-db pubmed
) for our search query (-query "smith bh[author]"
). Note that the search query can include search field tags ([author]
) to help focus our search, just as we can in the web version of PubMed. The esearch
command also limits the search results by publication date (-datetype PDAT
), including only articles published between 2012 and 2017 (-mindate 2012 -maxdate 2017
).
esearch -db pubmed -query "smith bh[author] AND (2012/01/01[pdat] : 2017/12/31[pdat])"
The second solution is largely the same as the first. Rather than use the -datetype
, -mindate
, and -maxdate
arguments to limit the search by publication date, this solution incorporates the date restriction into the search string itself (-query "smith bh[author] AND (2012/01/01[pdat] : 2017/12/31[pdat])"
), just as you would include a date restriction in a search string in the web version of PubMed.
Exercise 3: efetch
Who is the first author listed on the PubMed record 26287646?
Solution:
efetch -db pubmed -id 26287646 -format abstract
This line of code uses the efetch
command to retrieve a record from PubMed (-db pubmed
). We specify that we will retrieve the record for PMID 25359968 (-id 25359968
). The command retrieves the record in the text abstract format (-format abstract
), which allows us to easily see that the first author of the article is PF Brennan. Rather than using the abstract format, we could instead use -format medline
or -format xml
to retrieve the record in the MEDLINE or XML formats, if we prefer.
Exercise 4: Combining Commands
How do we get a list of PMIDs for all of the articles written by BH Smith between 2012 and 2017?
Solutions:
esearch -db pubmed -query "smith bh[author]" -datetype PDAT -mindate 2012 -maxdate 2017 | \
efetch -format uid
This solution begins the same as the first solution for Exercise 2. The first line concludes by piping (|
) the results of the esearch
command into a command on the next line (the “\” character at the end of the line allows us to continue our command on the next line, for easier-to-read formatting).
The efetch
command in the second line accepts the PMIDs piped from the previous line, and retrieves the PubMed records, but outputs only the PMIDs (-format uid
).
esearch -db pubmed -query "smith bh[author] AND (2012/01/01[pdat] : 2017/12/31[pdat])" | \
efetch -format uid
Similarly, this solution begins the same as the second solution for Exercise 2, and then pipes the results of the esearch
into the efetch
, which retrieves the PubMed records, but outputs only the PMIDs (-format uid
).
For more information about piping data from one EDirect command to another, please review the page on Making data pipelines with the History server in our EDirect overview.
Homework solutions
Question 1
Using EDirect, write a command to find out how many citations are in PubMed for articles about using melatonin to treat sleep disorders.
Solution:
esearch -db pubmed -query "melatonin sleep disorder"
This line of code uses the esearch
command to search PubMed (-db pubmed
) for our search query (-query "melatonin sleep disorder"
).
Question 2
How many of the PubMed citations identified in question 2 were added to PubMed (i.e. created) between January 1, 2015 and July 1, 2017?
Solution:
There are multiple possible solutions to this question.
esearch -db pubmed -query "melatonin sleep disorder" -datetype CRDT -mindate 2015/01/01 -maxdate 2017/07/01
Both of these solutions use the esearch
command to search PubMed (-db pubmed
) for our search query (-query "melatonin sleep disorder"
). In the first solution, the esearch
command also limits the search results by the date citations were added to PubMed, using the “CRDT” date type (-datetype CRDT
), including only articles created between January 1, 2015 and July 1, 2017 (-mindate 2015/01/01 -maxdate 2017/07/01
).
esearch -db pubmed -query "melatonin sleep disorder AND (2015/01/01[crdt] : 2017/07/01[crdt])"
The second solution is largely the same as the first. Rather than use the -datetype
, -mindate
, and -maxdate
arguments to limit the search by create date, this solution incorporates the date restriction into the search string itself (-query "melatonin sleep disorder AND (2015/01/01[crdt] : 2017/07/01[crdt])"
), just as you would include a date restriction in a search string in the web version of PubMed.
Question 3
Write a command to retreive the abstracts of the following PubMed records:
27240713,27027883,22468771,20121990
Solution:
efetch -db pubmed -id 27240713,27027883,22468771,20121990 -format abstract
This line of code uses the efetch
command to retrieve records from PubMed (-db pubmed
). We specify that we will retrieve the records for four PMIDs: 27240713, 27027883, 22468771, and 20121990 (-id 27240713,27027883,22468771,20121990
). The command retrieves the records in the text abstract format (-format abstract
).
Question 4
Modify your answer to Question 3 to retrieve the full XML of all four records.
Solution:
efetch -db pubmed -id 27240713,27027883,22468771,20121990 -format xml
This solution is largely the same as the solution for Question 3, but the -format
argument has been changed to retrieve XML instead of the text Abstract format (-format xml
).
Question 5
Write a series of commands that retrieves a list of PMIDs for all citations for papers written by the author with the ORCID 0000-0002-1141-6306.
Solution:
esearch -db pubmed -query "0000-0002-1141-6306[auid]" | \
efetch -format uid
This solution begins by using the esearch
command to search PubMed (-db pubmed
) for citations including an author identifier of “0000-0002-1141-6306” (-query "0000-0002-1141-6306[auid]"
). The first line concludes by piping (|
) the results of the esearch
command into a command on the next line (the “\” character at the end of the line allows us to continue our command on the next line, for easier-to-read formatting).
The efetch
command in the second line accepts the PMIDs piped from the previous line, and retrieves the PubMed records, but outputs only the PMIDs (-format uid
).