Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

The Insider's Guide to Accessing NLM Data

xtract: Storing and retrieving information with variables


As new Insider's Guide classes are no longer being offered, this site is not currently being updated. Please refer to NCBI's E-utilities documentation for more up-to-date information.


When using the xtract command, you can only use the -element argument to display data that is available in the -pattern or -block you are currently exploring. For example, consider the following xtract command:

xtract -pattern MeshHeading -element LastName Initials

This command will not output anything, as the %lt;MeshHeading> element does not contain descendant <LastName> or <Initials> elements. The xtract command cannot find a <LastName> or <Initials> within the -pattern MeshHeading, so provides no output. Similarly, consider the xtract command:

xtract -pattern PubmedArticle -block Author -element PMID LastName Initials

This command will output a new row for each PubMed record, and will output a list of the authors’ last names and initials for each record, but will not output the PMID, because <PMID> is not a descendant element of <Author>. While exploring a -block Author, the -element argument can only display elements and attributes that are contained within that -block.

However, by storing a value to a variable outside of a -block, then recalling that value from the variable inside the -block, you can gain even greater flexibility of output.

Declaring variables

An xtract variable name can be any combination of digits and capital letters. To store a value to a variable, rather than using an -element argument, use the variable’s name preceded by a dash:

xtract -pattern PubmedArticle -VAR1 MedlineCitation/PMID

The above command will store the contents of the <PMID> element (which is the direct child of the <MedlineCitation> element) into the new variable “VAR1”.

You can store multiple elements into the same variable, for convenient retrieval later:

xtract -pattern PubmedArticle -sep "/" -DATE PubDate/Year,PubDate/Month

The above command will store both the <Year> and <Month> child elements of <PubDate> into the variable “DATE”, separated by a “/” (thanks to the -sep "/" argument).

You can store strings of characters into variables instead of elements or attributes:

xtract -pattern PubmedArticle -LABEL1 "Author: "

The above command will store the string “Author: ” into the variable “LABEL1”. This technique can be used to have additional control over output formatting (see below for an example).

Retrieving data from a variable

To display the contents of an xtract variable with an -element argument, use the variable’s name preceded by an ampersand, enclosed in quotes:

xtract -pattern PubmedArticle -VAR1 MedlineCitation/PMID \
-block Author -element LastName Initials "&VAR1"

The above command will create a new row for each PubMed record, storing the PMID into the variable “VAR1”. The command then uses -block Author to loop through each <Author> element on the record. For each <Author>, the command will output the author’s last name and initials, followed by the contents of the variable “VAR1” (which, in this case, is the PMID for the record in question).

Example with variables

The best way to understand the value of variables is to look at an EDirect script that makes effective use of them.

The following code is designed to output affiliation data for an author, to help analyze the different ways an author’s affiliation is represented in PubMed. This could be useful for author disambiguation (especially in the case of authors with common names), or could provide data to analyze an author’s output as a function of the institution with which they are affiliated.

esearch -db pubmed -query "smith bh[Author]" \
-datetype PDAT -mindate 2014 -maxdate 2017 | \
efetch -format xml | \
xtract -pattern PubmedArticle -VAR1 MedlineCitation/PMID \
-block Author -if LastName -equals Smith \
-and Initials -equals BH -and Affiliation \
-element "&VAR1" Affiliation

The above code searches for all of the author BH Smith’s articles in a given date range and outputs the PMID and the affiliation data listed for BH Smith on each record. Affiliation data for all authors other than BH Smith is suppressed, as are all records where BH Smith is an author, but has no listed affiliation data.

esearch -db pubmed -query "smith bh[Author]" \
-datetype PDAT -mindate 2014 -maxdate 2017 | \

The first two lines of the above code finds records for articles authored by BH Smith and published between 2014 and 2017.

efetch -format xml | \

The third line retrieves all matching records in PubMed XML.

xtract -pattern PubmedArticle -VAR1 MedlineCitation/PMID \

The remaining lines of code, starting with the fourth, form an xtract command which extracts specific data from the PubMed XML retrieved on the previous line. The fourth line uses -pattern PubmedArticle to create a new row for each PubMed record, but does not immediately output any data. Instead, it saves the PMID for each record to a variable “VAR1” (-VAR1 MedlineCitation/PMID).

-block Author -if LastName -equals Smith \
-and Initials -equals BH -and Affiliation \

The next two lines use -block Author to check through each author on the record, but to only display information for authors with a last name of “Smith” (-if LastName -equals Smith), with initials of “BH” (-and Initials -equals BH) and with an <Affiliation> element (-and Affiliation). If any of these conditions are not true (e.g. if the author’s name doesn’t match, or the author does not have an <Affiliation> element), the author will be skipped. Based on our search, each record should have at least one author whose name is BH Smith, but not all of those authors will have affiliation data (as <Affiliation> is an optional element in PubMed XML). These conditions will limit our output, selecting out only the author BH Smith from each record, and only if that author has affiliation data listed.

-element "&VAR1" Affiliation

The last line indicates what data should be output for each row. In this case, we will output the PMID of the record, and the affiliation for the author BH Smith on that record. In the event that BH Smith has no affiliation data on a given record, the conditions in our previous lines will prevent anything from being output for that line.

Our end result will be a list of citations where BH Smith was an author, and where BH Smith’s affiliation data was listed. For each citation, the PMID and BH Smith’s affiliation data will be listed.

Last Reviewed: July 30, 2021