Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

The Insider's Guide to Accessing NLM Data

xtract: Exploration arguments


As new Insider's Guide classes are no longer being offered, this site is not currently being updated. Please refer to NCBI's E-utilities documentation for more up-to-date information.


When using xtract, there may be times when you want to group multiple elements together in your output. This can be especially useful to link multiple child elements of the same parent together. The xtract command includes a series of arguments that help with this. These arguments, including -group, -block, and -subset, are referred to as Exploration arguments, because they help you explore subsections of your XML document.

When to use Exploration arguments

For example, if you are trying to write an xtract command that outputs article PMIDs and author names (including last names and initials), with each row representing a different PubMed article, and with a “|” separating the columns, you might use the following command:

xtract -pattern PubmedArticle -tab "|" -sep "|" -element MedlineCitation/PMID LastName Initials

This command would work if each article only had one author. However, if an article has more than one author, the output may not be what you expect:

PMID1|LastName1.1|LastName1.2|LastName1.3|Initials1.1|Initials1.2|Initials1.3
PMID2|LastName2.1|LastName2.2|Initials2.1|Initials2.2
[...]

Because -element creates a column populated with each instance of the element or attribute in the -pattern, xtract will create two columns: one with every <LastName> element in the -pattern, and one with every <Initials> element in the -pattern. If you wanted to group together each individual <LastName> element with the individual <Initials> element that shares a parent <Author> element, you could use an Exploration argument.

How to use Exploration arguments

Continuing the previous example, we could modify our command to connect each individual <LastName> element with its corresponding <Initials> element by using the -block argument:

xtract -pattern PubmedArticle -tab "|" -sep "|" -element MedlineCitation/PMID -block Author -element LastName Initials

As with most xtract commands, this command scans through the XML input from the beginning. When it encounters an occurrence of the element specified in the -pattern argument, xtract will start a new row of output. The command will then scan through the -pattern until it encounters the first occurrence of the XML element specified in the -block argument (in this case, <Author>).

The command will then scan through the first instance of the Author -block, looking for every instance of the first element or attribute specified in the -element argument. xtract will populate the first column with each instance of the element or attribute encountered within that first Author -block.

When xtract reaches the end of the first instance of the Author -block, it goes back to the beginning of that first Author -block and begins looking for every instance of the second element or attribute specified in the -element argument (if there is one). This process repeats, creating new columns for each -element until all of the elements specified in -element have been returned.

The command will then look for the next occurrence of the element specified in the -block argument. If another occurrence of the -block element is found, the command repeats the above process, retrieving occurrences of each element within the second -block, before moving on to look for a new -block.

The result of this process is an output that resembles the following:

PMID1|LastName1.1|Initials1.1|LastName1.2|Initials1.2|LastName1.3|Initials1.3
PMID2|LastName2.1|Initials2.1|LastName2.2|Initials2.2

To make this command even more effective, you can use multiple elements or attributes in a single column by using a comma. By modifying the command slightly:

xtract -pattern PubmedArticle -tab "|" -sep " " -element MedlineCitation/PMID -block Author -element LastName,Initials

the output will change to:

PMID1|LastName1.1 Initials1.1|LastName1.2 Initials1.2|LastName1.3 Initials1.3
PMID2|LastName2.1 Initials2.1|LastName2.2 Initials2.2

It is important to note that, when using an -element argument inside a -block (as demonstrated above), only elements and attributes that appear within that -block element can be retrieved. For example, if you used -block Author, you could not use -element PMID within that -block, as the <Author> element does not contain any descendant elements named <PMID>. If you need information to be output within a -block that cannot be found in that -block in the input data, you will need to store that information into a variable.

The Exploration hierarchy

All of the previous examples use only the argument -block. However, for advanced uses, there is a multi-level hierarchy of Exploration arguments, allowing you to explore sections, subsections, and sub-subsections of an XML document in the same xtract command. Technically, -pattern is an Exploration argument, as it subdivides the input XML into smaller sections, and connects the elements within each section (by placing them in the same row). From largest to smallest, the Exploration arguments are

-pattern
    -division
        -group
            -branch
                -block
                    -section
                        -subset
                            -unit

For most use cases, only -pattern and -block will be necessary. To explore deeply-nested XML, you may wish to use -group and -subset as well. However, the remaining Exploration arguments are available for unusual cases. For more about using Exploration arguments, please visit NCBI’s EDirect documentation page, “Entrez Direct: E-utilities on the UNIX Command Line”.

Last Reviewed: July 30, 2021