xtract


This documentation reflects EDirect version 7.60, released on 11/14/2017.

We strive to keep this documentation up-to-date with the latest release. If you are looking for documentation on a more recent version of EDirect, or to find out more about new EDirect releases, please see the Release Notes of NCBI's EDirect documentation.


The xtract command lets you choose specific data from an XML file, extracts that data, and displays the data in a tabular format. The xtract command is incredibly customizable, and allows you to choose which data is output, how it is arranged, and how the table is formatted.

Because xtract is so flexible, it can take some time to understand all of the options available and how they can best be used.

Input

The xtract command accepts as input structured XML data from a variety of sources. The XML data can be:

  • piped in from efetch -format xml command.
  • piped in from a cat file.xml command (where file.xml is any XML file).
  • or, supplied via the xtract argument -input:
xtract -input file.xml

(Note that the -input argument is a more recent addition to xtract. It was added in EDirect version 5.00, in September 2016.)

Output

Data extracted from an XML file, arranged in a tabular format.

Working with XML

In order to use xtract effectively, it is helpful to have a basic understanding with structured XML data. For a brief overview on XML, you may want to visit W3Schools’ XML Tutorial.

The xtract command has several arguments that identify particular portions of an XML document. xtract uses these arguments to select and arrange data for output.

To specify most XML elements, simply provide the name of the element. Unix and EDirect are case-sensitive, so be sure to check spelling and capitalization. The command below specifies the XML element <Author> in the -pattern argument, and the XML element <LastName> in the -element argument.

xtract -pattern Author -element LastName

To specify an attribute of an XML element, provide the name of the element, followed by “@”, followed by the name of the attribute. The command below specifies the XML element <PubmedArticle> in the -pattern argument, and the attribute “Status” of the XML element <MedlineCitation> in the -element argument.

xtract -pattern PubmedArticle -element MedlineCitation@Status

In some circumstances, an XML document may have multiple elements with the same name located in different parts of the XML hierarchy. To specify an XML element in a specific location in the document, you can use a slash (/) to indicate Parent/Child construction: provide the name of the parent element, followed by “/”, followed by the name of the child element. The command below specifies the XML element <PubmedArticle> in the -pattern argument, and the element <Year> which is a child of the element <PubDate>.

xtract -pattern PubmedArticle -element PubDate/Year

Working with PubMed XML

When using xtract with PubMed XML, it is important to be familiar with the structure and contents of PubMed data. For more information on PubMed XML, please see MEDLINE/PubMed XML Data Elements.