This documentation reflects EDirect version 9.00, released on 6/6/2018.
We strive to keep this documentation up-to-date with the latest release. If you are looking for documentation on a more recent version of EDirect, or to find out more about new EDirect releases, please see the Release Notes of NCBI's EDirect documentation.
xtract command lets you choose specific data from an XML file, extracts that data, and displays the data in a tabular format. The
xtract command is incredibly customizable, and allows you to choose which data is output, how it is arranged, and how the table is formatted.
xtract is so flexible, it can take some time to understand all of the options available and how they can best be used.
xtract command accepts as input structured XML data from a variety of sources. The XML data can be:
- piped in from
efetch -format xmlcommand.
- piped in from a
cat file.xmlcommand (where
file.xmlis any XML file).
- or, supplied via the
xtract -input file.xml
(Note that the
-input argument is a more recent addition to
xtract. It was added in EDirect version 5.00, in September 2016.)
Data extracted from an XML file, arranged in a tabular format.
In order to use
xtract effectively, it is helpful to have a basic understanding with structured XML data. For a brief overview on XML, you may want to visit W3Schools’ XML Tutorial.
xtract command has several arguments that identify particular portions of an XML document.
xtract uses these arguments to select and arrange data for output.
To specify most XML elements, simply provide the name of the element. Unix and EDirect are case-sensitive, so be sure to check spelling and capitalization. The command below specifies the XML element
<Author> in the
-pattern argument, and the XML element
<LastName> in the
xtract -pattern Author -element LastName
To specify an attribute of an XML element, provide the name of the element, followed by “@”, followed by the name of the attribute. The command below specifies the XML element
<PubmedArticle> in the
-pattern argument, and the attribute “Status” of the XML element
<MedlineCitation> in the
xtract -pattern PubmedArticle -element MedlineCitation@Status
In some circumstances, an XML document may have multiple elements with the same name located in different parts of the XML hierarchy. To specify an XML element in a specific location in the document, you can use a slash (/) to indicate Parent/Child construction: provide the name of the parent element, followed by “/”, followed by the name of the child element. The command below specifies the XML element
<PubmedArticle> in the
-pattern argument, and the element
<Year> which is a child of the element
xtract -pattern PubmedArticle -element PubDate/Year
xtract with PubMed XML, it is important to be familiar with the structure and contents of PubMed data. For more information on PubMed XML, please see MEDLINE/PubMed XML Data Elements.