xtract: Creating rows and columns


This documentation reflects EDirect version 7.60, released on 11/14/2017.

We strive to keep this documentation up-to-date with the latest release. If you are looking for documentation on a more recent version of EDirect, or to find out more about new EDirect releases, please see the Release Notes of NCBI's EDirect documentation.


The xtract command is designed to output data in tabular format, based on your custom specifications. At the most basic level, when using xtract, you need to specify when to start a new row, how many columns to include, and what data should be included in each column.

Creating rows

The -pattern argument specifies when to start a new row. You provide -pattern with the name of an XML element (e.g. -pattern PubmedArticle). The xtract command scans through the XML input from the beginning. When it encounters an occurrence of the element specified in the -pattern argument, xtract will start a new row of output. All of the data included in this output row will come from descendants (child elements, children of children, etc.) of the element specified in the -pattern argument. When xtract reaches the end of the element, it ends the row of output and continues scanning for the next occurrence of the -pattern.

Creating a column

To create a column that contains data from an XML element or attribute, specify it in the -element argument (see the xtract overview page for information about specifying elements and attributes).

Once xtract has encountered an occurrence of -pattern (see above), it will scan within the -pattern, looking for every instance of the element or attribute specified in the -element argument. xtract will populate the column with each instance of the element or attribute encountered.

For example, the command:

xtract -pattern PubmedArticle -element ArticleTitle

will create an output table with a new row for each PubMed record (-pattern PubmedArticle) in the XML input. The table will have a single column, which will contain each record’s article title (-element ArticleTitle):

ArticleTitle1
ArticleTitle2
ArticleTitle3
[...]

A column may contain multiple values if the element or attribute specified in the -element field is repeated within the -pattern. For example, the command:

xtract -pattern PubmedArticle -element Author/LastName

will create an output table with a new row for each PubMed record (-pattern PubmedArticle) in the XML input. The table will again have a single column, but the column will contain the last name of each author on the record (-element Author/LastName; for more information on Parent/Child construction, see the overview page):

Author/LastName1.1 Author/LastName1.2 Author/LastName1.3
Author/LastName2.1 Author/LastName2.2
Author/LastName3.1
Author/LastName4.1 Author/LastName4.2 Author/LastName4.3 Author/LastName4.4
[...]

Creating multiple columns

You can create multiple columns with a single -element argument. To create multiple columns, type -element followed by multiple elements or attributes, separated by spaces:

xtract -pattern PubmedArticle -element Volume Year

Once xtract has encountered an occurrence of -pattern (see above), it will scan within the -pattern, looking for every instance of the first element or attribute specified in the -element argument. xtract will populate the first column with each instance of the element or attribute encountered.

When xtract reaches the end of the -pattern, it goes back to the beginning of the -pattern and begins looking for every instance of the second element or attribute specified in the -element argument. xtract will start a new column, populated with each instance of the element or attribute encountered.

By default, the columns are separated by a tab (denoted in Unix as “\t”), but this separator can be adjusted by using the -tab argument (see Formatting arguments for more information).

This process repeats, creating new columns for each -element until all of the elements specified in -element have been returned, at which point the row is ended and xtract begins looking for the next -pattern.

For example, the command:

xtract -pattern PubmedArticle -element MedlineCitation/PMID Journal/ISOAbbreviation ArticleTitle

will create an output table with a new row for each PubMed record (-pattern PubmedArticle) in the XML input. The table will have three columns: one for the record’s PMID (-element MedlineCitation/PMID), one for the title abbreviation of the record’s journal (Journal/ISOAbbreviation), and one for the record’s article title (ArticleTitle):

PMID1   ISOAbbreviation1  ArticleTitle1
PMID2   ISOAbbreviation2  ArticleTitle2
PMID3   ISOAbbreviation3  ArticleTitle3
[...]

Putting multiple elements or attributes in a single column

You can group multiple elements or attributes together in the same output column. This can be useful when grouping together subsections of an XML document using the Exploration arguments. To put multiple elements or attributes in the same column, specify multiple elements or attributes, separated by a comma instead of a space. While the columns will still be separated by a tab (unless the default is changed by the -tab argument), multiple elements or attributes within the same column will be separated by the separator defined in the -sep argument. For more information on -tab and -sep, see Formatting arguments.

Creating advanced xtract tables

Note that the explanation above describes the xtract process for simple tables. Some advanced xtract arguments like -block, -if and -unless alter this process. For more about these advanced arguments, see Exploration Arguments and Conditional Arguments.