Making data pipelines with the History server

One of the advantages of using EDirect to work with E-utilities in a Unix environment is Unix’s built in ability to combine commands together, taking the output of a command and using it as the input for a different command. As we previously mentioned, Unix accomplishes this using the “|” character, which allows you to “pipe” the output of one command into another command to be used as input.

EDirect commands can be combined together in this way, using “|”. For example:

esearch -db pubmed -query "seasonal affective disorder" | efetch -format xml

At first glance, this line of code is simple. We execute an esearch command to search a database (PubMed) for a query (“seasonal affective disorder”). The results of the query (a list of PMIDs that match the search criteria) are then piped into the efetch command, which retrieves full XML records for each PMID.

In reality, however, things are a little more complicated. The PMIDs from the esearch are not being piped directly into the efetch. Both esearch and efetch, as well as several other EDirect commands, make use of the E-utilities History server.

This History server keeps track of your previous queries, just like the History function in the PubMed Advanced Search Builder. Rather than outputting a list of PMIDs, esearch saves that list of PMIDs on the History server, and outputs two pieces of information which let you retrieve that list later: a Web Environment string (which identifies your specific history, as opposed to another user’s history), and a Query Key (identifying which specific set of results you would like to retrieve).

When you pipe the output of an esearch to an efetch, you are actually piping the Web Environment string and Query Key from the esearch to the efetch. The efetch then uses that information to retrieve the correct list of PMIDs from this History server, and uses that list of PMIDs as input.

A diagram showing an esearch-efetch pipeline, using the History server: The esearch command pipes stores a DB identifier and a list of PMIDs on the history server, and pipes the Web Environment string and Query Key to an efetch command; the efetch command uses this information to retrieve the correct DB identifier and PMIDs from the history server

In most cases, you can ignore the History server, and think of “|” as sending PMIDs from one command to the next. However, the History server makes it possible to create some longer and more powerful data pipelines, so it is important to understand what is at work.