Making data pipelines with the History server
One of the advantages of using EDirect to work with E-utilities in a Unix environment is Unix’s built in ability to combine commands together, taking the output of a command and using it as the input for a different command. As we previously mentioned, Unix accomplishes this using the “|” character, which allows you to “pipe” the output of one command into another command to be used as input.
EDirect commands can be combined together in this way, using “|”. For example:
esearch -db pubmed -query "seasonal affective disorder" | efetch -format xml
At first glance, this line of code is simple. We execute an
esearch command to search a database (PubMed) for a query (“seasonal affective disorder”). The results of the query (a list of PMIDs that match the search criteria) are then piped into the
efetch command, which retrieves full XML records for each PMID.
In reality, however, things are a little more complicated. The PMIDs from the
esearch are not being piped directly into the
efetch, as well as several other EDirect commands, make use of the E-utilities History server.
This History server keeps track of your previous queries, just like the History function in the PubMed Advanced Search Builder. Rather than outputting a list of PMIDs,
esearch saves that list of PMIDs on the History server, and outputs two pieces of information which let you retrieve that list later: a Web Environment string (which identifies your specific history, as opposed to another user’s history), and a Query Key (identifying which specific set of results you would like to retrieve).
When you pipe the output of an
esearch to an
efetch, you are actually piping the Web Environment string and Query Key from the
esearch to the
efetch then uses that information to retrieve the correct list of PMIDs from this History server, and uses that list of PMIDs as input.
In most cases, you can ignore the History server, and think of “|” as sending PMIDs from one command to the next. However, the History server makes it possible to create some longer and more powerful data pipelines, so it is important to understand what is at work.