1 00:00:03,533 --> 00:00:08,233 Welcome back to EDirect for PubMed. Today is Part Four: xtract Conditional Arguments. 2 00:00:08,233 --> 00:00:14,433 My name is Mike Davidson. I'm a librarian here at the National Library of Medicine, and I'm here with Rex Robison 3 00:00:14,433 --> 00:00:24,766 and also with Sarah Helson, who presented part two, last week at this time. As you know, this is 4 00:00:24,766 --> 00:00:26,666 part four out of five parts, so 5 00:00:26,666 --> 00:00:27,966 we're over halfway through. 6 00:00:27,966 --> 00:00:29,166 In the first session if you 7 00:00:29,166 --> 00:00:31,366 recall last Monday, Pete talked 8 00:00:31,366 --> 00:00:33,766 to you about getting PubMed data 9 00:00:33,766 --> 00:00:35,866 getting Esearch and Efetch. 10 00:00:35,866 --> 00:00:38,699 In the second session Sarah 11 00:00:38,700 --> 00:00:41,300 shows you you how to extract and 12 00:00:41,300 --> 00:00:44,200 arraign believe them in a custom 13 00:00:44,200 --> 00:00:46,100 tab format. 14 00:00:46,100 --> 00:00:48,200 Last time Kate told you about 15 00:00:48,200 --> 00:00:50,000 more extract including how to 16 00:00:50,000 --> 00:00:52,500 customize separators and group 17 00:00:52,500 --> 00:00:53,900 together using the block 18 00:00:53,900 --> 00:00:55,600 argument and she also covered 19 00:00:55,600 --> 00:00:58,300 some Unix tools for working with 20 00:00:58,300 --> 00:00:58,600 files. 21 00:00:58,600 --> 00:01:01,400 Today we're going to further our 22 00:01:01,400 --> 00:01:02,600 discussion which will help you 23 00:01:02,600 --> 00:01:05,500 refine our output to get 24 00:01:05,500 --> 00:01:07,700 precisely the data that we want. 25 00:01:07,700 --> 00:01:09,100 We're going to start today with 26 00:01:09,100 --> 00:01:11,433 a quick recap of last time of 27 00:01:11,433 --> 00:01:13,033 part three, then we're going to 28 00:01:13,033 --> 00:01:14,933 really dive into these extract 29 00:01:14,933 --> 00:01:16,233 conditional arguments starting 30 00:01:16,233 --> 00:01:17,833 with the if argument. 31 00:01:17,833 --> 00:01:19,433 We'll build on that, talk about 32 00:01:19,433 --> 00:01:23,133 how these multiple conditions at 33 00:01:23,133 --> 00:01:25,966 the time using and and or. 34 00:01:25,966 --> 00:01:27,466 We'll finish up with the 35 00:01:27,466 --> 00:01:29,166 discussion which lets you limit 36 00:01:29,166 --> 00:01:33,066 the location of the XML record 37 00:01:33,066 --> 00:01:34,766 and one other argument that lets 38 00:01:34,766 --> 00:01:37,366 you define place holders for 39 00:01:37,366 --> 00:01:38,366 blank cells. 40 00:01:38,366 --> 00:01:39,866 But first let's do a quick 41 00:01:39,866 --> 00:01:41,066 refresher. 42 00:01:41,066 --> 00:01:42,766 Last session Kate talked about 43 00:01:42,766 --> 00:01:44,399 different ways to group and 44 00:01:44,400 --> 00:01:47,000 format your output including the 45 00:01:47,000 --> 00:01:48,600 tab argument which lets you 46 00:01:48,600 --> 00:01:49,800 separate between come he means 47 00:01:49,800 --> 00:01:53,600 and the SEPargument which lets 48 00:01:53,600 --> 00:01:56,500 you separate by multiple values 49 00:01:56,500 --> 00:01:57,800 in the same column. 50 00:01:57,800 --> 00:01:59,900 She also told you about block 51 00:01:59,900 --> 00:02:02,900 which is used to collect and 52 00:02:02,900 --> 00:02:03,300 group. 53 00:02:03,300 --> 00:02:05,100 She used that with author data 54 00:02:05,100 --> 00:02:08,100 to group the author's last name 55 00:02:08,100 --> 00:02:10,000 with the corresponding inissues. 56 00:02:10,000 --> 00:02:13,700 She also showed you a couple 57 00:02:13,700 --> 00:02:17,033 Unix tools, working with files, 58 00:02:17,033 --> 00:02:19,633 from a scripts into a file. 59 00:02:19,633 --> 00:02:21,533 She showed you cat which she 60 00:02:21,533 --> 00:02:23,666 used to read the contents of a 61 00:02:23,666 --> 00:02:25,566 file which we can always use to 62 00:02:25,566 --> 00:02:27,966 pull into a search stripping 63 00:02:27,966 --> 00:02:30,166 into a search query from a text 64 00:02:30,166 --> 00:02:32,366 file and she also showed you 65 00:02:32,366 --> 00:02:36,266 Epost to the history server so 66 00:02:36,266 --> 00:02:38,066 we could retrieve those records 67 00:02:38,066 --> 00:02:39,466 with Efetch. 68 00:02:39,466 --> 00:02:39,766 All right. 69 00:02:39,766 --> 00:02:42,066 Before we get into the new 70 00:02:42,066 --> 00:02:43,466 stuff, does anybody have any 71 00:02:43,466 --> 00:02:45,166 questions from last class or the 72 00:02:45,166 --> 00:02:47,266 homework, anything they would 73 00:02:47,266 --> 00:02:49,566 like a refresher on or to go 74 00:02:49,566 --> 00:02:50,499 over again? 75 00:02:50,500 --> 00:02:52,200 If so, just go ahead and drop 76 00:02:52,200 --> 00:02:54,400 that in the chat box right now. 77 00:02:54,400 --> 00:02:55,900 There was one topic we left 78 00:02:55,900 --> 00:02:58,000 hanging last time that I wanted 79 00:02:58,000 --> 00:02:59,100 to touch on briefly. 80 00:02:59,100 --> 00:03:00,000 There was a question that came 81 00:03:00,000 --> 00:03:01,400 up in the chat about the 82 00:03:01,400 --> 00:03:03,900 difference between the Medline 83 00:03:03,900 --> 00:03:06,100 element and ITso element. 84 00:03:06,100 --> 00:03:07,800 There's not a lot of difference. 85 00:03:07,800 --> 00:03:09,400 The reason why there are two of 86 00:03:09,400 --> 00:03:11,300 them is because, as you may be 87 00:03:11,300 --> 00:03:13,800 aware, PubMed and the data base 88 00:03:13,800 --> 00:03:16,800 behind it, the data base that 89 00:03:16,800 --> 00:03:18,900 fed into it, Medline, are very 90 00:03:18,900 --> 00:03:21,100 old and have a lot of history 91 00:03:21,100 --> 00:03:23,366 and a lot of changes over time 92 00:03:23,366 --> 00:03:24,066 which means sometimes we have 93 00:03:24,066 --> 00:03:25,166 some things that are still in 94 00:03:25,166 --> 00:03:26,966 there for historical reasons, 95 00:03:26,966 --> 00:03:28,266 for one reason or another, that 96 00:03:28,266 --> 00:03:30,166 are not necessarily being 97 00:03:30,166 --> 00:03:31,666 actively used, and because of 98 00:03:31,666 --> 00:03:33,966 that we have some sort of 99 00:03:33,966 --> 00:03:36,766 idiosyncrasies in the data. 100 00:03:36,766 --> 00:03:38,166 We're actually going to talk 101 00:03:38,166 --> 00:03:40,566 more about that on Monday. 102 00:03:40,566 --> 00:03:43,466 For abbreviation, both should be 103 00:03:43,466 --> 00:03:49,666 on every PubMed record. 104 00:03:49,666 --> 00:03:52,066 I believe has punctuation like 105 00:03:52,066 --> 00:03:56,566 periods where as Medline Ta does 106 00:03:56,566 --> 00:03:59,399 not for some records and for 107 00:03:59,400 --> 00:04:00,800 some journals. 108 00:04:00,800 --> 00:04:03,400 If you're using -- if you are 109 00:04:03,400 --> 00:04:04,300 interested continue interaction 110 00:04:04,300 --> 00:04:06,500 with the gene data base, the 111 00:04:06,500 --> 00:04:10,500 gene data base has the Iso 112 00:04:10,500 --> 00:04:13,600 element but not Medline Ta. 113 00:04:13,600 --> 00:04:16,100 I use Iso abbreviation because I 114 00:04:16,100 --> 00:04:18,200 find it easier to find visually 115 00:04:18,200 --> 00:04:20,200 within the records so that's the 116 00:04:20,200 --> 00:04:22,100 one that I see first so that's 117 00:04:22,100 --> 00:04:23,333 the one that I use. 118 00:04:23,333 --> 00:04:25,333 Again, it's up to you. 119 00:04:25,333 --> 00:04:26,633 There might be different reasons 120 00:04:26,633 --> 00:04:28,333 you might want to use one over 121 00:04:28,333 --> 00:04:28,933 the other. 122 00:04:28,933 --> 00:04:30,266 Any other questions that we 123 00:04:30,266 --> 00:04:34,866 didn't get to last time that 124 00:04:34,866 --> 00:04:36,266 anybody wanted to address. 125 00:04:36,266 --> 00:04:37,966 Looking in the chat. 126 00:04:37,966 --> 00:04:39,666 Looks all clear right now. 127 00:04:39,666 --> 00:04:41,166 We will address them as they 128 00:04:41,166 --> 00:04:44,566 come up. 129 00:04:44,566 --> 00:04:46,766 Let's move on to the new stuff. 130 00:04:46,766 --> 00:04:49,166 So you've seen this slide a 131 00:04:49,166 --> 00:04:50,066 couple of times. 132 00:04:50,066 --> 00:04:51,566 We just want to remind you that 133 00:04:51,566 --> 00:04:52,866 the general theme for this 134 00:04:52,866 --> 00:04:54,466 class, we keep saying that 135 00:04:54,466 --> 00:04:57,066 EDirect is a tool to help you 136 00:04:57,066 --> 00:04:58,666 get the data you need and only 137 00:04:58,666 --> 00:05:02,066 the data you need in the exact 138 00:05:02,066 --> 00:05:03,599 format that you need it in. 139 00:05:03,600 --> 00:05:05,400 EDirect helps you do that a 140 00:05:05,400 --> 00:05:06,800 couple of different ways. 141 00:05:06,800 --> 00:05:09,600 Esearch and Efetch handle the 142 00:05:09,600 --> 00:05:11,400 get you the data you need part. 143 00:05:11,400 --> 00:05:13,600 A lot of what Kate talked about 144 00:05:13,600 --> 00:05:18,200 last time with tab and SEPget 145 00:05:18,200 --> 00:05:20,400 you the data in the format you 146 00:05:20,400 --> 00:05:21,300 need it in. 147 00:05:21,300 --> 00:05:22,400 In order to filter the results 148 00:05:22,400 --> 00:05:24,100 to get just the data you need, 149 00:05:24,100 --> 00:05:25,900 you'll need conditional 150 00:05:25,900 --> 00:05:27,300 arguments. 151 00:05:27,300 --> 00:05:29,000 And the first one and the one 152 00:05:29,000 --> 00:05:31,500 that you'll be using probably 153 00:05:31,500 --> 00:05:33,600 most frequently is if. 154 00:05:33,600 --> 00:05:36,833 The if argument lets you limit 155 00:05:36,833 --> 00:05:39,233 the data included in the table 156 00:05:39,233 --> 00:05:41,533 that extract creates and the 157 00:05:41,533 --> 00:05:44,733 output depending on different 158 00:05:44,733 --> 00:05:45,133 conditions. 159 00:05:45,133 --> 00:05:46,633 Data will only be includeed if 160 00:05:46,633 --> 00:05:47,933 it matches the condition you 161 00:05:47,933 --> 00:05:48,633 specify. 162 00:05:48,633 --> 00:05:50,533 For those of you who have done 163 00:05:50,533 --> 00:05:52,733 even a little bit of computer 164 00:05:52,733 --> 00:05:53,733 programming. 165 00:05:53,733 --> 00:05:54,833 You'll be familiar with the 166 00:05:54,833 --> 00:05:56,833 basic concept of if-then 167 00:05:56,833 --> 00:05:57,133 statements. 168 00:05:57,133 --> 00:05:59,433 If it's been a while, I can give 169 00:05:59,433 --> 00:06:02,433 you a refresher which the basic 170 00:06:02,433 --> 00:06:05,233 idea is if a condition is true, 171 00:06:05,233 --> 00:06:07,833 then follow a specified set of 172 00:06:07,833 --> 00:06:09,133 instructions. 173 00:06:09,133 --> 00:06:11,766 For extract's if argument we're 174 00:06:11,766 --> 00:06:14,466 following that same structure. 175 00:06:14,466 --> 00:06:15,966 If a condition is met for a 176 00:06:15,966 --> 00:06:17,866 pattern, then create a new row 177 00:06:17,866 --> 00:06:19,966 for that pattern and populate 178 00:06:19,966 --> 00:06:22,666 the specified columns in that 179 00:06:22,666 --> 00:06:23,166 row. 180 00:06:23,166 --> 00:06:24,699 If the condition isn't met, 181 00:06:24,700 --> 00:06:26,000 extract will just skip that 182 00:06:26,000 --> 00:06:28,500 pattern and move on to the next 183 00:06:28,500 --> 00:06:29,000 one. 184 00:06:29,000 --> 00:06:31,300 So let me walk you through an 185 00:06:31,300 --> 00:06:33,600 example of when and why you 186 00:06:33,600 --> 00:06:35,300 might want to do this. 187 00:06:35,300 --> 00:06:37,800 Here's sort of a case study and 188 00:06:37,800 --> 00:06:39,500 actually a couple of you were 189 00:06:39,500 --> 00:06:45,933 asked about. 190 00:06:45,933 --> 00:06:48,333 And we want the names and IDs 191 00:06:48,333 --> 00:06:50,033 for each author. 192 00:06:50,033 --> 00:06:52,033 This Efetch that's here that's 193 00:06:52,033 --> 00:06:54,233 also on your handout and I think 194 00:06:54,233 --> 00:06:56,333 are going to put in the chat pox 195 00:06:56,333 --> 00:06:59,133 as well, we'll use this one as a 196 00:06:59,133 --> 00:07:00,933 sample because I made sure that 197 00:07:00,933 --> 00:07:02,833 some of these records had some 198 00:07:02,833 --> 00:07:04,933 orchids on them so we'll have 199 00:07:04,933 --> 00:07:06,733 some data to work with. 200 00:07:06,733 --> 00:07:08,733 The first thing we need to do, 201 00:07:08,733 --> 00:07:10,833 as always, is look at the PubMed 202 00:07:10,833 --> 00:07:14,433 at XML to figure out where to 203 00:07:14,433 --> 00:07:16,966 get that ID. 204 00:07:16,966 --> 00:07:18,866 I'm going to hop out. 205 00:07:18,866 --> 00:07:20,666 Get to PubMed. 206 00:07:20,666 --> 00:07:24,299 I'm just going to run a quick 207 00:07:24,300 --> 00:07:27,400 search for those Pmids who are 208 00:07:27,400 --> 00:07:30,900 the same for the Efetch. 209 00:07:30,900 --> 00:07:33,400 I will go to the format menu and 210 00:07:33,400 --> 00:07:37,300 go to XML, in case you don't 211 00:07:37,300 --> 00:07:39,200 know how to find the stuff. 212 00:07:39,200 --> 00:07:43,200 I could use the control-Ffind 213 00:07:43,200 --> 00:07:44,500 to search through this record to 214 00:07:44,500 --> 00:07:46,900 find it but I actually see it 215 00:07:46,900 --> 00:07:49,100 right here, orchid. 216 00:07:49,100 --> 00:07:50,833 It's in the identifier element 217 00:07:50,833 --> 00:07:51,733 right here. 218 00:07:51,733 --> 00:07:52,833 There we go. 219 00:07:52,833 --> 00:07:54,433 And I also happen to know that 220 00:07:54,433 --> 00:07:56,233 the only information in the 221 00:07:56,233 --> 00:07:58,333 identifier element anywhere in 222 00:07:58,333 --> 00:08:01,933 PubMed is orchids, that's the 223 00:08:01,933 --> 00:08:04,233 only way that's in there. 224 00:08:04,233 --> 00:08:06,033 If there's an identifier on the 225 00:08:06,033 --> 00:08:07,933 record, we know that it has the 226 00:08:07,933 --> 00:08:09,033 data that we want. 227 00:08:09,033 --> 00:08:10,533 Starting from there, let's start 228 00:08:10,533 --> 00:08:13,433 with a rough draft of building 229 00:08:13,433 --> 00:08:17,833 this command, this extract, and 230 00:08:17,833 --> 00:08:21,333 hop over and I'm going to copy 231 00:08:21,333 --> 00:08:25,299 and past my Efetch with that 232 00:08:25,300 --> 00:08:27,400 right in there. 233 00:08:27,400 --> 00:08:30,500 My back splash and I have back 234 00:08:30,500 --> 00:08:32,600 slash enter and I'm going to 235 00:08:32,600 --> 00:08:34,500 tart building my extract. 236 00:08:34,500 --> 00:08:36,500 And remember we wanted a list of 237 00:08:36,500 --> 00:08:40,500 authors and orchid, not PubMed 238 00:08:40,500 --> 00:08:40,900 records. 239 00:08:40,900 --> 00:08:42,800 So we want to make sure that our 240 00:08:42,800 --> 00:08:44,900 pattern, not PubMed article like 241 00:08:44,900 --> 00:08:45,900 it often is. 242 00:08:45,900 --> 00:08:48,600 So we're going to have one a 243 00:08:48,600 --> 00:08:50,600 they are per row instead of one 244 00:08:50,600 --> 00:08:55,400 record per row. 245 00:08:55,400 --> 00:08:57,233 So let's do that. 246 00:08:57,233 --> 00:08:58,933 And now we can talk about what 247 00:08:58,933 --> 00:09:00,833 columns we want using the 248 00:09:00,833 --> 00:09:01,733 element argument. 249 00:09:01,733 --> 00:09:03,433 I'm going to do last name and 250 00:09:03,433 --> 00:09:04,733 initiallies. 251 00:09:04,733 --> 00:09:07,033 I'm going to group those in the 252 00:09:07,033 --> 00:09:09,433 same column with my comma. 253 00:09:09,433 --> 00:09:11,733 In my next column I'll have the 254 00:09:11,733 --> 00:09:13,633 identifier and let me go back 255 00:09:13,633 --> 00:09:17,833 and add in my SEPargument so 256 00:09:17,833 --> 00:09:20,433 using that SEPspace with the 257 00:09:20,433 --> 00:09:22,533 comma make sure we get in that 258 00:09:22,533 --> 00:09:24,266 last column. 259 00:09:24,266 --> 00:09:26,166 I'm just going to do this first 260 00:09:26,166 --> 00:09:27,966 and see what we get with it, 261 00:09:27,966 --> 00:09:30,066 sort of a rough drafts. 262 00:09:30,066 --> 00:09:46,399 So we're on the right track. 263 00:09:46,400 --> 00:09:48,400 We just wanted a list of authors 264 00:09:48,400 --> 00:09:51,300 with orchid IDs and that's 265 00:09:51,300 --> 00:09:52,900 where this conditional stuff is 266 00:09:52,900 --> 00:09:53,900 going to come in. 267 00:09:53,900 --> 00:09:56,800 So I'm going to clear my screen 268 00:09:56,800 --> 00:09:58,800 and take that same block of code 269 00:09:58,800 --> 00:10:01,500 we just used, throw that back in 270 00:10:01,500 --> 00:10:04,433 there and I'm going to add an if 271 00:10:04,433 --> 00:10:06,533 argument right after my pattern. 272 00:10:06,533 --> 00:10:08,633 It's going to be if identifier, 273 00:10:08,633 --> 00:10:11,833 and, as I said, if tells us to 274 00:10:11,833 --> 00:10:13,433 keep any pattern if the 275 00:10:13,433 --> 00:10:16,133 condition we define is met. 276 00:10:16,133 --> 00:10:18,233 And, in this case, by using if 277 00:10:18,233 --> 00:10:20,733 identifier, the condition is the 278 00:10:20,733 --> 00:10:22,233 existence of an identifier 279 00:10:22,233 --> 00:10:24,566 element inside this author. 280 00:10:24,566 --> 00:10:27,266 So going back to our -- let me 281 00:10:27,266 --> 00:10:29,366 execute this first and you'll 282 00:10:29,366 --> 00:10:30,966 see that we get the results that 283 00:10:30,966 --> 00:10:32,066 we wanted. 284 00:10:32,066 --> 00:10:33,866 We get only a list of authors 285 00:10:33,866 --> 00:10:34,966 that have orchids. 286 00:10:34,966 --> 00:10:37,399 We get last name and initials 287 00:10:37,400 --> 00:10:39,400 and we get the orchid ID. 288 00:10:39,400 --> 00:10:41,000 We exclude anything that doesn't 289 00:10:41,000 --> 00:10:42,200 have and identifier. 290 00:10:42,200 --> 00:10:44,500 So now let's go back to our 291 00:10:44,500 --> 00:10:49,400 slides, go back to our if-then 292 00:10:49,400 --> 00:10:50,700 structure that we talked about 293 00:10:50,700 --> 00:10:51,200 before. 294 00:10:51,200 --> 00:10:53,700 If the pattern has an identifier 295 00:10:53,700 --> 00:10:56,300 element, then create a new row 296 00:10:56,300 --> 00:10:58,700 for the pattern, two columns of 297 00:10:58,700 --> 00:10:59,200 data. 298 00:10:59,200 --> 00:11:02,900 Last name and initials in one 299 00:11:02,900 --> 00:11:05,700 column, last name in the other. 300 00:11:05,700 --> 00:11:10,000 If it doesn't have a identifier 301 00:11:10,000 --> 00:11:12,933 element, it will skip it 302 00:11:12,933 --> 00:11:13,733 entirely it. 303 00:11:13,733 --> 00:11:15,633 We're going to go back to that 304 00:11:15,633 --> 00:11:17,533 triple box thing that he we had 305 00:11:17,533 --> 00:11:19,233 on Monday that Kate showed you 306 00:11:19,233 --> 00:11:20,033 with block. 307 00:11:20,033 --> 00:11:23,166 To refresh your memory, on the 308 00:11:23,166 --> 00:11:25,266 left is the input XML. 309 00:11:25,266 --> 00:11:28,766 It's just a dummy but it will be 310 00:11:28,766 --> 00:11:30,866 our input command. 311 00:11:30,866 --> 00:11:32,666 On the right is our output 312 00:11:32,666 --> 00:11:34,766 command and on the bottom of the 313 00:11:34,766 --> 00:11:36,066 actual command itself. 314 00:11:36,066 --> 00:11:38,166 As always, extract looks for the 315 00:11:38,166 --> 00:11:41,066 first occurrence of our pattern, 316 00:11:41,066 --> 00:11:42,666 author, which is right up here 317 00:11:42,666 --> 00:11:43,366 at the top. 318 00:11:43,366 --> 00:11:44,899 But rather than creating a new 319 00:11:44,900 --> 00:11:46,900 row right away, first we check 320 00:11:46,900 --> 00:11:48,700 to see whether the author has 321 00:11:48,700 --> 00:11:52,300 and identifier element, whether 322 00:11:52,300 --> 00:11:54,500 the pattern meets our condition. 323 00:11:54,500 --> 00:11:55,600 This one does. 324 00:11:55,600 --> 00:11:57,300 So in this case we'll create a 325 00:11:57,300 --> 00:11:58,900 new row with the last name, 326 00:11:58,900 --> 00:12:02,800 initials and identifier. 327 00:12:02,800 --> 00:12:04,700 Then we move on to the next 328 00:12:04,700 --> 00:12:06,600 author, the next pattern and 329 00:12:06,600 --> 00:12:09,000 we'll keep to see if this one 330 00:12:09,000 --> 00:12:10,100 has and identifier. 331 00:12:10,100 --> 00:12:11,800 This one doesn't. 332 00:12:11,800 --> 00:12:13,600 We're not going to create a new 333 00:12:13,600 --> 00:12:14,600 row for that. 334 00:12:14,600 --> 00:12:16,800 We skip that. 335 00:12:16,800 --> 00:12:18,633 We move on the third one. 336 00:12:18,633 --> 00:12:20,433 We create another row and last 337 00:12:20,433 --> 00:12:22,633 name initials, identifier. 338 00:12:22,633 --> 00:12:24,166 It's a great way of making sure 339 00:12:24,166 --> 00:12:25,866 you get only the data that you 340 00:12:25,866 --> 00:12:27,566 want in your output. 341 00:12:27,566 --> 00:12:29,166 Any questions about this basic 342 00:12:29,166 --> 00:12:30,266 concept of if? 343 00:12:30,266 --> 00:12:31,866 I want to make sure that 344 00:12:31,866 --> 00:12:33,066 everybody's with me right now 345 00:12:33,066 --> 00:12:35,066 pause we're going to do a lot of 346 00:12:35,066 --> 00:12:36,866 variations on this over the 347 00:12:36,866 --> 00:12:38,866 course of today, so hopefully 348 00:12:38,866 --> 00:12:40,666 you'll all come along with us if 349 00:12:40,666 --> 00:12:43,166 we're all good on this basic 350 00:12:43,166 --> 00:12:43,866 concept. 351 00:12:43,866 --> 00:12:45,066 If you have any questions, go 352 00:12:45,066 --> 00:12:46,766 ahead and put them in the chat 353 00:12:46,766 --> 00:12:47,366 box. 354 00:12:47,366 --> 00:12:48,866 Looking clear at the moment. 355 00:12:48,866 --> 00:12:51,399 So we are going to move on and 356 00:12:51,400 --> 00:12:55,800 take a look at this exercise. 357 00:12:55,800 --> 00:12:57,300 So for this exercise I want you 358 00:12:57,300 --> 00:12:59,900 to write an extract command that 359 00:12:59,900 --> 00:13:01,700 only includes PubMed records if 360 00:13:01,700 --> 00:13:03,100 they have mesh headings 361 00:13:03,100 --> 00:13:05,700 attached, and we want one row 362 00:13:05,700 --> 00:13:08,400 per PubMed record, and each row 363 00:13:08,400 --> 00:13:11,100 should have the PMID and 364 00:13:11,100 --> 00:13:12,900 citation status. 365 00:13:12,900 --> 00:13:15,400 These are two pieces of data you 366 00:13:15,400 --> 00:13:16,400 looked at before. 367 00:13:16,400 --> 00:13:18,600 You look back over some of your 368 00:13:18,600 --> 00:13:20,800 old handouts, you might find 369 00:13:20,800 --> 00:13:21,300 them. 370 00:13:21,300 --> 00:13:28,466 Efetch, that's on the slide and 371 00:13:28,466 --> 00:13:30,066 we'll get you some good data to 372 00:13:30,066 --> 00:13:30,666 work with. 373 00:13:30,666 --> 00:13:32,466 You can also get that in the 374 00:13:32,466 --> 00:13:33,366 handout. 375 00:13:33,366 --> 00:13:35,166 At the bottom are all of the 376 00:13:35,166 --> 00:13:36,866 answers to all of the exercises. 377 00:13:36,866 --> 00:13:38,766 If you get stuck, feel free to 378 00:13:38,766 --> 00:13:40,366 look at those. 379 00:13:40,366 --> 00:13:42,166 There will also be hints in the 380 00:13:42,166 --> 00:13:42,766 chat box. 381 00:13:42,766 --> 00:13:44,866 Either way, if you get it by 382 00:13:44,866 --> 00:13:45,966 looking at the answer at the 383 00:13:45,966 --> 00:13:47,466 bottom or if you actually 384 00:13:47,466 --> 00:13:49,366 figured it out on your own go 385 00:13:49,366 --> 00:13:51,266 ahead and give us a green check 386 00:13:51,266 --> 00:13:53,766 mark P when you're all set. 387 00:13:53,766 --> 00:13:55,366 I will give you a few minutes to 388 00:13:55,366 --> 00:13:57,166 work on this and let's see how 389 00:13:57,166 --> 00:13:58,866 you do. 390 00:13:58,866 --> 00:14:03,932 let's go through this one 391 00:14:03,933 --> 00:14:04,333 together. 392 00:14:04,333 --> 00:14:05,933 Looks like some folks have it, 393 00:14:05,933 --> 00:14:07,633 some are still working on it. 394 00:14:07,633 --> 00:14:08,433 That's okay. 395 00:14:08,433 --> 00:14:10,533 We'll walk through it together. 396 00:14:10,533 --> 00:14:12,033 You guys are too smart. 397 00:14:12,033 --> 00:14:13,833 You're jumping ahead it things 398 00:14:13,833 --> 00:14:15,733 and figuring out stuff that we 399 00:14:15,733 --> 00:14:16,933 haven't even talked about. 400 00:14:16,933 --> 00:14:17,733 So stand by. 401 00:14:17,733 --> 00:14:19,533 We'll get there and I will 402 00:14:19,533 --> 00:14:20,933 address those questions in a 403 00:14:20,933 --> 00:14:21,233 second. 404 00:14:21,233 --> 00:14:24,533 But first let's go back, clear 405 00:14:24,533 --> 00:14:31,866 my screen here. 406 00:14:31,866 --> 00:14:33,466 I'm going to go ahead and grab 407 00:14:33,466 --> 00:14:56,366 my Efetch so we start off, we 408 00:14:56,366 --> 00:14:58,999 wanted one row for PubMed 409 00:14:59,000 --> 00:14:59,700 record. 410 00:14:59,700 --> 00:15:01,533 We're going to use PubMed 411 00:15:01,533 --> 00:15:05,433 articles, then we wanted two 412 00:15:05,433 --> 00:15:09,733 columns should be the PMID which 413 00:15:09,733 --> 00:15:12,133 you might remember is using the 414 00:15:12,133 --> 00:15:18,033 parent-child construction. 415 00:15:18,033 --> 00:15:20,433 But it's an attribute. 416 00:15:20,433 --> 00:15:22,133 It's the status attribute. 417 00:15:22,133 --> 00:15:29,533 So I'm going to just run this, 418 00:15:29,533 --> 00:15:31,633 and I get five records which is 419 00:15:31,633 --> 00:15:32,333 good. 420 00:15:32,333 --> 00:15:34,966 So five rose and I get PMID and 421 00:15:34,966 --> 00:15:37,466 I get citation status. 422 00:15:37,466 --> 00:15:39,466 Nancy points out about the funny 423 00:15:39,466 --> 00:15:41,966 spacing in the columns is 424 00:15:41,966 --> 00:15:44,366 because one of the UIDs is a 425 00:15:44,366 --> 00:15:46,866 smaller number of digits. 426 00:15:46,866 --> 00:15:49,666 That is an example I did not 427 00:15:49,666 --> 00:15:51,566 change and it throws off the 428 00:15:51,566 --> 00:15:53,966 alignment a little bit. 429 00:15:53,966 --> 00:15:55,866 Now that we've done that part. 430 00:15:55,866 --> 00:15:57,766 Now let's go back and figure out 431 00:15:57,766 --> 00:16:01,399 how to exclude the rows that 432 00:16:01,400 --> 00:16:04,400 don't have mesh headings. 433 00:16:04,400 --> 00:16:06,300 So I'm going to go back to 434 00:16:06,300 --> 00:16:08,533 pattern because I want to 435 00:16:08,533 --> 00:16:11,633 include patterns, include 436 00:16:11,633 --> 00:16:16,033 records only if they have a mesh 437 00:16:16,033 --> 00:16:17,633 heading element. 438 00:16:17,633 --> 00:16:19,633 If you used descriptor name 439 00:16:19,633 --> 00:16:22,033 instead of mesh heading, that 440 00:16:22,033 --> 00:16:23,233 would also work. 441 00:16:23,233 --> 00:16:25,633 You'll definitely be seeing a 442 00:16:25,633 --> 00:16:27,133 lot more on Monday. 443 00:16:27,133 --> 00:16:29,433 There's often multiple right 444 00:16:29,433 --> 00:16:32,133 ways to do these things and 445 00:16:32,133 --> 00:16:34,233 there is a big example on that. 446 00:16:34,233 --> 00:16:37,133 When I execute this, I'm down to 447 00:16:37,133 --> 00:16:38,633 three rows. 448 00:16:38,633 --> 00:16:40,366 And you'll notice that the 449 00:16:40,366 --> 00:16:44,266 status, the citation status of 450 00:16:44,266 --> 00:16:45,966 all of these records is Medline, 451 00:16:45,966 --> 00:16:47,466 which you know about Medline and 452 00:16:47,466 --> 00:16:49,566 you know about PubMed makes a 453 00:16:49,566 --> 00:16:51,066 lot of sense because only 454 00:16:51,066 --> 00:16:52,566 Medline records have mesh 455 00:16:52,566 --> 00:16:55,066 headings, and this exercise was 456 00:16:55,066 --> 00:16:57,366 one way of limiting your results 457 00:16:57,366 --> 00:16:58,766 to only Medline records. 458 00:16:58,766 --> 00:17:00,599 There are plenty of other ways 459 00:17:00,600 --> 00:17:01,400 to do this. 460 00:17:01,400 --> 00:17:04,900 You can include a tagged mesh 461 00:17:04,900 --> 00:17:08,200 heading or add a restrictioned 462 00:17:08,200 --> 00:17:10,200 section to the Medline subject 463 00:17:10,200 --> 00:17:13,533 by adding it tagged to your 464 00:17:13,533 --> 00:17:14,533 query. 465 00:17:14,533 --> 00:17:16,333 And the future that I'm about to 466 00:17:16,333 --> 00:17:18,533 show you which is also going to 467 00:17:18,533 --> 00:17:20,933 answer Kate's question gives you 468 00:17:20,933 --> 00:17:22,633 yet another way to include only 469 00:17:22,633 --> 00:17:23,833 Medline records, if that's what 470 00:17:23,833 --> 00:17:25,433 you want to do. 471 00:17:25,433 --> 00:17:27,033 I thought it was a great idea, 472 00:17:27,033 --> 00:17:30,233 as those of you searching PubMed 473 00:17:30,233 --> 00:17:32,933 Medline know, if you're still 474 00:17:32,933 --> 00:17:35,033 working on exercise 1, I will 475 00:17:35,033 --> 00:17:37,433 ask you to pause for now and 476 00:17:37,433 --> 00:17:39,233 make sure you pay attention to 477 00:17:39,233 --> 00:17:41,333 this next part because we're 478 00:17:41,333 --> 00:17:42,833 really going to open up what we 479 00:17:42,833 --> 00:17:44,333 can do with if and with 480 00:17:44,333 --> 00:17:45,333 conditional arguments. 481 00:17:45,333 --> 00:17:47,066 Remember the answers are always 482 00:17:47,066 --> 00:17:48,866 at the bottom so you can go back 483 00:17:48,866 --> 00:17:50,366 and look at them later. 484 00:17:50,366 --> 00:17:51,566 We'll probably have a little 485 00:17:51,566 --> 00:17:53,966 time to talk about any extra 486 00:17:53,966 --> 00:17:55,266 questions at the end as well. 487 00:17:55,266 --> 00:17:56,766 Make sure I didn't miss any 488 00:17:56,766 --> 00:17:58,766 questions except for Kate's 489 00:17:58,766 --> 00:17:59,999 which I'm going to answer in 490 00:18:00,000 --> 00:18:01,500 just a second. 491 00:18:01,500 --> 00:18:03,300 And let's move on. 492 00:18:03,300 --> 00:18:04,200 Okay. 493 00:18:04,200 --> 00:18:07,000 So in that exercise our 494 00:18:07,000 --> 00:18:09,200 condition was the presence of an 495 00:18:09,200 --> 00:18:11,100 element in a pattern, so the 496 00:18:11,100 --> 00:18:13,300 presence of the mesh heading 497 00:18:13,300 --> 00:18:15,800 element. 498 00:18:15,800 --> 00:18:18,000 However, we can also include 499 00:18:18,000 --> 00:18:20,333 data based on the value of an 500 00:18:20,333 --> 00:18:22,033 element or attribute as opposed 501 00:18:22,033 --> 00:18:24,633 to just its simple presence. 502 00:18:24,633 --> 00:18:27,133 So let me switch over to the 503 00:18:27,133 --> 00:18:30,933 slides. 504 00:18:30,933 --> 00:18:35,233 There we go. 505 00:18:35,233 --> 00:18:37,433 Like before we used the if 506 00:18:37,433 --> 00:18:39,533 argument to specify and element. 507 00:18:39,533 --> 00:18:41,033 But now we follow that up with 508 00:18:41,033 --> 00:18:43,233 an equals argument to specify 509 00:18:43,233 --> 00:18:44,933 what the value of that element 510 00:18:44,933 --> 00:18:45,633 should be. 511 00:18:45,633 --> 00:18:47,833 So we will include the pattern 512 00:18:47,833 --> 00:18:52,233 if Iso abbreviation equals 513 00:18:52,233 --> 00:18:52,733 drama. 514 00:18:52,733 --> 00:18:54,766 Again, following our same 515 00:18:54,766 --> 00:18:56,966 if-then structure, if the 516 00:18:56,966 --> 00:19:00,499 element Iso abbreviation equals 517 00:19:00,500 --> 00:19:02,800 JAMA, then we will create a new 518 00:19:02,800 --> 00:19:04,700 row for the pattern with the two 519 00:19:04,700 --> 00:19:06,800 columns, volume and issue, and 520 00:19:06,800 --> 00:19:08,200 this also highlights another 521 00:19:08,200 --> 00:19:10,200 useful thing about if that you 522 00:19:10,200 --> 00:19:11,700 may have already figured out, 523 00:19:11,700 --> 00:19:13,800 but it doesn't hurt to reiterate 524 00:19:13,800 --> 00:19:14,100 it. 525 00:19:14,100 --> 00:19:16,300 The data in your columns doesn't 526 00:19:16,300 --> 00:19:18,900 have to be from the same 527 00:19:18,900 --> 00:19:20,800 elements used in your condition. 528 00:19:20,800 --> 00:19:23,400 So your if was Iso aabbreviation 529 00:19:23,400 --> 00:19:26,100 but output are two element 530 00:19:26,100 --> 00:19:26,600 different elements. 531 00:19:26,600 --> 00:19:29,033 If we want to use attributes in 532 00:19:29,033 --> 00:19:30,533 our condition instead of 533 00:19:30,533 --> 00:19:32,833 elements, we can do that instead 534 00:19:32,833 --> 00:19:34,533 of the same way we used 535 00:19:34,533 --> 00:19:35,933 attributes before. 536 00:19:35,933 --> 00:19:38,733 In this case Medline citation at 537 00:19:38,733 --> 00:19:40,833 status so our critical condition 538 00:19:40,833 --> 00:19:42,433 is going to be based on that 539 00:19:42,433 --> 00:19:45,433 status attribute of the Medline 540 00:19:45,433 --> 00:19:46,633 citation element. 541 00:19:46,633 --> 00:19:48,333 Going into a little bit more 542 00:19:48,333 --> 00:19:49,733 depth on that. 543 00:19:49,733 --> 00:19:51,933 If the attribute status from the 544 00:19:51,933 --> 00:19:54,933 element Medline citation equals 545 00:19:54,933 --> 00:19:57,033 Medline, then create a new row 546 00:19:57,033 --> 00:19:59,166 for the pattern with just the 547 00:19:59,166 --> 00:20:00,199 PMID. 548 00:20:00,200 --> 00:20:01,900 As I said, this is another way 549 00:20:01,900 --> 00:20:04,000 to restrict your results to only 550 00:20:04,000 --> 00:20:09,000 Medline records. 551 00:20:09,000 --> 00:20:10,700 There are a couple of other 552 00:20:10,700 --> 00:20:13,100 alternatives to equals. 553 00:20:13,100 --> 00:20:15,500 Equals works great if you know 554 00:20:15,500 --> 00:20:16,400 when the argument should be 555 00:20:16,400 --> 00:20:19,700 equal to, but there are other 556 00:20:19,700 --> 00:20:21,600 elements including partial 557 00:20:21,600 --> 00:20:24,100 matches. 558 00:20:24,100 --> 00:20:26,700 And I will show you an example 559 00:20:26,700 --> 00:20:29,400 now with contains instead using 560 00:20:29,400 --> 00:20:31,900 that if-then structure. 561 00:20:31,900 --> 00:20:33,400 A little bit different than our 562 00:20:33,400 --> 00:20:34,433 previous ones because our 563 00:20:34,433 --> 00:20:35,933 pattern is author again. 564 00:20:35,933 --> 00:20:38,033 We're talking about one row per 565 00:20:38,033 --> 00:20:40,733 author rather than one row per 566 00:20:40,733 --> 00:20:41,233 record. 567 00:20:41,233 --> 00:20:43,233 For this one, if the affiliation 568 00:20:43,233 --> 00:20:45,133 element in an author pattern 569 00:20:45,133 --> 00:20:47,833 contains the strain, then create 570 00:20:47,833 --> 00:20:50,033 a new row for that aller pattern 571 00:20:50,033 --> 00:20:51,933 with two columns. 572 00:20:51,933 --> 00:20:54,033 Last name initials and 573 00:20:54,033 --> 00:20:55,133 affiliation. 574 00:20:55,133 --> 00:20:56,433 As always, if the condition is 575 00:20:56,433 --> 00:20:58,733 not met, extract skips the 576 00:20:58,733 --> 00:21:00,566 pattern and moves on to the next 577 00:21:00,566 --> 00:21:01,266 author. 578 00:21:01,266 --> 00:21:02,666 Now this one is especially 579 00:21:02,666 --> 00:21:04,566 useful for the affiliation 580 00:21:04,566 --> 00:21:04,866 element. 581 00:21:04,866 --> 00:21:06,899 I know a lot of you were asking 582 00:21:06,900 --> 00:21:08,600 in your case study ideas about 583 00:21:08,600 --> 00:21:10,000 affiliation. 584 00:21:10,000 --> 00:21:11,500 It's especially useful for 585 00:21:11,500 --> 00:21:14,300 affiliation because affiliation 586 00:21:14,300 --> 00:21:15,700 data is up structured and not 587 00:21:15,700 --> 00:21:18,500 even from the same institution 588 00:21:18,500 --> 00:21:20,300 formats their data the same way, 589 00:21:20,300 --> 00:21:24,000 so you can capture variations to 590 00:21:24,000 --> 00:21:26,200 get a partial match. 591 00:21:26,200 --> 00:21:27,500 Are there any questions about 592 00:21:27,500 --> 00:21:29,500 equals, contains, any of the 593 00:21:29,500 --> 00:21:31,100 same stuff that we've talked 594 00:21:31,100 --> 00:21:32,200 about? 595 00:21:32,200 --> 00:21:34,400 Kate, did I get your question 596 00:21:34,400 --> 00:21:35,400 more or less? 597 00:21:35,400 --> 00:21:37,200 We'll get more variations on 598 00:21:37,200 --> 00:21:39,500 these in a second. 599 00:21:39,500 --> 00:21:44,733 It looks like Susan had a 600 00:21:44,733 --> 00:21:47,233 question, that unrecognized 601 00:21:47,233 --> 00:21:48,433 argument if. 602 00:21:48,433 --> 00:21:50,433 Susan, I'm going to handle you 603 00:21:50,433 --> 00:21:52,933 privately during this exercise 604 00:21:52,933 --> 00:21:56,133 because I think I know what your 605 00:21:56,133 --> 00:21:57,433 problem might be but it's a 606 00:21:57,433 --> 00:21:58,833 little more complicated and I 607 00:21:58,833 --> 00:22:02,133 don't want to bog everybody else 608 00:22:02,133 --> 00:22:04,133 done with it, so we'll get the 609 00:22:04,133 --> 00:22:05,533 second exercise started first 610 00:22:05,533 --> 00:22:07,333 and I'll address this issue. 611 00:22:07,333 --> 00:22:09,433 So the second exercise we again 612 00:22:09,433 --> 00:22:11,733 want to write an extract command 613 00:22:11,733 --> 00:22:13,033 that only includes PubMed 614 00:22:13,033 --> 00:22:14,766 records for articles published 615 00:22:14,766 --> 00:22:16,966 in one of the JAMA journals. 616 00:22:16,966 --> 00:22:19,666 This could be JAMA itself or can 617 00:22:19,666 --> 00:22:22,566 could be JAMA cardiology, JAMA 618 00:22:22,566 --> 00:22:25,866 oncology, JAMA dermatology, any 619 00:22:25,866 --> 00:22:28,466 of those, but we do know it's 620 00:22:28,466 --> 00:22:30,766 going to start with JAFa. 621 00:22:30,766 --> 00:22:32,966 We want one row for PubMed 622 00:22:32,966 --> 00:22:35,466 record and we want two columns, 623 00:22:35,466 --> 00:22:39,466 PMID and Iso which we talked 624 00:22:39,466 --> 00:22:41,166 about earlier. 625 00:22:41,166 --> 00:22:43,866 We can use the Efetch here. 626 00:22:43,866 --> 00:22:45,366 There might be a couple of 627 00:22:45,366 --> 00:22:48,299 different ways to do this one. 628 00:22:48,300 --> 00:22:52,100 Feel free to look at the bottom. 629 00:22:52,100 --> 00:22:55,800 If your answer works, then 630 00:22:55,800 --> 00:22:56,500 that's great. 631 00:22:56,500 --> 00:22:57,900 I'll give you a little bit of 632 00:22:57,900 --> 00:22:59,633 time to work on this, and then 633 00:22:59,633 --> 00:23:01,333 if you do have questions at any 634 00:23:01,333 --> 00:23:04,133 time, drop a check in the chat 635 00:23:04,133 --> 00:23:06,733 box, give me a green check when 636 00:23:06,733 --> 00:23:10,266 you're all set. 637 00:23:10,266 --> 00:23:14,399 Folks, let's go through this one 638 00:23:14,400 --> 00:23:14,800 together. 639 00:23:14,800 --> 00:23:16,800 Wow, everybody figured it out. 640 00:23:16,800 --> 00:23:17,633 That's great. 641 00:23:17,633 --> 00:23:19,433 Well, I'll go through it quick 642 00:23:19,433 --> 00:23:19,833 then. 643 00:23:19,833 --> 00:23:22,333 All right. 644 00:23:22,333 --> 00:23:23,733 Here, clear this out. 645 00:23:23,733 --> 00:23:25,533 And again I'm going to do that 646 00:23:25,533 --> 00:23:27,533 same sort of pattern I've done 647 00:23:27,533 --> 00:23:28,133 before. 648 00:23:28,133 --> 00:23:30,333 I'm going to take my Efetch 649 00:23:30,333 --> 00:23:32,233 first and build my extract 650 00:23:32,233 --> 00:23:33,133 without the condition. 651 00:23:33,133 --> 00:23:34,933 I'm just going to make sure I 652 00:23:34,933 --> 00:23:36,866 get all of the data first. 653 00:23:36,866 --> 00:23:39,066 So pattern going to be PubMed 654 00:23:39,066 --> 00:23:45,666 article and we wanted PMID and 655 00:23:45,666 --> 00:23:50,366 Iso abbreviation. 656 00:23:50,366 --> 00:23:53,099 Hopefully spelled right. 657 00:23:53,100 --> 00:23:55,200 When I execute that I should get 658 00:23:55,200 --> 00:23:58,000 one row per record and the two 659 00:23:58,000 --> 00:23:58,800 columns. 660 00:23:58,800 --> 00:24:01,100 PMID and the Iso abbreviation. 661 00:24:01,100 --> 00:24:03,300 So now we should be only getting 662 00:24:03,300 --> 00:24:05,000 two rows if we do our condition 663 00:24:05,000 --> 00:24:06,600 correctly because we only want 664 00:24:06,600 --> 00:24:07,600 those first two. 665 00:24:07,600 --> 00:24:09,200 I'm going to take again that 666 00:24:09,200 --> 00:24:11,100 same code block, and I'm going 667 00:24:11,100 --> 00:24:13,700 to go back here and put in my 668 00:24:13,700 --> 00:24:18,900 condition, if Iso abbreviation, 669 00:24:18,900 --> 00:24:21,400 and I can't do equals because we 670 00:24:21,400 --> 00:24:24,633 don't know that JAMA -- equals 671 00:24:24,633 --> 00:24:26,033 would only get this second row. 672 00:24:26,033 --> 00:24:29,933 We want both of those first two. 673 00:24:29,933 --> 00:24:31,833 I could do contains but I'm 674 00:24:31,833 --> 00:24:34,433 going to do starts with. 675 00:24:34,433 --> 00:24:36,466 Either would probably work fine. 676 00:24:36,466 --> 00:24:38,866 I do starts with, JAMA. 677 00:24:38,866 --> 00:24:40,866 Make sure I get my space in 678 00:24:40,866 --> 00:24:41,266 there. 679 00:24:41,266 --> 00:24:43,366 And when I execute that I'm down 680 00:24:43,366 --> 00:24:45,866 to just those two rose. 681 00:24:45,866 --> 00:24:46,266 Okay . 682 00:24:46,266 --> 00:24:48,266 Any questions about any of that? 683 00:24:48,266 --> 00:24:50,666 Clear the feedback. 684 00:24:50,666 --> 00:24:52,366 Okay on questions for now. 685 00:24:52,366 --> 00:24:53,866 That's good because we're going 686 00:24:53,866 --> 00:24:55,366 to move on to something that's a 687 00:24:55,366 --> 00:24:57,066 little bit trickier. 688 00:24:57,066 --> 00:24:58,399 If you are still working on 689 00:24:58,400 --> 00:25:01,100 that, put that one aside for now 690 00:25:01,100 --> 00:25:03,200 because so far all of our 691 00:25:03,200 --> 00:25:04,600 conditional examples have been 692 00:25:04,600 --> 00:25:06,600 in a pattern, and we're only 693 00:25:06,600 --> 00:25:09,000 including that pattern if the 694 00:25:09,000 --> 00:25:10,200 condition is met. 695 00:25:10,200 --> 00:25:12,400 How far, we can also put an 696 00:25:12,400 --> 00:25:14,900 ifinside a block, and if you 697 00:25:14,900 --> 00:25:17,000 remember back to last time, Kate 698 00:25:17,000 --> 00:25:18,700 showed you how to use block very 699 00:25:18,700 --> 00:25:20,300 effectively when we were dealing 700 00:25:20,300 --> 00:25:22,200 with authors to keep the 701 00:25:22,200 --> 00:25:23,900 corresponding last name and 702 00:25:23,900 --> 00:25:25,500 initiallies together. 703 00:25:25,500 --> 00:25:27,000 What extract did was look for 704 00:25:27,000 --> 00:25:28,800 the first occurrence of the 705 00:25:28,800 --> 00:25:30,933 block author and then moved on 706 00:25:30,933 --> 00:25:32,433 to the next block and then the 707 00:25:32,433 --> 00:25:34,133 next block and the next block 708 00:25:34,133 --> 00:25:36,266 until there were no more blocks 709 00:25:36,266 --> 00:25:37,366 in the pattern. 710 00:25:37,366 --> 00:25:40,266 How far, if we put an if inside 711 00:25:40,266 --> 00:25:42,766 the block, we can include only 712 00:25:42,766 --> 00:25:44,666 certain blocks based on if they 713 00:25:44,666 --> 00:25:45,866 need a condition. 714 00:25:45,866 --> 00:25:47,566 Instead of filtering out our 715 00:25:47,566 --> 00:25:51,266 entire patterns that doesn't 716 00:25:51,266 --> 00:25:53,266 meet our condition, we'll 717 00:25:53,266 --> 00:25:55,266 include the pattern. 718 00:25:55,266 --> 00:25:57,166 We're going to clear the clear 719 00:25:57,166 --> 00:25:59,766 and talk through this example. 720 00:25:59,766 --> 00:26:02,766 Let's say we want to see the 721 00:26:02,766 --> 00:26:04,199 Pmids for all of the records up 722 00:26:04,200 --> 00:26:07,900 our Efetch and also the DOI. 723 00:26:07,900 --> 00:26:10,500 The DOI, if there is one for a 724 00:26:10,500 --> 00:26:10,733 record. 725 00:26:10,733 --> 00:26:13,799 The DOIis in the article 726 00:26:13,800 --> 00:26:15,800 IDelement but we have to be 727 00:26:15,800 --> 00:26:18,300 careful because while it can 728 00:26:18,300 --> 00:26:23,600 include the DOAIfor a number of 729 00:26:23,600 --> 00:26:24,800 different I'd fires. 730 00:26:24,800 --> 00:26:26,400 Many have multiple article 731 00:26:26,400 --> 00:26:28,000 IDelements each with a 732 00:26:28,000 --> 00:26:29,500 different kind of identifier. 733 00:26:29,500 --> 00:26:31,700 What we need to look for is the 734 00:26:31,700 --> 00:26:34,600 IDtype attribute which tells us 735 00:26:34,600 --> 00:26:41,766 what kind of IDis in there. 736 00:26:41,766 --> 00:26:43,066 Let's get started. 737 00:26:43,066 --> 00:26:45,166 I'm just going to copy and paste 738 00:26:45,166 --> 00:26:47,466 the whole block in here and 739 00:26:47,466 --> 00:26:49,366 we'll talk through it. 740 00:26:49,366 --> 00:26:50,766 Start with our Efetch. 741 00:26:50,766 --> 00:26:52,866 And then if you look at our 742 00:26:52,866 --> 00:26:54,566 extract, we're going to create a 743 00:26:54,566 --> 00:26:56,466 new row for every PubMed record 744 00:26:56,466 --> 00:26:58,666 since PubMed article is our 745 00:26:58,666 --> 00:26:59,166 pattern. 746 00:26:59,166 --> 00:27:01,166 We're going to have the PMID for 747 00:27:01,166 --> 00:27:01,966 each record. 748 00:27:01,966 --> 00:27:03,166 Since there's no condition 749 00:27:03,166 --> 00:27:04,266 before the element argument, 750 00:27:04,266 --> 00:27:06,166 there's no if there, so every 751 00:27:06,166 --> 00:27:07,466 record that we can put in is 752 00:27:07,466 --> 00:27:09,266 going to generate a row on the 753 00:27:09,266 --> 00:27:10,066 outside. 754 00:27:10,066 --> 00:27:11,899 Each of them is going to have a 755 00:27:11,900 --> 00:27:12,800 PMID. 756 00:27:12,800 --> 00:27:15,000 Then we have our block argument 757 00:27:15,000 --> 00:27:17,200 and block article IDwhich 758 00:27:17,200 --> 00:27:19,100 allows us to focus on the 759 00:27:19,100 --> 00:27:22,300 contents of one IDelement at a 760 00:27:22,300 --> 00:27:22,800 time. 761 00:27:22,800 --> 00:27:25,200 Inside that block we're going to 762 00:27:25,200 --> 00:27:28,500 put our if and equals and our 763 00:27:28,500 --> 00:27:29,100 element. 764 00:27:29,100 --> 00:27:31,700 So article is going to check 765 00:27:31,700 --> 00:27:34,500 each IDblock in a pattern to 766 00:27:34,500 --> 00:27:38,333 see if the attribute equals 767 00:27:38,333 --> 00:27:40,833 DOIfor that element. 768 00:27:40,833 --> 00:27:42,833 If it does, we'll put it in our 769 00:27:42,833 --> 00:27:43,666 second column. 770 00:27:43,666 --> 00:27:45,566 If it doesn't, we're going to 771 00:27:45,566 --> 00:27:46,766 skip the block and check the 772 00:27:46,766 --> 00:27:47,766 next one. 773 00:27:47,766 --> 00:27:49,566 We keep checking through each of 774 00:27:49,566 --> 00:27:54,966 the article's PMIDs. 775 00:27:54,966 --> 00:27:57,166 We'll see what it looks like. 776 00:27:57,166 --> 00:27:59,966 So, as you can see, we get three 777 00:27:59,966 --> 00:28:01,566 rose because we have three 778 00:28:01,566 --> 00:28:03,066 records going in, we're going to 779 00:28:03,066 --> 00:28:04,666 have three records coming out 780 00:28:04,666 --> 00:28:05,666 because there's no condition on 781 00:28:05,666 --> 00:28:06,866 the pattern. 782 00:28:06,866 --> 00:28:09,166 Each row is going to have a PMID 783 00:28:09,166 --> 00:28:10,266 in the first column. 784 00:28:10,266 --> 00:28:12,066 The second column in each row 785 00:28:12,066 --> 00:28:14,266 will either be the article ID, 786 00:28:14,266 --> 00:28:17,999 if there's a DOI, or it's going 787 00:28:18,000 --> 00:28:20,500 to be blank if none of those for 788 00:28:20,500 --> 00:28:23,300 that record have a DOIin them. 789 00:28:23,300 --> 00:28:26,500 I know that's a complicated use 790 00:28:26,500 --> 00:28:28,700 and complicated situation, it is 791 00:28:28,700 --> 00:28:31,200 a very useful one though. 792 00:28:31,200 --> 00:28:33,000 So if anybody has any questions, 793 00:28:33,000 --> 00:28:34,900 I'll be happy to address those 794 00:28:34,900 --> 00:28:35,500 now. 795 00:28:35,500 --> 00:28:36,600 Looks okay for now. 796 00:28:36,600 --> 00:28:41,100 So we are going to go on. 797 00:28:41,100 --> 00:28:45,300 Can we add no for no values? 798 00:28:45,300 --> 00:28:47,800 That's a really good question, 799 00:28:47,800 --> 00:28:51,833 Tom? 800 00:28:51,833 --> 00:28:52,833 Not exactly. 801 00:28:52,833 --> 00:28:57,733 In that case you'd probably want 802 00:28:57,733 --> 00:28:59,033 to use is not. 803 00:28:59,033 --> 00:29:01,433 There's a couple of semi 804 00:29:01,433 --> 00:29:04,233 advanced options but a little 805 00:29:04,233 --> 00:29:05,733 outside the scope. 806 00:29:05,733 --> 00:29:07,133 We can maybe talk about that 807 00:29:07,133 --> 00:29:08,433 later, Tom. 808 00:29:08,433 --> 00:29:10,833 So let me look no that for you. 809 00:29:10,833 --> 00:29:18,133 Kate asking when I repeat a 810 00:29:18,133 --> 00:29:19,333 previous mapped that was across 811 00:29:19,333 --> 00:29:21,033 two lines using slash, it 812 00:29:21,033 --> 00:29:22,733 remains across two lines and 813 00:29:22,733 --> 00:29:23,666 easy to read. 814 00:29:23,666 --> 00:29:25,366 When I do that with the up 815 00:29:25,366 --> 00:29:27,566 arrow, the line break 816 00:29:27,566 --> 00:29:27,866 disappears. 817 00:29:27,866 --> 00:29:29,266 Any ideas? 818 00:29:29,266 --> 00:29:31,666 Yes, I'm just pasting it over 819 00:29:31,666 --> 00:29:34,466 and over again which looks nicer 820 00:29:34,466 --> 00:29:38,699 on the screen. 821 00:29:38,700 --> 00:29:39,300 All right. 822 00:29:39,300 --> 00:29:39,700 Good. 823 00:29:39,700 --> 00:29:41,600 Let's move on because we've got 824 00:29:41,600 --> 00:29:42,900 some other interesting stuff to 825 00:29:42,900 --> 00:29:43,900 talk about. 826 00:29:43,900 --> 00:29:44,400 Okay. 827 00:29:44,400 --> 00:29:49,900 Get back to my slides. 828 00:29:49,900 --> 00:29:51,800 So there will be times when you 829 00:29:51,800 --> 00:29:53,500 want to include multiple 830 00:29:53,500 --> 00:29:55,200 conditions in the same extract 831 00:29:55,200 --> 00:29:57,000 and include only patterns or 832 00:29:57,000 --> 00:29:58,933 blocks that need at least one of 833 00:29:58,933 --> 00:30:01,133 a few different conditions or 834 00:30:01,133 --> 00:30:03,033 that meet all of a few different 835 00:30:03,033 --> 00:30:05,733 conditions, and in order to do 836 00:30:05,733 --> 00:30:08,433 that, you'll use the or and and 837 00:30:08,433 --> 00:30:13,133 arguments and if you're familiar 838 00:30:13,133 --> 00:30:16,633 with you know where I'm going 839 00:30:16,633 --> 00:30:17,233 with this. 840 00:30:17,233 --> 00:30:19,233 With and all of the conditions 841 00:30:19,233 --> 00:30:23,733 you specify must be true. 842 00:30:23,733 --> 00:30:27,033 Let me show you the syntax for 843 00:30:27,033 --> 00:30:27,533 this. 844 00:30:27,533 --> 00:30:29,133 As you can see here, we're going 845 00:30:29,133 --> 00:30:31,266 to follow the same pattern for 846 00:30:31,266 --> 00:30:35,899 our first condition if -- if we 847 00:30:35,900 --> 00:30:37,100 have an argument here that 848 00:30:37,100 --> 00:30:39,700 spells out an element or 849 00:30:39,700 --> 00:30:41,900 attribute or it can have an 850 00:30:41,900 --> 00:30:44,600 equals argument or contains. 851 00:30:44,600 --> 00:30:46,800 Now for a second condition 852 00:30:46,800 --> 00:30:48,900 rather than starting with if, we 853 00:30:48,900 --> 00:30:51,400 start with or, and then we can 854 00:30:51,400 --> 00:30:54,800 also use equals or contains as 855 00:30:54,800 --> 00:30:56,900 we would with if. 856 00:30:56,900 --> 00:30:58,800 This is a modified version of 857 00:30:58,800 --> 00:31:00,300 our previous example. 858 00:31:00,300 --> 00:31:01,900 However, this time we will be 859 00:31:01,900 --> 00:31:04,233 including the article ID, if 860 00:31:04,233 --> 00:31:14,633 it's a DOIor if it's a PFCID. 861 00:31:14,633 --> 00:31:19,033 If the value of the IDtype 862 00:31:19,033 --> 00:31:23,233 article equals DOI-- jumped 863 00:31:23,233 --> 00:31:26,833 ahead of myself or the IDtype 864 00:31:26,833 --> 00:31:32,633 attribute equals BMC, then put 865 00:31:32,633 --> 00:31:35,766 the article for the second IDin 866 00:31:35,766 --> 00:31:37,366 the second column. 867 00:31:37,366 --> 00:31:40,799 We'll skip the article IDblock 868 00:31:40,800 --> 00:31:43,200 and check this one, whether that 869 00:31:43,200 --> 00:31:44,800 one meets either of these 870 00:31:44,800 --> 00:31:45,100 conditions. 871 00:31:45,100 --> 00:31:46,800 So let me show you what that 872 00:31:46,800 --> 00:31:48,000 looks like especially in 873 00:31:48,000 --> 00:31:49,900 comparison to our previous 874 00:31:49,900 --> 00:31:56,200 example. 875 00:31:56,200 --> 00:31:58,300 So we basically are doing a very 876 00:31:58,300 --> 00:31:59,300 similar thing. 877 00:31:59,300 --> 00:32:01,700 We're just adding in this extra 878 00:32:01,700 --> 00:32:04,100 or here, and when I execute 879 00:32:04,100 --> 00:32:06,400 this, we'll see that because the 880 00:32:06,400 --> 00:32:08,300 process repeats for every 881 00:32:08,300 --> 00:32:10,833 article IDin the pattern, if a 882 00:32:10,833 --> 00:32:13,633 PubMed record has both a DOIand 883 00:32:13,633 --> 00:32:17,633 both a PMID, then both will be 884 00:32:17,633 --> 00:32:19,433 presented in that row in the 885 00:32:19,433 --> 00:32:21,033 second column. 886 00:32:21,033 --> 00:32:24,433 So, again, or meeting one or the 887 00:32:24,433 --> 00:32:28,133 other or both of the conditions. 888 00:32:28,133 --> 00:32:34,133 Any questions about that? 889 00:32:34,133 --> 00:32:36,766 Okay. 890 00:32:36,766 --> 00:32:39,366 Going to slide right on into and 891 00:32:39,366 --> 00:32:40,966 then because and is very 892 00:32:40,966 --> 00:32:41,666 similar. 893 00:32:41,666 --> 00:32:44,799 It's syntax is exactly the same. 894 00:32:44,800 --> 00:32:46,700 In order for a block or pattern 895 00:32:46,700 --> 00:32:49,900 to be included, it must sea all 896 00:32:49,900 --> 00:32:51,700 of the conditions that you and 897 00:32:51,700 --> 00:32:52,100 together. 898 00:32:52,100 --> 00:32:54,600 For this sample, I'll also point 899 00:32:54,600 --> 00:32:57,400 out that our if is followed by 900 00:32:57,400 --> 00:33:00,200 an equals argument. 901 00:33:00,200 --> 00:33:03,300 Remember we don't need equals. 902 00:33:03,300 --> 00:33:06,000 We can just have if or and by 903 00:33:06,000 --> 00:33:07,800 itself and I'll show you what 904 00:33:07,800 --> 00:33:11,000 that does, using our pattern or 905 00:33:11,000 --> 00:33:11,800 familiar structure. 906 00:33:11,800 --> 00:33:16,500 If the pattern has a last time 907 00:33:16,500 --> 00:33:18,433 element and the pattern has any 908 00:33:18,433 --> 00:33:20,233 affiliation element because 909 00:33:20,233 --> 00:33:21,933 we're just checking for the 910 00:33:21,933 --> 00:33:23,233 presence. 911 00:33:23,233 --> 00:33:25,333 If both of those things are 912 00:33:25,333 --> 00:33:27,733 true, then create a row for the 913 00:33:27,733 --> 00:33:29,633 new pattern with last name, 914 00:33:29,633 --> 00:33:32,133 initials and affiliation. 915 00:33:32,133 --> 00:33:34,633 If neither one of those things 916 00:33:34,633 --> 00:33:37,366 is not true, then we'll skip the 917 00:33:37,366 --> 00:33:37,766 row. 918 00:33:37,766 --> 00:33:39,866 I have one more and example to 919 00:33:39,866 --> 00:33:40,466 show you. 920 00:33:40,466 --> 00:33:42,666 For this one we only want to I'm 921 00:33:42,666 --> 00:33:44,266 include records that have any 922 00:33:44,266 --> 00:33:46,266 mesh heading that contains the 923 00:33:46,266 --> 00:33:48,666 word Zika virus and that have 924 00:33:48,666 --> 00:33:52,999 the mesh heading microcephaly. 925 00:33:53,000 --> 00:33:54,700 Both conditions must be true. 926 00:33:54,700 --> 00:33:57,200 Here's my extract, a little bit 927 00:33:57,200 --> 00:33:59,300 longer but just mostly because 928 00:33:59,300 --> 00:34:01,000 we have longer element names. 929 00:34:01,000 --> 00:34:02,900 If the pattern has a descriptor 930 00:34:02,900 --> 00:34:04,600 name element that contains the 931 00:34:04,600 --> 00:34:06,700 string Zika virus and the 932 00:34:06,700 --> 00:34:08,800 pattern has a descriptor name 933 00:34:08,800 --> 00:34:10,100 element that equals 934 00:34:10,100 --> 00:34:12,800 microcephaly, then we will 935 00:34:12,800 --> 00:34:15,200 create a row for the new pattern 936 00:34:15,200 --> 00:34:17,600 with PMID and article title. 937 00:34:17,600 --> 00:34:19,400 If either of those things is not 938 00:34:19,400 --> 00:34:20,900 true, we're going to skip the 939 00:34:20,900 --> 00:34:22,400 pattern and move on. 940 00:34:22,400 --> 00:34:24,433 I want to take a closer look at 941 00:34:24,433 --> 00:34:30,433 what's happening here. 942 00:34:30,433 --> 00:34:31,833 This record would actually be 943 00:34:31,833 --> 00:34:33,533 included in our results because 944 00:34:33,533 --> 00:34:37,066 it has a descriptor name that 945 00:34:37,066 --> 00:34:41,266 contains the name Zika virus, 946 00:34:41,266 --> 00:34:43,166 and because it has a descriptor 947 00:34:43,166 --> 00:34:45,566 name element that equals 948 00:34:45,566 --> 00:34:46,766 microcephaly. 949 00:34:46,766 --> 00:34:48,266 Both things are true. 950 00:34:48,266 --> 00:34:50,966 This is why we use contains for 951 00:34:50,966 --> 00:34:53,266 Zika virus for equals. 952 00:34:53,266 --> 00:34:56,566 We want to make sure we contain 953 00:34:56,566 --> 00:34:58,599 like this one but don't have the 954 00:34:58,600 --> 00:35:00,600 heading Zika virus. 955 00:35:00,600 --> 00:35:07,100 Any questions about and, or, 956 00:35:07,100 --> 00:35:08,800 anything else that I have not 957 00:35:08,800 --> 00:35:12,400 addressed? 958 00:35:12,400 --> 00:35:15,700 Microcephaly not in quotation 959 00:35:15,700 --> 00:35:17,300 marks. 960 00:35:17,300 --> 00:35:18,900 Always wonder if somebody's 961 00:35:18,900 --> 00:35:22,400 going to pick up on that. 962 00:35:22,400 --> 00:35:24,000 Zika virus does because there's 963 00:35:24,000 --> 00:35:25,000 a space in it. 964 00:35:25,000 --> 00:35:27,800 Because there's a space, if you 965 00:35:27,800 --> 00:35:30,400 did not put continue quotes, 966 00:35:30,400 --> 00:35:32,233 you'd encounter some trouble. 967 00:35:32,233 --> 00:35:35,566 It would try to contain Zika and 968 00:35:35,566 --> 00:35:38,966 then virus would just be hanging 969 00:35:38,966 --> 00:35:40,866 out there so you need to quote 970 00:35:40,866 --> 00:35:41,066 it. 971 00:35:41,066 --> 00:35:49,166 Any other questions? 972 00:35:49,166 --> 00:35:50,266 Is there a way to group 973 00:35:50,266 --> 00:35:51,466 conditional if there are more 974 00:35:51,466 --> 00:35:52,766 than two? 975 00:35:52,766 --> 00:35:55,066 I don't believe there is at the 976 00:35:55,066 --> 00:35:55,766 current time. 977 00:35:55,766 --> 00:35:57,366 I was talking to a developer 978 00:35:57,366 --> 00:35:57,966 about that. 979 00:35:57,966 --> 00:36:02,766 I think that the syntax gets a 980 00:36:02,766 --> 00:36:05,499 little too complicated. 981 00:36:05,500 --> 00:36:07,200 If you're talk about complex 982 00:36:07,200 --> 00:36:08,900 stuff, what you're better off 983 00:36:08,900 --> 00:36:11,800 doing is using some non-eDirect 984 00:36:11,800 --> 00:36:13,300 programming tools but that's a 985 00:36:13,300 --> 00:36:14,800 little bit beyond our scope 986 00:36:14,800 --> 00:36:15,600 right here. 987 00:36:15,600 --> 00:36:17,400 There's also other ways to think 988 00:36:17,400 --> 00:36:18,100 about it. 989 00:36:18,100 --> 00:36:19,700 You can put some of your 990 00:36:19,700 --> 00:36:20,900 questions in your search, for 991 00:36:20,900 --> 00:36:23,100 example, or you can run a set of 992 00:36:23,100 --> 00:36:25,400 conditions on a list of Pmids 993 00:36:25,400 --> 00:36:27,900 and feed that back in and run 994 00:36:27,900 --> 00:36:29,400 another list set of conditions, 995 00:36:29,400 --> 00:36:30,700 so there's a couple of different 996 00:36:30,700 --> 00:36:32,600 ways you could do that. 997 00:36:32,600 --> 00:36:35,633 Grouping things like that is not 998 00:36:35,633 --> 00:36:39,266 currently possible. 999 00:36:39,266 --> 00:36:39,766 Okay. 1000 00:36:39,766 --> 00:36:41,266 Going to go on to exercise 1001 00:36:41,266 --> 00:36:41,566 three. 1002 00:36:41,566 --> 00:36:44,166 This one is a little bit of a 1003 00:36:44,166 --> 00:36:47,866 tricky one. 1004 00:36:47,866 --> 00:36:49,766 Trying to build an entire 1005 00:36:49,766 --> 00:36:51,866 solution to a problem, an entire 1006 00:36:51,866 --> 00:36:52,166 script. 1007 00:36:52,166 --> 00:36:53,566 You did a little of that in 1008 00:36:53,566 --> 00:36:55,866 previous exercises but I want to 1009 00:36:55,866 --> 00:36:57,966 have you do one of these right 1010 00:36:57,966 --> 00:36:58,266 here. 1011 00:36:58,266 --> 00:37:02,366 We want do a search for Ba Smith 1012 00:37:02,366 --> 00:37:04,066 as and author and see different 1013 00:37:04,066 --> 00:37:06,566 affiliations listed for that 1014 00:37:06,566 --> 00:37:07,166 author. 1015 00:37:07,166 --> 00:37:08,866 We're going to limit it to 1016 00:37:08,866 --> 00:37:11,599 publications from 2011-2016. 1017 00:37:11,600 --> 00:37:13,700 We only want to see the 1018 00:37:13,700 --> 00:37:16,200 affiliation data for BHSmith 1019 00:37:16,200 --> 00:37:17,800 regardless of how many authors 1020 00:37:17,800 --> 00:37:19,000 are on the citation. 1021 00:37:19,000 --> 00:37:21,200 We want our output to be a table 1022 00:37:21,200 --> 00:37:22,900 of citations. 1023 00:37:22,900 --> 00:37:24,500 We want the PMID. 1024 00:37:24,500 --> 00:37:27,800 We want the author last name and 1025 00:37:27,800 --> 00:37:30,700 initials which should always be 1026 00:37:30,700 --> 00:37:32,800 Smith space BHbecause of our 1027 00:37:32,800 --> 00:37:33,300 condition. 1028 00:37:33,300 --> 00:37:34,800 It's nice to put it in there to 1029 00:37:34,800 --> 00:37:35,833 be sure your condition is 1030 00:37:35,833 --> 00:37:37,233 working properly and also we 1031 00:37:37,233 --> 00:37:39,233 want the affiliation. 1032 00:37:39,233 --> 00:37:41,033 Again, we want the whole script, 1033 00:37:41,033 --> 00:37:42,833 not just the extract command. 1034 00:37:42,833 --> 00:37:44,366 If you remember back to some of 1035 00:37:44,366 --> 00:37:45,966 our previous examples, we did 1036 00:37:45,966 --> 00:37:47,766 some searches for authors. 1037 00:37:47,766 --> 00:37:50,166 If you are not as familiar in 1038 00:37:50,166 --> 00:37:51,966 searching for authors in PubMed, 1039 00:37:51,966 --> 00:37:54,966 you should search last name 1040 00:37:54,966 --> 00:37:56,566 space initials tagged with 1041 00:37:56,566 --> 00:37:57,666 author in brackets. 1042 00:37:57,666 --> 00:37:59,866 If you search in other ways, you 1043 00:37:59,866 --> 00:38:01,366 will not get the results you're 1044 00:38:01,366 --> 00:38:02,066 working for. 1045 00:38:02,066 --> 00:38:03,466 I'll give you more time to work 1046 00:38:03,466 --> 00:38:04,666 on this because it's a little 1047 00:38:04,666 --> 00:38:05,966 bit trickier. 1048 00:38:05,966 --> 00:38:08,066 Give me a green check mark when 1049 00:38:08,066 --> 00:38:09,566 you're all set. 1050 00:38:09,566 --> 00:38:10,866 Have any questions, throw them 1051 00:38:10,866 --> 00:38:11,866 in the chat box. 1052 00:38:11,866 --> 00:38:13,066 Hints at the bottom. 1053 00:38:13,066 --> 00:38:13,866 Usual deal. 1054 00:38:13,866 --> 00:38:14,166 All right. 1055 00:38:14,166 --> 00:38:15,432 Go to it. 1056 00:38:15,433 --> 00:38:18,299 >>> I know this one's a tricky 1057 00:38:18,300 --> 00:38:18,700 one. 1058 00:38:18,700 --> 00:38:20,400 We're going to talk through this 1059 00:38:20,400 --> 00:38:22,600 one together. 1060 00:38:22,600 --> 00:38:25,933 So let me hop over and clear my 1061 00:38:25,933 --> 00:38:26,333 screen. 1062 00:38:26,333 --> 00:38:28,533 I'm just going to -- rather then 1063 00:38:28,533 --> 00:38:31,733 type it out, I'm going to copy 1064 00:38:31,733 --> 00:38:33,533 and paste it because it's a 1065 00:38:33,533 --> 00:38:38,833 little on the long side. 1066 00:38:38,833 --> 00:38:40,333 Okay. 1067 00:38:40,333 --> 00:38:42,033 So again this is another one of 1068 00:38:42,033 --> 00:38:43,333 those things that has a couple 1069 00:38:43,333 --> 00:38:45,133 of different ways to do it. 1070 00:38:45,133 --> 00:38:46,833 I'll show you my way, but, if 1071 00:38:46,833 --> 00:38:48,866 you got answers, then you're 1072 00:38:48,866 --> 00:38:49,466 fine. 1073 00:38:49,466 --> 00:38:51,466 It doesn't matter if you did it 1074 00:38:51,466 --> 00:38:53,566 exactly the way I did it. 1075 00:38:53,566 --> 00:38:54,766 Though you're generally going to 1076 00:38:54,766 --> 00:38:56,566 want to start off with and 1077 00:38:56,566 --> 00:38:59,966 Esearch. 1078 00:38:59,966 --> 00:39:05,166 My query as I signatured, Smith 1079 00:39:05,166 --> 00:39:09,366 space DH. 1080 00:39:09,366 --> 00:39:11,266 The one thing that I would 1081 00:39:11,266 --> 00:39:13,066 caution you about that is that 1082 00:39:13,066 --> 00:39:14,866 will only work if all of 1083 00:39:14,866 --> 00:39:19,266 BHSmith's papers have his or 1084 00:39:19,266 --> 00:39:21,599 her orchid on them. 1085 00:39:21,600 --> 00:39:23,500 We did not start accepting 1086 00:39:23,500 --> 00:39:26,133 orchids until somewhat more 1087 00:39:26,133 --> 00:39:27,833 recently last couple of years, 1088 00:39:27,833 --> 00:39:30,733 and not all publishers submit 1089 00:39:30,733 --> 00:39:32,433 them and not all authors put 1090 00:39:32,433 --> 00:39:34,833 them on all papers and authors 1091 00:39:34,833 --> 00:39:36,233 who don't get them later in 1092 00:39:36,233 --> 00:39:38,633 their career don't put them on 1093 00:39:38,633 --> 00:39:39,833 earlier papers. 1094 00:39:39,833 --> 00:39:42,133 So orchid is great. 1095 00:39:42,133 --> 00:39:44,433 We're going to talk a little bit 1096 00:39:44,433 --> 00:39:46,333 more about orchid in one of the 1097 00:39:46,333 --> 00:39:48,633 case studies actually next time. 1098 00:39:48,633 --> 00:39:49,033 All right. 1099 00:39:49,033 --> 00:39:51,533 But after that digression, we 1100 00:39:51,533 --> 00:39:54,333 need our date restriction to 1101 00:39:54,333 --> 00:39:56,366 2011-2016. 1102 00:39:56,366 --> 00:39:57,666 This is another place have you 1103 00:39:57,666 --> 00:39:58,066 options. 1104 00:39:58,066 --> 00:40:00,266 You could put them in the search 1105 00:40:00,266 --> 00:40:01,966 query or use the arguments like 1106 00:40:01,966 --> 00:40:03,066 I have. 1107 00:40:03,066 --> 00:40:05,366 I usually prefer those because I 1108 00:40:05,366 --> 00:40:06,966 like to experiment with 1109 00:40:06,966 --> 00:40:08,466 different date restrictions 1110 00:40:08,466 --> 00:40:09,966 while keeping any search 1111 00:40:09,966 --> 00:40:11,666 strategy the same, especially as 1112 00:40:11,666 --> 00:40:12,766 you'll see next time. 1113 00:40:12,766 --> 00:40:14,666 I do artificially small date 1114 00:40:14,666 --> 00:40:17,066 restrictions so I can get sample 1115 00:40:17,066 --> 00:40:17,366 data. 1116 00:40:17,366 --> 00:40:19,466 So it's easier for me to do that 1117 00:40:19,466 --> 00:40:21,266 if I keep them outside. 1118 00:40:21,266 --> 00:40:23,766 If you keep them upside, that's 1119 00:40:23,766 --> 00:40:24,899 totally fine. 1120 00:40:24,900 --> 00:40:27,833 Type the XML into our extract 1121 00:40:27,833 --> 00:40:29,433 pattern PubMed article. 1122 00:40:29,433 --> 00:40:31,233 That's pretty familiar. 1123 00:40:31,233 --> 00:40:33,233 PMID, we got that part and then 1124 00:40:33,233 --> 00:40:37,533 we wanted to get only BHSmith's 1125 00:40:37,533 --> 00:40:39,233 affiliation information, and the 1126 00:40:39,233 --> 00:40:40,833 way we can do that is by 1127 00:40:40,833 --> 00:40:43,333 checking each author to see if 1128 00:40:43,333 --> 00:40:46,433 it's BHSmith. 1129 00:40:46,433 --> 00:40:49,133 We would go block author to look 1130 00:40:49,133 --> 00:40:51,233 at one author at a time, but 1131 00:40:51,233 --> 00:40:52,833 we'll only include a block if 1132 00:40:52,833 --> 00:40:55,633 the last name element equals 1133 00:40:55,633 --> 00:40:58,133 Smith and the initials element 1134 00:40:58,133 --> 00:40:59,333 equals BH. 1135 00:40:59,333 --> 00:41:01,233 If both of those things are 1136 00:41:01,233 --> 00:41:04,466 true, then we will include our 1137 00:41:04,466 --> 00:41:09,666 data with the SEP space and then 1138 00:41:09,666 --> 00:41:11,966 affiliation in the second 1139 00:41:11,966 --> 00:41:12,766 column. 1140 00:41:12,766 --> 00:41:14,566 When I execute this, probably 1141 00:41:14,566 --> 00:41:16,366 going to take a moment or two to 1142 00:41:16,366 --> 00:41:17,066 run. 1143 00:41:17,066 --> 00:41:18,266 Not too bad. 1144 00:41:18,266 --> 00:41:22,566 We have a list of PMIDs which is 1145 00:41:22,566 --> 00:41:23,566 good. 1146 00:41:23,566 --> 00:41:25,499 We probably don't really need 1147 00:41:25,500 --> 00:41:27,200 that but it's good to just 1148 00:41:27,200 --> 00:41:32,600 confirm to make sure your 1149 00:41:32,600 --> 00:41:34,100 connection as it seems to be and 1150 00:41:34,100 --> 00:41:37,933 then we have your affiliation 1151 00:41:37,933 --> 00:41:39,833 data if there are multiple 1152 00:41:39,833 --> 00:41:42,533 BHSmiths to see which of the 1153 00:41:42,533 --> 00:41:43,433 records is the one that you 1154 00:41:43,433 --> 00:41:43,833 want. 1155 00:41:43,833 --> 00:41:45,533 Any questions about any of that 1156 00:41:45,533 --> 00:41:46,033 stuff? 1157 00:41:46,033 --> 00:41:48,333 And actually Tom's questions 1158 00:41:48,333 --> 00:41:53,633 about adding no for no values. 1159 00:41:53,633 --> 00:41:55,333 I may have misunderstood my 1160 00:41:55,333 --> 00:41:57,333 question as one of my colleagues 1161 00:41:57,333 --> 00:41:58,233 pointed out. 1162 00:41:58,233 --> 00:42:00,033 We may be addressing your 1163 00:42:00,033 --> 00:42:01,133 question in just a few minutes. 1164 00:42:01,133 --> 00:42:02,933 If you still have your question 1165 00:42:02,933 --> 00:42:05,333 at the end, let me know and I'll 1166 00:42:05,333 --> 00:42:07,033 maybe come back to it. 1167 00:42:07,033 --> 00:42:10,766 Kate asked what if BHSmith had 1168 00:42:10,766 --> 00:42:14,666 to affiliations in one article. 1169 00:42:14,666 --> 00:42:15,766 >> If they have multiple 1170 00:42:15,766 --> 00:42:17,666 affiliations, they will all be 1171 00:42:17,666 --> 00:42:20,166 within that same element. 1172 00:42:20,166 --> 00:42:22,166 Even if it was repeated, it 1173 00:42:22,166 --> 00:42:24,366 would do like we saw remember 1174 00:42:24,366 --> 00:42:26,666 back of Sara was showing you 1175 00:42:26,666 --> 00:42:29,666 parent child construction, it 1176 00:42:29,666 --> 00:42:31,166 puts them all one after the 1177 00:42:31,166 --> 00:42:32,566 other so that's what you would 1178 00:42:32,566 --> 00:42:34,666 see, but it would both be on the 1179 00:42:34,666 --> 00:42:35,566 same line. 1180 00:42:35,566 --> 00:42:38,066 They would all be one line 1181 00:42:38,066 --> 00:42:39,466 running out there, and this is 1182 00:42:39,466 --> 00:42:41,299 another great example of when 1183 00:42:41,300 --> 00:42:42,900 you might want to save your 1184 00:42:42,900 --> 00:42:44,100 results toll a file nor 1185 00:42:44,100 --> 00:42:49,300 something like excel because it 1186 00:42:49,300 --> 00:42:51,200 will be a little bit easier to 1187 00:42:51,200 --> 00:42:51,900 read. 1188 00:42:51,900 --> 00:42:52,166 All right. 1189 00:42:52,166 --> 00:42:53,399 A little low on time. 1190 00:42:53,400 --> 00:42:54,800 If there's any other questions, 1191 00:42:54,800 --> 00:42:56,600 I will address them momentarily. 1192 00:42:56,600 --> 00:42:57,800 We're going to move on because 1193 00:42:57,800 --> 00:42:58,900 there's another conditional 1194 00:42:58,900 --> 00:43:00,300 argument that we have not talked 1195 00:43:00,300 --> 00:43:01,200 about yet. 1196 00:43:01,200 --> 00:43:02,400 It works a little bit 1197 00:43:02,400 --> 00:43:04,000 differently than if, but it can 1198 00:43:04,000 --> 00:43:06,600 be pretty useful for isolating 1199 00:43:06,600 --> 00:43:07,900 certain parts of your data. 1200 00:43:07,900 --> 00:43:10,800 Let me hop back into my slides 1201 00:43:10,800 --> 00:43:12,300 to talk about this example. 1202 00:43:12,300 --> 00:43:14,000 Let's say you want to include 1203 00:43:14,000 --> 00:43:15,733 off your data in your table but 1204 00:43:15,733 --> 00:43:18,633 you only want the first author, 1205 00:43:18,633 --> 00:43:20,433 and you can't really do this 1206 00:43:20,433 --> 00:43:22,733 with equals or contains unless 1207 00:43:22,733 --> 00:43:25,466 you know the name of the first 1208 00:43:25,466 --> 00:43:26,166 aller. 1209 00:43:26,166 --> 00:43:27,666 However, you can do this using 1210 00:43:27,666 --> 00:43:29,266 the position argument, and you 1211 00:43:29,266 --> 00:43:30,866 use position in combination with 1212 00:43:30,866 --> 00:43:32,866 a block argument. 1213 00:43:32,866 --> 00:43:34,266 Earlier today when we talked 1214 00:43:34,266 --> 00:43:36,766 about if and block, we were 1215 00:43:36,766 --> 00:43:38,666 including a block only if it met 1216 00:43:38,666 --> 00:43:39,366 our position. 1217 00:43:39,366 --> 00:43:40,966 With position we can include a 1218 00:43:40,966 --> 00:43:42,466 block only if it is the first 1219 00:43:42,466 --> 00:43:43,766 occurrence of that block or the 1220 00:43:43,766 --> 00:43:45,466 second occurrence of the block, 1221 00:43:45,466 --> 00:43:45,966 etc. 1222 00:43:45,966 --> 00:43:47,666 So if we wanted to include only 1223 00:43:47,666 --> 00:43:50,299 the first block, we could add a 1224 00:43:50,300 --> 00:43:51,500 position first argument right 1225 00:43:51,500 --> 00:43:53,200 after the block argument, as you 1226 00:43:53,200 --> 00:43:54,700 can see here, just where we 1227 00:43:54,700 --> 00:43:57,500 would put an if, we put the 1228 00:43:57,500 --> 00:43:58,600 position argument instead and we 1229 00:43:58,600 --> 00:44:00,800 can specify we want the first 1230 00:44:00,800 --> 00:44:03,200 block by saying position first. 1231 00:44:03,200 --> 00:44:05,000 If we wanted to include only the 1232 00:44:05,000 --> 00:44:07,400 last block like the last author, 1233 00:44:07,400 --> 00:44:08,900 we can do position last. 1234 00:44:08,900 --> 00:44:10,400 That last one is especially 1235 00:44:10,400 --> 00:44:12,900 useful since we can specify that 1236 00:44:12,900 --> 00:44:15,000 we want the last block even if 1237 00:44:15,000 --> 00:44:16,900 we don't know how many blocks 1238 00:44:16,900 --> 00:44:17,400 there are. 1239 00:44:17,400 --> 00:44:18,700 When using the position argument 1240 00:44:18,700 --> 00:44:21,233 you can also specify a block by 1241 00:44:21,233 --> 00:44:22,233 number if you want. 1242 00:44:22,233 --> 00:44:23,833 So position one is the same 1243 00:44:23,833 --> 00:44:24,966 thing as position first. 1244 00:44:24,966 --> 00:44:28,066 Let me show you what this looks 1245 00:44:28,066 --> 00:44:38,766 like. 1246 00:44:38,766 --> 00:44:40,366 First I'm going to run a script 1247 00:44:40,366 --> 00:44:43,066 that you saw a version last time 1248 00:44:43,066 --> 00:44:47,466 when we first introduced -- when 1249 00:44:47,466 --> 00:44:49,266 we first introduced the concept 1250 00:44:49,266 --> 00:44:56,599 of block. 1251 00:44:56,600 --> 00:44:57,600 And here it is. 1252 00:44:57,600 --> 00:45:01,000 I'm going to execute this. 1253 00:45:01,000 --> 00:45:03,600 Pairing together the 1254 00:45:03,600 --> 00:45:04,900 corresponding last name and 1255 00:45:04,900 --> 00:45:06,400 initials, the last name and 1256 00:45:06,400 --> 00:45:09,400 initials are groups into a 1257 00:45:09,400 --> 00:45:10,000 column. 1258 00:45:10,000 --> 00:45:12,100 For each record we have a PMID 1259 00:45:12,100 --> 00:45:13,800 and a list of authors. 1260 00:45:13,800 --> 00:45:16,000 If instead of all of the authors 1261 00:45:16,000 --> 00:45:17,900 though, we just want the first 1262 00:45:17,900 --> 00:45:20,000 author, we could do that same 1263 00:45:20,000 --> 00:45:21,700 block of code. 1264 00:45:21,700 --> 00:45:23,700 You should notice I was kind of 1265 00:45:23,700 --> 00:45:26,333 cheating and add in the position 1266 00:45:26,333 --> 00:45:31,166 arguments. 1267 00:45:31,166 --> 00:45:33,866 So this should restrict our 1268 00:45:33,866 --> 00:45:35,866 retrieval to just that first 1269 00:45:35,866 --> 00:45:36,866 author. 1270 00:45:36,866 --> 00:45:38,766 We're going to check each 1271 00:45:38,766 --> 00:45:40,966 authentic are one at a time but 1272 00:45:40,966 --> 00:45:43,066 only retrieve the one that's the 1273 00:45:43,066 --> 00:45:45,366 first so we have PMID and first 1274 00:45:45,366 --> 00:45:47,466 author, and I can just as easily 1275 00:45:47,466 --> 00:45:49,566 do that same block of code 1276 00:45:49,566 --> 00:45:51,866 except changing first to last 1277 00:45:51,866 --> 00:45:53,766 and I get the last author 1278 00:45:53,766 --> 00:45:56,866 instead. 1279 00:45:56,866 --> 00:45:59,866 Any questions about position? 1280 00:45:59,866 --> 00:46:01,499 I went through it kind of 1281 00:46:01,500 --> 00:46:03,800 quickly, but it can be pretty 1282 00:46:03,800 --> 00:46:06,100 useful. 1283 00:46:06,100 --> 00:46:07,900 Don't see any right now. 1284 00:46:07,900 --> 00:46:08,800 So hang on. 1285 00:46:08,800 --> 00:46:10,900 I'm going to clear my screen. 1286 00:46:10,900 --> 00:46:22,300 Can you do a range? 1287 00:46:22,300 --> 00:46:24,933 >> You would do block one, block 1288 00:46:24,933 --> 00:46:28,333 two, block position one, block 1289 00:46:28,333 --> 00:46:30,133 position two, block position 1290 00:46:30,133 --> 00:46:32,033 three as separate lines on code. 1291 00:46:32,033 --> 00:46:33,933 There's no way to do a range 1292 00:46:33,933 --> 00:46:34,833 built into that. 1293 00:46:34,833 --> 00:46:36,266 I will ask the developer about 1294 00:46:36,266 --> 00:46:36,566 that. 1295 00:46:36,566 --> 00:46:38,166 That's an interesting question. 1296 00:46:38,166 --> 00:46:40,466 Okay. 1297 00:46:40,466 --> 00:46:42,266 We're running a little short of 1298 00:46:42,266 --> 00:46:44,866 time on so I'm to move on. 1299 00:46:44,866 --> 00:46:46,166 And this is the thing that might 1300 00:46:46,166 --> 00:46:47,366 answer Tom's question. 1301 00:46:47,366 --> 00:46:49,466 If it doesn't, let me know. 1302 00:46:49,466 --> 00:46:51,366 We're just about done but I do 1303 00:46:51,366 --> 00:46:52,766 want to show you one more 1304 00:46:52,766 --> 00:46:54,066 argument that's not a technical 1305 00:46:54,066 --> 00:46:55,266 conditional argument. 1306 00:46:55,266 --> 00:46:57,066 It's close to the other 1307 00:46:57,066 --> 00:46:59,266 formatting airmass like Kate was 1308 00:46:59,266 --> 00:47:00,966 showing you like tab and SEP. 1309 00:47:00,966 --> 00:47:03,766 It works more like those but 1310 00:47:03,766 --> 00:47:05,466 fits schematically. 1311 00:47:05,466 --> 00:47:08,299 It's a way of adjusting your 1312 00:47:08,300 --> 00:47:10,600 output automatically based on 1313 00:47:10,600 --> 00:47:11,700 your input. 1314 00:47:11,700 --> 00:47:13,400 I'm going to go back to that 1315 00:47:13,400 --> 00:47:15,800 previous example we had, with 1316 00:47:15,800 --> 00:47:18,100 the author name, but this time 1317 00:47:18,100 --> 00:47:19,800 instead of us getting the first 1318 00:47:19,800 --> 00:47:22,100 author' last name and initials, 1319 00:47:22,100 --> 00:47:24,000 I'm also going to get the orchid 1320 00:47:24,000 --> 00:47:24,700 ID. 1321 00:47:24,700 --> 00:47:26,333 So when I execute this, we can 1322 00:47:26,333 --> 00:47:29,433 see we get PMID, author name and 1323 00:47:29,433 --> 00:47:31,033 orchid if there is one, and this 1324 00:47:31,033 --> 00:47:33,633 is all well and good, but if we 1325 00:47:33,633 --> 00:47:35,733 now wanted to add in a fourth 1326 00:47:35,733 --> 00:47:37,333 column after the identifier, we 1327 00:47:37,333 --> 00:47:39,733 might get into a little bit of 1328 00:47:39,733 --> 00:47:41,033 trouble because of those blanks 1329 00:47:41,033 --> 00:47:42,466 in those rows that don't have 1330 00:47:42,466 --> 00:47:43,066 orchids. 1331 00:47:43,066 --> 00:47:45,066 So let me show you what I mean 1332 00:47:45,066 --> 00:47:45,566 by that. 1333 00:47:45,566 --> 00:47:47,466 I'm going to take almost that 1334 00:47:47,466 --> 00:47:49,766 same exact code, except now I'm 1335 00:47:49,766 --> 00:47:53,466 going to add a fourth column. 1336 00:47:53,466 --> 00:47:55,166 Fourth column is going to be 1337 00:47:55,166 --> 00:47:58,166 last name again but it's really 1338 00:47:58,166 --> 00:48:00,966 going to understand why this is 1339 00:48:00,966 --> 00:48:01,666 a problem. 1340 00:48:01,666 --> 00:48:03,566 When I execute this, we can see 1341 00:48:03,566 --> 00:48:10,166 we have some alignment issues. 1342 00:48:10,166 --> 00:48:12,766 Extract just skips it and 1343 00:48:12,766 --> 00:48:14,899 causing these alignment problems 1344 00:48:14,900 --> 00:48:16,600 because when there's a blank in 1345 00:48:16,600 --> 00:48:19,300 the third column, extract puts 1346 00:48:19,300 --> 00:48:20,700 the data that should go into the 1347 00:48:20,700 --> 00:48:22,000 four-alarm column into that 1348 00:48:22,000 --> 00:48:24,200 blank, so everything gets 1349 00:48:24,200 --> 00:48:26,333 shifted over and your table gets 1350 00:48:26,333 --> 00:48:28,133 out of alignment. 1351 00:48:28,133 --> 00:48:34,933 The solution to this is the 1352 00:48:34,933 --> 00:48:35,733 death argument. 1353 00:48:35,733 --> 00:48:38,033 Wherever a blank would be 1354 00:48:38,033 --> 00:48:39,633 extract would put the default 1355 00:48:39,633 --> 00:48:40,333 I'm stead. 1356 00:48:40,333 --> 00:48:41,833 Following up on our last 1357 00:48:41,833 --> 00:48:49,266 example, I'm going to copy and 1358 00:48:49,266 --> 00:48:50,966 paste this in. 1359 00:48:50,966 --> 00:48:55,366 We're going to make our def 1360 00:48:55,366 --> 00:48:56,466 Nslash A. 1361 00:48:56,466 --> 00:48:58,966 You can see we put the def right 1362 00:48:58,966 --> 00:48:59,966 next to the SEP. 1363 00:48:59,966 --> 00:49:01,766 The placement works just the 1364 00:49:01,766 --> 00:49:02,266 same way. 1365 00:49:02,266 --> 00:49:04,366 It has to be inside the block 1366 00:49:04,366 --> 00:49:05,966 because block resets our 1367 00:49:05,966 --> 00:49:07,966 formatting arguments, and it has 1368 00:49:07,966 --> 00:49:09,666 to be -- and it should be after 1369 00:49:09,666 --> 00:49:10,566 the conditional arguments as 1370 00:49:10,566 --> 00:49:12,366 well, so right there next to 1371 00:49:12,366 --> 00:49:14,266 SEP, that's where we put it, and 1372 00:49:14,266 --> 00:49:17,366 when I execute that, well, the 1373 00:49:17,366 --> 00:49:19,366 assignment is a little bit weird 1374 00:49:19,366 --> 00:49:20,566 because of the tabs. 1375 00:49:20,566 --> 00:49:22,299 But you can see that we have 1376 00:49:22,300 --> 00:49:25,033 these NAs holding the place of 1377 00:49:25,033 --> 00:49:26,933 the orchid IDs. 1378 00:49:26,933 --> 00:49:29,433 We have PMID, last name and 1379 00:49:29,433 --> 00:49:33,333 initials, orchid ID, if none is 1380 00:49:33,333 --> 00:49:34,733 there, and last names. 1381 00:49:34,733 --> 00:49:37,233 If I took that and put that into 1382 00:49:37,233 --> 00:49:39,833 excel, those would be grouped in 1383 00:49:39,833 --> 00:49:41,733 the appropriate come he means. 1384 00:49:41,733 --> 00:49:42,133 All right. 1385 00:49:42,133 --> 00:49:44,633 Any questions about that? 1386 00:49:44,633 --> 00:49:46,333 Okay. 1387 00:49:46,333 --> 00:49:47,633 This is wonderful. 1388 00:49:47,633 --> 00:49:48,133 Excellent. 1389 00:49:48,133 --> 00:49:49,933 I'm glad you like it. 1390 00:49:49,933 --> 00:49:51,433 Tom says that's what he meant. 1391 00:49:51,433 --> 00:49:52,433 That's also great. 1392 00:49:52,433 --> 00:49:55,033 Let me start my wrapup stuff and 1393 00:49:55,033 --> 00:49:56,566 then we will see if there's any 1394 00:49:56,566 --> 00:49:58,066 other questions left over at the 1395 00:49:58,066 --> 00:49:59,366 end. 1396 00:49:59,366 --> 00:50:00,966 Because we are a little short on 1397 00:50:00,966 --> 00:50:02,966 time and I want to save time for 1398 00:50:02,966 --> 00:50:03,366 questions. 1399 00:50:03,366 --> 00:50:04,466 Thank you very much again for 1400 00:50:04,466 --> 00:50:05,766 sticking with us so far. 1401 00:50:05,766 --> 00:50:07,766 You've done a great job. 1402 00:50:07,766 --> 00:50:09,466 Next Monday is our last 1403 00:50:09,466 --> 00:50:10,266 significance. 1404 00:50:10,266 --> 00:50:11,666 You're almost rid husband but 1405 00:50:11,666 --> 00:50:13,166 you're not rid of me because 1406 00:50:13,166 --> 00:50:14,166 I'll be back again. 1407 00:50:14,166 --> 00:50:15,866 We're going to try to put all of 1408 00:50:15,866 --> 00:50:18,166 this together and look at 1409 00:50:18,166 --> 00:50:21,266 strategies and techniques of 1410 00:50:21,266 --> 00:50:23,166 building from the ground up. 1411 00:50:23,166 --> 00:50:24,266 There was some really good 1412 00:50:24,266 --> 00:50:24,566 ideas. 1413 00:50:24,566 --> 00:50:26,099 A lot of people had similar 1414 00:50:26,100 --> 00:50:28,733 questions along the similar vein 1415 00:50:28,733 --> 00:50:31,033 so we tried to group those 1416 00:50:31,033 --> 00:50:32,433 together into a couple of case 1417 00:50:32,433 --> 00:50:32,733 studies. 1418 00:50:32,733 --> 00:50:34,833 We'll see how much time we get. 1419 00:50:34,833 --> 00:50:36,933 I think you'll really find them 1420 00:50:36,933 --> 00:50:38,033 interesting. 1421 00:50:38,033 --> 00:50:39,633 As always, the homework 1422 00:50:39,633 --> 00:50:40,633 questions are at the bottom of 1423 00:50:40,633 --> 00:50:42,033 the happenedout for today's 1424 00:50:42,033 --> 00:50:43,433 class, and the answers are on 1425 00:50:43,433 --> 00:50:45,033 the sample code page. 1426 00:50:45,033 --> 00:50:46,933 If you do get stuck, there's 1427 00:50:46,933 --> 00:50:48,733 absolutely no shame in looking 1428 00:50:48,733 --> 00:50:49,733 at those answers. 1429 00:50:49,733 --> 00:50:51,833 You can see how we did it and 1430 00:50:51,833 --> 00:50:53,333 figure out what the answer 1431 00:50:53,333 --> 00:50:53,933 means. 1432 00:50:53,933 --> 00:50:56,533 If you do get stuck and still 1433 00:50:56,533 --> 00:50:58,533 don't understand the annotated 1434 00:50:58,533 --> 00:51:00,233 solutions, make sure you send us 1435 00:51:00,233 --> 00:51:01,633 an e-mail or ask us at the 1436 00:51:01,633 --> 00:51:04,766 beginning of next time's class. 1437 00:51:04,766 --> 00:51:06,466 As always, you can check out the 1438 00:51:06,466 --> 00:51:08,566 website for more documentation 1439 00:51:08,566 --> 00:51:10,466 to look at the slides and 1440 00:51:10,466 --> 00:51:12,166 scripts and sample code and all 1441 00:51:12,166 --> 00:51:13,166 of that stuff. 1442 00:51:13,166 --> 00:51:14,166 Any questions that you have for 1443 00:51:14,166 --> 00:51:18,166 us, you can ask us by going to 1444 00:51:18,166 --> 00:51:20,866 the contact page, or if you have 1445 00:51:20,866 --> 00:51:22,366 any questions right now, I would 1446 00:51:22,366 --> 00:51:24,066 be more than happy to answer 1447 00:51:24,066 --> 00:51:25,566 them.