HPR4404: Kevie nerd snipes Ken by grepping xml

More Command line fun: downloading a podcast In the show hpr4398 :: Command line fun: downloading a podcast Kevie walked us through a command to download a podcast. He used some techniques here that I hadn't used before, and it's always great to see how other people approach the problem. Let's have a look at the script and walk through what it does, then we'll have a look at some "traps for young players" as the EEVBlog is fond of saying. Analysis of the Script wget `curl https://tuxjam.otherside.network/feed/podcast/ | grep -o 'https*://[^"]*ogg' | head -1` It chains four different commands together to "Save the latest file from a feed". Let's break it down so we can have checkpoints between each step. I often do this when writing a complex one liner - first do it as steps, and then combine it. The curl command gets https://tuxjam.otherside.network/feed/podcast/ . To do this ourselves we will call curl https://tuxjam.otherside.network/feed/podcast/ --output tuxjam.xml , as the default file name is index.html. This gives us a xml file, and we can confirm it's valid xml with the xmllint command. $ xmllint --format tuxjam.xml >/dev/null $ echo $? 0 Here the output of the command is ignored by redirecting it to /dev/null Then we check the error code the last command had. As it's 0 it completed sucessfully. Kevie then passes the output to the grep search command with the option -o and then looks for any string starting with https followed by anything then followed by two forward slashes, then -o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line We can do the same with. I was not aware that grep defaulted to regex, as I tend to add the --perl-regexp to explicitly add it. grep --only-matching 'https*://[^"]*ogg' tuxjam.xml http matches the characters http literally (case sensitive) s* matches the character s literally (case sensitive) Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy] : matches the character : literally / matches the character / literally / matches the character / literally [^"]* match a single character not present in the list below Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy] " a single character in the list " literally (case sensitive) ogg matches the characters ogg literally (case sensitive) When we run this ourselves we get the following $ grep --only-matching 'https*://[^"]*ogg' tuxjam.xml https://archive.org/download/tuxjam-121/tuxjam_121.ogg https://archive.org/download/tuxjam-120/TuxJam_120.ogg https://archive.org/download/tux-jam-119/TuxJam_119.ogg https://archive.org/download/tuxjam_118/tuxjam_118.ogg https://archive.org/download/tux-jam-117-uncut/TuxJam_117.ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://archive.org/download/tuxjam_116/tuxjam_116.ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://t

Om Podcasten

Hacker Public Radio is an podcast that releases shows every weekday Monday through Friday. Our shows are produced by the community (you) and can be on any topic that are of interest to hackers and hobbyists.