With a Corn Cob Pyp
The other day, Dr. Drang ruminated on different sorting possibilities based on a post by T.J. Luoma. The doctor pulled out all the stops with a bunch of command-line kung-fu. It’s impressive, but as I confessed to him over Twitter, I find awk mystifying. Things like awk fall in to the same category as sed or regex, where it looks like someone fell asleep on their keyboard, but then magic happens. Spoiler Alert! Dr. Drang ultimately composes a python script to accomplish the task. I asked if he has ever tried pyp - a python-based command-line tool I’m quite fond of.
@joesteel Thanks! I haven’t used pyp because I think Python and one-liners are antithetical.
— Dr. Drang (@drdrang) May 4, 2014
This, of course, led me to ask why he would go through all those other hoops, many of them inscrutable. Particularly since python has built-in support for one-liners (semicolons, list comprehensions, lambdas):
@joesteel I am large. I contain multitudes.
— Dr. Drang (@drdrang) May 4, 2014
Well then! Can’t argue with that.
Arguing With That
I learned about pyp through work, though I had no interaction with the developers. It is particularly well-suited to slicing, and dicing standardized directories to pick out tokens and sort them. I mostly gravitate toward pyp because it uses a syntax I’m familar with in a lazy way — uh, laziness is total alien to me, sure.
I copied Doc’s example list:
foo.tjluoma.com
a.luo.ma
bar.luo.ma
b.tjluoma.com
leancrew.com
drdrang.com
daringfireball.net
6by6.5by5.fm
4by4.5by5.fm
5by5.fm
tjluoma.com
atp.fm
wordpress.com
wordpress.net
wordpress.co
The nice thing about pyp is that you can mash out a few little tests fast. Each time you add a pipe you’re modifying the incoming stream, filtering a bit at a time until you get what you’re looking for. There is no jumping to a text editor, nor the need to run the python console.
Pyp provides you with p
, which represents each of line of the input; and pp
(giggle) which represents the array including each line.
Start off simply with your input, in this case: drangs_domains.txt
. Then read it in with the standard unix command, cat
. Pipe it to pyp, just like you were going to grep for directories or some such. This is where it gets interesting because you start to form your own little chain of python commands that operate one after the other on the preceding input.
cat drangs_domains.txt | pyp "p.split('.')"
That accomplishes the split.
cat drangs_domains.txt | pyp "p.split('.')[::-1]"
I’m using python’s slicing abilities to reverse each line. Now I’ll just skip ahead:
cat drangs_domains.txt | pyp "' '.join(p.split('.')[::-1]) + ' ' + p |pp.sort()"
Pyp even outputs the result as colorful terminal output. Each line starts with an array number, if I need it for reference, or further slicing. To print it as plain-jane text, just add | p
to the end and it’ll print the whole thing without the numbers and color.
You can also just do the easy thing and add, | p.split(' ')[1]
.
Some might criticize relying on slices as the same kind of inscrutable stuff as awk, but in this case, pyp helpfully prints out what selectors you can use after each of your splits. So you don’t know which index you want? Run it without specifying, then run it again when you got the number. So what?
I hope someday Dr. Drang and I will be able to bridge our differences.
Get it? I said bridge because…
Category: text