CMOS #4: Tom Hudson - gron for making JSON greppable
Interview with Tom Hudson about gron, the command line tool that makes JSON greppable.
Tom Hudson
Links
- PHP
- Go lang
- The Etherpad (was Etherpad light) project.
- The PHP client for Etherpad.
- Node.js
- The window manager Tom mentioned: goomwm written by Sean Pringle.
- Blog post on fixing a segfault in it despite not knowing C: Debugging a segfault in goomwwm
- gron
- JSON
- AWK
- Extended Backus–Naur Form (EBNF)
- The errors package written by Dave Cheney.
- The jq tool for manipulating JSON.
- Some
advanced
examples of gronning and ungronning
Transcript
Other than that, I've made some small contributioins here and there. I made a fix to the Window Manager that I use, one called Get Out of My Way Window Manager, written by a guy called Sean Pringle, it's a really great lightweight tiling Window Manager, or rather floating Window Manager, with a great support for tiling. Just small bits here and there, usually when I find something that's broken, and I need it to work, I'll try to fix it.
Debugging a segfault in goomwwm.It's on tomnomnom.com. And again, I don't know C so it's probably laughable to anyone who does, but it's mostly an adventure in finding a problem and fixing it for me.
So you don't get any context to go with that value, it can be really difficult to reason about the structure of that JSON, so originally about, I think it's about three, four years ago, I wrote gron in PHP and the idea was to take a JSON structure and output it as a series of individual assignments but as valid JavaScript, so you would end up with something like JSON.city.name = Leeds, for example.
And that means if you were grepping for that value, Leeds, you can see the whole path, all the way through. And you just get that context that allows you to see, what code do I actually need to write to access this value, what things do I need to traverse over? So mostly I wrote it because I needed it, but it didn't really gain any traction, initially, I think mostly because I'd written it in PHP.
The main feedback I had from people was, well, I'm writing Node.JS, or something like that. Or even, I'm writing C# or Perl. I don't have PHP installed, so I can't use this thing. It wasn't of use to that many people.
So Go produces statically-linked binaries, by default, which means I can build for different operating systems, and just upload a binary and people can just download it and run it. It just works. So I kind of took it from there, started adding some more features, made it a bit more robust, a bit more user-friendly, and initially it was all just about turning the JSON into these discrete assignments, which at some point became a verb, to gron.
So JSON gets gronned, is the official way of putting it now. I then decided that, for the tool to be really powerful, maybe a little bit more than just exploration, it'd be really great if you could do the other, go in the other direction, which has become ungronning or norging, some people would have it put. There was a bit of debate about that early on.
But that means that you can alter the structure of the data in its intermediate state, when it's a list of assignments, with things like grep and sed and awk, if you like, and then turn the result back in to JSON again. I mean, it's not the kind of thing you should really be relying on in scripts, but when you just need like a quick fix for something, hacking on the command-line, it turns out to be actually pretty powerful.
Someone had already come up with that format for me, I just needed to make it work. The fact that it is executable JavaScript is almost more of a curiosity in some ways but it means that I could get away with not defining the grammar properly initially. I just sort of said, the grammar is anything that's valid JavaScript, which was maybe a bit too vague, I have now started to define the grammar properly, particularly when I've been dealing with some bugs and things on the ungronning phase.
Which is fun, it's a real experience for me learning the EBNF, I think it's Extended Backus-Naur Form, I think, so I'm self-educated, so I didn't do computer science or compiler theory or anything, so it's all a lot of intensive Googling to figure it out. But yeah, I think there's an example in the README, you can pipe the output into a .js file on console that's got log on the end of it, and it will output the object.
And then before the ungronning mode existed, I suppose that was kind of useful because you could do the things where you would grep -b and remove certain statements or you'd set it to change the paths in things and then you could pipe into a JavaScript file to get it back into JSON. But it was always a bit flakey, because you need every step of the way, so your top-level statement has to sort of say, well, this equals an empty object and only then can you refer to properties of that object.
Whereas, when gron does its own gron process, it can imply all of that stuff from a single statement and you don't need those preceding statements to say how things are set up, if that makes sense.
couldn't parse the input statements.Which is pretty useless, particularly if you're putting a half a megabyte of input into it. Somewhere in here is an error.
But really I think my lexer that does the ungronning needs some attention, in terms of actually giving the user some context in terms of what character was it that caused the lexer to choke or what exactly it was that was unexpected so that people can figure out what their problem is a bit more easily.
So that needs some work, there are a few bits of the code, I think, just needs refactoring, where I've set things globally because it's easy and really it makes it a bit difficult to test because I've got to change global state when I'm in the test to make things work and I'd like to change how the adding close to the gron output works, it's kind of added in-line as the statements are built-up at the moment.
But then I need to do a sort, to make sure things are in the right order and an actual or human sort, as well, not just sort of a standard or less than or greater than, but in order to do that, I have to strip the callers back out of the statements again, which is a bit of a pain. It's a bit inefficient, I mean performance has never really been a primary goal of the tool but if it's unbearably slow, then people aren't going to use it.
I'm not too keen on adding many more features to it. I mean the whole idea was that the tool would let you use things that you already knew, like grep and sed and awk, to get things done. So, one of the common questions I get is, why didn't I just use jq? So jq is a tool for manipulating JSON and it's an amazing tool, it's really, really powerful, does a lot more things than gron, but it's only really useful if you're already able to understand the structure of the JSON itself.
So if you know the path to the key that you want, there's not much in the way of discovery there. I mean, it'll do pretty printing and things like that but again, if you've got 0.5 megabytes of JSON and you grep for something and it's eight levels deep, you're doing a lot of scrolling to figure out where that actually is. So really I see it as a complement to jq, which yeah, I think they can work together.
I think probably not like in a script or something like that but certainly, I find myself using gron to figure out what the structure of the JSON is, and then I'll probably use something like jq to do the actual transformations unless I'm feeling lazy. And then I'll just use that.
Published on 2016-09-06