stdout.be

A blog about programming, information architecture and journalism

in coding, journalism

Git in the newsroom

Summary A quick overview on why journalists shouldn't take version control principles and metaphors too literally.

I have no idea what it is, but every writer or budding techie in the newspaper industry who stumbles on Git and GitHub, or any version control system really — enjoying some of that computational thinking, are we? — suddenly goes “Oh. My. God. We should write stories like you guys write code in Git, with forks and branches and commits and issue tracking and history.”

It’s one of those wonderful moments when different fields of study come together and cross-pollinate.

Applying some of the best practices of the IT world to journalism is actually a good (if half-baked) idea. But I sometimes fear people might be taking the version control metaphor a bit too literally, unaware of why you cannot simply use version control as-is in a journalistic context.

One

Version control systems are line-based. For narrative texts, that actually means paragraph-based. Not only does this make it difficult to find the actual changes in a piece of text, it also makes it vastly impractical to, say, merge a spelling fix you’ve done in a special branch with the master branch, because it’ll just overwrite the entire paragraph and all updates that may have happened since.

My blog is actually version-controlled. Feel free to check on GitHub how absolutely useless the diffs are when I update a post.

Two

To get the most out of version control, you’re supposed to make atomic commits, which means that every batch of changes you make should have one specific purpose and one specific purpose only. If you fix a typo, reorder a couple of paragraphs and change the title, those changes merit three different commits.

For code, atomic commits work really well and are hardly any trouble; you’ll have to commit your changes maybe every half an hour and force yourself not to switch to a different task too often. Reasonable enough. For editing prose, be prepared to do a commit including a descriptive message of your changes about every five seconds. Let’s see how long that stays fun for.

Three

Merging code can sometimes be a challenge, although it’s relatively painless in Git. You can easily isolate a block of code and transplant it onto other code. A sentence, however, is a fragile thing that can imply all sorts of things and needs to fit with the next and previous sentences and the paragraph in general.

You might want to haul over sentences, like a crucial fact you corrected in one branch like, say, your web edition of a story, to another branch , like, say, your longer in-the-works print edition. Merging at its best. But think about how time-intensive that would actually be: you have to cherry-pick exactly those commits you want to merge in. Not only that, you will never be able to avoid doing a bit of double work, because the two branches will likely be different enough that the same sentence or paragraph may make absolute sense in the one version but look weird in the other.

Code can take some manhandling, writing can’t. Copy-pasting and a light rewrite suddenly doesn’t seem so bad.

Four

Collaboration on stories is really really hard. Much harder than collaborating on code, even though that’s not always easy either. Look at how hard it is to work together on a piece even when you have the real-time feedback you get in a Google doc. It’s something you generally only want to do if you absolutely can’t avoid it, say, to get a bit of a turbo boost for breaking news.

Now imagine having to do that same collaborative writing exercise in total isolation of each other, like you’d do in a Git-based workflow. Then merge the result together and see what happens. Writing code alongside each other is more like writing different stories about the same thing rather than actually collaborating on a story, which is why it works, and why it won’t work for journalists or wiki authors.

It’s a metaphor, people!

Journalism may need version control, but it needs its own special kind of version control, and that’s something we haven’t invented yet.


2 comments

Have to disagree on one point, though I think you're generally right that git isn't a drop-in solution for prose-based journalism:

Journalism may need version control, but it needs its own special kind of version control, and that’s something we haven’t invented yet.

We actually have plenty of forms of version control: WordPress has built-in revisions, as does MediaWiki. Google Docs (and MS Word) can track changes, revert and show diffs.

What we don't have is distributed version control, where everyone has a complete history of whatever they're working on, but I'm not sure that's necessary and may actually run counter to goals of collaboration, which tends to mean people talking and working together in real time.

I think we can go a little bit more advanced with those forms of version control, though.

For example, if you have both a web and a print version of a story in your CMS, like in the example I gave above, you'd expect those to remain sort of logically grouped like a branch. However, once you've split them up, for WordPress or Drupal or Joomla or any CMS, they count as entirely separate stories, and if you make a change to one you won't even be informed that there are other versions you might like to change as well.

Also, some CMSes have log messages where you can detail the nature of your update, but others don't, even though logs can be a great help in seeing how a story evolved.

So those are two examples of how we could go about the question what a kick-ass version control for journalism should look like. Seems like an interesting avenue to explore, even though there are probably more important workflow-related challenges in the newsroom.