Tuesday, September 1, 2009

Dotting the i in review

Any project with a code repository needs to be diligent with its revision history. Design decisions are made (and sometimes reversed), people move on, and the origin and rationale for artifacts in code needs preservation for posterity. Some tools and code review styles lend better than others for subsequent code forensics and traceability. While the RTC development style (Review-Then-Commit) offers reasonable benefits at ensuring immediate quality over CTR (Commit-Then-Review), it may cause loss of valuable information if iterations to a proposed change happen in a medium that doesn't preserve history.

Lets compare a few techniques.

CTR. Apache Commons.

In Apache Commons, committers change code at will. Luckily, we have a bunch of subscribers that look over the SVN diff emails that get sent out and some errors get caught due to community oversight and fixed in ensuing discussion. For bugs and fixes, the ASF JIRA is set up to annotate issues with relevant SVN commit messages containing the issue identifier.

JIRA showing commit messages

Not all commits are related to JIRA issues, and we haven't gotten into the habit of insisting the same, so often there are what I call standalone commits -- islands of activity that cannot be tied into the issue tracker, discussions or other project tools.

RTC. Apache Tomcat.

In Apache Tomcat, some parts of the repository need enough developer confidence before a change can be committed to trunk. A low overhead version of RTC is used. A text file (see the Tomcat 5.5.x STATUS.txt file) is used for bookkeeping. A code change is proposed with a pointer to the diff and opinions are collected via votes to proposed changes. When a change goes into trunk, the proposal in the STATUS.txt file is removed.

In some cases, patches are in the ASF Bugzilla so there is a better record of them. In other cases, patches are posted in committer's personal web spaces. If there is ensuing discussion and a patch is updated, multiple times even, that bit of the code evolution is not recorded for the most part at worst and harder to retrieve at best.

RTC++. Ubiquity XForms.

In the Ubiquity XForms project, we use what I refer to as RTC++. The two pluses stand for:

+ Effective use of tracker - Each change must be attached to an issue in the tracker. Each issue in the tracker must bear the correct status at all times (Accepted, Started, InReview, Resolved etc.) and pointers to all related commits must appear in the issue comments.

+ Commit stream as a doubly linked list - Every proposed change from a working copy must first be committed to SVN in the changes/ space. During the code review stage, any iteration in the changes/ space based on feedback during the review process must serve as a node in a doubly linked list like so: the commit message should point to the previous iteration which is being improved on (if any) and the code review space should contain a comment about the next iteration (or commit to trunk). All commit messages must contain the issue identifier.

While it may seem like a bit more process, its actually a trivial overhead to RTC when you get it, and having just spent a couple of hours on code forensics I'm personally glad for every occassion where it was used correctly.

For an actual example in action, see Mark's blog post using issue 515 I recently proposed as an example.