A few years ago, I read about natural distribution of leading digits in a set of natural numbers. The normal use of this rule is to differentiate between data sets with fabricated numbers and those with real numbers.

Today, I ended up with two sets of sixteen numbers, and was curious how the leading digits were spread out.

The first data set had values between 561 and 8224. The second had values between 39 and 576. The second set was a function of the first. The leading digit frequencies were as follows:

First Digit Frequency Benford’s Law
Set 1 Set 2 Total
1 3 8 11 34.4% 30.1%
2 5 4 9 28.1% 17.6%
3 4 1 5 15.6% 12.5%
4 0 0 0 0% 9.7%
5 1 2 3 9.4% 7.9%
6 0 1 1 3.1% 6.7%
7 1 0 1 3.1% 5.8%
8 1 0 1 3.1% 5.1%
9 1 0 1 3.1% 4.6%

I was impressed with how front-loaded that table is, and how closely it tracked with Benford’s law. There doesn’t seem to be any reason for “1″ or “2″ to be more common than, say, “4″ as a leading digit in either set, but in both cases “1″ and “2″ (22% of the leading digits) accounted for more than half of the leading digits (34% and 28% respectively).

I spoke last night at IndyALT.NET about source control systems. I had a pretty good time, and I’d like to thank the group for inviting me. For any who are interested, I put my slides on slideshare. The git repo graphs were generated by git-graph-objects.

I’m trying to get started with Ragel, and found a hello world example for ruby that got me past Go. Since the C# example was a bit different, I thought I’d share what I came up with.


 1 %%{
 2   machine hello;
 3   expr = ‘h’;
 4   main := expr @ { Console.Out.WriteLine("greetings!"); } ;
 5 }%%
 6
 7 using System;
 8
 9 namespace Mab.Test
10 {
11   public class Hello
12   {
13 %% write data;
14     public static void Main(string [] args)
15     {
16       foreach(var arg in args)
17       {
18         Console.WriteLine("***** " + arg + " ******");
19         Run(arg);
20       }
21     }
22
23     private static void Run(String data)
24     {
25       int cs;
26       int p = 0;
27       int pe = data.Length;
28       // init:
29       %% write init;
30       // exec:
31       %% write exec;
32     }
33   }
34 }

To see how it works, save the following code to Hello.rl and run these commands:

ragel -A Hello.rl
csc /t:exe /out:test.exe Hello.cs
test.exe a h z

I’m pretty impressed with Hudson. It’s very easy to administer, and has had a plugin available for everything I’ve wanted to do so far. Despite being from outside of the Windows world, it’s pretty easy to get VS stuff running in it.

However, one thing that would make it better is the ability to watch several source control sources, similar to CCNET’s Multi Source Control plugin. I’m not up for writing an entire plugin. (The one time I tried to do that, I was overwhelmed by the difficulty of creating a plugin from scratch for hudson as compared to CCNET.)

So I patched the Hudson TFS plugin to support multiple paths. So now, in addition to being able to use a simple source control path (e.g. $/myproject/some/directory) that’s mapped to the job’s workspace, you can map several TFS paths into hudson’s workspace (e.g. $/path1 : path1 ; $/path2/a/b/c/d : c\d). You can get this from the dev.java.net issue, github, or download a snapshot build I created.

Ruby objects have a nice-for-debugging method called inspect. Dumping objects isn’t something that I only ever want to do in Ruby. In fact, for the bug I’m looking at right now, I think it’s related to a settings object that doesn’t get updated, and the easiest way to watch the settings object is to Inspect().

So, I present, Inspect.cs (and tests).

I’m sure I’m not the first person to discover this, but it has come up twice in the past few days (once for the code review tool I maintain, and the other for a project-to-be-named later), so I thought I’d toss it out as interesting.

Let’s say you’re writing an app that needs to process TF changesets. You might pull some changesets with a history query …

changesets = VersionControl.QueryHistory("$/Project", VersionSpec.Latest, 0, RecursionType.Full, null, 1, VersionSpec.Latest, int.MaxValue, true, true, true);

… or you might grab the changeset directly …

changeset = VersionControl.GetChangeset(12345);

Regardless, you now have a shiny Changeset object that will tell you all about what happened at that point in time. So you start looking through the changes in the changeset, and you see some Adds, some Renames, some Deletes, some Rename | Edits, etc. So now you can present a list to the user with the things that got updated, one item per change in the changeset. Except…

I found that I need to pay attention to more than just the change type. For example, if you have a directory with a bunch of deleted files, and you move the directory, the changeset for the directory move will include a change with ChangeType.Rename for every file under that directory, including deleted files! The only indication you’ll get is that the change.Item will have a DeletionId value of something other than 0.

I saw this issue on the code review tool when a user asked why an apparent rename was giving a diff of two empty files. The answer was that the file had been deleted before the rename, so it didn’t really exist on either side of the rename. It was just dragged along. So, I should have checked DeletionId!

Also because of this behavior, the project-to-be-named-later ended up adding files to a directory that weren’t actually supposed to be there. Before I write the new version of files in the changeset, I check DeletionId… if it’s not 0, then I ignore the file.

So, if you’re consuming TF changesets, and you want to ignore files that were deleted in the past, be sure to check the change.Item.DeletionId property!

In my earlier post on getting a Ubuntu VM running on VPC, I neglected one important and usually insignificant detail: how to disable udev for the VM’s NICs.

The problem with udev is that it effectively disables the VM’s NIC under certain conditions. To understand why, it helps to make some observations about the various modules involved.

Udev is the device manager for Linux 2.6 systems. It is responsible for populating /dev with only the devices that are present. It provides notification to other software when devices are changed. And, most importantly for this blog post, it provides “persistent naming for devices when they move around the device tree.” So, if you move your USB cabling and routers around, your machine should still be able to identify, say, your printer. Or if you add a new NIC and your system detects it before your existing eth0, your old NIC will still be eth0 and your new one will be eth1. This is pretty awesome, and it helps resolve some of the problems I remember having with old PCs, trying to guess which device name maps to which physical device.

On Hyper-V (or Virtual Server, or Virtual PC), NICs are usually virtual, so they get assigned virtual MAC addresses. The default configuration is to let Windows pick the MAC address. These MAC addresses are typically stable, unless you do something like recreate the NIC or move the VM to a new host.

The scenario that this blog post is concerned about is this: you set up a VM on one host, and then you move it to another host. You might move it for any number of reasons: you set it up on VPC and want to permanently host it on Hyper-V, or your Hyper-V hosting situation changes and you need to migrate VMs around, or whatever. Following the move, your VM is no longer on the network! Bad news.

So, when you’re setting up your VM, you need to consider how udev will interact with Hyper-V. Some possible solutions include assigning a static MAC address to the NIC, or making udev smart enough to realize that if the old NIC went away and a new one is in its place, that it should use the new one in place of the old one. My solution is to make udev ignore eth*, since I usually am configuring a Linux VM to run some particular service, and I will not be changing the network configuration in any significant way. My solution is likely to be affected by upgrades because it modifies files that are generated by scripts.

And now, to the point of this post: how to make udev ignore your eth* devices:

  1. Open /etc/udev/rules.d/75-persistent-net-generator.rules in your favorite text editor.
  2. On line 21, where it says KERNEL!="eth*|ath*… remove eth*|. For example, on my system, I start with
    KERNEL!="eth*|ath*|wlan*[0-9]|ra*|sta*|ctc*|lcs*|hsi*", GOTO="persistent_net_generator_end"
    and end up with
    KERNEL!="ath*|wlan*[0-9]|ra*|sta*|ctc*|lcs*|hsi*", GOTO="persistent_net_generator_end"
  3. If udev has already renamed your nic, you’ll also need to open /etc/udev/rules.d/70-persistent-net.rules and remove any eth* entries you find.

This should work on any Linux with udev (kernel >= 2.6). Line 21 is the right one to change on Ubuntu 8.10 and 8.04.2 systems that I’ve configured this way.

Every once in a while, I need to merge something using clearcase’s command-line cleartool. It always takes me a few minutes to remember how exactly to tell Clearcase to do what I want, so I’m writing it down in hopes that I remember to look here next time I need this.

So, I ran “cleartool ci”, and got an error about the version I was trying to check in not having the right parent version. To fix, I ran

cleartool merge -to my_file.txt my_file.txt@@/main/LATEST

After that, my checkin worked just fine.

Whilst we may muse about engineering, architectural or craft metaphors for software development, there is no denying that, in essence, programming is writing. It is a form of communication that has two distinct audiences: us and the machine. Although there are times when it might not feel this way, the machine is easily pleased, demanding little more than well-formed code. We, however, are a little more complex and discerning: we demand that our communication communicate.

As a discipline of composition much of what can be said for writing natural language is directly applicable to code. There is no virtue in long-windedness and, by way of balance, there is also no virtue in code that is unreadably terse. As ever the appeal is for well-written code. Code that is simple and clear, brief and direct.

– Kevlin Henney in “Minimalism: Omit Needless Code

While reading the article quoted above, I was reminded of the first big leap I made professionally. Engaging the part of my brain that deals with composing and editing English prose enabled me to move from being an entry-level developer to a “Senior Software Engineer”.

The circumstance for my “aha” moment was the transition between two very different projects. I had just returned to the home office after being offsite for 14 months. My offsite assignment had been a testing job, during which I had written scripts for doing some test script prep and data analysis, and to maintain my sanity. I came back and started working on a .NET web service that served as an integration point between a legacy Visual C++ app, some legacy C++ code extracted from the app, and a perl/CGI/Oracle web application.

I was working on a team with a couple of Senior Engineers. My first task was to write a component that would handle the compression needs of the application. I wrote a class that abstracted zip files, and let you list a zip file’s contents, extract one or all of the files in the zip, add files to the zip, etc. Essentially, I wrote a subset of #ziplib. It was really sweet, and had 0 (that’s right, zero) unit tests.

Unfortunately, all the application needed was two functions: Zip and Unzip. It needed to add a single file to a zip at one end, and pull it out on the other end. Added to that, my code was generally sloppy. The code review was a bit painful.

During the project, there were many more code reviews. This was the first project that I was on that had code reviews, and I really hadn’t gotten the hang of it at that point.

When we were close to finished, one of the engineers asked me to review a paper he had written that gave guidance about exception handling. I like reviewing and marking up documents — I’m one of those people who thinks that bad grammar and bad spelling detract from the value of written things — so I had no problem printing out a copy and taking a red pen to it.

During the document review, something clicked for me. I realized that during code reviews, I needed to engage the same part of my brain that I used when reviewing prose. Code, like prose, tells a story. There is some reason that you’ve written the thing. You’re starting in one place, and going to another. You’re building arguments that you can fall back on later. You’re using the results of other works to build up something significant, at least in the current context. I was now able to use that part of my self that wanted to be a writer, and my professional growth took off.

I gave a presentation at work about couchdb. I put the slides on slideshare. The couchdb example is captured in this gist (more or less; I made a few changes to the views from what I talked about today). In the talk, I demonstrated all of the stuff in that script using either in browser with futon or on the command line with curl.

Next Page »