On a recent project, we built an OLAP piece to satisfy some of the reporting requirements. The app was a Rails app, but since we used MS SQL Server, we just built the OLAP piece using standard MS tooling (SSIS, SSAS, etc.).

During implementation of the OLAP thing, I recognized that 32 bits was way overkill for some of the primary keys in the app database. For instance, there’s one table that had 6 rows when we deployed, and there are probably no more than 50, let alone 5000000, possible entries in that table, ever. So I took the liberty of “optimizing” the post-ETL database to use smaller, 16-bit primary keys.

Things ran just fine for almost six months.

Right before Christmas, I got an email from the DBA, saying that the processing job just started reporting an error:

… Description: Data conversion failed while converting column “id” (199) to column “small_id” (142).  The conversion returned status value 6 and status text “Conversion failed because the data value overflowed the specified type.”. …

It was obvious where that was coming from. At this point, I started asking myself, “Why in the world did I decide to truncate those integers?” I was a bit embarassed — such a n00b error.

But I was also curious why this problem happened so quickly. Like I said, the production database that we started with had max(id) == 6, and this table hardly ever got new rows. How did it get over 32767 so quickly? Did we have a weird migration? Or did the customer churn through that many tries at setting up a new value?

I found the answer while I was trying to set up a database to reproduce the error. I thought it was going to be a pain to get a value high enough into the PK, since PKs are usually a little tricky to insert arbitrarily. I decided the easy way to do it would be to bump the sequence up to 100,000 or so, and then use the app to generate some bad data. So the first step was to check what the next sequence value was.

dbcc checkident('things', noreseed)

Checking identity information: current identity value '1460103721', ...

Huh?

select max(id) from things

6

So how did a “6” get translated to “1460103721”?

Well, remember that this was a Rails app. To initially populate this table, we used a fixture. We loaded the fixture in a migration. There were several iterations of the initial data, so several migrations loaded several fixtures on top of each other. Early in the project, we used Rails’s magic autonumbering fixtures for the initial data. Rails basically creates a hash of the fixture name to generate the id. Near the 1.0 release, I got paranoid about Rails generating an ID close enough to MAX_INT that the customer wouldn’t be able to add a reasonable number of new entries to the table, so we switched to hand-numbered fixtures. Evidently, the last fixture load (the one with max(id) == 6) should have also reset the sequence so the next value would be 7, not 1460103722.

Looking through the tables now, it looks like most of them are on the order of 1,000,000,000 away from max. The closest one is the admin users table, which is at 2,126,185,674. So the app will only support 21,297,973 more admin users. I think I’m comfortable with that.

Anyway, back to my main point: Don’t truncate data! Even if you think you’re being all slick and saving all kinds of space (I was saving 500 KB!) don’t do it!

On the server that I set up a couple days ago, there’s a rails app running with passenger on ubuntu 8.10. It’s using SQL Server as its database.

Today, I got a report from a user that it was serving up 500 errors. Not good.

Looking at log/production.log I found that the DB connection had gone stale. (The actual error was “08S01 (0) [unixODBC][FreeTDS][SQL Server]Communication link failure”.) Evidently, patch Tuesday happened this week (as evidenced by the OpenNMS notifications in my inbox and the windows update dialog on my PC about needing to RESTART NOW), so the SQL cluster had been rebooted.

I restarted the app, which resolved the problem, but wasn’t looking forward to future restarts due to SQL cluster activity. I wanted to make it so the app would not fail ever due to a stale DB connection. Could I tell passenger to restart if it got a 500 error? Could I tell rails to retry if the connection is stale?

Thinking back to my recent JRuby on Rails work, I remembered a suggestion from Nick Sieger for releasing ActiveRecord connections. It’s more necessary when using a connection pool (i.e. when running on a J2EE server, and using a JDBC connection pool), but it should also keep my connections from going stale because, well, the connection won’t be around to go stale.