WTF Was That?! (Part 1)

“I’m a reasonable guy. But, I’ve just experienced some very unreasonable things.”

–Jack Burton, “Big Trouble In Little China” (1986)

You haven’t done professional software development if you haven’t had moments that make you step back and question your sanity, the sanity of your team, or perhaps reality. These are just some stories we’ve encountered over the years. Enjoy!

“But it’s durable, right?”

Years ago I worked on a system that needed to synchronize user information from one set of servers to a different set of servers. Occasionally the data would stop synchronizing and someone had to take a look at where the problem lied. Once I’d been at the company long enough one of these bugs came my way and I had to troubleshoot the issue. As I dug into things I saw references to odd things in configuration data, like “SMTP server” and “mail fetch time interval.” The way that the system worked is that the remote servers dumped the user’s data into a file when they logged out. There was a daemon that polled the file system for changes, and would pick up the files deposited into that directory. It would then connect to an SMTP server and create an email and attach the exported data file. It would then send an email to a pre-configured address on an internal mail server. The remote server had its own daemon that would poll the mailbox, loop through incoming email messages, grab the attachment from each one, and dump it into a monitored directory where another daemon would poll for files being added. This daemon was responsible for the last-mile processing of the user’s data and made it available on a portal. When I expressed … skepticism? … concern? … bewilderment? … I received a snarky remark of “It’s clunky, but the messages are durable, right?” They weren’t being ironic. This wasn’t during the 80’s. Or 90’s. Or even the early 00’s.

“We’re using Perl. No, I’m not joking.”

Once upon a time I worked at a company where we were working on a major new version of one of our flagship products. There had been a major change to the data formats across many systems and we found we needed to update thousands of files to match the new data formats. The files were all text based so we decided that Perl would be a fast and reliable language to use for this mass conversion. When we told one of our colleagues that we were going to use Perl they laughed as they thought we were joking. You see, Perl had gone out of vogue and was considered old and ancient. I told him “No, I’m not joking.” Despite Perl being archaic and out of favor it got the job done beautifully. In less than two weeks we had scripts that ran in seconds. But wait, two weeks for Perl! Clearly that shows how cringe it is. That’s with analysis of data format changes, coming up with a plan, writing code with unit tests, extensive testing, and then migrating the files over. Oh yeah, with documentation.

The NoSQL SQL Pattern

Once I was at a company where one of our teams was having performance issues with their database queries. Everyone was surprised and frustrated by this since it was in an area that just kept track of simple counters. I asked the Engineering Manager to look into things so I could better understand what was happening. Well, it turns out that this particular bit of code had been originally written in a different language than the current system. The code had simply been literally re-implemented in the new language without any consideration if it made sense. Why would this bother me so much? You see, the original system used a NoSQL store and stored a json document with counters. Whenever something happened the values of the document would be updated accordingly. The new system had since done away with the NoSQL store and had moved to a traditional RDBMS. However, when they ported this, they created a text column. This meant that the json document was serialized into text and then inserted into the database. Whenever a counter needed to be updated the code fetched the string from the database, deserialized it into a json document, increased the appropriate counter by 1, and would then serialize the json document back into a string, and then update the database with the string. This happened frequently. Everyone on the team, from the Product Manager down to the Engineers, complained about how slow this particular functionality was, but no one did anything about it. It literally took me going “WTF!” and pushing for this to be addressed as tech debt before it was cleaned up.

“I know this! This is [git]!”

There are moments in your life where you wonder how it is they let you anywhere near a keyboard and then pay you for it. This is one such moment. I had joined a company that was using git for its Version Control System (VCS). I had only used non-distributed VCSs up to this point in my career, e.g. Subversion, CVS, ClearCase, etc.) and I knew nothing about git other than its name and that it was distributed and decentralized. I was accustomed to performing all my version control operations via the built-in client of my IDE. Lucky for me they had integration with git! Right? Right. One day I went to check in commit some code and there was an error. I had no clue what it meant, but I had the option of forcing the change. Always a good idea! Right? Right. The forced change went through beautifully. I was so proud of myself. A few minutes later I get a message from an engineer on the team asking me, “Hey, did you mean to delete 5,000 commits worth of history?” I very much did not mean to do that. I answered no, and described to him what I was trying to do. I was so embarrassed. Luckily for me several engineers had local copies that had everything except my change, so they were able to restore the repository to where it should have been. Whew! Kids, this is why you should always learn to use a tool on the command-line first, so you can really be sure you know what it’s doing and how it works without any crutches from a GUI.

Of Well-meaning HR Violations

Long before the world had thought to add “CD” to “CI,” “breaking the build” was something that happened frequently in web application development. If you had a solid team you just needed to discuss the importance of why people should test the build locally and poke at things before checking in their code. On teams with a wide distribution of talent things were rockier. In one company we found ourselves with the team grinding to a halt due to multiple build breakages per day. In case you’re not a software engineer, if someone checks in a bad build that eventually trips everyone up. You see, part of the development workflow is to update your local copy of code from the remote server before you check things in to make sure it’s all going to work. If the server copy is broken then once you update your local copy will no longer work. In order to encourage the engineers to be more rigorous about their updates we instituted a policy: anyone that broke the build had to wear an 18″ tall conical cap with the word “DUNCE” written in large block print on it. The effects of this were pretty staggering: within a week build breakages went from multiple times per day to almost 0 per week. I’m proud to say I only had to wear the cap once during the year that the project lasted. (Never mind that I was one of the few people working on the actual build scripts and project configuration and would therefore be more statistically likely to break the build). However, it turns out that when HR found out what we were doing they were pissed off not happy; something about hazing rituals and some other nonsense.

Let’s get creative

Startups provide an amazing atmosphere to learn on an accelerated schedule. Furthermore, they provide a great platform for wearing many hats and develop a great sense of ownership. They’re also incredibly creative at squeezing the most out of the last round of funding. In the early 2000’s, prior to virtualization, on-prem and public cloud, bare bones was the modus operandi. After migrating our servers (all hilariously named after Star Trek characters) to a data center, the CTO and founder was unhappy with the monthly bills. So we decided to do a data center migration to one of the rooms in our office. Think about an average 1970’s non-descript and not-up-to-code small office building. Think about a room with poor ventilation, terrible electric work and giant windows facing the relentless sun for a good portion of the day. That became our data center for dozens of servers, a massive database and other production equipment. No more pesky data center bill! Needless to say, we had multiple outages due to power spikes, thanks to the electrical load we were placing on the poor building. We pushed through those as part of “the cost of doing business”. But low and behold, one hot summer day, our online service (and main source of revenue) came to a full stop. As part of troubleshooting, we noticed that the database refused to come back online. After many attempts and waiting for the vendor to return our call, we decided to take the server case apart. Upon close inspection, we were shocked to realize that the hard drive head MELTED onto the disk. Needless to say, we migrated back to a proper data center a few weeks after that. Getting creative is cool and all but at some point you need to pause and question the sanity of your choices. Lesson learned!


1
+ posts

I am Cranky Old, born four hundred years ago in the Highlands of Scotland. I am Immortal and I am not alone.

I come from a dimension where Engineers are thoughtful, mature and take their craft seriously. I've been sent to this reality to fight a cosmic battle to ensure that technology always works. What I've learned: you die a good Engineer or live long enough to bury others in your tech debt

One thought on “WTF Was That?! (Part 1)

Comments are closed.