Brian MacMillan

10 The Human in Human Error

BM The Human in Human Error

The Human in Human Error

AKA Hoshi reboots the server

Intake: The nonsense virus is not stopped. I’m sent to Japan to investigate … Hoshi reboots the server.

The man was a bully.

I realized he had no ability to negotiate. All his life he’d given orders and they’d been taken. (One of the many ways being rich prepares you for being a boss).

The Human in Human Error

Mainframe computers are like boilers on a transatlantic steamship. They do a lot of work, but require constant maintenance, and when they blow up people get hurt.

Data flows down hill from mainframes, to AS400s and to unix, where it enters the internet, and Windows, through which it reaches our user’s desktops.

There is one exception in my world to this downhill flow of data from Mainframes to Windows. One of my systems actually sent data upstream to the AS400. In this instance it is a great idea from a cost perspective, but creates a flow chart that looks like it has a short circuit in it.

The mainframes are run by these men with white lab coats, clipboards, who segregate themselves from us not only through their dress, but also with their weird data models and weirder naming conventions.

It is more mediaeval than you would expect at the beating heart of the digital world. A guild. We mockingly call them the keepers of the sacred fire. Or data trolls, when we’re feeling mean.

˙

Many things only have meaning in context. Like all spoken words.

Computers, like people, have many types of memory. Likewise, the phrase reboot the server has many meanings depending on context.  Application servers, hardware servers, file servers. There is a big difference between restarting a program that runs a web page and rebooting a mainframe.

The biggest machines in the world, when I was growing up, were these pair of excavators in Athabaska Alberta. The Athabaska tar sands are so huge and flat that the bigger the excavator you can use the better. The limit on size, interestingly enough, was not mechanical but energetic. It took a noticeable amount of the entire electrical grid of northern Alberta to jump start one of these suckers. Turning one on dimmed the lights in in Edmonton.  I always think of this when I think of rebooting mainframes.

“Hoshi-san, are you ready to reboot the weblogic server.”

“The server? Yes. Are you sure you’re authorized?  Jun-san insists that told me that only Ansulm can authorize this. Shall I call him? Its 3 a.m. your time.”

“Just reboot the server. You’ve got the scripts, right.”

“Here I go.”

“Hoshi, I’ve just been kicked out of telnet.”

“Yes. As expected. I believe that you will have to restart the telnet application once the server comes up.”

This information is like a safe dropping on my head. “You rebooted the linux server tokfidops?”

“Yes.”

“I thought that you were rebooting the infohub server.”

“No. I tried that. It didn’t work. We just discussed this.”

“Hoshi, we’re in trouble.  This unix machine runs 5, maybe 10 systems. They are all down now.”

On queue the Asian pager on my batbelt begins to receive a stream of pages that do not stop.

“I think that you should be more concerned about the batch Patrick-san. We may have to start that from the beginning.”

The thing about the batch – a nightly reconciliation of London and New Trades – is that it has to happen before the next day starts. For a bunch of reasons.  Like positions reports. And compliance rules. And convention. And law. Most trades have to be completed no later than T+1.  Why so little slack? Because it’s a slippery slope. One day late compounds into 2 days, which in 1968 caused a paperwork crises that killed Wall Street and launched Ross Perot’s career, two outcomes no one wants to repeats.

And thus the batch going down is not unlike a boiler exploding on the Lusitania.  There is an outer shell of people with surface wounds. But near the damage, it’s a mess. Body parts and blood, probably mine and certainly Hoshi’s. Thank goodness we’re peers and I’m not his boss.

“Hoshi, I’ll take care of the batch from my end. You ensure that all of our systems have rebooted. When you’re done ping me. I’ll need your help.”

“I have already begun executing the reboot scripts.”

“Hoshi, do you realize that rebooting the mainframe was a mistake.”

“Now I do. I will almost certainly be fired. ”

sidebar-toc.php

No comments yet

ccc