[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Mitigating human error in the SP
- Subject: Mitigating human error in the SP
- From: nanog at 85d5b20a518b8f6864949bd940457dc124746ddc.nosense.org (Mark Smith)
- Date: Tue, 2 Feb 2010 23:46:29 +1030
- In-reply-to: <[email protected]>
- References: <[email protected]>
On Mon, 1 Feb 2010 21:21:52 -0500
Chadwick Sorrell <mirotrem at gmail.com> wrote:
> Hello NANOG,
>
> Long time listener, first time caller.
>
> A recent organizational change at my company has put someone in charge
> who is determined to make things perfect. We are a service provider,
> not an enterprise company, and our business is doing provisioning work
> during the day. We recently experienced an outage when an engineer,
> troubleshooting a failed turn-up, changed the ethertype on the wrong
> port losing both management and customer data on said device. This
> isn't a common occurrence, and the engineer in question has a pristine
> track record.
>
Why didn't the customer have a backup link if their service was so
important to them and indirectly your upper management? If your
upper management are taking this problem that seriously, then your
*sales people* didn't do their job properly - they should be ensuring
that customers with high availability requirements have a backup link,
or aren't led to believe that the single-point-of-failure service will
be highly available.
> This outage, of a high profile customer, triggered upper management to
> react by calling a meeting just days after. Put bluntly, we've been
> told "Human errors are unacceptable, and they will be completely
> eliminated. One is too many."
>
If upper management don't understand that human error is a risk factor
that can't be completely eliminated, then I suggest "self-eliminating"
and find yourself a job somewhere else. The only way you'll avoid
human error having any impact on production services is to not change
anything - which pretty much means not having a job anyway ...
> I am asking the respectable NANOG engineers....
>
> What measures have you taken to mitigate human mistakes?
>
> Have they been successful?
>
> Any other comments on the subject would be appreciated, we would like
> to come to our next meeting armed and dangerous.
>
> Thanks!
> Chad
>