From lance at osuosl.org Fri Aug 6 04:29:44 2021 From: lance at osuosl.org (Lance Albertson) Date: Thu, 5 Aug 2021 21:29:44 -0700 Subject: [opencompute-hosting-announce] Unplanned Power Event In-Reply-To: References: Message-ID: FYI: It looks like we had another power event that impacted our primary data center along with our OpenCompute hosts in another datacenter. I'm taking a look to see what might be down but this time it seems to be not nearly as widespread. I don't think we had any issues with any of the OSL managed services. Please let me know if you do have any issues. Thanks- On Tue, Aug 3, 2021 at 3:26 PM Lance Albertson wrote: > I received an update on the issues we had in the primary data center. It > appears that there was a battery cell problem on one of the UPS's. Previous > to the outage OSU issued a Purchase Order for battery replacements and are > waiting for them to arrive to schedule the installation. The projected > arrival date for the batteries is September 10th. When they arrive we are > scheduling the install as a priority. > In the meantime, this may happen again however I did fix a few systems we > had issues with related to how their power was configured. > > If you have any questions or concerns please let me know. > > Thank you! > > On Sun, Aug 1, 2021 at 12:28 PM Lance Albertson wrote: > >> I got word that this outage was more campus wide which included impacting >> the OpenCompute hosts. I went through those hosts and ensured they are back >> online but let me know if I missed anything. >> >> OSU will be sending in a tech in a few days to see why the UPS didn't >> fail over properly in our primary datacenter which caused the power event. >> I'm also going to spot check a few hosts' power when I go in on Tuesday to >> ensure power is split properly between the power feeds. If you had any >> hosts that went down with dual power, please let me know ASAP so I can add >> it to the list of hosts to check. >> >> Thanks for your patience! >> >> On Sun, Aug 1, 2021 at 8:15 AM Lance Albertson wrote: >> >>> It seems as though we had an unplanned power event that happened in our >>> primary data center early this morning at 3:03AM PDT (1003 UTC) that >>> affected one of the two power feeds. Virtually every system that has a dual >>> power supply should have remained online. The one exception is some systems >>> located in a row that are only being fed by that power feed which include: >>> >>> - power8-aix >>> - pieta.debian.org >>> - gcc2-power8 >>> - All Buildbot/RTEMS systems >>> - gcc113 >>> - gcc114 >>> - gcc115 >>> - gcc116 >>> - gcc117 >>> - gcc118 >>> >>> I believe every system that we monitor should be back online but there >>> might be others we aren't monitoring that are still down. If that's the >>> case, please send an email to support and we'll take a look at it as soon >>> as possible. >>> >>> I'm still waiting to hear back about what happened and why it happened >>> and will pass that information along once I learn more. >>> >>> Thanks for your patience. >>> >>> -- >>> Lance Albertson >>> Director >>> Oregon State University | Open Source Lab >>> >> >> >> -- >> Lance Albertson >> Director >> Oregon State University | Open Source Lab >> > > > -- > Lance Albertson > Director > Oregon State University | Open Source Lab > -- Lance Albertson Director Oregon State University | Open Source Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: