[osuosl-openpower] MAINTENANCE: OpenStack cluster server moves: Mar 8, 9 & 12

Lance Albertson lance at osuosl.org
Fri Mar 9 01:25:58 UTC 2018


The move for openpower2 has been completed. Sorry it took a little longer
than planned. Please let me know if any VMs are still unreachable.

On Thu, Mar 8, 2018 at 11:15 AM, Lance Albertson <lance at osuosl.org> wrote:

> The move for openpower1 has been completed and all VMs should be booting
> up or already should be back online that were on that hypervisor. Please
> let us know if you have an issue with one of your VMs. We'll be moving
> openpower2 later this afternoon as planned.
>
> Thanks-
>
> On Tue, Mar 6, 2018 at 2:36 PM, Lance Albertson <lance at osuosl.org> wrote:
>
>> Service(s) affected:
>>
>> ​All VMs hosted on the OpenPOWER OpenStack cluster will be offline for
>> approximately 2-4 hours during each server move. In addition, any VMs which
>> have block storage attached to the affected nodes will have an outage.
>>
>> For a list of affected VMs per hypervisor node, please see the following
>> spreadsheet which includes the UUID for each instance as it stands today.
>> You can see what UUID your VM has by looking at the
>> /run/cloud-init/.instance-id file on your vm. In addition, if you're using
>> a block storage (cinder) volume, I have a sheet which shows the mappings by
>> UUID to the host.
>>>>  OpenStack Cluster Server Moves
>> <https://docs.google.com/a/osuosl.org/spreadsheets/d/15D3VE13chSn0jmGWpf5wsPsin6ex0B3I6FTwS74T5uY/edit?usp=drive_web>
>>>> Outage Window
>> ​s​
>> :
>>
>> ​openpower1​
>> ​Start:   Thu, Mar 8, 9:00AM PST (Thu Mar 8 1700 UTC)
>> End:    Thu, Mar 8, 11:00AM PST (Thu Mar 8 1900 UTC)
>>
>> ​openpower2
>> ​Start:   Thu, Mar 8, 3:00PM PST (Thu Mar 8 2300 UTC)
>> End:    Thu, Mar 8, 5:00PM PST (Fri Mar 9 0100 UTC)
>>
>> ​openpower3
>> ​Start:   Fri, Mar 9, 8:30AM PST (Fri Mar 9 1630 UTC)
>> End:    Fri Mar 9, 10:30AM PST (Fri Mar 9 1830 UTC)
>>
>> ​openpower5
>> ​Start:   Fri, Mar 9, 1:00PM PST (Fri Mar 9 2100 UTC)
>> End:    Fri Mar 9, 3:00PM PST (Fri Mar 9 2300 UTC)
>>
>> ​openpower6 (note DST change for us)
>> ​Start:   Mon, Mar 12, 1:00PM PDT (Fri Mar 9 2000 UTC)
>> End:    Mon Mar 12, 3:00PM PDT (Fri Mar 9 2200 UTC)
>>
>> Reason for outage:
>>
>> ​We are in the process of ​migrating the storage backend of the cluster
>> from local storage to using Ceph as a backend. The migration to Ceph should
>> improve I/O bandwidth and capacity and also provide more flexibility with
>> doing server maintenance since we can do live migrations on VMs. Thanks to
>> a donation from IBM, we have a new five node Ceph cluster with 292TB of
>> capacity including SSD's for journal caching. In addition, we're going to
>> be upgrading the networking layer from 1Gbps to 40Gbps due to the use of
>> Ceph thanks to several donations from Mellanox. Since we're going to be
>> incurring an outage for the server move, we wanted to do a few other items
>> as the same time to reduce additional outage times.
>>
>> The first phase of this migration includes the following (which this
>> outage covers):
>>
>> 1. Moving each compute server to a different rack closer to a Mellanox
>> 40G switch
>> 2. Installing and configuring a Mellanox 40G NIC card
>> 3. Upgrading the system firmware (which includes Meltdown/Spectre fixes)
>> 4. Switching over to a 4.14 mainline kernel on the host to provide better
>> feature support on ppc64le (also provides fixes for Meltdown/Spectre)
>>
>> We have five compute nodes and we're planning on doing two sever moves a
>> day starting on Thursday of this week. We're going to need to bring the
>> nodes up and down several times so we'll be disabling the openstack
>> services on those nodes until the process is complete.
>>
>> The second phase of the migration will happen in a few weeks and should
>> only have per VM impacts while we migrate them over to the new Ceph
>> cluster. I'll send a separate announcement about that once we're ready for
>> that.
>>
>> If you have any questions or concerns please let me know directly via
>> email or IRC.
>>
>> Thanks!
>>
>> --
>> Lance Albertson
>> Director
>> Oregon State University | Open Source Lab
>>
>
>
>
> --
> Lance Albertson
> Director
> Oregon State University | Open Source Lab
>



-- 
Lance Albertson
Director
Oregon State University | Open Source Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/openpower/attachments/20180308/0dbce86e/attachment-0001.html>


More information about the openpower mailing list