[osuosl-openpower] MAINTENANCE: OpenStack cluster server moves: Mar 8, 9 & 12

Lance Albertson lance at osuosl.org
Mon Mar 12 23:32:40 UTC 2018


It looks like we need to replace the back plane on the server. I'm hoping
the part will arrive tomorrow so we can get the system back online then.
I'll send another update once I have something new.

On Mon, Mar 12, 2018 at 2:52 PM, Lance Albertson <lance at osuosl.org> wrote:

> The move for openpower6 has hit an hardware issue and I'm needing to call
> IBM support to try and resolve it. I don't have any ETA on when this server
> will come back online at the moment.
>
> On Fri, Mar 9, 2018 at 3:18 PM, Lance Albertson <lance at osuosl.org> wrote:
>
>> The move for openpower5 has been completed. Please let me know if any VMs
>> are still unreachable.
>>
>>
>> On Fri, Mar 9, 2018 at 10:03 AM, Lance Albertson <lance at osuosl.org>
>> wrote:
>>
>>> The move for openpower3 has been completed. Please let me know if any
>>> VMs are still unreachable.
>>>
>>>
>>> On Thu, Mar 8, 2018 at 5:25 PM, Lance Albertson <lance at osuosl.org>
>>> wrote:
>>>
>>>> The move for openpower2 has been completed. Sorry it took a little
>>>> longer than planned. Please let me know if any VMs are still unreachable.
>>>>
>>>> On Thu, Mar 8, 2018 at 11:15 AM, Lance Albertson <lance at osuosl.org>
>>>> wrote:
>>>>
>>>>> The move for openpower1 has been completed and all VMs should be
>>>>> booting up or already should be back online that were on that hypervisor.
>>>>> Please let us know if you have an issue with one of your VMs. We'll be
>>>>> moving openpower2 later this afternoon as planned.
>>>>>
>>>>> Thanks-
>>>>>
>>>>> On Tue, Mar 6, 2018 at 2:36 PM, Lance Albertson <lance at osuosl.org>
>>>>> wrote:
>>>>>
>>>>>> Service(s) affected:
>>>>>>
>>>>>> ​All VMs hosted on the OpenPOWER OpenStack cluster will be offline
>>>>>> for approximately 2-4 hours during each server move. In addition, any VMs
>>>>>> which have block storage attached to the affected nodes will have an outage.
>>>>>>
>>>>>> For a list of affected VMs per hypervisor node, please see the
>>>>>> following spreadsheet which includes the UUID for each instance as it
>>>>>> stands today. You can see what UUID your VM has by looking at the
>>>>>> /run/cloud-init/.instance-id file on your vm. In addition, if you're using
>>>>>> a block storage (cinder) volume, I have a sheet which shows the mappings by
>>>>>> UUID to the host.
>>>>>>>>>>>>  OpenStack Cluster Server Moves
>>>>>> <https://docs.google.com/a/osuosl.org/spreadsheets/d/15D3VE13chSn0jmGWpf5wsPsin6ex0B3I6FTwS74T5uY/edit?usp=drive_web>
>>>>>>>>>>>> Outage Window
>>>>>> ​s​
>>>>>> :
>>>>>>
>>>>>> ​openpower1​
>>>>>> ​Start:   Thu, Mar 8, 9:00AM PST (Thu Mar 8 1700 UTC)
>>>>>> End:    Thu, Mar 8, 11:00AM PST (Thu Mar 8 1900 UTC)
>>>>>>
>>>>>> ​openpower2
>>>>>> ​Start:   Thu, Mar 8, 3:00PM PST (Thu Mar 8 2300 UTC)
>>>>>> End:    Thu, Mar 8, 5:00PM PST (Fri Mar 9 0100 UTC)
>>>>>>
>>>>>> ​openpower3
>>>>>> ​Start:   Fri, Mar 9, 8:30AM PST (Fri Mar 9 1630 UTC)
>>>>>> End:    Fri Mar 9, 10:30AM PST (Fri Mar 9 1830 UTC)
>>>>>>
>>>>>> ​openpower5
>>>>>> ​Start:   Fri, Mar 9, 1:00PM PST (Fri Mar 9 2100 UTC)
>>>>>> End:    Fri Mar 9, 3:00PM PST (Fri Mar 9 2300 UTC)
>>>>>>
>>>>>> ​openpower6 (note DST change for us)
>>>>>> ​Start:   Mon, Mar 12, 1:00PM PDT (Fri Mar 9 2000 UTC)
>>>>>> End:    Mon Mar 12, 3:00PM PDT (Fri Mar 9 2200 UTC)
>>>>>>
>>>>>> Reason for outage:
>>>>>>
>>>>>> ​We are in the process of ​migrating the storage backend of the
>>>>>> cluster from local storage to using Ceph as a backend. The migration to
>>>>>> Ceph should improve I/O bandwidth and capacity and also provide more
>>>>>> flexibility with doing server maintenance since we can do live migrations
>>>>>> on VMs. Thanks to a donation from IBM, we have a new five node Ceph cluster
>>>>>> with 292TB of capacity including SSD's for journal caching. In addition,
>>>>>> we're going to be upgrading the networking layer from 1Gbps to 40Gbps due
>>>>>> to the use of Ceph thanks to several donations from Mellanox. Since we're
>>>>>> going to be incurring an outage for the server move, we wanted to do a few
>>>>>> other items as the same time to reduce additional outage times.
>>>>>>
>>>>>> The first phase of this migration includes the following (which this
>>>>>> outage covers):
>>>>>>
>>>>>> 1. Moving each compute server to a different rack closer to a
>>>>>> Mellanox 40G switch
>>>>>> 2. Installing and configuring a Mellanox 40G NIC card
>>>>>> 3. Upgrading the system firmware (which includes Meltdown/Spectre
>>>>>> fixes)
>>>>>> 4. Switching over to a 4.14 mainline kernel on the host to provide
>>>>>> better feature support on ppc64le (also provides fixes for Meltdown/Spectre)
>>>>>>
>>>>>> We have five compute nodes and we're planning on doing two sever
>>>>>> moves a day starting on Thursday of this week. We're going to need to bring
>>>>>> the nodes up and down several times so we'll be disabling the openstack
>>>>>> services on those nodes until the process is complete.
>>>>>>
>>>>>> The second phase of the migration will happen in a few weeks and
>>>>>> should only have per VM impacts while we migrate them over to the new Ceph
>>>>>> cluster. I'll send a separate announcement about that once we're ready for
>>>>>> that.
>>>>>>
>>>>>> If you have any questions or concerns please let me know directly via
>>>>>> email or IRC.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> --
>>>>>> Lance Albertson
>>>>>> Director
>>>>>> Oregon State University | Open Source Lab
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lance Albertson
>>>>> Director
>>>>> Oregon State University | Open Source Lab
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Lance Albertson
>>>> Director
>>>> Oregon State University | Open Source Lab
>>>>
>>>
>>>
>>>
>>> --
>>> Lance Albertson
>>> Director
>>> Oregon State University | Open Source Lab
>>>
>>
>>
>>
>> --
>> Lance Albertson
>> Director
>> Oregon State University | Open Source Lab
>>
>
>
>
> --
> Lance Albertson
> Director
> Oregon State University | Open Source Lab
>



-- 
Lance Albertson
Director
Oregon State University | Open Source Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/openpower/attachments/20180312/b29769cc/attachment-0001.html>


More information about the openpower mailing list