[osuosl-openpower] Ongoing VM network connectivity issues since Pike upgrade

Lance Albertson lance at osuosl.org
Fri Jun 7 17:07:08 UTC 2019


All,

I wanted to send you yet another update regarding this networking issue. I
have been unable to find a solution to the problem and have decided to move
forward with the upgrade to Queens to see if the issue is resolved. I will
be sending an email shortly regarding when we'll do this upgrade.

Thanks-

On Tue, May 21, 2019 at 12:16 PM Lance Albertson <lance at osuosl.org> wrote:

> All,
>
> I wanted to send you an update on where we are at on this issue. So far
> I've narrowed down the problem to happening when a VM using a private
> network is removed causing certain iptable rules on the hypervisor to get
> out of order. It only seems to effect inbound connections to the VM as
> outbound seems to still work. I haven't been able to easily reproduce the
> issue unfortunately which makes it difficult to troubleshoot. I've looked
> through the source code and also looked online to see if anyone else had
> run into this without success.
>
> I've rebooted all of the hypervisors on our x86 cluster and two on our ppc
> cluster (which was needed for the MDS updates). So far on the nodes that
> have been rebooted we haven't seen any issues, but I need to let those run
> for a few days to verify that theory. These machines were also due for a
> reboot also because of the CentOS 7.5 -> 7.6 upgrade so perhaps it's
> related to that.
>
> At any rate, I've deployed a temporary cronjob on the nodes that haven't
> been rebooted which should "fix" the networking issue. I have it set to run
> every minute so that the downtime should be minimal.
>
> I'll send another update as I have one.
>
> Thanks-
>
> On Thu, May 16, 2019 at 8:58 AM Lance Albertson <lance at osuosl.org> wrote:
>
>> All,
>>
>> Since the upgrade to Pike we've noticed virtual machines suddenly losing
>> network connectivity. This issue seems to sometimes fix itself or when we
>> restart the  neutron-linuxbridge-agent service on the hypervisors. We
>> are doing our best to track down why this is happening and how to fix it.
>> Since we're not monitoring every host on the cluster, it's difficult for us
>> to know when it happens so if you do have a problem with one of your VMs,
>> please let us know either via IRC in #osuosl on Freenode, or via a support
>> email.
>>
>> I'll be sending further updates as we have them.
>>
>> Thanks for your patience!
>>
>> --
>> Lance Albertson
>> Director
>> Oregon State University | Open Source Lab
>>
>
>
> --
> Lance Albertson
> Director
> Oregon State University | Open Source Lab
>


-- 
Lance Albertson
Director
Oregon State University | Open Source Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/openpower/attachments/20190607/6c2cb71c/attachment.html>


More information about the openpower mailing list