[Intel-wired-lan] [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC
alexander.duyck at gmail.com
Tue Dec 1 18:36:33 UTC 2015
On Tue, Dec 1, 2015 at 9:37 AM, Michael S. Tsirkin <mst at redhat.com> wrote:
> On Tue, Dec 01, 2015 at 09:04:32AM -0800, Alexander Duyck wrote:
>> On Tue, Dec 1, 2015 at 7:28 AM, Michael S. Tsirkin <mst at redhat.com> wrote:
>> > There are several components to this:
>> > - dma_map_* needs to prevent page from
>> > being migrated while device is running.
>> > For example, expose some kind of bitmap from guest
>> > to host, set bit there while page is mapped.
>> > What happens if we stop the guest and some
>> > bits are still set? See dma_alloc_coherent below
>> > for some ideas.
>> Yeah, I could see something like this working. Maybe we could do
>> something like what was done for the NX bit and make use of the upper
>> order bits beyond the limits of the memory range to mark pages as
>> I'm curious. What we have with a DMA mapped region is essentially
>> shared memory between the guest and the device. How would we resolve
>> something like this with IVSHMEM, or are we blocked there as well in
>> terms of migration?
> I have some ideas. Will post later.
I look forward to it.
>> > - dma_unmap_* needs to mark page as dirty
>> > This can be done by writing into a page.
>> > - dma_sync_* needs to mark page as dirty
>> > This is trickier as we can not change the data.
>> > One solution is using atomics.
>> > For example:
>> > int x = ACCESS_ONCE(*p);
>> > cmpxchg(p, x, x);
>> > Seems to do a write without changing page
>> > contents.
>> Like I said we can probably kill 2 birds with one stone by just
>> implementing our own dma_mark_clean() for x86 virtualized
>> I'd say we could take your solution one step further and just use 0
>> instead of bothering to read the value. After all it won't write the
>> area if the value at the offset is not 0.
> Really almost any atomic that has no side effect will do.
> atomic or with 0
> atomic and with ffffffff
> It's just that cmpxchg already happens to have a portable
I was originally thinking maybe an atomic_add with 0 would be the way
to go. Either way though we still are using a locked prefix and
having to dirty a cache line per page which is going to come at some
>> > - dma_alloc_coherent memory (e.g. device rings)
>> > must be migrated after device stopped modifying it.
>> > Just stopping the VCPU is not enough:
>> > you must make sure device is not changing it.
>> > Or maybe the device has some kind of ring flush operation,
>> > if there was a reasonably portable way to do this
>> > (e.g. a flush capability could maybe be added to SRIOV)
>> > then hypervisor could do this.
>> This is where things start to get messy. I was suggesting the
>> suspend/resume to resolve this bit, but it might be possible to also
>> deal with this via something like this via clearing the bus master
>> enable bit for the VF. If I am not mistaken that should disable MSI-X
>> interrupts and halt any DMA. That should work as long as you have
>> some mechanism that is tracking the pages in use for DMA.
> A bigger issue is recovering afterwards.
>> > In case you need to resume on source, you
>> > really need to follow the same path
>> > as on destination, preferably detecting
>> > device reset and restoring the device
>> > state.
>> The problem with detecting the reset is that you would likely have to
>> be polling to do something like that.
> We could some event to guest to notify it about this event
> through a new or existing channel.
> Or we could make it possible for userspace to trigger this,
> then notify guest through the guest agent.
The first thing that comes to mind would be to use something like PCIe
Advanced Error Reporting, however I don't know if we can put a
requirement on the system supporting the q35 machine type or not in
order to support migration.
>> I believe the fm10k driver
>> already has code like that in place where it will detect a reset as a
>> part of its watchdog, however the response time is something like 2
>> seconds for that. That was one of the reasons I preferred something
>> like hot-plug as that should be functioning as soon as the guest is up
>> and it is a mechanism that operates outside of the VF drivers.
> That's pretty minor.
> A bigger issue is making sure guest does not crash
> when device is suddenly reset under it's legs.
I know the ixgbevf driver should already have logic to address some of
that. If you look through the code there should be logic there for a
surprise removal support in ixgbevf. The only issue is that unlike
fm10k it will not restore itself after a resume or slot_reset call.
More information about the Intel-wired-lan