1. 18 10月, 2018 5 次提交
  2. 11 10月, 2018 1 次提交
    • L
      PCI/P2PDMA: Support peer-to-peer memory · 52916982
      Logan Gunthorpe 提交于
      Some PCI devices may have memory mapped in a BAR space that's intended for
      use in peer-to-peer transactions.  To enable such transactions the memory
      must be registered with ZONE_DEVICE pages so it can be used by DMA
      interfaces in existing drivers.
      
      Add an interface for other subsystems to find and allocate chunks of P2P
      memory as necessary to facilitate transfers between two PCI peers:
      
        struct pci_dev *pci_p2pmem_find[_many]();
        int pci_p2pdma_distance[_many]();
        void *pci_alloc_p2pmem();
      
      The new interface requires a driver to collect a list of client devices
      involved in the transaction then call pci_p2pmem_find() to obtain any
      suitable P2P memory.  Alternatively, if the caller knows a device which
      provides P2P memory, they can use pci_p2pdma_distance() to determine if it
      is usable.  With a suitable p2pmem device, memory can then be allocated
      with pci_alloc_p2pmem() for use in DMA transactions.
      
      Depending on hardware, using peer-to-peer memory may reduce the bandwidth
      of the transfer but can significantly reduce pressure on system memory.
      This may be desirable in many cases: for example a system could be designed
      with a small CPU connected to a PCIe switch by a small number of lanes
      which would maximize the number of lanes available to connect to NVMe
      devices.
      
      The code is designed to only utilize the p2pmem device if all the devices
      involved in a transfer are behind the same PCI bridge.  This is because we
      have no way of knowing whether peer-to-peer routing between PCIe Root Ports
      is supported (PCIe r4.0, sec 1.3.1).  Additionally, the benefits of P2P
      transfers that go through the RC is limited to only reducing DRAM usage
      and, in some cases, coding convenience.  The PCI-SIG may be exploring
      adding a new capability bit to advertise whether this is possible for
      future hardware.
      
      This commit includes significant rework and feedback from Christoph
      Hellwig.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      [bhelgaas: fold in fix from Keith Busch <keith.busch@intel.com>:
      https://lore.kernel.org/linux-pci/20181012155920.15418-1-keith.busch@intel.com,
      to address comment from Dan Carpenter <dan.carpenter@oracle.com>, fold in
      https://lore.kernel.org/linux-pci/20181017160510.17926-1-logang@deltatee.com]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      52916982
  3. 14 9月, 2018 5 次提交
    • M
      xen/gntdev: fix up blockable calls to mn_invl_range_start · 58a57569
      Michal Hocko 提交于
      Patch series "mmu_notifiers follow ups".
      
      Tetsuo has noticed some fallouts from 93065ac7 ("mm, oom: distinguish
      blockable mode for mmu notifiers").  One of them has been fixed and picked
      up by AMD/DRM maintainer [1].  XEN issue is fixed by patch 1.  I have also
      clarified expectations about blockable semantic of invalidate_range_end.
      Finally the last patch removes MMU_INVALIDATE_DOES_NOT_BLOCK which is no
      longer used nor needed.
      
      [1] http://lkml.kernel.org/r/20180824135257.GU29735@dhcp22.suse.cz
      
      This patch (of 3):
      
      93065ac7 ("mm, oom: distinguish blockable mode for mmu notifiers") has
      introduced blockable parameter to all mmu_notifiers and the notifier has
      to back off when called in !blockable case and it could block down the
      road.
      
      The above commit implemented that for mn_invl_range_start but both
      in_range checks are done unconditionally regardless of the blockable mode
      and as such they would fail all the time for regular calls.  Fix this by
      checking blockable parameter as well.
      
      Once we are there we can remove the stale TODO.  The lock has to be
      sleepable because we wait for completion down in gnttab_unmap_refs_sync.
      
      Link: http://lkml.kernel.org/r/20180827112623.8992-2-mhocko@kernel.org
      Fixes: 93065ac7 ("mm, oom: distinguish blockable mode for mmu notifiers")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      58a57569
    • J
      xen: fix GCC warning and remove duplicate EVTCHN_ROW/EVTCHN_COL usage · 4dca864b
      Josh Abraham 提交于
      This patch removes duplicate macro useage in events_base.c.
      
      It also fixes gcc warning:
      variable ‘col’ set but not used [-Wunused-but-set-variable]
      Signed-off-by: NJoshua Abraham <j.abraham1776@gmail.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      4dca864b
    • O
      xen: avoid crash in disable_hotplug_cpu · 3366cdb6
      Olaf Hering 提交于
      The command 'xl vcpu-set 0 0', issued in dom0, will crash dom0:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 7 PID: 65 Comm: xenwatch Not tainted 4.19.0-rc2-1.ga9462db-default #1 openSUSE Tumbleweed (unreleased)
      Hardware name: Intel Corporation S5520UR/S5520UR, BIOS S5500.86B.01.00.0050.050620101605 05/06/2010
      RIP: e030:device_offline+0x9/0xb0
      Code: 77 24 00 e9 ce fe ff ff 48 8b 13 e9 68 ff ff ff 48 8b 13 e9 29 ff ff ff 48 8b 13 e9 ea fe ff ff 90 66 66 66 66 90 41 54 55 53 <f6> 87 d8 02 00 00 01 0f 85 88 00 00 00 48 c7 c2 20 09 60 81 31 f6
      RSP: e02b:ffffc90040f27e80 EFLAGS: 00010203
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff8801f3800000 RSI: ffffc90040f27e70 RDI: 0000000000000000
      RBP: 0000000000000000 R08: ffffffff820e47b3 R09: 0000000000000000
      R10: 0000000000007ff0 R11: 0000000000000000 R12: ffffffff822e6d30
      R13: dead000000000200 R14: dead000000000100 R15: ffffffff8158b4e0
      FS:  00007ffa595158c0(0000) GS:ffff8801f39c0000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000002d8 CR3: 00000001d9602000 CR4: 0000000000002660
      Call Trace:
       handle_vcpu_hotplug_event+0xb5/0xc0
       xenwatch_thread+0x80/0x140
       ? wait_woken+0x80/0x80
       kthread+0x112/0x130
       ? kthread_create_worker_on_cpu+0x40/0x40
       ret_from_fork+0x3a/0x50
      
      This happens because handle_vcpu_hotplug_event is called twice. In the
      first iteration cpu_present is still true, in the second iteration
      cpu_present is false which causes get_cpu_device to return NULL.
      In case of cpu#0, cpu_online is apparently always true.
      
      Fix this crash by checking if the cpu can be hotplugged, which is false
      for a cpu that was just removed.
      
      Also check if the cpu was actually offlined by device_remove, otherwise
      leave the cpu_present state as it is.
      
      Rearrange to code to do all work with device_hotplug_lock held.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      3366cdb6
    • M
      xen/balloon: add runtime control for scrubbing ballooned out pages · 197ecb38
      Marek Marczykowski-Górecki 提交于
      Scrubbing pages on initial balloon down can take some time, especially
      in nested virtualization case (nested EPT is slow). When HVM/PVH guest is
      started with memory= significantly lower than maxmem=, all the extra
      pages will be scrubbed before returning to Xen. But since most of them
      weren't used at all at that point, Xen needs to populate them first
      (from populate-on-demand pool). In nested virt case (Xen inside KVM)
      this slows down the guest boot by 15-30s with just 1.5GB needed to be
      returned to Xen.
      
      Add runtime parameter to enable/disable it, to allow initially disabling
      scrubbing, then enable it back during boot (for example in initramfs).
      Such usage relies on assumption that a) most pages ballooned out during
      initial boot weren't used at all, and b) even if they were, very few
      secrets are in the guest at that time (before any serious userspace
      kicks in).
      Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also
      enabled by default), controlling default value for the new runtime
      switch.
      Signed-off-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      197ecb38
    • V
      xen/manage: don't complain about an empty value in control/sysrq node · 87dffe86
      Vitaly Kuznetsov 提交于
      When guest receives a sysrq request from the host it acknowledges it by
      writing '\0' to control/sysrq xenstore node. This, however, make xenstore
      watch fire again but xenbus_scanf() fails to parse empty value with "%c"
      format string:
      
       sysrq: SysRq : Emergency Sync
       Emergency Sync complete
       xen:manage: Error -34 reading sysrq code in control/sysrq
      
      Ignore -ERANGE the same way we already ignore -ENOENT, empty value in
      control/sysrq is totally legal.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      87dffe86
  4. 13 9月, 2018 8 次提交
  5. 12 9月, 2018 20 次提交
  6. 11 9月, 2018 1 次提交