1. 19 8月, 2016 5 次提交
  2. 16 8月, 2016 1 次提交
  3. 13 8月, 2016 3 次提交
  4. 11 8月, 2016 1 次提交
  5. 09 8月, 2016 12 次提交
  6. 08 8月, 2016 1 次提交
  7. 04 8月, 2016 3 次提交
    • M
      Soft RoCE driver · 8700e3e7
      Moni Shoua 提交于
      Soft RoCE (RXE) - The software RoCE driver
      
      ib_rxe implements the RDMA transport and registers to the RDMA core
      device as a kernel verbs provider. It also implements the packet IO
      layer. On the other hand ib_rxe registers to the Linux netdev stack
      as a udp encapsulating protocol, in that case RDMA, for sending and
      receiving packets over any Ethernet device.  This yields a RDMA
      transport over the UDP/Ethernet network layer forming a RoCEv2
      compatible device.
      
      The configuration procedure of the Soft RoCE drivers requires
      binding to any existing Ethernet network device. This is done with
      /sys interface.
      
      A userspace Soft RoCE library (librxe) provides user applications
      the ability to run with Soft RoCE devices.  The use of rxe verbs ins
      user space requires the inclusion of librxe as a device specifics
      plug-in to libibverbs. librxe is packaged separately.
      
      Architecture:
      
           +-----------------------------------------------------------+
           |                          Application                      |
           +-----------------------------------------------------------+
                                  +-----------------------------------+
                                  |             libibverbs            |
      User                        +-----------------------------------+
                                  +----------------+ +----------------+
                                  | librxe         | | HW RoCE lib    |
                                  +----------------+ +----------------+
      +---------------------------------------------------------------+
           +--------------+                           +------------+
           | Sockets      |                           | RDMA ULP   |
           +--------------+                           +------------+
           +--------------+                  +---------------------+
           | TCP/IP       |                  | ib_core             |
           +--------------+                  +---------------------+
                                   +------------+ +----------------+
      Kernel                       | ib_rxe     | | HW RoCE driver |
                                   +------------+ +----------------+
           +------------------------------------+
           | NIC driver                         |
           +------------------------------------+
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           +-----------------------------------------------------------+
           |                          Application                      |
           +-----------------------------------------------------------+
                                  +-----------------------------------+
                                  |             libibverbs            |
      User                        +-----------------------------------+
                                  +----------------+ +----------------+
                                  | librxe         | | HW RoCE lib    |
                                  +----------------+ +----------------+
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           +--------------+                           +------------+
           | Sockets      |                           | RDMA ULP   |
           +--------------+                           +------------+
           +--------------+                  +---------------------+
           | TCP/IP       |                  | ib_core             |
           +--------------+                  +---------------------+
                                   +------------+ +----------------+
      Kernel                       | ib_rxe     | | HW RoCE driver |
                                   +------------+ +----------------+
           +------------------------------------+
           | NIC driver                         |
           +------------------------------------+
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Soft RoCE resources:
      
      [1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
      Github
      [2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
      Wiki page
      [3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library
      Signed-off-by: NKamal Heib <kamalh@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8700e3e7
    • A
      IB/core: Support for CMA multicast join flags · ab15c95a
      Alex Vesker 提交于
      Added UCMA and CMA support for multicast join flags. Flags are
      passed using UCMA CM join command previously reserved fields.
      Currently supporting two join flags indicating two different
      multicast JoinStates:
      
      1. Full Member:
         The initiator creates the Multicast group(MCG) if it wasn't
         previously created, can send Multicast messages to the group
         and receive messages from the MCG.
      
      2. Send Only Full Member:
         The initiator creates the Multicast group(MCG) if it wasn't
         previously created, can send Multicast messages to the group
         but doesn't receive any messages from the MCG.
      
         IB: Send Only Full Member requires a query of ClassPortInfo
             to determine if SM/SA supports this option. If SM/SA
             doesn't support Send-Only there will be no join request
             sent and an error will be returned.
      
         ETH: When Send Only Full Member is requested no IGMP join
      	will be sent.
      Signed-off-by: NAlex Vesker <valex@mellanox.com>
      Reviewed by: Hal Rosenstock <hal@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      ab15c95a
    • J
      modules: add ro_after_init support · 444d13ff
      Jessica Yu 提交于
      Add ro_after_init support for modules by adding a new page-aligned section
      in the module layout (after rodata) for ro_after_init data and enabling RO
      protection for that section after module init runs.
      Signed-off-by: NJessica Yu <jeyu@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      444d13ff
  8. 03 8月, 2016 4 次提交
  9. 02 8月, 2016 5 次提交
    • J
      vhost: new device IOTLB API · 6b1e6cc7
      Jason Wang 提交于
      This patch tries to implement an device IOTLB for vhost. This could be
      used with userspace(qemu) implementation of DMA remapping
      to emulate an IOMMU for the guest.
      
      The idea is simple, cache the translation in a software device IOTLB
      (which is implemented as an interval tree) in vhost and use vhost_net
      file descriptor for reporting IOTLB miss and IOTLB
      update/invalidation. When vhost meets an IOTLB miss, the fault
      address, size and access can be read from the file. After userspace
      finishes the translation, it writes the translated address to the
      vhost_net file to update the device IOTLB.
      
      When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq
      addresses set by ioctl are treated as iova instead of virtual address and
      the accessing can only be done through IOTLB instead of direct userspace
      memory access. Before each round or vq processing, all vq metadata is
      prefetched in device IOTLB to make sure no translation fault happens
      during vq processing.
      
      In most cases, virtqueues are contiguous even in virtual address space.
      The IOTLB translation for virtqueue itself may make it a little
      slower. We might add fast path cache on top of this patch.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      [mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ]
      [mst: fix build warnings ]
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      [ weiyj.lk: missing unlock on error ]
      Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
      6b1e6cc7
    • A
      VSOCK: Introduce vhost_vsock.ko · 433fc58e
      Asias He 提交于
      VM sockets vhost transport implementation.  This driver runs on the
      host.
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      433fc58e
    • A
      VSOCK: Introduce virtio_vsock_common.ko · 06a8fc78
      Asias He 提交于
      This module contains the common code and header files for the following
      virtio_transporto and vhost_vsock kernel modules.
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      06a8fc78
    • M
      virtio: new feature to detect IOMMU device quirk · 1a937693
      Michael S. Tsirkin 提交于
      The interaction between virtio and IOMMUs is messy.
      
      On most systems with virtio, physical addresses match bus addresses,
      and it doesn't particularly matter which one we use to program
      the device.
      
      On some systems, including Xen and any system with a physical device
      that speaks virtio behind a physical IOMMU, we must program the IOMMU
      for virtio DMA to work at all.
      
      On other systems, including SPARC and PPC64, virtio-pci devices are
      enumerated as though they are behind an IOMMU, but the virtio host
      ignores the IOMMU, so we must either pretend that the IOMMU isn't
      there or somehow map everything as the identity.
      
      Add a feature bit to detect that quirk: VIRTIO_F_IOMMU_PLATFORM.
      
      Any device with this feature bit set to 0 needs a quirk and has to be
      passed physical addresses (as opposed to bus addresses) even though
      the device is behind an IOMMU.
      
      Note: it has to be a per-device quirk because for example, there could
      be a mix of passed-through and virtual virtio devices. As another
      example, some devices could be implemented by an out of process
      hypervisor backend (in case of qemu vhost, or vhost-user) and so support
      for an IOMMU needs to be coded up separately.
      
      It would be cleanest to handle this in IOMMU core code, but that needs
      per-device DMA ops. While we are waiting for that to be implemented, use
      a work-around in virtio core.
      
      Note: a "noiommu" feature is a quirk - add a wrapper to make
      that clear.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      1a937693
    • S
      KVM: PPC: Introduce KVM_CAP_PPC_HTM · 23528bb2
      Sam Bobroff 提交于
      Introduce a new KVM capability, KVM_CAP_PPC_HTM, that can be queried to
      determine if a PowerPC KVM guest should use HTM (Hardware Transactional
      Memory).
      
      This will be used by QEMU to populate the pa-features bits in the
      guest's device tree.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      23528bb2
  10. 01 8月, 2016 1 次提交
  11. 27 7月, 2016 4 次提交
    • M
      zsmalloc: page migration support · 48b4800a
      Minchan Kim 提交于
      This patch introduces run-time migration feature for zspage.
      
      For migration, VM uses page.lru field so it would be better to not use
      page.next field which is unified with page.lru for own purpose.  For
      that, firstly, we can get first object offset of the page via runtime
      calculation instead of using page.index so we can use page.index as link
      for page chaining instead of page.next.
      
      In case of huge object, it stores handle to page.index instead of next
      link of page chaining because huge object doesn't need to next link for
      page chaining.  So get_next_page need to identify huge object to return
      NULL.  For it, this patch uses PG_owner_priv_1 flag of the page flag.
      
      For migration, it supports three functions
      
      * zs_page_isolate
      
      It isolates a zspage which includes a subpage VM want to migrate from
      class so anyone cannot allocate new object from the zspage.
      
      We could try to isolate a zspage by the number of subpage so subsequent
      isolation trial of other subpage of the zpsage shouldn't fail.  For
      that, we introduce zspage.isolated count.  With that, zs_page_isolate
      can know whether zspage is already isolated or not for migration so if
      it is isolated for migration, subsequent isolation trial can be
      successful without trying further isolation.
      
      * zs_page_migrate
      
      First of all, it holds write-side zspage->lock to prevent migrate other
      subpage in zspage.  Then, lock all objects in the page VM want to
      migrate.  The reason we should lock all objects in the page is due to
      race between zs_map_object and zs_page_migrate.
      
        zs_map_object				zs_page_migrate
      
        pin_tag(handle)
        obj = handle_to_obj(handle)
        obj_to_location(obj, &page, &obj_idx);
      
      					write_lock(&zspage->lock)
      					if (!trypin_tag(handle))
      						goto unpin_object
      
        zspage = get_zspage(page);
        read_lock(&zspage->lock);
      
      If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can be
      stale by migration so it goes crash.
      
      If it locks all of objects successfully, it copies content from old page
      to new one, finally, create new zspage chain with new page.  And if it's
      last isolated subpage in the zspage, put the zspage back to class.
      
      * zs_page_putback
      
      It returns isolated zspage to right fullness_group list if it fails to
      migrate a page.  If it find a zspage is ZS_EMPTY, it queues zspage
      freeing to workqueue.  See below about async zspage freeing.
      
      This patch introduces asynchronous zspage free.  The reason to need it
      is we need page_lock to clear PG_movable but unfortunately, zs_free path
      should be atomic so the apporach is try to grab page_lock.  If it got
      page_lock of all of pages successfully, it can free zspage immediately.
      Otherwise, it queues free request and free zspage via workqueue in
      process context.
      
      If zs_free finds the zspage is isolated when it try to free zspage, it
      delays the freeing until zs_page_putback finds it so it will free free
      the zspage finally.
      
      In this patch, we expand fullness_list from ZS_EMPTY to ZS_FULL.  First
      of all, it will use ZS_EMPTY list for delay freeing.  And with adding
      ZS_FULL list, it makes to identify whether zspage is isolated or not via
      list_empty(&zspage->list) test.
      
      [minchan@kernel.org: zsmalloc: keep first object offset in struct page]
        Link: http://lkml.kernel.org/r/1465788015-23195-1-git-send-email-minchan@kernel.org
      [minchan@kernel.org: zsmalloc: zspage sanity check]
        Link: http://lkml.kernel.org/r/20160603010129.GC3304@bbox
      Link: http://lkml.kernel.org/r/1464736881-24886-12-git-send-email-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48b4800a
    • M
      mm: balloon: use general non-lru movable page feature · b1123ea6
      Minchan Kim 提交于
      Now, VM has a feature to migrate non-lru movable pages so balloon
      doesn't need custom migration hooks in migrate.c and compaction.c.
      
      Instead, this patch implements the page->mapping->a_ops->
      {isolate|migrate|putback} functions.
      
      With that, we could remove hooks for ballooning in general migration
      functions and make balloon compaction simple.
      
      [akpm@linux-foundation.org: compaction.h requires that the includer first include node.h]
      Link: http://lkml.kernel.org/r/1464736881-24886-4-git-send-email-minchan@kernel.orgSigned-off-by: NGioh Kim <gi-oh.kim@profitbricks.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1123ea6
    • P
      tipc: dump monitor attributes · cf6f7e1d
      Parthasarathy Bhuvaragan 提交于
      In this commit, we dump the monitor attributes when queried.
      The link monitor attributes are separated into two kinds:
      1. general attributes per bearer
      2. specific attributes per node/peer
      This style resembles the socket attributes and the nametable
      publications per socket.
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf6f7e1d
    • P
      tipc: get monitor threshold for the cluster · bf1035b2
      Parthasarathy Bhuvaragan 提交于
      In this commit, we add support to fetch the configured
      cluster monitoring threshold.
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf1035b2