1. 19 2月, 2015 2 次提交
  2. 18 2月, 2015 1 次提交
  3. 06 2月, 2015 1 次提交
  4. 16 12月, 2014 4 次提交
    • H
      IB/core: Implement support for MMU notifiers regarding on demand paging regions · 882214e2
      Haggai Eran 提交于
      * Add an interval tree implementation for ODP umems. Create an
        interval tree for each ucontext (including a count of the number of
        ODP MRs in this context, semaphore, etc.), and register ODP umems in
        the interval tree.
      * Add MMU notifiers handling functions, using the interval tree to
        notify only the relevant umems and underlying MRs.
      * Register to receive MMU notifier events from the MM subsystem upon
        ODP MR registration (and unregister accordingly).
      * Add a completion object to synchronize the destruction of ODP umems.
      * Add mechanism to abort page faults when there's a concurrent invalidation.
      
      The way we synchronize between concurrent invalidations and page
      faults is by keeping a counter of currently running invalidations, and
      a sequence number that is incremented whenever an invalidation is
      caught. The page fault code checks the counter and also verifies that
      the sequence number hasn't progressed before it updates the umem's
      page tables. This is similar to what the kvm module does.
      
      In order to prevent the case where we register a umem in the middle of
      an ongoing notifier, we also keep a per ucontext counter of the total
      number of active mmu notifiers. We only enable new umems when all the
      running notifiers complete.
      Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NShachar Raindel <raindel@mellanox.com>
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NYuval Dagan <yuvalda@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      882214e2
    • S
      IB/core: Add support for on demand paging regions · 8ada2c1c
      Shachar Raindel 提交于
      * Extend the umem struct to keep the ODP related data.
      * Allocate and initialize the ODP related information in the umem
        (page_list, dma_list) and freeing as needed in the end of the run.
      * Store a reference to the process PID struct in the ucontext.  Used to
        safely obtain the task_struct and the mm during fault handling,
        without preventing the task destruction if needed.
      * Add 2 helper functions: ib_umem_odp_map_dma_pages and
        ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses
        of specific pages of the umem (and, currently, pin them).
      * Support for page faults only - IB core will keep the reference on
        the pages used and call put_page when freeing an ODP umem
        area. Invalidations support will be added in a later patch.
      Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NShachar Raindel <raindel@mellanox.com>
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      8ada2c1c
    • S
      IB/core: Add flags for on demand paging support · 860f10a7
      Sagi Grimberg 提交于
      * Add a configuration option for enable on-demand paging support in
        the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a
        later patch, this configuration option will select the MMU_NOTIFIER
        configuration option to enable mmu notifiers.
      * Add a flag for on demand paging (ODP) support in the IB device capabilities.
      * Add a flag to request ODP MR in the access flags to reg_mr.
      * Fail registrations done with the ODP flag when the low-level driver
        doesn't support this.
      * Change the conditions in which an MR will be writable to explicitly
        specify the access flags.  This is to avoid making an MR writable just
        because it is an ODP MR.
      * Add a ODP capabilities to the extended query device verb.
      Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NShachar Raindel <raindel@mellanox.com>
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      860f10a7
    • E
      IB/core: Add support for extended query device caps · 5a77abf9
      Eli Cohen 提交于
      Add extensible query device capabilities verb to allow adding new features.
      ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to
      copy capability fields to be used by both ib_uverbs_query_device and
      ib_uverbs_ex_query_device.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      5a77abf9
  5. 14 10月, 2014 1 次提交
  6. 02 8月, 2014 1 次提交
  7. 20 1月, 2014 1 次提交
  8. 21 12月, 2013 4 次提交
  9. 18 11月, 2013 5 次提交
    • M
      IB/core: Re-enable create_flow/destroy_flow uverbs · 69ad5da4
      Matan Barak 提交于
      This commit reverts commit 7afbddfa ("IB/core: Temporarily disable
      create_flow/destroy_flow uverbs").  Since the uverbs extensions
      functionality was experimental for v3.12, this patch re-enables the
      support for them and flow-steering for v3.13.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      69ad5da4
    • Y
      IB/core: extended command: an improved infrastructure for uverbs commands · f21519b2
      Yann Droneaud 提交于
      Commit 400dbc96 ("IB/core: Infrastructure for extensible uverbs
      commands") added an infrastructure for extensible uverbs commands
      while later commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow
      through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
      using this new infrastructure.
      
      According to the commit 400dbc96, the purpose of this
      infrastructure is to support passing around provider (eg. hardware)
      specific buffers when userspace issue commands to the kernel, so that
      it would be possible to extend uverbs (eg. core) buffers independently
      from the provider buffers.
      
      But the new kernel command function prototypes were not modified to
      take advantage of this extension. This issue was exposed by Roland
      Dreier in a previous review[1].
      
      So the following patch is an attempt to a revised extensible command
      infrastructure.
      
      This improved extensible command infrastructure distinguish between
      core (eg. legacy)'s command/response buffers from provider
      (eg. hardware)'s command/response buffers: each extended command
      implementing function is given a struct ib_udata to hold core
      (eg. uverbs) input and output buffers, and another struct ib_udata to
      hold the hw (eg. provider) input and output buffers.
      
      Having those buffers identified separately make it easier to increase
      one buffer to support extension without having to add some code to
      guess the exact size of each command/response parts: This should make
      the extended functions more reliable.
      
      Additionally, instead of relying on command identifier being greater
      than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
      unused bits in command field: on the 32 bits provided by command
      field, only 6 bits are really needed to encode the identifier of
      commands currently supported by the kernel. (Even using only 6 bits
      leaves room for about 23 new commands).
      
      So this patch makes use of some high order bits in command field to
      store flags, leaving enough room for more command identifiers than one
      will ever need (eg. 256).
      
      The new flags are used to specify if the command should be processed
      as an extended one or a legacy one. While designing the new command
      format, care was taken to make usage of flags itself extensible.
      
      Using high order bits of the commands field ensure that newer
      libibverbs on older kernel will properly fail when trying to call
      extended commands. On the other hand, older libibverbs on newer kernel
      will never be able to issue calls to extended commands.
      
      The extended command header includes the optional response pointer so
      that output buffer length and output buffer pointer are located
      together in the command, allowing proper parameters checking. This
      should make implementing functions easier and safer.
      
      Additionally the extended header ensure 64bits alignment, while making
      all sizes multiple of 8 bytes, extending the maximum buffer size:
      
                                   legacy      extended
      
         Maximum command buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
        Maximum response buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
      
      For the purpose of doing proper buffer size accounting, the headers
      size are no more taken in account in "in_words".
      
      One of the odds of the current extensible infrastructure, reading
      twice the "legacy" command header, is fixed by removing the "legacy"
      command header from the extended command header: they are processed as
      two different parts of the command: memory is read once and
      information are not duplicated: it's making clear that's an extended
      command scheme and not a different command scheme.
      
      The proposed scheme will format input (command) and output (response)
      buffers this way:
      
      - command:
      
        legacy header +
        extended header +
        command data (core + hw):
      
          +----------------------------------------+
          | flags     |   00      00    |  command |
          |        in_words    |   out_words       |
          +----------------------------------------+
          |                 response               |
          |                 response               |
          | provider_in_words | provider_out_words |
          |                 padding                |
          +----------------------------------------+
          |                                        |
          .              <uverbs input>            .
          .              (in_words * 8)            .
          |                                        |
          +----------------------------------------+
          |                                        |
          .             <provider input>           .
          .          (provider_in_words * 8)       .
          |                                        |
          +----------------------------------------+
      
      - response, if present:
      
          +----------------------------------------+
          |                                        |
          .          <uverbs output space>         .
          .             (out_words * 8)            .
          |                                        |
          +----------------------------------------+
          |                                        |
          .         <provider output space>        .
          .         (provider_out_words * 8)       .
          |                                        |
          +----------------------------------------+
      
      The overall design is to ensure that the extensible infrastructure is
      itself extensible while begin more reliable with more input and bound
      checking.
      
      Note:
      
      The unused field in the extended header would be perfect candidate to
      hold the command "comp_mask" (eg. bit field used to handle
      compatibility).  This was suggested by Roland Dreier in a previous
      review[2].  But "comp_mask" field is likely to be present in the uverb
      input and/or provider input, likewise for the response, as noted by
      Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
      header.
      
      [1]:
      http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
      
      [2]:
      http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
      
      [3]:
      http://marc.info/?i=525C1149.6000701@mellanox.comSigned-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
      
      [ Convert "ret ? ret : 0" to the equivalent "ret".  - Roland ]
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      f21519b2
    • Y
      IB/core: Make uverbs flow structure use names like verbs ones · b68c9560
      Yann Droneaud 提交于
      This patch adds "flow" prefix to most of data structure added as part
      of commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow through
      uverbs") to keep those names in sync with the data structures added in
      commit 319a441d ("IB/core: Add receive flow steering support").
      
      It's just a matter of translating 'ib_flow' to 'ib_uverbs_flow'.
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: NRoland Dreier <roland@purestorage.com>
      b68c9560
    • Y
      IB/core: Rename 'flow' structs to match other uverbs structs · d82693da
      Yann Droneaud 提交于
      Commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow through
      uverbs") added public data structures to support receive flow
      steering.  The new structs are not following the 'uverbs' pattern:
      they're lacking the common prefix 'ib_uverbs'.
      
      This patch replaces ib_kern prefix by ib_uverbs.
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: NRoland Dreier <roland@purestorage.com>
      d82693da
    • M
      IB/core: clarify overflow/underflow checks on ib_create/destroy_flow · f8848274
      Matan Barak 提交于
      This patch fixes the following issues:
      
      1. Unneeded checks were removed
      
      2. Removed the fixed size out of flow_attr.size, thus simplifying the checks.
      
      3. Remove a 32bit hole on 64bit systems with strict alignment in
         struct ib_kern_flow_att by adding a reserved field.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      f8848274
  10. 16 11月, 2013 1 次提交
  11. 09 11月, 2013 1 次提交
  12. 22 10月, 2013 1 次提交
  13. 03 9月, 2013 1 次提交
  14. 29 8月, 2013 1 次提交
  15. 14 8月, 2013 1 次提交
  16. 09 7月, 2013 1 次提交
    • R
      IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() · da183c7a
      Roland Dreier 提交于
      The macro get_unused_fd() is used to allocate a file descriptor with
      default flags.  Those default flags (0) can be "unsafe": O_CLOEXEC must
      be used by default to not leak file descriptor across exec().
      
      Replace calls to get_unused_fd() in uverbs with calls to
      get_unused_fd_flags(O_CLOEXEC).  Inheriting uverbs fds across exec()
      cannot be used to do anything useful.
      
      Based on a patch/suggestion from Yann Droneaud <ydroneaud@opteya.com>.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      da183c7a
  17. 28 2月, 2013 1 次提交
  18. 23 2月, 2013 1 次提交
  19. 22 2月, 2013 1 次提交
  20. 27 9月, 2012 2 次提交
  21. 09 5月, 2012 3 次提交
    • O
      IB/core: Add raw packet QP type · c938a616
      Or Gerlitz 提交于
      IB_QPT_RAW_PACKET allows applications to build a complete packet,
      including L2 headers, when sending; on the receive side, the HW will
      not strip any headers.
      
      This QP type is designed for userspace direct access to Ethernet; for
      example by applications that do TCP/IP themselves.  Only processes
      with the NET_RAW capability are allowed to create raw packet QPs (the
      name "raw packet QP" is supposed to suggest an analogy to AF_PACKET /
      SOL_RAW sockets).
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      c938a616
    • R
      IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order · 5909ce54
      Roland Dreier 提交于
      Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
      objects needed during QP and SRQ creation in different orders
      depending on the the code path.  This leads to the (at least
      theoretical) possibility of deadlock, and triggers the lockdep splat
      below.
      
      Fix this by making sure we always lock the SRQ first, then CQs and
      finally the PD.
      
          ======================================================
          [ INFO: possible circular locking dependency detected ]
          3.4.0-rc5+ #34 Not tainted
          -------------------------------------------------------
          ibv_srq_pingpon/2484 is trying to acquire lock:
           (SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
      
          but task is already holding lock:
           (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #2 (CQ-uobj){+++++.}:
                 [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
                 [<ffffffff81384f28>] down_read+0x34/0x43
                 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
                 [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
                 [<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
                 [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
                 [<ffffffff810fe47f>] vfs_write+0xa7/0xee
                 [<ffffffff810fe65f>] sys_write+0x45/0x69
                 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
      
          -> #1 (PD-uobj){++++++}:
                 [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
                 [<ffffffff81384f28>] down_read+0x34/0x43
                 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
                 [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
                 [<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
                 [<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
                 [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
                 [<ffffffff810fe47f>] vfs_write+0xa7/0xee
                 [<ffffffff810fe65f>] sys_write+0x45/0x69
                 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
      
          -> #0 (SRQ-uobj){+++++.}:
                 [<ffffffff81070898>] __lock_acquire+0xa29/0xd06
                 [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
                 [<ffffffff81384f28>] down_read+0x34/0x43
                 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
                 [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
                 [<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
                 [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
                 [<ffffffff810fe47f>] vfs_write+0xa7/0xee
                 [<ffffffff810fe65f>] sys_write+0x45/0x69
                 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
      
          other info that might help us debug this:
      
          Chain exists of:
            SRQ-uobj --> PD-uobj --> CQ-uobj
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(CQ-uobj);
                                         lock(PD-uobj);
                                         lock(CQ-uobj);
            lock(SRQ-uobj);
      
           *** DEADLOCK ***
      
          3 locks held by ibv_srq_pingpon/2484:
           #0:  (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
           #1:  (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
           #2:  (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
      
          stack backtrace:
          Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
          Call Trace:
           [<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
           [<ffffffff81070898>] __lock_acquire+0xa29/0xd06
           [<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
           [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
           [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
           [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
           [<ffffffff81070eee>] ? lock_release+0x166/0x189
           [<ffffffff81384f28>] down_read+0x34/0x43
           [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
           [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
           [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
           [<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
           [<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
           [<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
           [<ffffffff810d470f>] ? might_fault+0x40/0x90
           [<ffffffff810d470f>] ? might_fault+0x40/0x90
           [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
           [<ffffffff810fe47f>] vfs_write+0xa7/0xee
           [<ffffffff810ff736>] ? fget_light+0x3b/0x99
           [<ffffffff810fe65f>] sys_write+0x45/0x69
           [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
      Reported-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      5909ce54
    • R
      IB/uverbs: Make lockdep output more readable · 3bea57a5
      Roland Dreier 提交于
      Add names for our lockdep classes, so instead of having to decipher
      lockdep output with mysterious names:
      
          Chain exists of:
            key#14 --> key#11 --> key#13
      
      lockdep will give us something nicer:
      
          Chain exists of:
            SRQ-uobj --> PD-uobj --> CQ-uobj
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3bea57a5
  22. 28 1月, 2012 1 次提交
  23. 05 1月, 2012 1 次提交
  24. 04 1月, 2012 1 次提交
    • E
      IB/uverbs: Protect QP multicast list · e214a0fe
      Eli Cohen 提交于
      Userspace verbs multicast attach/detach operations on a QP are done
      while holding the rwsem of the QP for reading.  That's not sufficient
      since a reader lock allows more than one reader to acquire the
      lock.  However, multicast attach/detach does list manipulation that
      can corrupt the list if multiple threads run in parallel.
      
      Fix this by acquiring the rwsem as a writer to serialize attach/detach
      operations.  Add idr_write_qp() and put_qp_write() to encapsulate
      this.
      
      This fixes oops seen when running applications that perform multicast
      joins/leaves.
      
      Reported by: Mike Dubman <miked@mellanox.com>
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      e214a0fe
  25. 14 10月, 2011 2 次提交