1. 20 1月, 2014 1 次提交
  2. 21 12月, 2013 4 次提交
  3. 18 11月, 2013 5 次提交
    • M
      IB/core: Re-enable create_flow/destroy_flow uverbs · 69ad5da4
      Matan Barak 提交于
      This commit reverts commit 7afbddfa ("IB/core: Temporarily disable
      create_flow/destroy_flow uverbs").  Since the uverbs extensions
      functionality was experimental for v3.12, this patch re-enables the
      support for them and flow-steering for v3.13.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      69ad5da4
    • Y
      IB/core: extended command: an improved infrastructure for uverbs commands · f21519b2
      Yann Droneaud 提交于
      Commit 400dbc96 ("IB/core: Infrastructure for extensible uverbs
      commands") added an infrastructure for extensible uverbs commands
      while later commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow
      through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
      using this new infrastructure.
      
      According to the commit 400dbc96, the purpose of this
      infrastructure is to support passing around provider (eg. hardware)
      specific buffers when userspace issue commands to the kernel, so that
      it would be possible to extend uverbs (eg. core) buffers independently
      from the provider buffers.
      
      But the new kernel command function prototypes were not modified to
      take advantage of this extension. This issue was exposed by Roland
      Dreier in a previous review[1].
      
      So the following patch is an attempt to a revised extensible command
      infrastructure.
      
      This improved extensible command infrastructure distinguish between
      core (eg. legacy)'s command/response buffers from provider
      (eg. hardware)'s command/response buffers: each extended command
      implementing function is given a struct ib_udata to hold core
      (eg. uverbs) input and output buffers, and another struct ib_udata to
      hold the hw (eg. provider) input and output buffers.
      
      Having those buffers identified separately make it easier to increase
      one buffer to support extension without having to add some code to
      guess the exact size of each command/response parts: This should make
      the extended functions more reliable.
      
      Additionally, instead of relying on command identifier being greater
      than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
      unused bits in command field: on the 32 bits provided by command
      field, only 6 bits are really needed to encode the identifier of
      commands currently supported by the kernel. (Even using only 6 bits
      leaves room for about 23 new commands).
      
      So this patch makes use of some high order bits in command field to
      store flags, leaving enough room for more command identifiers than one
      will ever need (eg. 256).
      
      The new flags are used to specify if the command should be processed
      as an extended one or a legacy one. While designing the new command
      format, care was taken to make usage of flags itself extensible.
      
      Using high order bits of the commands field ensure that newer
      libibverbs on older kernel will properly fail when trying to call
      extended commands. On the other hand, older libibverbs on newer kernel
      will never be able to issue calls to extended commands.
      
      The extended command header includes the optional response pointer so
      that output buffer length and output buffer pointer are located
      together in the command, allowing proper parameters checking. This
      should make implementing functions easier and safer.
      
      Additionally the extended header ensure 64bits alignment, while making
      all sizes multiple of 8 bytes, extending the maximum buffer size:
      
                                   legacy      extended
      
         Maximum command buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
        Maximum response buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
      
      For the purpose of doing proper buffer size accounting, the headers
      size are no more taken in account in "in_words".
      
      One of the odds of the current extensible infrastructure, reading
      twice the "legacy" command header, is fixed by removing the "legacy"
      command header from the extended command header: they are processed as
      two different parts of the command: memory is read once and
      information are not duplicated: it's making clear that's an extended
      command scheme and not a different command scheme.
      
      The proposed scheme will format input (command) and output (response)
      buffers this way:
      
      - command:
      
        legacy header +
        extended header +
        command data (core + hw):
      
          +----------------------------------------+
          | flags     |   00      00    |  command |
          |        in_words    |   out_words       |
          +----------------------------------------+
          |                 response               |
          |                 response               |
          | provider_in_words | provider_out_words |
          |                 padding                |
          +----------------------------------------+
          |                                        |
          .              <uverbs input>            .
          .              (in_words * 8)            .
          |                                        |
          +----------------------------------------+
          |                                        |
          .             <provider input>           .
          .          (provider_in_words * 8)       .
          |                                        |
          +----------------------------------------+
      
      - response, if present:
      
          +----------------------------------------+
          |                                        |
          .          <uverbs output space>         .
          .             (out_words * 8)            .
          |                                        |
          +----------------------------------------+
          |                                        |
          .         <provider output space>        .
          .         (provider_out_words * 8)       .
          |                                        |
          +----------------------------------------+
      
      The overall design is to ensure that the extensible infrastructure is
      itself extensible while begin more reliable with more input and bound
      checking.
      
      Note:
      
      The unused field in the extended header would be perfect candidate to
      hold the command "comp_mask" (eg. bit field used to handle
      compatibility).  This was suggested by Roland Dreier in a previous
      review[2].  But "comp_mask" field is likely to be present in the uverb
      input and/or provider input, likewise for the response, as noted by
      Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
      header.
      
      [1]:
      http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
      
      [2]:
      http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
      
      [3]:
      http://marc.info/?i=525C1149.6000701@mellanox.comSigned-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
      
      [ Convert "ret ? ret : 0" to the equivalent "ret".  - Roland ]
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      f21519b2
    • Y
      IB/core: Make uverbs flow structure use names like verbs ones · b68c9560
      Yann Droneaud 提交于
      This patch adds "flow" prefix to most of data structure added as part
      of commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow through
      uverbs") to keep those names in sync with the data structures added in
      commit 319a441d ("IB/core: Add receive flow steering support").
      
      It's just a matter of translating 'ib_flow' to 'ib_uverbs_flow'.
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: NRoland Dreier <roland@purestorage.com>
      b68c9560
    • Y
      IB/core: Rename 'flow' structs to match other uverbs structs · d82693da
      Yann Droneaud 提交于
      Commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow through
      uverbs") added public data structures to support receive flow
      steering.  The new structs are not following the 'uverbs' pattern:
      they're lacking the common prefix 'ib_uverbs'.
      
      This patch replaces ib_kern prefix by ib_uverbs.
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: NRoland Dreier <roland@purestorage.com>
      d82693da
    • M
      IB/core: clarify overflow/underflow checks on ib_create/destroy_flow · f8848274
      Matan Barak 提交于
      This patch fixes the following issues:
      
      1. Unneeded checks were removed
      
      2. Removed the fixed size out of flow_attr.size, thus simplifying the checks.
      
      3. Remove a 32bit hole on 64bit systems with strict alignment in
         struct ib_kern_flow_att by adding a reserved field.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      f8848274
  4. 16 11月, 2013 1 次提交
  5. 09 11月, 2013 1 次提交
  6. 22 10月, 2013 1 次提交
  7. 03 9月, 2013 1 次提交
  8. 29 8月, 2013 1 次提交
  9. 14 8月, 2013 1 次提交
  10. 09 7月, 2013 1 次提交
    • R
      IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() · da183c7a
      Roland Dreier 提交于
      The macro get_unused_fd() is used to allocate a file descriptor with
      default flags.  Those default flags (0) can be "unsafe": O_CLOEXEC must
      be used by default to not leak file descriptor across exec().
      
      Replace calls to get_unused_fd() in uverbs with calls to
      get_unused_fd_flags(O_CLOEXEC).  Inheriting uverbs fds across exec()
      cannot be used to do anything useful.
      
      Based on a patch/suggestion from Yann Droneaud <ydroneaud@opteya.com>.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      da183c7a
  11. 28 2月, 2013 1 次提交
  12. 23 2月, 2013 1 次提交
  13. 22 2月, 2013 1 次提交
  14. 27 9月, 2012 2 次提交
  15. 09 5月, 2012 3 次提交
    • O
      IB/core: Add raw packet QP type · c938a616
      Or Gerlitz 提交于
      IB_QPT_RAW_PACKET allows applications to build a complete packet,
      including L2 headers, when sending; on the receive side, the HW will
      not strip any headers.
      
      This QP type is designed for userspace direct access to Ethernet; for
      example by applications that do TCP/IP themselves.  Only processes
      with the NET_RAW capability are allowed to create raw packet QPs (the
      name "raw packet QP" is supposed to suggest an analogy to AF_PACKET /
      SOL_RAW sockets).
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      c938a616
    • R
      IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order · 5909ce54
      Roland Dreier 提交于
      Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
      objects needed during QP and SRQ creation in different orders
      depending on the the code path.  This leads to the (at least
      theoretical) possibility of deadlock, and triggers the lockdep splat
      below.
      
      Fix this by making sure we always lock the SRQ first, then CQs and
      finally the PD.
      
          ======================================================
          [ INFO: possible circular locking dependency detected ]
          3.4.0-rc5+ #34 Not tainted
          -------------------------------------------------------
          ibv_srq_pingpon/2484 is trying to acquire lock:
           (SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
      
          but task is already holding lock:
           (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #2 (CQ-uobj){+++++.}:
                 [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
                 [<ffffffff81384f28>] down_read+0x34/0x43
                 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
                 [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
                 [<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
                 [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
                 [<ffffffff810fe47f>] vfs_write+0xa7/0xee
                 [<ffffffff810fe65f>] sys_write+0x45/0x69
                 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
      
          -> #1 (PD-uobj){++++++}:
                 [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
                 [<ffffffff81384f28>] down_read+0x34/0x43
                 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
                 [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
                 [<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
                 [<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
                 [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
                 [<ffffffff810fe47f>] vfs_write+0xa7/0xee
                 [<ffffffff810fe65f>] sys_write+0x45/0x69
                 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
      
          -> #0 (SRQ-uobj){+++++.}:
                 [<ffffffff81070898>] __lock_acquire+0xa29/0xd06
                 [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
                 [<ffffffff81384f28>] down_read+0x34/0x43
                 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
                 [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
                 [<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
                 [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
                 [<ffffffff810fe47f>] vfs_write+0xa7/0xee
                 [<ffffffff810fe65f>] sys_write+0x45/0x69
                 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
      
          other info that might help us debug this:
      
          Chain exists of:
            SRQ-uobj --> PD-uobj --> CQ-uobj
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(CQ-uobj);
                                         lock(PD-uobj);
                                         lock(CQ-uobj);
            lock(SRQ-uobj);
      
           *** DEADLOCK ***
      
          3 locks held by ibv_srq_pingpon/2484:
           #0:  (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
           #1:  (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
           #2:  (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
      
          stack backtrace:
          Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
          Call Trace:
           [<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
           [<ffffffff81070898>] __lock_acquire+0xa29/0xd06
           [<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
           [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
           [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
           [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
           [<ffffffff81070eee>] ? lock_release+0x166/0x189
           [<ffffffff81384f28>] down_read+0x34/0x43
           [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
           [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
           [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
           [<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
           [<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
           [<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
           [<ffffffff810d470f>] ? might_fault+0x40/0x90
           [<ffffffff810d470f>] ? might_fault+0x40/0x90
           [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
           [<ffffffff810fe47f>] vfs_write+0xa7/0xee
           [<ffffffff810ff736>] ? fget_light+0x3b/0x99
           [<ffffffff810fe65f>] sys_write+0x45/0x69
           [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
      Reported-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      5909ce54
    • R
      IB/uverbs: Make lockdep output more readable · 3bea57a5
      Roland Dreier 提交于
      Add names for our lockdep classes, so instead of having to decipher
      lockdep output with mysterious names:
      
          Chain exists of:
            key#14 --> key#11 --> key#13
      
      lockdep will give us something nicer:
      
          Chain exists of:
            SRQ-uobj --> PD-uobj --> CQ-uobj
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3bea57a5
  16. 28 1月, 2012 1 次提交
  17. 05 1月, 2012 1 次提交
  18. 04 1月, 2012 1 次提交
    • E
      IB/uverbs: Protect QP multicast list · e214a0fe
      Eli Cohen 提交于
      Userspace verbs multicast attach/detach operations on a QP are done
      while holding the rwsem of the QP for reading.  That's not sufficient
      since a reader lock allows more than one reader to acquire the
      lock.  However, multicast attach/detach does list manipulation that
      can corrupt the list if multiple threads run in parallel.
      
      Fix this by acquiring the rwsem as a writer to serialize attach/detach
      operations.  Add idr_write_qp() and put_qp_write() to encapsulate
      this.
      
      This fixes oops seen when running applications that perform multicast
      joins/leaves.
      
      Reported by: Mike Dubman <miked@mellanox.com>
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      e214a0fe
  19. 14 10月, 2011 7 次提交
  20. 09 12月, 2010 1 次提交
    • D
      IB/uverbs: Handle large number of entries in poll CQ · 7182afea
      Dan Carpenter 提交于
      In ib_uverbs_poll_cq() code there is a potential integer overflow if
      userspace passes in a large cmd.ne.  The calls to kmalloc() would
      allocate smaller buffers than intended, leading to memory corruption.
      There iss also an information leak if resp wasn't all used.
      Unprivileged userspace may call this function, although only if an
      RDMA device that uses this function is present.
      
      Fix this by copying CQ entries one at a time, which avoids the
      allocation entirely, and also by moving this copying into a function
      that makes sure to initialize all memory copied to userspace.
      
      Special thanks to Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
      for his help and advice.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NDan Carpenter <error27@gmail.com>
      
      [ Monkey around with things a bit to avoid bad code generation by gcc
        when designated initializers are used.  - Roland ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7182afea
  21. 26 10月, 2010 1 次提交
  22. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  23. 04 3月, 2010 1 次提交
  24. 10 12月, 2009 1 次提交