1. 21 12月, 2013 1 次提交
  2. 18 11月, 2013 2 次提交
    • M
      IB/core: Re-enable create_flow/destroy_flow uverbs · 69ad5da4
      Matan Barak 提交于
      This commit reverts commit 7afbddfa ("IB/core: Temporarily disable
      create_flow/destroy_flow uverbs").  Since the uverbs extensions
      functionality was experimental for v3.12, this patch re-enables the
      support for them and flow-steering for v3.13.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      69ad5da4
    • Y
      IB/core: extended command: an improved infrastructure for uverbs commands · f21519b2
      Yann Droneaud 提交于
      Commit 400dbc96 ("IB/core: Infrastructure for extensible uverbs
      commands") added an infrastructure for extensible uverbs commands
      while later commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow
      through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
      using this new infrastructure.
      
      According to the commit 400dbc96, the purpose of this
      infrastructure is to support passing around provider (eg. hardware)
      specific buffers when userspace issue commands to the kernel, so that
      it would be possible to extend uverbs (eg. core) buffers independently
      from the provider buffers.
      
      But the new kernel command function prototypes were not modified to
      take advantage of this extension. This issue was exposed by Roland
      Dreier in a previous review[1].
      
      So the following patch is an attempt to a revised extensible command
      infrastructure.
      
      This improved extensible command infrastructure distinguish between
      core (eg. legacy)'s command/response buffers from provider
      (eg. hardware)'s command/response buffers: each extended command
      implementing function is given a struct ib_udata to hold core
      (eg. uverbs) input and output buffers, and another struct ib_udata to
      hold the hw (eg. provider) input and output buffers.
      
      Having those buffers identified separately make it easier to increase
      one buffer to support extension without having to add some code to
      guess the exact size of each command/response parts: This should make
      the extended functions more reliable.
      
      Additionally, instead of relying on command identifier being greater
      than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
      unused bits in command field: on the 32 bits provided by command
      field, only 6 bits are really needed to encode the identifier of
      commands currently supported by the kernel. (Even using only 6 bits
      leaves room for about 23 new commands).
      
      So this patch makes use of some high order bits in command field to
      store flags, leaving enough room for more command identifiers than one
      will ever need (eg. 256).
      
      The new flags are used to specify if the command should be processed
      as an extended one or a legacy one. While designing the new command
      format, care was taken to make usage of flags itself extensible.
      
      Using high order bits of the commands field ensure that newer
      libibverbs on older kernel will properly fail when trying to call
      extended commands. On the other hand, older libibverbs on newer kernel
      will never be able to issue calls to extended commands.
      
      The extended command header includes the optional response pointer so
      that output buffer length and output buffer pointer are located
      together in the command, allowing proper parameters checking. This
      should make implementing functions easier and safer.
      
      Additionally the extended header ensure 64bits alignment, while making
      all sizes multiple of 8 bytes, extending the maximum buffer size:
      
                                   legacy      extended
      
         Maximum command buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
        Maximum response buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
      
      For the purpose of doing proper buffer size accounting, the headers
      size are no more taken in account in "in_words".
      
      One of the odds of the current extensible infrastructure, reading
      twice the "legacy" command header, is fixed by removing the "legacy"
      command header from the extended command header: they are processed as
      two different parts of the command: memory is read once and
      information are not duplicated: it's making clear that's an extended
      command scheme and not a different command scheme.
      
      The proposed scheme will format input (command) and output (response)
      buffers this way:
      
      - command:
      
        legacy header +
        extended header +
        command data (core + hw):
      
          +----------------------------------------+
          | flags     |   00      00    |  command |
          |        in_words    |   out_words       |
          +----------------------------------------+
          |                 response               |
          |                 response               |
          | provider_in_words | provider_out_words |
          |                 padding                |
          +----------------------------------------+
          |                                        |
          .              <uverbs input>            .
          .              (in_words * 8)            .
          |                                        |
          +----------------------------------------+
          |                                        |
          .             <provider input>           .
          .          (provider_in_words * 8)       .
          |                                        |
          +----------------------------------------+
      
      - response, if present:
      
          +----------------------------------------+
          |                                        |
          .          <uverbs output space>         .
          .             (out_words * 8)            .
          |                                        |
          +----------------------------------------+
          |                                        |
          .         <provider output space>        .
          .         (provider_out_words * 8)       .
          |                                        |
          +----------------------------------------+
      
      The overall design is to ensure that the extensible infrastructure is
      itself extensible while begin more reliable with more input and bound
      checking.
      
      Note:
      
      The unused field in the extended header would be perfect candidate to
      hold the command "comp_mask" (eg. bit field used to handle
      compatibility).  This was suggested by Roland Dreier in a previous
      review[2].  But "comp_mask" field is likely to be present in the uverb
      input and/or provider input, likewise for the response, as noted by
      Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
      header.
      
      [1]:
      http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
      
      [2]:
      http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
      
      [3]:
      http://marc.info/?i=525C1149.6000701@mellanox.comSigned-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
      
      [ Convert "ret ? ret : 0" to the equivalent "ret".  - Roland ]
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      f21519b2
  3. 22 10月, 2013 1 次提交
  4. 29 8月, 2013 2 次提交
    • H
      IB/core: Export ib_create/destroy_flow through uverbs · 436f2ad0
      Hadar Hen Zion 提交于
      Implement ib_uverbs_create_flow() and ib_uverbs_destroy_flow() to
      support flow steering for user space applications.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      436f2ad0
    • I
      IB/core: Infrastructure for extensible uverbs commands · 400dbc96
      Igor Ivanov 提交于
      Add infrastructure to support extended uverbs capabilities in a
      forward/backward manner.  Uverbs command opcodes which are based on
      the verbs extensions approach should be greater or equal to
      IB_USER_VERBS_CMD_THRESHOLD.  They have new header format and
      processed a bit differently.
      
      Whenever a specific IB_USER_VERBS_CMD_XXX is extended, which practically means
      it needs to have additional arguments, we will be able to add them without creating
      a completely new IB_USER_VERBS_CMD_YYY command or bumping the uverbs ABI version.
      
      This patch for itself doesn't provide the whole scheme which is also dependent
      on adding a comp_mask field to each extended uverbs command struct.
      
      The new header framework allows for future extension of the CMD arguments
      (ib_uverbs_cmd_hdr.in_words, ib_uverbs_cmd_hdr.out_words) for an existing
      new command (that is a command that supports the new uverbs command header format
      suggested in this patch) w/o bumping ABI version and with maintaining backward
      and formward compatibility to new and old libibverbs versions.
      
      In the uverbs command we are passing both uverbs arguments and the provider arguments.
      We split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only
      uverbs input argument struct size and  ib_uverbs_cmd_hdr.provider_in_words that will carry
      the provider input argument size. Same goes for the response (the uverbs CMD output argument).
      
      For example take the create_cq call and the mlx4_ib provider:
      
      The uverbs layer gets libibverb's struct ibv_create_cq (named struct ib_uverbs_create_cq
      in the kernel), mlx4_ib gets libmlx4's struct mlx4_create_cq (which includes struct
      ibv_create_cq and is named struct mlx4_ib_create_cq in the kernel) and
      in_words = sizeof(mlx4_create_cq)/4 .
      
      Thus ib_uverbs_cmd_hdr.in_words carry both uverbs plus mlx4_ib input argument sizes,
      where uverbs assumes it knows the size of its input argument - struct ibv_create_cq.
      
      Now, if we wish to add a variable to struct ibv_create_cq, we can add a comp_mask field
      to the struct which is basically bit field indicating which fields exists in the struct
      (as done for the libibverbs API extension), but we need a way to tell what is the total
      size of the struct and not assume the struct size is predefined (since we may get different
      struct sizes from different user libibverbs versions). So we know at which point the
      provider input argument (struct mlx4_create_cq) begins. Same goes for extending the
      provider struct mlx4_create_cq. Thus we split the ib_uverbs_cmd_hdr.in_words to
      ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and
      ib_uverbs_cmd_hdr.provider_in_words that will carry the provider (mlx4_ib) input argument size.
      Signed-off-by: NIgor Ivanov <Igor.Ivanov@itseez.com>
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      400dbc96
  5. 22 2月, 2013 1 次提交
  6. 27 9月, 2012 2 次提交
  7. 04 1月, 2012 1 次提交
  8. 14 10月, 2011 5 次提交
  9. 07 10月, 2011 1 次提交
  10. 05 7月, 2011 1 次提交
  11. 24 5月, 2011 1 次提交
  12. 22 4月, 2010 1 次提交
    • R
      IB: Explicitly rule out llseek to avoid BKL in default_llseek() · bc1db9af
      Roland Dreier 提交于
      Several RDMA user-access drivers have file_operations structures with
      no .llseek method set.  None of the drivers actually do anything with
      f_pos, so this means llseek is essentially a NOP, instead of returning
      an error as leaving other file_operations methods unimplemented would
      do.  This is mostly harmless, except that a NULL .llseek means that
      default_llseek() is used, and this function grabs the BKL, which we
      would like to avoid.
      
      Since llseek does nothing useful on these files, we would like it to
      return an error to userspace instead of silently grabbing the BKL and
      succeeding.  For nearly all of the file types, we take the
      belt-and-suspenders approach of setting the .llseek method to
      no_llseek and also calling nonseekable_open(); the exception is the
      uverbs_event files, which are created with anon_inode_getfile(), which
      already sets f_mode the same way as nonseekable_open() would.
      
      This work is motivated by Arnd Bergmann's bkl-removal tree.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      bc1db9af
  13. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  14. 08 3月, 2010 2 次提交
  15. 04 3月, 2010 1 次提交
  16. 25 2月, 2010 7 次提交
  17. 17 12月, 2009 1 次提交
  18. 05 10月, 2009 1 次提交
  19. 06 9月, 2009 2 次提交
    • J
      IB/uverbs: Return ENOSYS for unimplemented commands (not EINVAL) · b1b8afb8
      Jack Morgenstein 提交于
      Since the original commit 883a99c7 ("[IB] uverbs: Add a mask of device
      methods allowed for userspace"), the uverbs core returns EINVAL for
      commands not implemented by a specific low-level driver.
      
      This creates a problem that there is no way to tell the difference
      between an unimplemented command and an implemented one which is
      incorrectly invoked (which also returns EINVAL).
      
      The fix is to have unimplemented commands return ENOSYS.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      b1b8afb8
    • R
      IB: Use DEFINE_SPINLOCK() for static spinlocks · 6276e08a
      Roland Dreier 提交于
      Rather than just defining static spinlock_t variables and then
      initializing them later in init functions, simply define them with
      DEFINE_SPINLOCK() and remove the calls to spin_lock_init().  This cleans
      up the source a tad and also shrinks the compiled code; eg on x86-64:
      
      add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
      function                                     old     new   delta
      ib_uverbs_init                               336     326     -10
      ib_mad_init_module                           147     137     -10
      ib_sa_init                                   123     103     -20
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      6276e08a
  20. 02 11月, 2008 1 次提交
    • A
      saner FASYNC handling on file close · 233e70f4
      Al Viro 提交于
      As it is, all instances of ->release() for files that have ->fasync()
      need to remember to evict file from fasync lists; forgetting that
      creates a hole and we actually have a bunch that *does* forget.
      
      So let's keep our lives simple - let __fput() check FASYNC in
      file->f_flags and call ->fasync() there if it's been set.  And lose that
      crap in ->release() instances - leaving it there is still valid, but we
      don't have to bother anymore.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      233e70f4
  21. 17 10月, 2008 1 次提交
  22. 15 7月, 2008 1 次提交
  23. 05 7月, 2008 1 次提交
  24. 21 6月, 2008 1 次提交
  25. 19 6月, 2008 1 次提交