1. 09 9月, 2017 3 次提交
  2. 07 9月, 2017 8 次提交
    • R
      mm,fork: introduce MADV_WIPEONFORK · d2cd9ede
      Rik van Riel 提交于
      Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
      in the child process after fork.  This differs from MADV_DONTFORK in one
      important way.
      
      If a child process accesses memory that was MADV_WIPEONFORK, it will get
      zeroes.  The address ranges are still valid, they are just empty.
      
      If a child process accesses memory that was MADV_DONTFORK, it will get a
      segmentation fault, since those address ranges are no longer valid in
      the child after fork.
      
      Since MADV_DONTFORK also seems to be used to allow very large programs
      to fork in systems with strict memory overcommit restrictions, changing
      the semantics of MADV_DONTFORK might break existing programs.
      
      MADV_WIPEONFORK only works on private, anonymous VMAs.
      
      The use case is libraries that store or cache information, and want to
      know that they need to regenerate it in the child process after fork.
      
      Examples of this would be:
       - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
         check, which is too slow without a PID cache)
       - PKCS#11 API reinitialization check (mandated by specification)
       - glibc's upcoming PRNG (reseed after fork)
       - OpenSSL PRNG (reseed after fork)
      
      The security benefits of a forking server having a re-inialized PRNG in
      every child process are pretty obvious.  However, due to libraries
      having all kinds of internal state, and programs getting compiled with
      many different versions of each library, it is unreasonable to expect
      calling programs to re-initialize everything manually after fork.
      
      A further complication is the proliferation of clone flags, programs
      bypassing glibc's functions to call clone directly, and programs calling
      unshare, causing the glibc pthread_atfork hook to not get called.
      
      It would be better to have the kernel take care of this automatically.
      
      The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
      MADV_WIPEONFORK.
      
      This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
      
          https://man.openbsd.org/minherit.2
      
      [akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
      Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.comSigned-off-by: NRik van Riel <riel@redhat.com>
      Reported-by: NFlorian Weimer <fweimer@redhat.com>
      Reported-by: NColm MacCártaigh <colm@allcosts.net>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Cc: <linux-api@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2cd9ede
    • M
      mm/shmem: add hugetlbfs support to memfd_create() · 749df87b
      Mike Kravetz 提交于
      This patch came out of discussions in this e-mail thread:
        http://lkml.kernel.org/r/1499357846-7481-1-git-send-email-mike.kravetz%40oracle.com
      
      The Oracle JVM team is developing a new garbage collection model.  This
      new model requires multiple mappings of the same anonymous memory.  One
      straight forward way to accomplish this is with memfd_create.  They can
      use the returned fd to create multiple mappings of the same memory.
      
      The JVM today has an option to use (static hugetlb) huge pages.  If this
      option is specified, they would like to use the same garbage collection
      model requiring multiple mappings to the same memory.  Using hugetlbfs,
      it is possible to explicitly mount a filesystem and specify file paths
      in order to get an fd that can be used for multiple mappings.  However,
      this introduces additional system admin work and coordination.
      
      Ideally they would like to get a hugetlbfs fd without requiring explicit
      mounting of a filesystem.  Today, mmap and shmget can make use of
      hugetlbfs without explicitly mounting a filesystem.  The patch adds this
      functionality to memfd_create.
      
      Add a new flag MFD_HUGETLB to memfd_create() that will specify the file
      to be created resides in the hugetlbfs filesystem.  This is the generic
      hugetlbfs filesystem not associated with any specific mount point.  As
      with other system calls that request hugetlbfs backed pages, there is
      the ability to encode huge page size in the flag arguments.
      
      hugetlbfs does not support sealing operations, therefore specifying
      MFD_ALLOW_SEALING with MFD_HUGETLB will result in EINVAL.
      
      Of course, the memfd_man page would need updating if this type of
      functionality moves forward.
      
      Link: http://lkml.kernel.org/r/1502149672-7759-2-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      749df87b
    • A
      userfaultfd: provide pid in userfault msg - add feat union · a36985d3
      Andrea Arcangeli 提交于
      No ABI change, but this will make it more explicit to software that ptid
      is only available if requested by passing UFFD_FEATURE_THREAD_ID to
      UFFDIO_API.  The fact it's a union will also self document it shouldn't
      be taken for granted there's a tpid there.
      
      Link: http://lkml.kernel.org/r/20170802165145.22628-7-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Alexey Perevalov <a.perevalov@samsung.com>
      Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a36985d3
    • A
      userfaultfd: provide pid in userfault msg · 9d4ac934
      Alexey Perevalov 提交于
      It could be useful for calculating downtime during postcopy live
      migration per vCPU.  Side observer or application itself will be
      informed about proper task's sleep during userfaultfd processing.
      
      Process's thread id is being provided when user requeste it by setting
      UFFD_FEATURE_THREAD_ID bit into uffdio_api.features.
      
      Link: http://lkml.kernel.org/r/20170802165145.22628-6-aarcange@redhat.comSigned-off-by: NAlexey Perevalov <a.perevalov@samsung.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d4ac934
    • P
      mm: userfaultfd: add feature to request for a signal delivery · 2d6d6f5a
      Prakash Sangappa 提交于
      In some cases, userfaultfd mechanism should just deliver a SIGBUS signal
      to the faulting process, instead of the page-fault event.  Dealing with
      page-fault event using a monitor thread can be an overhead in these
      cases.  For example applications like the database could use the
      signaling mechanism for robustness purpose.
      
      Database uses hugetlbfs for performance reason.  Files on hugetlbfs
      filesystem are created and huge pages allocated using fallocate() API.
      Pages are deallocated/freed using fallocate() hole punching support.
      These files are mmapped and accessed by many processes as shared memory.
      The database keeps track of which offsets in the hugetlbfs file have
      pages allocated.
      
      Any access to mapped address over holes in the file, which can occur due
      to bugs in the application, is considered invalid and expect the process
      to simply receive a SIGBUS.  However, currently when a hole in the file
      is accessed via the mapped address, kernel/mm attempts to automatically
      allocate a page at page fault time, resulting in implicitly filling the
      hole in the file.  This may not be the desired behavior for applications
      like the database that want to explicitly manage page allocations of
      hugetlbfs files.
      
      Using userfaultfd mechanism with this support to get a signal, database
      application can prevent pages from being allocated implicitly when
      processes access mapped address over holes in the file.
      
      This patch adds UFFD_FEATURE_SIGBUS feature to userfaultfd mechnism to
      request for a SIGBUS signal.
      
      See following for previous discussion about the database requirement
      leading to this proposal as suggested by Andrea.
      
      http://www.spinics.net/lists/linux-mm/msg129224.html
      
      Link: http://lkml.kernel.org/r/1501552446-748335-2-git-send-email-prakash.sangappa@oracle.comSigned-off-by: NPrakash Sangappa <prakash.sangappa@oracle.com>
      Reviewed-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d6d6f5a
    • M
      mm: shm: use new hugetlb size encoding definitions · 4da243ac
      Mike Kravetz 提交于
      Use the common definitions from hugetlb_encode.h header file for
      encoding hugetlb size definitions in shmget system call flags.
      
      In addition, move these definitions from the internal (kernel) to user
      (uapi) header file.
      
      Link: http://lkml.kernel.org/r/1501527386-10736-4-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Suggested-by: NMatthew Wilcox <willy@infradead.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4da243ac
    • M
      mm: arch: consolidate mmap hugetlb size encodings · aafd4562
      Mike Kravetz 提交于
      A non-default huge page size can be encoded in the flags argument of the
      mmap system call.  The definitions for these encodings are in arch
      specific header files.  However, all architectures use the same values.
      
      Consolidate all the definitions in the primary user header file
      (uapi/linux/mman.h).  Include definitions for all known huge page sizes.
      Use the generic encoding definitions in hugetlb_encode.h as the basis
      for these definitions.
      
      Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aafd4562
    • M
      mm: hugetlb: define system call hugetlb size encodings in single file · e652f694
      Mike Kravetz 提交于
      Patch series "Consolidate system call hugetlb page size encodings".
      
      These patches are the result of discussions in
      https://lkml.org/lkml/2017/3/8/548.  The following changes are made in the
      patch set:
      
      1) Put all the log2 encoded huge page size definitions in a common
         header file.  The idea is have a set of definitions that can be use as
         the basis for system call specific definitions such as MAP_HUGE_* and
         SHM_HUGE_*.
      
      2) Remove MAP_HUGE_* definitions in arch specific files.  All these
         definitions are the same.  Consolidate all definitions in the primary
         user header file (uapi/linux/mman.h).
      
      3) Remove SHM_HUGE_* definitions intended for user space from kernel
         header file, and add to user (uapi/linux/shm.h) header file.  Add
         definitions for all known huge page size encodings as in mmap.
      
      This patch (of 3):
      
      If hugetlb pages are requested in mmap or shmget system calls, a huge
      page size other than default can be requested.  This is accomplished by
      encoding the log2 of the huge page size in the upper bits of the flag
      argument.  asm-generic and arch specific headers all define the same
      values for these encodings.
      
      Put common definitions in a single header file.  The primary uapi header
      files for mmap and shm will use these definitions as a basis for
      definitions specific to those system calls.
      
      Link: http://lkml.kernel.org/r/1501527386-10736-2-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e652f694
  3. 05 9月, 2017 16 次提交
  4. 04 9月, 2017 3 次提交
    • P
      netlink: add NLM_F_NONREC flag for deletion requests · 2335ba70
      Pablo Neira Ayuso 提交于
      In the last NFWS in Faro, Portugal, we discussed that netlink is lacking
      the semantics to request non recursive deletions, ie. do not delete an
      object iff it has child objects that hang from this parent object that
      the user requests to be deleted.
      
      We need this new flag to solve a problem for the iptables-compat
      backward compatibility utility, that runs iptables commands using the
      existing nf_tables netlink interface. Specifically, custom chains in
      iptables cannot be deleted if there are rules in it, however, nf_tables
      allows to remove any chain that is populated with content. To sort out
      this asymmetry, iptables-compat userspace sets this new NLM_F_NONREC
      flag to obtain the same semantics that iptables provides.
      
      This new flag should only be used for deletion requests. Note this new
      flag value overlaps with the existing:
      
      * NLM_F_ROOT for get requests.
      * NLM_F_REPLACE for new requests.
      
      However, those flags should not ever be used in deletion requests.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2335ba70
    • P
      netfilter: nft_limit: add stateful object type · a6912055
      Pablo M. Bermudo Garay 提交于
      Register a new limit stateful object type into the stateful object
      infrastructure.
      Signed-off-by: NPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a6912055
    • V
      netfilter: xt_hashlimit: add rate match mode · bea74641
      Vishwanath Pai 提交于
      This patch adds a new feature to hashlimit that allows matching on the
      current packet/byte rate without rate limiting. This can be enabled
      with a new flag --hashlimit-rate-match. The match returns true if the
      current rate of packets is above/below the user specified value.
      
      The main difference between the existing algorithm and the new one is
      that the existing algorithm rate-limits the flow whereas the new
      algorithm does not. Instead it *classifies* the flow based on whether
      it is above or below a certain rate. I will demonstrate this with an
      example below. Let us assume this rule:
      
      iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain
      
      If the packet rate is 15/s, the existing algorithm would ACCEPT 10
      packets every second and send 5 packets to "new_chain".
      
      But with the new algorithm, as long as the rate of 15/s is sustained,
      all packets will continue to match and every packet is sent to new_chain.
      
      This new functionality will let us classify different flows based on
      their current rate, so that further decisions can be made on them based on
      what the current rate is.
      
      This is how the new algorithm works:
      We divide time into intervals of 1 (sec/min/hour) as specified by
      the user. We keep track of the number of packets/bytes processed in the
      current interval. After each interval we reset the counter to 0.
      
      When we receive a packet for match, we look at the packet rate
      during the current interval and the previous interval to make a
      decision:
      
      if [ prev_rate < user and cur_rate < user ]
              return Below
      else
              return Above
      
      Where cur_rate is the number of packets/bytes seen in the current
      interval, prev is the number of packets/bytes seen in the previous
      interval and 'user' is the rate specified by the user.
      
      We also provide flexibility to the user for choosing the time
      interval using the option --hashilmit-interval. For example the user can
      keep a low rate like x/hour but still keep the interval as small as 1
      second.
      
      To preserve backwards compatibility we have to add this feature in a new
      revision, so I've created revision 3 for hashlimit. The two new options
      we add are:
      
      --hashlimit-rate-match
      --hashlimit-rate-interval
      
      I have updated the help text to add these new options. Also added a few
      tests for the new options.
      Suggested-by: NIgor Lubashev <ilubashe@akamai.com>
      Reviewed-by: NJosh Hunt <johunt@akamai.com>
      Signed-off-by: NVishwanath Pai <vpai@akamai.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      bea74641
  5. 02 9月, 2017 2 次提交
  6. 01 9月, 2017 4 次提交
  7. 31 8月, 2017 4 次提交
    • M
      IB/core: Add completion queue (cq) object actions · 9ee79fce
      Matan Barak 提交于
      Adding CQ ioctl actions:
      1. create_cq
      2. destroy_cq
      
      This requires adding the following:
      1. A specification describing the method
      	a. Handler
      	b. Attributes specification
      		Each attribute is one of the following:
      		a. PTR_IN - input data
      			    Note: This could be encoded inlined for
      				  data < 64bit
      		b. PTR_OUT - response data
      		c. IDR - idr based object
      		d. FD - fd based object
                      Blobs attributes (clauses a and b) contain their type,
      	        while objects specifications (clauses c and d)
                      contains the expected object type (for example, the
                      given id should be UVERBS_TYPE_PD) and the required
                      access (READ, WRITE, NEW or DESTROY). If a NEW is
                      required, the new object's id will be assigned to this
                      attribute. All attributes could get UA_FLAGS
                      attribute. Currently we support stating that an
      		attribute is mandatory or that the specification size
                      corresponds to a lower bound (and that this attribute
      		could be extended).
      		We currently add both default attributes and the two
      		generic UHW_IN and UHW_OUT driver specific attributes.
      2. Handler
         A handler gets a uverbs_attr_bundle. The handler developer uses
         uverbs_attr_get to fetch an attribute of a given id.
         Each of these attribute groups correspond to the specification
         group defined in the action (clauses 1.b and 1.c respectively).
         The indices of these arrays corresponds to the attribute ids
         declared in the specifications (clause 2).
      
         The handler is quite simple. It assumes the infrastructure fetched
         all objects and locked, created or destroyed them as required by
         the specification. Pointer (or blob) attributes were validated to
         match their required sizes. After the handler finished, the
         infrastructure commits or rollbacks the objects.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      9ee79fce
    • M
      IB/core: Add legacy driver's user-data · d70724f1
      Matan Barak 提交于
      In this phase, we don't want to change all the drivers to use
      flexible driver's specific attributes. Therefore, we add two default
      attributes: UHW_IN and UHW_OUT. These attributes are optional in some
      methods and they encode the driver specific command data. We add
      a function that extract this data and creates the legacy udata over
      it.
      
      Driver's data should start from UVERBS_UDATA_DRIVER_DATA_FLAG. This
      turns on the first bit of the namespace, indicating this attribute
      belongs to the driver's namespace.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d70724f1
    • M
      IB/core: Export ioctl enum types to user-space · 64b19e13
      Matan Barak 提交于
      Add a new ib_user_ioctl_verbs.h which exports all required ABI
      enums and structs to the user-space.
      Export the default types to user-space through this file.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      64b19e13
    • M
      IB/core: Add new ioctl interface · fac9658c
      Matan Barak 提交于
      In this ioctl interface, processing the command starts from
      properties of the command and fetching the appropriate user objects
      before calling the handler.
      
      Parsing and validation is done according to a specifier declared by
      the driver's code. In the driver, all supported objects are declared.
      These objects are separated to different object namepsaces. Dividing
      objects to namespaces is done at initialization by using the higher
      bits of the object ids. This initialization can mix objects declared
      in different places to one parsing tree using in this ioctl interface.
      
      For each object we list all supported methods. Similarly to objects,
      methods are separated to method namespaces too. Namespacing is done
      similarly to the objects case. This could be used in order to add
      methods to an existing object.
      
      Each method has a specific handler, which could be either a default
      handler or a driver specific handler.
      Along with the handler, a bunch of attributes are specified as well.
      Similarly to objects and method, attributes are namespaced and hashed
      by their ids at initialization too. All supported attributes are
      subject to automatic fetching and validation. These attributes include
      the command, response and the method's related objects' ids.
      
      When these entities (objects, methods and attributes) are used, the
      high bits of the entities ids are used in order to calculate the hash
      bucket index. Then, these high bits are masked out in order to have a
      zero based index. Since we use these high bits for both bucketing and
      namespacing, we get a compact representation and O(1) array access.
      This is mandatory for efficient dispatching.
      
      Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length.
      Attributes could be validated through some attributes, like:
      (*) Minimum size / Exact size
      (*) Fops for FD
      (*) Object type for IDR
      
      If an IDR/fd attribute is specified, the kernel also states the object
      type and the required access (NEW, WRITE, READ or DESTROY).
      All uobject/fd management is done automatically by the infrastructure,
      meaning - the infrastructure will fail concurrent commands that at
      least one of them requires concurrent access (WRITE/DESTROY),
      synchronize actions with device removals (dissociate context events)
      and take care of reference counting (increase/decrease) for concurrent
      actions invocation. The reference counts on the actual kernel objects
      shall be handled by the handlers.
      
       objects
      +--------+
      |        |
      |        |   methods                                                                +--------+
      |        |   ns         method      method_spec                           +-----+   |len     |
      +--------+  +------+[d]+-------+   +----------------+[d]+------------+    |attr1+-> |type    |
      | object +> |method+-> | spec  +-> +  attr_buckets  +-> |default_chain+--> +-----+   |idr_type|
      +--------+  +------+   |handler|   |                |   +------------+    |attr2|   |access  |
      |        |  |      |   +-------+   +----------------+   |driver chain|    +-----+   +--------+
      |        |  |      |                                    +------------+
      |        |  +------+
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      +--------+
      
      [d] = Hash ids to groups using the high order bits
      
      The right types table is also chosen by using the high bits from
      the ids. Currently we have either default or driver specific groups.
      
      Once validation and object fetching (or creation) completed, we call
      the handler:
      int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
                     struct uverbs_attr_bundle *ctx);
      
      ctx bundles attributes of different namespaces. Each element there
      is an array of attributes which corresponds to one namespaces of
      attributes. For example, in the usually used case:
      
       ctx                               core
      +----------------------------+     +------------+
      | core:                      +---> | valid      |
      +----------------------------+     | cmd_attr   |
      | driver:                    |     +------------+
      |----------------------------+--+  | valid      |
                                      |  | cmd_attr   |
                                      |  +------------+
                                      |  | valid      |
                                      |  | obj_attr   |
                                      |  +------------+
                                      |
                                      |  drivers
                                      |  +------------+
                                      +> | valid      |
                                         | cmd_attr   |
                                         +------------+
                                         | valid      |
                                         | cmd_attr   |
                                         +------------+
                                         | valid      |
                                         | obj_attr   |
                                         +------------+
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      fac9658c