1. 08 5月, 2013 1 次提交
  2. 06 4月, 2013 1 次提交
  3. 23 3月, 2013 1 次提交
  4. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  5. 28 2月, 2013 1 次提交
  6. 23 2月, 2013 1 次提交
  7. 15 2月, 2013 1 次提交
    • M
      IB/qib: Fix QP locate/remove race · bcc9b67a
      Mike Marciniszyn 提交于
      remove_qp() can execute concurrently with a qib_lookup_qpn() on
      another CPU, which in of itself, is ok, given the RCU locking.
      
      The issue is that remove_qp() NULLs out the qp->next field so that a
      qib_lookup_qpn() might fail to find a qp if it occurs after the one
      that is being deleted.  This is a momentary issue and subsequent
      qib_lookup_qpn() calls would find the qp's since the search restarts
      from the bucket head.  At scale, the issue might causes dropped
      packets and unnecessary retransmissions.
      
      The fix just deletes the qp->next NULL assignment to prevent the
      remove_qp() from hiding qp's from qib_lookup_qpn().
      Reviewed-by: NDean Luick <dean.luick@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      bcc9b67a
  8. 06 2月, 2013 1 次提交
  9. 04 1月, 2013 1 次提交
    • G
      Drivers: infinband: remove __dev* attributes. · 1e6d9abe
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitdata,
      and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Tom Tucker <tom@opengridcomputing.com>
      Cc: Steve Wise <swise@opengridcomputing.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
      Cc: Christoph Raisch <raisch@de.ibm.com>
      Cc: Mike Marciniszyn <infinipath@intel.com>
      Cc: Faisal Latif <faisal.latif@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e6d9abe
  10. 09 10月, 2012 1 次提交
    • K
      mm: kill vma flag VM_RESERVED and mm->reserved_vm counter · 314e51b9
      Konstantin Khlebnikov 提交于
      A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
      currently it lost original meaning but still has some effects:
      
       | effect                 | alternative flags
      -+------------------------+---------------------------------------------
      1| account as reserved_vm | VM_IO
      2| skip in core dump      | VM_IO, VM_DONTDUMP
      3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      
      This patch removes reserved_vm counter from mm_struct.  Seems like nobody
      cares about it, it does not exported into userspace directly, it only
      reduces total_vm showed in proc.
      
      Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
      
      remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
      remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
      
      [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      314e51b9
  11. 02 10月, 2012 1 次提交
  12. 01 10月, 2012 1 次提交
  13. 21 9月, 2012 1 次提交
  14. 15 9月, 2012 1 次提交
  15. 08 9月, 2012 1 次提交
  16. 24 8月, 2012 1 次提交
  17. 16 8月, 2012 2 次提交
  18. 30 7月, 2012 1 次提交
  19. 20 7月, 2012 4 次提交
  20. 18 7月, 2012 1 次提交
  21. 11 7月, 2012 1 次提交
  22. 09 7月, 2012 3 次提交
    • M
      IB/qib: RCU locking for MR validation · 8aac4cc3
      Mike Marciniszyn 提交于
      Profiling indicates that MR validation locking is expensive.  The MR
      table is largely read-only and is a suitable candidate for RCU locking.
      
      The patch uses RCU locking during validation to eliminate one
      lock/unlock during that validation.
      Reviewed-by: NMike Heinz <michael.william.heinz@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      8aac4cc3
    • M
      IB/qib: Avoid returning EBUSY from MR deregister · 6a82649f
      Mike Marciniszyn 提交于
      A timing issue can occur where qib_mr_dereg can return -EBUSY if the
      MR use count is not zero.
      
      This can occur if the MR is de-registered while RDMA read response
      packets are being progressed from the SDMA ring.  The suspicion is
      that the peer sent an RDMA read request, which has already been copied
      across to the peer.  The peer sees the completion of his request and
      then communicates to the responder that the MR is not needed any
      longer.  The responder tries to de-register the MR, catching some
      responses remaining in the SDMA ring holding the MR use count.
      
      The code now uses a get/put paradigm to track MR use counts and
      coordinates with the MR de-registration process using a completion
      when the count has reached zero.  A timeout on the delay is in place
      to catch other EBUSY issues.
      
      The reference count protocol is as follows:
      - The return to the user counts as 1
      - A reference from the lk_table or the qib_ibdev counts as 1.
      - Transient I/O operations increase/decrease as necessary
      
      A lot of code duplication has been folded into the new routines
      init_qib_mregion() and deinit_qib_mregion().  Additionally, explicit
      initialization of fields to zero is now handled by kzalloc().
      
      Also, duplicated code 'while.*num_sge' that decrements reference
      counts have been consolidated in qib_put_ss().
      Reviewed-by: NRamkrishna Vepa <ramkrishna.vepa@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      6a82649f
    • M
      IB/qib: Fix UC MR refs for immediate operations · 354dff1b
      Mike Marciniszyn 提交于
      An MR reference leak exists when handling UC RDMA writes with
      immediate data because we manipulate the reference counts as if the
      operation had been a send.
      
      This patch moves the last_imm label so that the RDMA write operations
      with immediate data converge at the cq building code.  The copy/mr
      deref code is now done correctly prior to the branch to last_imm.
      Reviewed-by: NEdward Mascarenhas <edward.mascarenhas@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      354dff1b
  23. 15 5月, 2012 8 次提交
  24. 09 3月, 2012 1 次提交
    • O
      IB: Change CQE "csum_ok" field to a bit flag · d927d505
      Or Gerlitz 提交于
      Use a bit in wc_flags rather then a whole integer to hold the
      "checksum OK" flag.  By itself, this change doesn't reduce the size of
      struct ib_wc on 64bit machines -- it stays on 56 bytes because of
      padding.  However, it will allow to add more fields in the future
      without enlarging the struct.  Also, it will let us have a unified
      approach with future libibverbs checksum offload reporting, because a
      bit flag doesn't break the library ABI.
      
      This patch was suggested during conversation with Liran Liss
      <liranl@mellanox.com>.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      d927d505
  25. 26 2月, 2012 2 次提交
  26. 28 1月, 2012 1 次提交