1. 03 11月, 2017 3 次提交
    • J
      mm: Remove VM_FAULT_HWPOISON_LARGE_MASK · d81b8a72
      Jan Kara 提交于
      It is unused.
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d81b8a72
    • D
      mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags · 1c972597
      Dan Williams 提交于
      The mmap(2) syscall suffers from the ABI anti-pattern of not validating
      unknown flags. However, proposals like MAP_SYNC need a mechanism to
      define new behavior that is known to fail on older kernels without the
      support. Define a new MAP_SHARED_VALIDATE flag pattern that is
      guaranteed to fail on all legacy mmap implementations.
      
      It is worth noting that the original proposal was for a standalone
      MAP_VALIDATE flag. However, when that  could not be supported by all
      archs Linus observed:
      
          I see why you *think* you want a bitmap. You think you want
          a bitmap because you want to make MAP_VALIDATE be part of MAP_SYNC
          etc, so that people can do
      
          ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED
      		    | MAP_SYNC, fd, 0);
      
          and "know" that MAP_SYNC actually takes.
      
          And I'm saying that whole wish is bogus. You're fundamentally
          depending on special semantics, just make it explicit. It's already
          not portable, so don't try to make it so.
      
          Rename that MAP_VALIDATE as MAP_SHARED_VALIDATE, make it have a value
          of 0x3, and make people do
      
          ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED_VALIDATE
      		    | MAP_SYNC, fd, 0);
      
          and then the kernel side is easier too (none of that random garbage
          playing games with looking at the "MAP_VALIDATE bit", but just another
          case statement in that map type thing.
      
          Boom. Done.
      
      Similar to ->fallocate() we also want the ability to validate the
      support for new flags on a per ->mmap() 'struct file_operations'
      instance basis.  Towards that end arrange for flags to be generically
      validated against a mmap_supported_flags exported by 'struct
      file_operations'. By default all existing flags are implicitly
      supported, but new flags require MAP_SHARED_VALIDATE and
      per-instance-opt-in.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1c972597
    • J
      mm: Handle 0 flags in _calc_vm_trans() macro · 592e2545
      Jan Kara 提交于
      _calc_vm_trans() does not handle the situation when some of the passed
      flags are 0 (which can happen if these VM flags do not make sense for
      the architecture). Improve the _calc_vm_trans() macro to return 0 in
      such situation. Since all passed flags are constant, this does not add
      any runtime overhead.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      592e2545
  2. 04 10月, 2017 10 次提交
  3. 29 9月, 2017 6 次提交
  4. 28 9月, 2017 3 次提交
    • K
      timer: Prepare to change timer callback argument type · 686fef92
      Kees Cook 提交于
      Modern kernel callback systems pass the structure associated with a
      given callback to the callback function. The timer callback remains one
      of the legacy cases where an arbitrary unsigned long argument continues
      to be passed as the callback argument. This has several problems:
      
      - This bloats the timer_list structure with a normally redundant
        .data field.
      
      - No type checking is being performed, forcing callbacks to do
        explicit type casts of the unsigned long argument into the object
        that was passed, rather than using container_of(), as done in most
        of the other callback infrastructure.
      
      - Neighboring buffer overflows can overwrite both the .function and
        the .data field, providing attackers with a way to elevate from a buffer
        overflow into a simplistic ROP-like mechanism that allows calling
        arbitrary functions with a controlled first argument.
      
      - For future Control Flow Integrity work, this creates a unique function
        prototype for timer callbacks, instead of allowing them to continue to
        be clustered with other void functions that take a single unsigned long
        argument.
      
      This adds a new timer initialization API, which will ultimately replace
      the existing setup_timer(), setup_{deferrable,pinned,etc}_timer() family,
      named timer_setup() (to mirror hrtimer_setup(), making instances of its
      use much easier to grep for).
      
      In order to support the migration of existing timers into the new
      callback arguments, timer_setup() casts its arguments to the existing
      legacy types, and explicitly passes the timer pointer as the legacy
      data argument. Once all setup_*timer() callers have been replaced with
      timer_setup(), the casts can be removed, and the data argument can be
      dropped with the timer expiration code changed to just pass the timer
      to the callback directly.
      
      Since the regular pattern of using container_of() during local variable
      declaration repeats the need for the variable type declaration
      to be included, this adds a helper modeled after other from_*()
      helpers that wrap container_of(), named from_timer(). This helper uses
      typeof(*variable), removing the type redundancy and minimizing the need
      for line wraps in forthcoming conversions from "unsigned data long" to
      "struct timer_list *" in the timer callbacks:
      
      -void callback(unsigned long data)
      +void callback(struct timer_list *t)
      {
      -   struct some_data_structure *local = (struct some_data_structure *)data;
      +   struct some_data_structure *local = from_timer(local, t, timer);
      
      Finally, in order to support the handful of timer users that perform
      open-coded assignments of the .function (and .data) fields, provide
      cast macros (TIMER_FUNC_TYPE and TIMER_DATA_TYPE) that can be used
      temporarily. Once conversion has been completed, these can be globally
      trivially removed.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20170928133817.GA113410@beast
      686fef92
    • R
      net/mlx5: Check device capability for maximum flow counters · 16f1c5bb
      Raed Salem 提交于
      Added check for the maximal number of flow counters attached
      to rule (FTE).
      
      Fixes: bd5251db ('net/mlx5_core: Introduce flow steering destination of type counter')
      Signed-off-by: NRaed Salem <raeds@mellanox.com>
      Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      16f1c5bb
    • I
      net/mlx5: Fix FPGA capability location · 99d3cd27
      Inbar Karmy 提交于
      Currently, FPGA capability is located in (mdev)->caps.hca_cur,
      change the location to be (mdev)->caps.fpga,
      since hca_cur is reserved for HCA device capabilities.
      
      Fixes: e29341fb ("net/mlx5: FPGA, Add basic support for Innova")
      Signed-off-by: NInbar Karmy <inbark@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      99d3cd27
  5. 27 9月, 2017 1 次提交
  6. 26 9月, 2017 4 次提交
  7. 25 9月, 2017 4 次提交
    • J
      nvme: add transport SGL definitions · d85cf207
      James Smart 提交于
      Add transport SGL defintions from NVMe TP 4008, required for
      the final NVMe-FC standard.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d85cf207
    • J
      nvme.h: remove FC transport-specific error values · c98cb3bd
      James Smart 提交于
      The NVM express group recinded the reserved range for the transport.
      Remove the FC-centric values that had been defined.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c98cb3bd
    • W
      blktrace: Fix potential deadlock between delete & sysfs ops · 5acb3cc2
      Waiman Long 提交于
      The lockdep code had reported the following unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(s_active#228);
                                     lock(&bdev->bd_mutex/1);
                                     lock(s_active#228);
        lock(&bdev->bd_mutex);
      
       *** DEADLOCK ***
      
      The deadlock may happen when one task (CPU1) is trying to delete a
      partition in a block device and another task (CPU0) is accessing
      tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that
      partition.
      
      The s_active isn't an actual lock. It is a reference count (kn->count)
      on the sysfs (kernfs) file. Removal of a sysfs file, however, require
      a wait until all the references are gone. The reference count is
      treated like a rwsem using lockdep instrumentation code.
      
      The fact that a thread is in the sysfs callback method or in the
      ioctl call means there is a reference to the opended sysfs or device
      file. That should prevent the underlying block structure from being
      removed.
      
      Instead of using bd_mutex in the block_device structure, a new
      blk_trace_mutex is now added to the request_queue structure to protect
      access to the blk_trace structure.
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      
      Fix typo in patch subject line, and prune a comment detailing how
      the code used to work.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5acb3cc2
    • E
      KEYS: prevent creating a different user's keyrings · 237bbd29
      Eric Biggers 提交于
      It was possible for an unprivileged user to create the user and user
      session keyrings for another user.  For example:
      
          sudo -u '#3000' sh -c 'keyctl add keyring _uid.4000 "" @u
                                 keyctl add keyring _uid_ses.4000 "" @u
                                 sleep 15' &
          sleep 1
          sudo -u '#4000' keyctl describe @u
          sudo -u '#4000' keyctl describe @us
      
      This is problematic because these "fake" keyrings won't have the right
      permissions.  In particular, the user who created them first will own
      them and will have full access to them via the possessor permissions,
      which can be used to compromise the security of a user's keys:
      
          -4: alswrv-----v------------  3000     0 keyring: _uid.4000
          -5: alswrv-----v------------  3000     0 keyring: _uid_ses.4000
      
      Fix it by marking user and user session keyrings with a flag
      KEY_FLAG_UID_KEYRING.  Then, when searching for a user or user session
      keyring by name, skip all keyrings that don't have the flag set.
      
      Fixes: 69664cf1 ("keys: don't generate user and user session keyrings unless they're accessed")
      Cc: <stable@vger.kernel.org>	[v2.6.26+]
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      237bbd29
  8. 24 9月, 2017 1 次提交
  9. 22 9月, 2017 1 次提交
    • D
      Input: uinput - avoid FF flush when destroying device · e8b95728
      Dmitry Torokhov 提交于
      Normally, when input device supporting force feedback effects is being
      destroyed, we try to "flush" currently playing effects, so that the
      physical device does not continue vibrating (or executing other effects).
      Unfortunately this does not work well for uinput as flushing of the effects
      deadlocks with the destroy action:
      
      - if device is being destroyed because the file descriptor is being closed,
        then there is noone to even service FF requests;
      
      - if device is being destroyed because userspace sent UI_DEV_DESTROY,
        while theoretically it could be possible to service FF requests,
        userspace is unlikely to do so (they'd need to make sure FF handling
        happens on a separate thread) even if kernel solves the issue with FF
        ioctls deadlocking with UI_DEV_DESTROY ioctl on udev->mutex.
      
      To avoid lockups like the one below, let's install a custom input device
      flush handler, and avoid trying to flush force feedback effects when we
      destroying the device, and instead rely on uinput to shut off the device
      properly.
      
      NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
      ...
       <<EOE>>  [<ffffffff817a0307>] _raw_spin_lock_irqsave+0x37/0x40
       [<ffffffff810e633d>] complete+0x1d/0x50
       [<ffffffffa00ba08c>] uinput_request_done+0x3c/0x40 [uinput]
       [<ffffffffa00ba587>] uinput_request_submit.part.7+0x47/0xb0 [uinput]
       [<ffffffffa00bb62b>] uinput_dev_erase_effect+0x5b/0x76 [uinput]
       [<ffffffff815d91ad>] erase_effect+0xad/0xf0
       [<ffffffff815d929d>] flush_effects+0x4d/0x90
       [<ffffffff815d4cc0>] input_flush_device+0x40/0x60
       [<ffffffff815daf1c>] evdev_cleanup+0xac/0xc0
       [<ffffffff815daf5b>] evdev_disconnect+0x2b/0x60
       [<ffffffff815d74ac>] __input_unregister_device+0xac/0x150
       [<ffffffff815d75f7>] input_unregister_device+0x47/0x70
       [<ffffffffa00bac45>] uinput_destroy_device+0xb5/0xc0 [uinput]
       [<ffffffffa00bb2de>] uinput_ioctl_handler.isra.9+0x65e/0x740 [uinput]
       [<ffffffff811231ab>] ? do_futex+0x12b/0xad0
       [<ffffffffa00bb3f8>] uinput_ioctl+0x18/0x20 [uinput]
       [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
       [<ffffffff81337553>] ? security_file_ioctl+0x43/0x60
       [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
       [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71
      Reported-by: NRodrigo Rivas Costa <rodrigorivascosta@gmail.com>
      Reported-by: NClément VUCHENER <clement.vuchener@gmail.com>
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=193741Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      e8b95728
  10. 21 9月, 2017 2 次提交
  11. 20 9月, 2017 1 次提交
    • A
      of: provide inline helper for of_find_device_by_node · aa767cfb
      Arnd Bergmann 提交于
      The ipmmu-vmsa driver fails in compile-testing on non-OF platforms:
      
      drivers/iommu/ipmmu-vmsa.o: In function `ipmmu_of_xlate':
      ipmmu-vmsa.c:(.text+0x740): undefined reference to `of_find_device_by_node'
      
      It would be reasonable to assume that this interface works but
      returns failure on non-OF builds, like it does on machines that
      have been booted in another way, so this adds another inline
      function helper.
      
      Fixes: 7b2d5961 ("iommu/ipmmu-vmsa: Replace local utlb code with fwspec ids")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRob Herring <robh@kernel.org>
      aa767cfb
  12. 19 9月, 2017 1 次提交
  13. 18 9月, 2017 2 次提交
  14. 15 9月, 2017 1 次提交