1. 16 1月, 2017 5 次提交
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · aa2797b3
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "Bugfixes for I2C. Mostly core this time which is a bit unusual but
        nothing really scary in there"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: piix4: Avoid race conditions with IMC
        i2c: fix spelling mistake: "insufficent" -> "insufficient"
        i2c: print correct device invalid address
        i2c: do not enable fall back to Host Notify by default
        i2c: fix kernel memory disclosure in dev interface
      aa2797b3
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 83346fbc
      Linus Torvalds 提交于
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes:
      
         - unwinder fixes
         - AMD CPU topology enumeration fixes
         - microcode loader fixes
         - x86 embedded platform fixes
         - fix for a bootup crash that may trigger when clearcpuid= is used
           with invalid values"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mpx: Use compatible types in comparison to fix sparse error
        x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc()
        x86/entry: Fix the end of the stack for newly forked tasks
        x86/unwind: Include __schedule() in stack traces
        x86/unwind: Disable KASAN checks for non-current tasks
        x86/unwind: Silence warnings for non-current tasks
        x86/microcode/intel: Use correct buffer size for saving microcode data
        x86/microcode/intel: Fix allocation size of struct ucode_patch
        x86/microcode/intel: Add a helper which gives the microcode revision
        x86/microcode: Use native CPUID to tickle out microcode revision
        x86/CPU: Add native CPUID variants returning a single datum
        x86/boot: Add missing declaration of string functions
        x86/CPU/AMD: Fix Bulldozer topology
        x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev'
        x86/cpu: Fix typo in the comment for Anniedale
        x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option
      83346fbc
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a11ce3a4
      Linus Torvalds 提交于
      Pull NOHZ fix from Ingo Molnar:
       "This fixes an old NOHZ race where we incorrectly calculate the next
        timer interrupt in certain circumstances where hrtimers are pending,
        that can cause hard to reproduce stalled-values artifacts in
        /proc/stat"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        nohz: Fix collision between tick and other hrtimers
      a11ce3a4
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 79078c53
      Linus Torvalds 提交于
      Pull perf fixes from Ingo Molnar:
       "Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU
        driver fixes, plus miscellanous tooling fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86: Reject non sampling events with precise_ip
        perf/x86/intel: Account interrupts for PEBS errors
        perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
        perf/core: Fix sys_perf_event_open() vs. hotplug
        perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
        perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
        perf/x86: Set pmu->module in Intel PMU modules
        perf probe: Fix to probe on gcc generated symbols for offline kernel
        perf probe: Fix --funcs to show correct symbols for offline module
        perf symbols: Robustify reading of build-id from sysfs
        perf tools: Install tools/lib/traceevent plugins with install-bin
        tools lib traceevent: Fix prev/next_prio for deadline tasks
        perf record: Fix --switch-output documentation and comment
        perf record: Make __record_options static
        tools lib subcmd: Add OPT_STRING_OPTARG_SET option
        perf probe: Fix to get correct modname from elf header
        samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
        samples/bpf sock_example: Avoid getting ethhdr from two includes
        perf sched timehist: Show total scheduling time
      79078c53
    • L
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 255e6140
      Linus Torvalds 提交于
      Pull EFI fixes from Ingo Molnar:
       "A number of regression fixes:
      
         - Fix a boot hang on machines that have somewhat unusual memory map
           entries of phys_addr=0x0 num_pages=0, which broke due to a recent
           commit. This commit got cherry-picked from the v4.11 queue because
           the bug is affecting real machines.
      
         - Fix a boot hang also reported by KASAN, caused by incorrect init
           ordering introduced by a recent optimization.
      
         - Fix a recent robustification fix to allocate_new_fdt_and_exit_boot()
           that introduced an invalid assumption. Neither bugs were seen in
           the wild AFAIK"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/x86: Prune invalid memory map entries and fix boot regression
        x86/efi: Don't allocate memmap through memblock after mm_init()
        efi/libstub/arm*: Pass latest memory map to the kernel
      255e6140
  2. 15 1月, 2017 6 次提交
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f4d3935e
      Linus Torvalds 提交于
      Pull vfs fixes from Al Viro.
      
      The most notable fix here is probably the fix for a splice regression
      ("fix a fencepost error in pipe_advance()") noticed by Alan Wylie.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fix a fencepost error in pipe_advance()
        coredump: Ensure proper size of sparse core files
        aio: fix lock dep warning
        tmpfs: clear S_ISGID when setting posix ACLs
      f4d3935e
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 34241af7
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
      
       - the virtio_blk stack DMA corruption fix from Christoph, fixing and
         issue with VMAP stacks.
      
       - O_DIRECT blkbits calculation fix from Chandan.
      
       - discard regression fix from Christoph.
      
       - queue init error handling fixes for nbd and virtio_blk, from Omar and
         Jeff.
      
       - two small nvme fixes, from Christoph and Guilherme.
      
       - rename of blk_queue_zone_size and bdev_zone_size to _sectors instead,
         to more closely follow what we do in other places in the block layer.
         This interface is new for this series, so let's get the naming right
         before releasing a kernel with this feature. From Damien.
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: don't try to discard from __blkdev_issue_zeroout
        sd: remove __data_len hack for WRITE SAME
        nvme: use blk_rq_payload_bytes
        scsi: use blk_rq_payload_bytes
        block: add blk_rq_payload_bytes
        block: Rename blk_queue_zone_size and bdev_zone_size
        nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
        nvme-rdma: fix nvme_rdma_queue_is_ready
        virtio_blk: fix panic in initialization error path
        nbd: blk_mq_init_queue returns an error code on failure, not NULL
        virtio_blk: avoid DMA to stack for the sense buffer
        do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned
      34241af7
    • A
      fix a fencepost error in pipe_advance() · b9dc6f65
      Al Viro 提交于
      The logics in pipe_advance() used to release all buffers past the new
      position failed in cases when the number of buffers to release was equal
      to pipe->buffers.  If that happened, none of them had been released,
      leaving pipe full.  Worse, it was trivial to trigger and we end up with
      pipe full of uninitialized pages.  IOW, it's an infoleak.
      
      Cc: stable@vger.kernel.org # v4.9
      Reported-by: N"Alan J. Wylie" <alan@wylie.me.uk>
      Tested-by: N"Alan J. Wylie" <alan@wylie.me.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b9dc6f65
    • D
      coredump: Ensure proper size of sparse core files · 4d22c75d
      Dave Kleikamp 提交于
      If the last section of a core file ends with an unmapped or zero page,
      the size of the file does not correspond with the last dump_skip() call.
      gdb complains that the file is truncated and can be confusing to users.
      
      After all of the vma sections are written, make sure that the file size
      is no smaller than the current file position.
      
      This problem can be demonstrated with gdb's bigcore testcase on the
      sparc architecture.
      Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4d22c75d
    • S
      aio: fix lock dep warning · a12f1ae6
      Shaohua Li 提交于
      lockdep reports a warnning. file_start_write/file_end_write only
      acquire/release the lock for regular files. So checking the files in aio
      side too.
      
      [  453.532141] ------------[ cut here ]------------
      [  453.533011] WARNING: CPU: 1 PID: 1298 at ../kernel/locking/lockdep.c:3514 lock_release+0x434/0x670
      [  453.533011] DEBUG_LOCKS_WARN_ON(depth <= 0)
      [  453.533011] Modules linked in:
      [  453.533011] CPU: 1 PID: 1298 Comm: fio Not tainted 4.9.0+ #964
      [  453.533011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014
      [  453.533011]  ffff8803a24b7a70 ffffffff8196cffb ffff8803a24b7ae8 0000000000000000
      [  453.533011]  ffff8803a24b7ab8 ffffffff81091ee1 ffff8803a5dba700 00000dba00000008
      [  453.533011]  ffffed0074496f59 ffff8803a5dbaf54 ffff8803ae0f8488 fffffffffffffdef
      [  453.533011] Call Trace:
      [  453.533011]  [<ffffffff8196cffb>] dump_stack+0x67/0x9c
      [  453.533011]  [<ffffffff81091ee1>] __warn+0x111/0x130
      [  453.533011]  [<ffffffff81091f97>] warn_slowpath_fmt+0x97/0xb0
      [  453.533011]  [<ffffffff81091f00>] ? __warn+0x130/0x130
      [  453.533011]  [<ffffffff8191b789>] ? blk_finish_plug+0x29/0x60
      [  453.533011]  [<ffffffff811205d4>] lock_release+0x434/0x670
      [  453.533011]  [<ffffffff8198af94>] ? import_single_range+0xd4/0x110
      [  453.533011]  [<ffffffff81322195>] ? rw_verify_area+0x65/0x140
      [  453.533011]  [<ffffffff813aa696>] ? aio_write+0x1f6/0x280
      [  453.533011]  [<ffffffff813aa6c9>] aio_write+0x229/0x280
      [  453.533011]  [<ffffffff813aa4a0>] ? aio_complete+0x640/0x640
      [  453.533011]  [<ffffffff8111df20>] ? debug_check_no_locks_freed+0x1a0/0x1a0
      [  453.533011]  [<ffffffff8114793a>] ? debug_lockdep_rcu_enabled.part.2+0x1a/0x30
      [  453.533011]  [<ffffffff81147985>] ? debug_lockdep_rcu_enabled+0x35/0x40
      [  453.533011]  [<ffffffff812a92be>] ? __might_fault+0x7e/0xf0
      [  453.533011]  [<ffffffff813ac9bc>] do_io_submit+0x94c/0xb10
      [  453.533011]  [<ffffffff813ac2ae>] ? do_io_submit+0x23e/0xb10
      [  453.533011]  [<ffffffff813ac070>] ? SyS_io_destroy+0x270/0x270
      [  453.533011]  [<ffffffff8111d7b3>] ? mark_held_locks+0x23/0xc0
      [  453.533011]  [<ffffffff8100201a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
      [  453.533011]  [<ffffffff813acb90>] SyS_io_submit+0x10/0x20
      [  453.533011]  [<ffffffff824f96aa>] entry_SYSCALL_64_fastpath+0x18/0xad
      [  453.533011]  [<ffffffff81119190>] ? trace_hardirqs_off_caller+0xc0/0x110
      [  453.533011] ---[ end trace b2fbe664d1cc0082 ]---
      
      Cc: Dmitry Monakhov <dmonakhov@openvz.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a12f1ae6
    • L
      Merge tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma · f0ad1771
      Linus Torvalds 提交于
      Pull dmaengine fixes from Vinod Koul:
       "The fixes this time around are spread over drivers, pretty normal
        update:
      
         - PCI ID for SKL ioatdma, workaround for SKX and
           ioat_alloc_chan_resources sleepy allocation fix
      
         - dw kconfig typo fix
      
         - null pointer deref for stm32
      
         - MAINTAINERS Update for at_hdmac
      
         - pl330 runtime pm fixes
      
         - omap-dma port window fix
      
         - rcar-dmac unmap slave resource fix"
      
      * tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: rcar-dmac: unmap slave resource when channel is freed
        dmaengine: omap-dma: Fix the port_window support
        dmaengine: iota: ioat_alloc_chan_resources should not perform sleeping allocations.
        dmaengine: pl330: Fix runtime PM support for terminated transfers
        MAINTAINERS: dmaengine: Update + Hand over the at_hdmac driver to Ludovic
        dmaengine: omap-dma: Fix dynamic lch_map allocation
        dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
        dmaengine: stm32-dma: Fix null pointer dereference in stm32_dma_tx_status
        dmaengine: stm32-dma: Set correct args number for DMA request from DT
        dmaengine: dw: fix typo in Kconfig
        dmaengine: ioatdma: workaround SKX ioatdma version
        dmaengine: ioatdma: Add Skylake PCI Dev ID
      f0ad1771
  3. 14 1月, 2017 19 次提交
    • P
      efi/x86: Prune invalid memory map entries and fix boot regression · 0100a3e6
      Peter Jones 提交于
      Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
      (2.28), include memory map entries with phys_addr=0x0 and num_pages=0.
      
      These machines fail to boot after the following commit,
      
        commit 8e80632f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
      
      Fix this by removing such bogus entries from the memory map.
      
      Furthermore, currently the log output for this case (with efi=debug)
      looks like:
      
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |  |  |  |  |  ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)
      
      This is clearly wrong, and also not as informative as it could be.  This
      patch changes it so that if we find obviously invalid memory map
      entries, we print an error and skip those entries.  It also detects the
      display of the address range calculation overflow, so the new output is:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[0x0000000000000000-0x0000000000000000] (invalid)
      
      It also detects memory map sizes that would overflow the physical
      address, for example phys_addr=0xfffffffffffff000 and
      num_pages=0x0200000000000001, and prints:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)
      
      It then removes these entries from the memory map.
      Signed-off-by: NPeter Jones <pjones@redhat.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      [Matt: Include bugzilla info in commit log]
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0100a3e6
    • J
      perf/x86: Reject non sampling events with precise_ip · 18e7a45a
      Jiri Olsa 提交于
      As Peter suggested [1] rejecting non sampling PEBS events,
      because they dont make any sense and could cause bugs
      in the NMI handler [2].
      
        [1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop
        [2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.orgSigned-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20170103142454.GA26251@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      18e7a45a
    • J
      perf/x86/intel: Account interrupts for PEBS errors · 475113d9
      Jiri Olsa 提交于
      It's possible to set up PEBS events to get only errors and not
      any data, like on SNB-X (model 45) and IVB-EP (model 62)
      via 2 perf commands running simultaneously:
      
          taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10
      
      This leads to a soft lock up, because the error path of the
      intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
      for error PEBS interrupts, so in case you're getting ONLY
      errors you don't have a way to stop the event when it's over
      the max_samples_per_tick limit:
      
        NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
        ...
        RIP: 0010:[<ffffffff81159232>]  [<ffffffff81159232>] smp_call_function_single+0xe2/0x140
        ...
        Call Trace:
         ? trace_hardirqs_on_caller+0xf5/0x1b0
         ? perf_cgroup_attach+0x70/0x70
         perf_install_in_context+0x199/0x1b0
         ? ctx_resched+0x90/0x90
         SYSC_perf_event_open+0x641/0xf90
         SyS_perf_event_open+0x9/0x10
         do_syscall_64+0x6c/0x1f0
         entry_SYSCALL64_slow_path+0x25/0x25
      
      Add perf_event_account_interrupt() which does the interrupt
      and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
      error path.
      
      We keep the pending_kill and pending_wakeup logic only in the
      __perf_event_overflow() path, because they make sense only if
      there's any data to deliver.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      475113d9
    • P
      perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race · 321027c1
      Peter Zijlstra 提交于
      Di Shen reported a race between two concurrent sys_perf_event_open()
      calls where both try and move the same pre-existing software group
      into a hardware context.
      
      The problem is exactly that described in commit:
      
        f63a8daa ("perf: Fix event->ctx locking")
      
      ... where, while we wait for a ctx->mutex acquisition, the event->ctx
      relation can have changed under us.
      
      That very same commit failed to recognise sys_perf_event_context() as an
      external access vector to the events and thereby didn't apply the
      established locking rules correctly.
      
      So while one sys_perf_event_open() call is stuck waiting on
      mutex_lock_double(), the other (which owns said locks) moves the group
      about. So by the time the former sys_perf_event_open() acquires the
      locks, the context we've acquired is stale (and possibly dead).
      
      Apply the established locking rules as per perf_event_ctx_lock_nested()
      to the mutex_lock_double() for the 'move_group' case. This obviously means
      we need to validate state after we acquire the locks.
      
      Reported-by: Di Shen (Keen Lab)
      Tested-by: NJohn Dias <joaodias@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Min Chong <mchong@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: f63a8daa ("perf: Fix event->ctx locking")
      Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      321027c1
    • P
      perf/core: Fix sys_perf_event_open() vs. hotplug · 63cae12b
      Peter Zijlstra 提交于
      There is problem with installing an event in a task that is 'stuck' on
      an offline CPU.
      
      Blocked tasks are not dis-assosciated from offlined CPUs, after all, a
      blocked task doesn't run and doesn't require a CPU etc.. Only on
      wakeup do we ammend the situation and place the task on a available
      CPU.
      
      If we hit such a task with perf_install_in_context() we'll loop until
      either that task wakes up or the CPU comes back online, if the task
      waking depends on the event being installed, we're stuck.
      
      While looking into this issue, I also spotted another problem, if we
      hit a task with perf_install_in_context() that is in the middle of
      being migrated, that is we observe the old CPU before sending the IPI,
      but run the IPI (on the old CPU) while the task is already running on
      the new CPU, things also go sideways.
      
      Rework things to rely on task_curr() -- outside of rq->lock -- which
      is rather tricky. Imagine the following scenario where we're trying to
      install the first event into our task 't':
      
      CPU0            CPU1            CPU2
      
                      (current == t)
      
      t->perf_event_ctxp[] = ctx;
      smp_mb();
      cpu = task_cpu(t);
      
                      switch(t, n);
                                      migrate(t, 2);
                                      switch(p, t);
      
                                      ctx = t->perf_event_ctxp[]; // must not be NULL
      
      smp_function_call(cpu, ..);
      
                      generic_exec_single()
                        func();
                          spin_lock(ctx->lock);
                          if (task_curr(t)) // false
      
                          add_event_to_ctx();
                          spin_unlock(ctx->lock);
      
                                      perf_event_context_sched_in();
                                        spin_lock(ctx->lock);
                                        // sees event
      
      So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing'.
      Because if CPU2's load of that variable were to observe NULL, it would
      not try to schedule the ctx and we'd have a task running without its
      counter, which would be 'bad'.
      
      As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it
      first and not see the event yet, then CPU0 must observe task_curr()
      and retry. If the install happens first, then we must see the event on
      sched-in and all is well.
      
      I think we can translate the first part (until the 'must not be NULL')
      of the scenario to a litmus test like:
      
        C C-peterz
      
        {
        }
      
        P0(int *x, int *y)
        {
                int r1;
      
                WRITE_ONCE(*x, 1);
                smp_mb();
                r1 = READ_ONCE(*y);
        }
      
        P1(int *y, int *z)
        {
                WRITE_ONCE(*y, 1);
                smp_store_release(z, 1);
        }
      
        P2(int *x, int *z)
        {
                int r1;
                int r2;
      
                r1 = smp_load_acquire(z);
      	  smp_mb();
                r2 = READ_ONCE(*x);
        }
      
        exists
        (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
      
      Where:
        x is perf_event_ctxp[],
        y is our tasks's CPU, and
        z is our task being placed on the rq of CPU2.
      
      The P0 smp_mb() is the one added by this patch, ordering the store to
      perf_event_ctxp[] from find_get_context() and the load of task_cpu()
      in task_function_call().
      
      The smp_store_release/smp_load_acquire model the RCpc locking of the
      rq->lock and the smp_mb() of P2 is the context switch switching from
      whatever CPU2 was running to our task 't'.
      
      This litmus test evaluates into:
      
        Test C-peterz Allowed
        States 7
        0:r1=0; 2:r1=0; 2:r2=0;
        0:r1=0; 2:r1=0; 2:r2=1;
        0:r1=0; 2:r1=1; 2:r2=1;
        0:r1=1; 2:r1=0; 2:r2=0;
        0:r1=1; 2:r1=0; 2:r2=1;
        0:r1=1; 2:r1=1; 2:r2=0;
        0:r1=1; 2:r1=1; 2:r2=1;
        No
        Witnesses
        Positive: 0 Negative: 7
        Condition exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
        Observation C-peterz Never 0 7
        Hash=e427f41d9146b2a5445101d3e2fcaa34
      
      And the strong and weak model agree.
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: jeremy.linton@arm.com
      Link: http://lkml.kernel.org/r/20161209135900.GU3174@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      63cae12b
    • T
      x86/mpx: Use compatible types in comparison to fix sparse error · 45382862
      Tobias Klauser 提交于
      info->si_addr is of type void __user *, so it should be compared against
      something from the same address space.
      
      This fixes the following sparse error:
      
        arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces)
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      45382862
    • L
      x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc() · 695085b4
      Len Brown 提交于
      The Intel Denverton microserver uses a 25 MHz TSC crystal,
      so we can derive its exact [*] TSC frequency
      using CPUID and some arithmetic, eg.:
      
        TSC: 1800 MHz (25000000 Hz * 216 / 3 / 1000000)
      
      [*] 'exact' is only as good as the crystal, which should be +/- 20ppm
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/306899f94804aece6d8fa8b4223ede3b48dbb59c.1484287748.git.len.brown@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      695085b4
    • L
      Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · e96f8f18
      Linus Torvalds 提交于
      Pull btrfs fixes from Chris Mason:
       "These are all over the place.
      
        The tracepoint part of the pull fixes a crash and adds a little more
        information to two tracepoints, while the rest are good old fashioned
        fixes"
      
      * 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: make tracepoint format strings more compact
        Btrfs: add truncated_len for ordered extent tracepoints
        Btrfs: add 'inode' for extent map tracepoint
        btrfs: fix crash when tracepoint arguments are freed by wq callbacks
        Btrfs: adjust outstanding_extents counter properly when dio write is split
        Btrfs: fix lockdep warning about log_mutex
        Btrfs: use down_read_nested to make lockdep silent
        btrfs: fix locking when we put back a delayed ref that's too new
        btrfs: fix error handling when run_delayed_extent_op fails
        btrfs: return the actual error value from  from btrfs_uuid_tree_iterate
      e96f8f18
    • L
      Merge tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client · 04e39627
      Linus Torvalds 提交于
      Pull ceph fixes from Ilya Dryomov:
       "Two small fixups for the filesystem changes that went into this merge
        window"
      
      * tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client:
        ceph: fix get_oldest_context()
        ceph: fix mds cluster availability check
      04e39627
    • L
      Merge tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio · af54efa4
      Linus Torvalds 提交于
      Pull VFIO fixes from Alex Williamson:
      
       - Cleanups and bug fixes for the mtty sample driver (Dan Carpenter)
      
       - Export and make use of has_capability() to fix incorrect use of
         ns_capable() for testing task capabilities (Jike Song)
      
      * tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio:
        vfio/type1: Remove pid_namespace.h include
        vfio iommu type1: fix the testing of capability for remote task
        capability: export has_capability
        vfio-mdev: remove some dead code
        vfio-mdev: buffer overflow in ioctl()
        vfio-mdev: return -EFAULT if copy_to_user() fails
      af54efa4
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 406732c9
      Linus Torvalds 提交于
      Pull KVM fixes from Paolo Bonzini:
      
       - fix for module unload vs deferred jump labels (note: there might be
         other buggy modules!)
      
       - two NULL pointer dereferences from syzkaller
      
       - also syzkaller: fix emulation of fxsave/fxrstor/sgdt/sidt, problem
         made worse during this merge window, "just" kernel memory leak on
         releases
      
       - fix emulation of "mov ss" - somewhat serious on AMD, less so on Intel
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: fix emulation of "MOV SS, null selector"
        KVM: x86: fix NULL deref in vcpu_scan_ioapic
        KVM: eventfd: fix NULL deref irqbypass consumer
        KVM: x86: Introduce segmented_write_std
        KVM: x86: flush pending lapic jump label updates on module unload
        jump_labels: API for flushing deferred jump label updates
      406732c9
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · a65c9259
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
      
       - Fix huge_ptep_set_access_flags() to return "changed" when any of the
         ptes in the contiguous range is changed, not just the last one
      
       - Fix the adr_l assembly macro to work in modules under KASLR
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: assembler: make adr_l work in modules under KASLR
        arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags
      a65c9259
    • C
      block: don't try to discard from __blkdev_issue_zeroout · bef13315
      Christoph Hellwig 提交于
      Discard can return -EIO asynchronously if the alignment for the request
      isn't suitable for the driver, which makes a proper fallback to other
      methods in __blkdev_issue_zeroout impossible.  Thus only issue a sync
      discard from blkdev_issue_zeroout an don't try discard at all from
      __blkdev_issue_zeroout as a non-invasive workaround.
      
      One more reason why abusing discard for zeroing must die..
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NEryu Guan <eguan@redhat.com>
      Fixes: e73c23ff ("block: add async variant of blkdev_issue_zeroout")
      Signed-off-by: NJens Axboe <axboe@fb.com>
      bef13315
    • C
      sd: remove __data_len hack for WRITE SAME · f80de881
      Christoph Hellwig 提交于
      Now that we have the blk_rq_payload_bytes helper available to determine
      the actual I/O size we don't need to mess around with __data_len for
      WRITE SAME.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f80de881
    • C
      nvme: use blk_rq_payload_bytes · b131c61d
      Christoph Hellwig 提交于
      The new blk_rq_payload_bytes generalizes the payload length hacks
      that nvme_map_len did before.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b131c61d
    • C
      scsi: use blk_rq_payload_bytes · fd102b12
      Christoph Hellwig 提交于
      Without that we'll pass a wrong payload size in cmd->sdb, which
      can lead to hangs with drivers that need the total transfer size.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NChris Valean <v-chvale@microsoft.com>
      Reported-by: NDexuan Cui <decui@microsoft.com>
      Fixes: f9d03f96 ("block: improve handling of the magic discard payload")
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fd102b12
    • C
      block: add blk_rq_payload_bytes · 2e3258ec
      Christoph Hellwig 提交于
      Add a helper to calculate the actual data transfer size for special
      payload requests.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2e3258ec
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c79d47f1
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "The major fix is the bfa firmware, since the latest 10Gb cards fail
        probing with the current firmware.
      
        The rest is a set of minor fixes: one missed Kconfig dependency
        causing randconfig failures, a missed error return on an error leg, a
        change for how multiqueue waits on a blocked device and a don't reset
        while in reset fix"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: bfa: Increase requested firmware version to 3.2.5.1
        scsi: snic: Return error code on memory allocation failure
        scsi: fnic: Avoid sending reset to firmware when another reset is in progress
        scsi: qedi: fix build, depends on UIO
        scsi: scsi-mq: Wait for .queue_rq() if necessary
      c79d47f1
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 6d90b4f9
      Linus Torvalds 提交于
      Pull input updates from Dmitry Torokhov:
       "Small driver fixups"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data
        Input: adxl34x - make it enumerable in ACPI environment
        Input: ALPS - fix TrackStick Y axis handling for SS5 hardware
        Input: synaptics-rmi4 - fix F03 build error when serio is module
        Input: xpad - use correct product id for x360w controllers
        Input: synaptics_i2c - change msleep to usleep_range for small msecs
        Input: i8042 - add Pegatron touchpad to noloop table
        Input: joydev - remove unused linux/miscdevice.h include
      6d90b4f9
  4. 13 1月, 2017 10 次提交