1. 02 8月, 2018 6 次提交
  2. 31 7月, 2018 5 次提交
  3. 28 7月, 2018 1 次提交
  4. 27 7月, 2018 1 次提交
  5. 26 7月, 2018 8 次提交
  6. 25 7月, 2018 3 次提交
    • A
      tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure · 57ea2a34
      Artem Savkov 提交于
      If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
      it returns an error, but does not unset the tp flags it set previously.
      This results in a probe being considered enabled and failures like being
      unable to remove the probe through kprobe_events file since probes_open()
      expects every probe to be disabled.
      
      Link: http://lkml.kernel.org/r/20180725102826.8300-1-asavkov@redhat.com
      Link: http://lkml.kernel.org/r/20180725142038.4765-1-asavkov@redhat.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 41a7dd42 ("tracing/kprobes: Support ftrace_event_file base multibuffer")
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NArtem Savkov <asavkov@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      57ea2a34
    • M
      ring_buffer: tracing: Inherit the tracing setting to next ring buffer · 73c8d894
      Masami Hiramatsu 提交于
      Maintain the tracing on/off setting of the ring_buffer when switching
      to the trace buffer snapshot.
      
      Taking a snapshot is done by swapping the backup ring buffer
      (max_tr_buffer). But since the tracing on/off setting is defined
      by the ring buffer, when swapping it, the tracing on/off setting
      can also be changed. This causes a strange result like below:
      
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 0 > tracing_on
        /sys/kernel/debug/tracing # cat tracing_on
        0
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        0
      
      We don't touch tracing_on, but snapshot changes tracing_on
      setting each time. This is an anomaly, because user doesn't know
      that each "ring_buffer" stores its own tracing-enable state and
      the snapshot is done by swapping ring buffers.
      
      Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
      Cc: stable@vger.kernel.org
      Fixes: debdd57f ("tracing: Make a snapshot feature available from userspace")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      [ Updated commit log and comment in the code ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      73c8d894
    • S
      tracing: Fix double free of event_trigger_data · 1863c387
      Steven Rostedt (VMware) 提交于
      Running the following:
      
       # cd /sys/kernel/debug/tracing
       # echo 500000 > buffer_size_kb
      [ Or some other number that takes up most of memory ]
       # echo snapshot > events/sched/sched_switch/trigger
      
      Triggers the following bug:
      
       ------------[ cut here ]------------
       kernel BUG at mm/slub.c:296!
       invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
       CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
       RIP: 0010:kfree+0x16c/0x180
       Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f
       RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246
       RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80
       RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500
       RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be
       R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be
       R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00
       FS:  00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0
       Call Trace:
        event_trigger_callback+0xee/0x1d0
        event_trigger_write+0xfc/0x1a0
        __vfs_write+0x33/0x190
        ? handle_mm_fault+0x115/0x230
        ? _cond_resched+0x16/0x40
        vfs_write+0xb0/0x190
        ksys_write+0x52/0xc0
        do_syscall_64+0x5a/0x160
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7f363e16ab50
       Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24
       RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
       RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50
       RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001
       RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700
       R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009
       R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0
       Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper
      86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e
       ---[ end trace d301afa879ddfa25 ]---
      
      The cause is because the register_snapshot_trigger() call failed to
      allocate the snapshot buffer, and then called unregister_trigger()
      which freed the data that was passed to it. Then on return to the
      function that called register_snapshot_trigger(), as it sees it
      failed to register, it frees the trigger_data again and causes
      a double free.
      
      By calling event_trigger_init() on the trigger_data (which only ups
      the reference counter for it), and then event_trigger_free() afterward,
      the trigger_data would not get freed by the registering trigger function
      as it would only up and lower the ref count for it. If the register
      trigger function fails, then the event_trigger_free() called after it
      will free the trigger data normally.
      
      Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home
      
      Cc: stable@vger.kerne.org
      Fixes: 93e31ffb ("tracing: Add 'snapshot' event trigger command")
      Reported-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1863c387
  7. 22 7月, 2018 3 次提交
    • L
      mm: make vm_area_alloc() initialize core fields · 490fc053
      Linus Torvalds 提交于
      Like vm_area_dup(), it initializes the anon_vma_chain head, and the
      basic mm pointer.
      
      The rest of the fields end up being different for different users,
      although the plan is to also initialize the 'vm_ops' field to a dummy
      entry.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      490fc053
    • L
      mm: make vm_area_dup() actually copy the old vma data · 95faf699
      Linus Torvalds 提交于
      .. and re-initialize th eanon_vma_chain head.
      
      This removes some boiler-plate from the users, and also makes it clear
      why it didn't need use the 'zalloc()' version.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95faf699
    • L
      mm: use helper functions for allocating and freeing vm_area structs · 3928d4f5
      Linus Torvalds 提交于
      The vm_area_struct is one of the most fundamental memory management
      objects, but the management of it is entirely open-coded evertwhere,
      ranging from allocation and freeing (using kmem_cache_[z]alloc and
      kmem_cache_free) to initializing all the fields.
      
      We want to unify this in order to end up having some unified
      initialization of the vmas, and the first step to this is to at least
      have basic allocation functions.
      
      Right now those functions are literally just wrappers around the
      kmem_cache_*() calls.  This is a purely mechanical conversion:
      
          # new vma:
          kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()
      
          # copy old vma
          kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)
      
          # free vma
          kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)
      
      to the point where the old vma passed in to the vm_area_dup() function
      isn't even used yet (because I've left all the old manual initialization
      alone).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3928d4f5
  8. 18 7月, 2018 1 次提交
    • L
      Mark HI and TASKLET softirq synchronous · 3c53776e
      Linus Torvalds 提交于
      Way back in 4.9, we committed 4cd13c21 ("softirq: Let ksoftirqd do
      its job"), and ever since we've had small nagging issues with it.  For
      example, we've had:
      
        1ff68820 ("watchdog: core: make sure the watchdog_worker is not deferred")
        8d5755b3 ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
        217f6974 ("net: busy-poll: allow preemption in sk_busy_loop()")
      
      all of which worked around some of the effects of that commit.
      
      The DVB people have also complained that the commit causes excessive USB
      URB latencies, which seems to be due to the USB code using tasklets to
      schedule USB traffic.  This seems to be an issue mainly when already
      living on the edge, but waiting for ksoftirqd to handle it really does
      seem to cause excessive latencies.
      
      Now Hanna Hawa reports that this issue isn't just limited to USB URB and
      DVB, but also causes timeout problems for the Marvell SoC team:
      
       "I'm facing kernel panic issue while running raid 5 on sata disks
        connected to Macchiatobin (Marvell community board with Armada-8040
        SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
        and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
        uses a tasklet to clean the done descriptors from the queue"
      
      The latency problem causes a panic:
      
        mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
        Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
      
      We've discussed simply just reverting the original commit entirely, and
      also much more involved solutions (with per-softirq threads etc).  This
      patch is intentionally stupid and fairly limited, because the issue
      still remains, and the other solutions either got sidetracked or had
      other issues.
      
      We should probably also consider the timer softirqs to be synchronous
      and not be delayed to ksoftirqd (since they were the issue with the
      earlier watchdog problems), but that should be done as a separate patch.
      This does only the tasklet cases.
      Reported-and-tested-by: NHanna Hawa <hannah@marvell.com>
      Reported-and-tested-by: NJosef Griebichler <griebichler.josef@gmx.at>
      Reported-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c53776e
  9. 16 7月, 2018 1 次提交
    • J
      sched/deadline: Fix switched_from_dl() warning · e117cb52
      Juri Lelli 提交于
      Mark noticed that syzkaller is able to reliably trigger the following warning:
      
        dl_rq->running_bw > dl_rq->this_bw
        WARNING: CPU: 1 PID: 153 at kernel/sched/deadline.c:124 switched_from_dl+0x454/0x608
        Kernel panic - not syncing: panic_on_warn set ...
      
        CPU: 1 PID: 153 Comm: syz-executor253 Not tainted 4.18.0-rc3+ #29
        Hardware name: linux,dummy-virt (DT)
        Call trace:
         dump_backtrace+0x0/0x458
         show_stack+0x20/0x30
         dump_stack+0x180/0x250
         panic+0x2dc/0x4ec
         __warn_printk+0x0/0x150
         report_bug+0x228/0x2d8
         bug_handler+0xa0/0x1a0
         brk_handler+0x2f0/0x568
         do_debug_exception+0x1bc/0x5d0
         el1_dbg+0x18/0x78
         switched_from_dl+0x454/0x608
         __sched_setscheduler+0x8cc/0x2018
         sys_sched_setattr+0x340/0x758
         el0_svc_naked+0x30/0x34
      
      syzkaller reproducer runs a bunch of threads that constantly switch
      between DEADLINE and NORMAL classes while interacting through futexes.
      
      The splat above is caused by the fact that if a DEADLINE task is setattr
      back to NORMAL while in non_contending state (blocked on a futex -
      inactive timer armed), its contribution to running_bw is not removed
      before sub_rq_bw() gets called (!task_on_rq_queued() branch) and the
      latter sees running_bw > this_bw.
      
      Fix it by removing a task contribution from running_bw if the task is
      not queued and in non_contending state while switched to a different
      class.
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NJuri Lelli <juri.lelli@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Reviewed-by: NLuca Abeni <luca.abeni@santannapisa.it>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: claudio@evidence.eu.com
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/20180711072948.27061-1-juri.lelli@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e117cb52
  10. 15 7月, 2018 1 次提交
    • I
      stop_machine: Disable preemption when waking two stopper threads · 9fb8d5dc
      Isaac J. Manjarres 提交于
      When cpu_stop_queue_two_works() begins to wake the stopper threads, it does
      so without preemption disabled, which leads to the following race
      condition:
      
      The source CPU calls cpu_stop_queue_two_works(), with cpu1 as the source
      CPU, and cpu2 as the destination CPU. When adding the stopper threads to
      the wake queue used in this function, the source CPU stopper thread is
      added first, and the destination CPU stopper thread is added last.
      
      When wake_up_q() is invoked to wake the stopper threads, the threads are
      woken up in the order that they are queued in, so the source CPU's stopper
      thread is woken up first, and it preempts the thread running on the source
      CPU.
      
      The stopper thread will then execute on the source CPU, disable preemption,
      and begin executing multi_cpu_stop(), and wait for an ack from the
      destination CPU's stopper thread, with preemption still disabled. Since the
      worker thread that woke up the stopper thread on the source CPU is affine
      to the source CPU, and preemption is disabled on the source CPU, that
      thread will never run to dequeue the destination CPU's stopper thread from
      the wake queue, and thus, the destination CPU's stopper thread will never
      run, causing the source CPU's stopper thread to wait forever, and stall.
      
      Disable preemption when waking the stopper threads in
      cpu_stop_queue_two_works().
      
      Fixes: 0b26351b ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
      Co-Developed-by: NPrasad Sodagudi <psodagud@codeaurora.org>
      Signed-off-by: NPrasad Sodagudi <psodagud@codeaurora.org>
      Co-Developed-by: NPavankumar Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: NPavankumar Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: NIsaac J. Manjarres <isaacm@codeaurora.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: peterz@infradead.org
      Cc: matt@codeblueprint.co.uk
      Cc: bigeasy@linutronix.de
      Cc: gregkh@linuxfoundation.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1530655334-4601-1-git-send-email-isaacm@codeaurora.org
      9fb8d5dc
  11. 13 7月, 2018 2 次提交
    • J
      tracing: Reorder display of TGID to be after PID · f8494fa3
      Joel Fernandes (Google) 提交于
      Currently ftrace displays data in trace output like so:
      
                                             _-----=> irqs-off
                                            / _----=> need-resched
                                           | / _---=> hardirq/softirq
                                           || / _--=> preempt-depth
                                           ||| /     delay
                  TASK-PID   CPU    TGID   ||||    TIMESTAMP  FUNCTION
                     | |       |      |    ||||       |         |
                  bash-1091  [000] ( 1091) d..2    28.313544: sched_switch:
      
      However Android's trace visualization tools expect a slightly different
      format due to an out-of-tree patch patch that was been carried for a
      decade, notice that the TGID and CPU fields are reversed:
      
                                             _-----=> irqs-off
                                            / _----=> need-resched
                                           | / _---=> hardirq/softirq
                                           || / _--=> preempt-depth
                                           ||| /     delay
                  TASK-PID    TGID   CPU   ||||    TIMESTAMP  FUNCTION
                     | |        |      |   ||||       |         |
                  bash-1091  ( 1091) [002] d..2    64.965177: sched_switch:
      
      From kernel v4.13 onwards, during which TGID was introduced, tracing
      with systrace on all Android kernels will break (most Android kernels
      have been on 4.9 with Android patches, so this issues hasn't been seen
      yet). From v4.13 onwards things will break.
      
      The chrome browser's tracing tools also embed the systrace viewer which
      uses the legacy TGID format and updates to that are known to be
      difficult to make.
      
      Considering this, I suggest we make this change to the upstream kernel
      and backport it to all Android kernels. I believe this feature is merged
      recently enough into the upstream kernel that it shouldn't be a problem.
      Also logically, IMO it makes more sense to group the TGID with the
      TASK-PID and the CPU after these.
      
      Link: http://lkml.kernel.org/r/20180626000822.113931-1-joel@joelfernandes.org
      
      Cc: jreck@google.com
      Cc: tkjos@google.com
      Cc: stable@vger.kernel.org
      Fixes: 441dae8f ("tracing: Add support for display of tgid in trace output")
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      f8494fa3
    • D
      bpf: don't leave partial mangled prog in jit_subprogs error path · c7a89784
      Daniel Borkmann 提交于
      syzkaller managed to trigger the following bug through fault injection:
      
        [...]
        [  141.043668] verifier bug. No program starts at insn 3
        [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                       get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
        [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                       fixup_call_args kernel/bpf/verifier.c:5587 [inline]
        [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                       bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
        [  141.047355] CPU: 3 PID: 4072 Comm: a.out Not tainted 4.18.0-rc4+ #51
        [  141.048446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 1.10.2-1 04/01/2014
        [  141.049877] Call Trace:
        [  141.050324]  __dump_stack lib/dump_stack.c:77 [inline]
        [  141.050324]  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
        [  141.050950]  ? dump_stack_print_info.cold.2+0x52/0x52 lib/dump_stack.c:60
        [  141.051837]  panic+0x238/0x4e7 kernel/panic.c:184
        [  141.052386]  ? add_taint.cold.5+0x16/0x16 kernel/panic.c:385
        [  141.053101]  ? __warn.cold.8+0x148/0x1ba kernel/panic.c:537
        [  141.053814]  ? __warn.cold.8+0x117/0x1ba kernel/panic.c:530
        [  141.054506]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
        [  141.054506]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
        [  141.054506]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
        [  141.055163]  __warn.cold.8+0x163/0x1ba kernel/panic.c:538
        [  141.055820]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
        [  141.055820]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
        [  141.055820]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
        [...]
      
      What happens in jit_subprogs() is that kcalloc() for the subprog func
      buffer is failing with NULL where we then bail out. Latter is a plain
      return -ENOMEM, and this is definitely not okay since earlier in the
      loop we are walking all subprogs and temporarily rewrite insn->off to
      remember the subprog id as well as insn->imm to temporarily point the
      call to __bpf_call_base + 1 for the initial JIT pass. Thus, bailing
      out in such state and handing this over to the interpreter is troublesome
      since later/subsequent e.g. find_subprog() lookups are based on wrong
      insn->imm.
      
      Therefore, once we hit this point, we need to jump to out_free path
      where we undo all changes from earlier loop, so that interpreter can
      work on unmodified insn->{off,imm}.
      
      Another point is that should find_subprog() fail in jit_subprogs() due
      to a verifier bug, then we also should not simply defer the program to
      the interpreter since also here we did partial modifications. Instead
      we should just bail out entirely and return an error to the user who is
      trying to load the program.
      
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Reported-by: syzbot+7d427828b2ea6e592804@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c7a89784
  12. 12 7月, 2018 2 次提交
  13. 11 7月, 2018 5 次提交
    • M
      rseq: uapi: Declare rseq_cs field as union, update includes · ec9c82e0
      Mathieu Desnoyers 提交于
      Declaring the rseq_cs field as a union between __u64 and two __u32
      allows both 32-bit and 64-bit kernels to read the full __u64, and
      therefore validate that a 32-bit user-space cleared the upper 32
      bits, thus ensuring a consistent behavior between native 32-bit
      kernels and 32-bit compat tasks on 64-bit kernels.
      
      Check that the rseq_cs value read is < TASK_SIZE.
      
      The asm/byteorder.h header needs to be included by rseq.h, now
      that it is not using linux/types_32_64.h anymore.
      
      Considering that only __32 and __u64 types are declared in linux/rseq.h,
      the linux/types.h header should always be included for both kernel and
      user-space code: including stdint.h is just for u64 and u32, which are
      not used in this header at all.
      
      Use copy_from_user()/clear_user() to interact with a 64-bit field,
      because arm32 does not implement 64-bit __get_user, and ppc32 does not
      64-bit get_user. Considering that the rseq_cs pointer does not need to
      be loaded/stored with single-copy atomicity from the kernel anymore, we
      can simply use copy_from_user()/clear_user().
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-api@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Chris Lameter <cl@linux.com>
      Cc: Ben Maurer <bmaurer@fb.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20180709195155.7654-5-mathieu.desnoyers@efficios.com
      ec9c82e0
    • M
      rseq: uapi: Update uapi comments · 0fb9a1ab
      Mathieu Desnoyers 提交于
      Update rseq uapi header comments to reflect that user-space need to do
      thread-local loads/stores from/to the struct rseq fields.
      
      As a consequence of this added requirement, the kernel does not need
      to perform loads/stores with single-copy atomicity.
      
      Update the comment associated to the "flags" fields to describe
      more accurately that it's only useful to facilitate single-stepping
      through rseq critical sections with debuggers.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-api@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Chris Lameter <cl@linux.com>
      Cc: Ben Maurer <bmaurer@fb.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20180709195155.7654-4-mathieu.desnoyers@efficios.com
      0fb9a1ab
    • M
      rseq: Use get_user/put_user rather than __get_user/__put_user · 8f281770
      Mathieu Desnoyers 提交于
      __get_user()/__put_user() is used to read values for address ranges that
      were already checked with access_ok() on rseq registration.
      
      It has been recognized that __get_user/__put_user are optimizing the
      wrong thing. Replace them by get_user/put_user across rseq instead.
      
      If those end up showing up in benchmarks, the proper approach would be to
      use user_access_begin() / unsafe_{get,put}_user() / user_access_end()
      anyway.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-api@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Chris Lameter <cl@linux.com>
      Cc: Ben Maurer <bmaurer@fb.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lkml.kernel.org/r/20180709195155.7654-3-mathieu.desnoyers@efficios.com
      8f281770
    • M
      rseq: Use __u64 for rseq_cs fields, validate user inputs · e96d7135
      Mathieu Desnoyers 提交于
      Change the rseq ABI so rseq_cs start_ip, post_commit_offset and abort_ip
      fields are seen as 64-bit fields by both 32-bit and 64-bit kernels rather
      that ignoring the 32 upper bits on 32-bit kernels. This ensures we have a
      consistent behavior for a 32-bit binary executed on 32-bit kernels and in
      compat mode on 64-bit kernels.
      
      Validating the value of abort_ip field to be below TASK_SIZE ensures the
      kernel don't return to an invalid address when returning to userspace
      after an abort. I don't fully trust each architecture code to consistently
      deal with invalid return addresses.
      
      Validating the value of the start_ip and post_commit_offset fields
      prevents overflow on arithmetic performed on those values, used to
      check whether abort_ip is within the rseq critical section.
      
      If validation fails, the process is killed with a segmentation fault.
      
      When the signature encountered before abort_ip does not match the expected
      signature, return -EINVAL rather than -EPERM to be consistent with other
      input validation return codes from rseq_get_rseq_cs().
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-api@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Chris Lameter <cl@linux.com>
      Cc: Ben Maurer <bmaurer@fb.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20180709195155.7654-2-mathieu.desnoyers@efficios.com
      e96d7135
    • S
      Revert "tick: Prefer a lower rating device only if it's CPU local device" · 5b5ccbc2
      Sudeep Holla 提交于
      This reverts commit 1332a905.
      
      The original issue was not because of incorrect checking of cpumask for
      both new and old tick device. It was incorrectly analysed was due to the
      misunderstanding of the comment and misinterpretation of the return value
      from tick_check_preferred. The main issue is with the clockevent driver
      that sets the cpumask to cpu_all_mask instead of cpu_possible_mask.
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NKevin Hilman <khilman@baylibre.com>
      Tested-by: NMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Link: https://lkml.kernel.org/r/1531151136-18297-1-git-send-email-sudeep.holla@arm.com
      5b5ccbc2
  14. 08 7月, 2018 1 次提交