1. 19 5月, 2018 3 次提交
    • A
      timekeeping: Standardize on ktime_get_*() naming · fb7fcc96
      Arnd Bergmann 提交于
      The current_kernel_time64, get_monotonic_coarse64, getrawmonotonic64,
      get_monotonic_boottime64 and timekeeping_clocktai64 interfaces have
      rather inconsistent naming, and they differ in the calling conventions
      by passing the output either by reference or as a return value.
      
      Rename them to ktime_get_coarse_real_ts64, ktime_get_coarse_ts64,
      ktime_get_raw_ts64, ktime_get_boottime_ts64 and ktime_get_clocktai_ts64
      respectively, and provide the interfaces with macros or inline
      functions as needed.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: y2038@lists.linaro.org
      Cc: John Stultz <john.stultz@linaro.org>
      Link: https://lkml.kernel.org/r/20180427134016.2525989-4-arnd@arndb.de
      fb7fcc96
    • A
      timekeeping: Clean up ktime_get_real_ts64 · edca71fe
      Arnd Bergmann 提交于
      In a move to make ktime_get_*() the preferred driver interface into the
      timekeeping code, sanitizes ktime_get_real_ts64() to be a proper exported
      symbol rather than an alias for getnstimeofday64().
      
      The internal __getnstimeofday64() is no longer used, so remove that
      and merge it into ktime_get_real_ts64().
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: y2038@lists.linaro.org
      Cc: John Stultz <john.stultz@linaro.org>
      Link: https://lkml.kernel.org/r/20180427134016.2525989-3-arnd@arndb.de
      edca71fe
    • A
      timekeeping: Remove timespec64 hack · 4f0fad9a
      Arnd Bergmann 提交于
      At this point, we have converted most of the kernel to use timespec64
      consistently in place of timespec, so it seems it's time to make
      timespec64 the native structure and define timespec in terms of that
      one on 64-bit architectures.
      
      Starting with gcc-5, the compiler can completely optimize away the
      timespec_to_timespec64 and timespec64_to_timespec functions on 64-bit
      architectures. With older compilers, we introduce a couple of extra
      copies of local variables, but those are easily avoided by using
      the timespec64 based interfaces consistently, as we do in most of the
      important code paths already.
      
      The main upside of removing the hack is that printing the tv_sec
      field of a timespec64 structure can now use the %lld format
      string on all architectures without a cast to time64_t. Without
      this patch, the field is a 'long' type and would have to be printed
      using %ld on 64-bit architectures.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: y2038@lists.linaro.org
      Cc: John Stultz <john.stultz@linaro.org>
      Link: https://lkml.kernel.org/r/20180427134016.2525989-2-arnd@arndb.de
      4f0fad9a
  2. 12 5月, 2018 2 次提交
  3. 11 5月, 2018 2 次提交
  4. 09 5月, 2018 2 次提交
  5. 05 5月, 2018 3 次提交
  6. 04 5月, 2018 3 次提交
  7. 03 5月, 2018 9 次提交
    • Z
      tracing: Fix the file mode of stack tracer · 0c5a9acc
      Zhengyuan Liu 提交于
      It looks weird that the stack_trace_filter file can be written by root
      but shows that it does not have write permission by ll command.
      
      Link: http://lkml.kernel.org/r/1518054113-28096-1-git-send-email-liuzhengyuan@kylinos.cnSigned-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      0c5a9acc
    • C
      ftrace: Have set_graph_* files have normal file modes · 1ce0500d
      Chen LinX 提交于
      The set_graph_function and set_graph_notrace file mode should be 0644
      instead of 0444 as they are writeable. Note, the mode appears to be ignored
      regardless, but they should at least look sane.
      
      Link: http://lkml.kernel.org/r/1409725869-4501-1-git-send-email-linx.z.chen@intel.comAcked-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NChen LinX <linx.z.chen@intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1ce0500d
    • P
      kthread, sched/wait: Fix kthread_parkme() completion issue · 85f1abe0
      Peter Zijlstra 提交于
      Even with the wait-loop fixed, there is a further issue with
      kthread_parkme(). Upon hotplug, when we do takedown_cpu(),
      smpboot_park_threads() can return before all those threads are in fact
      blocked, due to the placement of the complete() in __kthread_parkme().
      
      When that happens, sched_cpu_dying() -> migrate_tasks() can end up
      migrating such a still runnable task onto another CPU.
      
      Normally the task will have hit schedule() and gone to sleep by the
      time we do kthread_unpark(), which will then do __kthread_bind() to
      re-bind the task to the correct CPU.
      
      However, when we loose the initial TASK_PARKED store to the concurrent
      wakeup issue described previously, do the complete(), get migrated, it
      is possible to either:
      
       - observe kthread_unpark()'s clearing of SHOULD_PARK and terminate
         the park and set TASK_RUNNING, or
      
       - __kthread_bind()'s wait_task_inactive() to observe the competing
         TASK_RUNNING store.
      
      Either way the WARN() in __kthread_bind() will trigger and fail to
      correctly set the CPU affinity.
      
      Fix this by only issuing the complete() when the kthread has scheduled
      out. This does away with all the icky 'still running' nonsense.
      
      The alternative is to promote TASK_PARKED to a special state, this
      guarantees wait_task_inactive() cannot observe a 'stale' TASK_RUNNING
      and we'll end up doing the right thing, but this preserves the whole
      icky business of potentially migating the still runnable thing.
      Reported-by: NGaurav Kohli <gkohli@codeaurora.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      85f1abe0
    • P
      kthread, sched/wait: Fix kthread_parkme() wait-loop · 741a76b3
      Peter Zijlstra 提交于
      Gaurav reported a problem with __kthread_parkme() where a concurrent
      try_to_wake_up() could result in competing stores to ->state which,
      when the TASK_PARKED store got lost bad things would happen.
      
      The comment near set_current_state() actually mentions this competing
      store, but only mentions the case against TASK_RUNNING. This same
      store, with different timing, can happen against a subsequent !RUNNING
      store.
      
      This normally is not a problem, because as per that same comment, the
      !RUNNING state store is inside a condition based wait-loop:
      
        for (;;) {
          set_current_state(TASK_UNINTERRUPTIBLE);
          if (!need_sleep)
            break;
          schedule();
        }
        __set_current_state(TASK_RUNNING);
      
      If we loose the (first) TASK_UNINTERRUPTIBLE store to a previous
      (concurrent) wakeup, the schedule() will NO-OP and we'll go around the
      loop once more.
      
      The problem here is that the TASK_PARKED store is not inside the
      KTHREAD_SHOULD_PARK condition wait-loop.
      
      There is a genuine issue with sleeps that do not have a condition;
      this is addressed in a subsequent patch.
      Reported-by: NGaurav Kohli <gkohli@codeaurora.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      741a76b3
    • V
      sched/fair: Fix the update of blocked load when newly idle · 457be908
      Vincent Guittot 提交于
      With commit:
      
        31e77c93 ("sched/fair: Update blocked load when newly idle")
      
      ... we release the rq->lock when updating blocked load of idle CPUs.
      
      This opens a time window during which another CPU can add a task to this
      CPU's cfs_rq.
      
      The check for newly added task of idle_balance() is not in the common path.
      Move the out label to include this check.
      Reported-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 31e77c93 ("sched/fair: Update blocked load when newly idle")
      Link: http://lkml.kernel.org/r/20180426103133.GA6953@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      457be908
    • P
      stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock · 0b26351b
      Peter Zijlstra 提交于
      Matt reported the following deadlock:
      
      CPU0					CPU1
      
      schedule(.prev=migrate/0)		<fault>
        pick_next_task()			  ...
          idle_balance()			    migrate_swap()
            active_balance()			      stop_two_cpus()
      						spin_lock(stopper0->lock)
      						spin_lock(stopper1->lock)
      						ttwu(migrate/0)
      						  smp_cond_load_acquire() -- waits for schedule()
              stop_one_cpu(1)
      	  spin_lock(stopper1->lock) -- waits for stopper lock
      
      Fix this deadlock by taking the wakeups out from under stopper->lock.
      This allows the active_balance() to queue the stop work and finish the
      context switch, which in turn allows the wakeup from migrate_swap() to
      observe the context and complete the wakeup.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reported-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180420095005.GH4064@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0b26351b
    • J
      bpf: sockmap, fix error handling in redirect failures · abaeb096
      John Fastabend 提交于
      When a redirect failure happens we release the buffers in-flight
      without calling a sk_mem_uncharge(), the uncharge is called before
      dropping the sock lock for the redirecte, however we missed updating
      the ring start index. When no apply actions are in progress this
      is OK because we uncharge the entire buffer before the redirect.
      But, when we have apply logic running its possible that only a
      portion of the buffer is being redirected. In this case we only
      do memory accounting for the buffer slice being redirected and
      expect to be able to loop over the BPF program again and/or if
      a sock is closed uncharge the memory at sock destruct time.
      
      With an invalid start index however the program logic looks at
      the start pointer index, checks the length, and when seeing the
      length is zero (from the initial release and failure to update
      the pointer) aborts without uncharging/releasing the remaining
      memory.
      
      The fix for this is simply to update the start index. To avoid
      fixing this error in two locations we do a small refactor and
      remove one case where it is open-coded. Then fix it in the
      single function.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      abaeb096
    • J
      bpf: sockmap, zero sg_size on error when buffer is released · fec51d40
      John Fastabend 提交于
      When an error occurs during a redirect we have two cases that need
      to be handled (i) we have a cork'ed buffer (ii) we have a normal
      sendmsg buffer.
      
      In the cork'ed buffer case we don't currently support recovering from
      errors in a redirect action. So the buffer is released and the error
      should _not_ be pushed back to the caller of sendmsg/sendpage. The
      rationale here is the user will get an error that relates to old
      data that may have been sent by some arbitrary thread on that sock.
      Instead we simple consume the data and tell the user that the data
      has been consumed. We may add proper error recovery in the future.
      However, this patch fixes a bug where the bytes outstanding counter
      sg_size was not zeroed. This could result in a case where if the user
      has both a cork'ed action and apply action in progress we may
      incorrectly call into the BPF program when the user expected an
      old verdict to be applied via the apply action. I don't have a use
      case where using apply and cork at the same time is valid but we
      never explicitly reject it because it should work fine. This patch
      ensures the sg_size is zeroed so we don't have this case.
      
      In the normal sendmsg buffer case (no cork data) we also do not
      zero sg_size. Again this can confuse the apply logic when the logic
      calls into the BPF program when the BPF programmer expected the old
      verdict to remain. So ensure we set sg_size to zero here as well. And
      additionally to keep the psock state in-sync with the sk_msg_buff
      release all the memory as well. Previously we did this before
      returning to the user but this left a gap where psock and sk_msg_buff
      states were out of sync which seems fragile. No additional overhead
      is taken here except for a call to check the length and realize its
      already been freed. This is in the error path as well so in my
      opinion lets have robust code over optimized error paths.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      fec51d40
    • J
      bpf: sockmap, fix scatterlist update on error path in send with apply · 3cc9a472
      John Fastabend 提交于
      When the call to do_tcp_sendpage() fails to send the complete block
      requested we either retry if only a partial send was completed or
      abort if we receive a error less than or equal to zero. Before
      returning though we must update the scatterlist length/offset to
      account for any partial send completed.
      
      Before this patch we did this at the end of the retry loop, but
      this was buggy when used while applying a verdict to fewer bytes
      than in the scatterlist. When the scatterlist length was being set
      we forgot to account for the apply logic reducing the size variable.
      So the result was we chopped off some bytes in the scatterlist without
      doing proper cleanup on them. This results in a WARNING when the
      sock is tore down because the bytes have previously been charged to
      the socket but are never uncharged.
      
      The simple fix is to simply do the accounting inside the retry loop
      subtracting from the absolute scatterlist values rather than trying
      to accumulate the totals and subtract at the end.
      Reported-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      3cc9a472
  8. 02 5月, 2018 4 次提交
  9. 01 5月, 2018 1 次提交
  10. 27 4月, 2018 5 次提交
  11. 26 4月, 2018 2 次提交
    • T
      Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME · a3ed0e43
      Thomas Gleixner 提交于
      Revert commits
      
      92af4dcb ("tracing: Unify the "boot" and "mono" tracing clocks")
      127bfa5f ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior")
      7250a404 ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior")
      d6c7270e ("timekeeping: Remove boot time specific code")
      f2d6fdbf ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior")
      d6ed449a ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock")
      72199320 ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock")
      
      As stated in the pull request for the unification of CLOCK_MONOTONIC and
      CLOCK_BOOTTIME, it was clear that we might have to revert the change.
      
      As reported by several folks systemd and other applications rely on the
      documented behaviour of CLOCK_MONOTONIC on Linux and break with the above
      changes. After resume daemons time out and other timeout related issues are
      observed. Rafael compiled this list:
      
      * systemd kills daemons on resume, after >WatchdogSec seconds
        of suspending (Genki Sky).  [Verified that that's because systemd uses
        CLOCK_MONOTONIC and expects it to not include the suspend time.]
      
      * systemd-journald misbehaves after resume:
        systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal
      corrupted or uncleanly shut down, renaming and replacing.
        (Mike Galbraith).
      
      * NetworkManager reports "networking disabled" and networking is broken
        after resume 50% of the time (Pavel).  [May be because of systemd.]
      
      * MATE desktop dims the display and starts the screensaver right after
        system resume (Pavel).
      
      * Full system hang during resume (me).  [May be due to systemd or NM or both.]
      
      That happens on debian and open suse systems.
      
      It's sad, that these problems were neither catched in -next nor by those
      folks who expressed interest in this change.
      Reported-by: NRafael J. Wysocki <rjw@rjwysocki.net>
      Reported-by: Genki Sky <sky@genki.is>,
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kevin Easton <kevin@guarana.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      a3ed0e43
    • T
      tick/sched: Do not mess with an enqueued hrtimer · 1f71addd
      Thomas Gleixner 提交于
      Kaike reported that in tests rdma hrtimers occasionaly stopped working. He
      did great debugging, which provided enough context to decode the problem.
      
      CPU 3			     	      	     CPU 2
      
      idle
      start sched_timer expires = 712171000000
       queue->next = sched_timer
      					    start rdmavt timer. expires = 712172915662
      					    lock(baseof(CPU3))
      tick_nohz_stop_tick()
      tick = 716767000000			    timerqueue_add(tmr)
      
      hrtimer_set_expires(sched_timer, tick);
        sched_timer->expires = 716767000000  <---- FAIL
      					     if (tmr->expires < queue->next->expires)
      hrtimer_start(sched_timer)		          queue->next = tmr;
      lock(baseof(CPU3))
      					     unlock(baseof(CPU3))
      timerqueue_remove()
      timerqueue_add()
      
      ts->sched_timer is queued and queue->next is pointing to it, but then
      ts->sched_timer.expires is modified.
      
      This not only corrupts the ordering of the timerqueue RB tree, it also
      makes CPU2 see the new expiry time of timerqueue->next->expires when
      checking whether timerqueue->next needs to be updated. So CPU2 sees that
      the rdma timer is earlier than timerqueue->next and sets the rdma timer as
      new next.
      
      Depending on whether it had also seen the new time at RB tree enqueue, it
      might have queued the rdma timer at the wrong place and then after removing
      the sched_timer the RB tree is completely hosed.
      
      The problem was introduced with a commit which tried to solve inconsistency
      between the hrtimer in the tick_sched data and the underlying hardware
      clockevent. It split out hrtimer_set_expires() to store the new tick time
      in both the NOHZ and the NOHZ + HIGHRES case, but missed the fact that in
      the NOHZ + HIGHRES case the hrtimer might still be queued.
      
      Use hrtimer_start(timer, tick...) for the NOHZ + HIGHRES case which sets
      timer->expires after canceling the timer and move the hrtimer_set_expires()
      invocation into the NOHZ only code path which is not affected as it merily
      uses the hrtimer as next event storage so code pathes can be shared with
      the NOHZ + HIGHRES case.
      
      Fixes: d4af6d93 ("nohz: Fix spurious warning when hrtimer and clockevent get out of sync")
      Reported-by: N"Wan Kaike" <kaike.wan@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NFrederic Weisbecker <frederic@kernel.org>
      Cc: "Marciniszyn Mike" <mike.marciniszyn@intel.com>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: linux-rdma@vger.kernel.org
      Cc: "Dalessandro Dennis" <dennis.dalessandro@intel.com>
      Cc: "Fleck John" <john.fleck@intel.com>
      Cc: stable@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: "Weiny Ira" <ira.weiny@intel.com>
      Cc: "linux-rdma@vger.kernel.org"
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804241637390.1679@nanos.tec.linutronix.de
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804242119210.1597@nanos.tec.linutronix.de
      
      1f71addd
  12. 25 4月, 2018 3 次提交
    • P
      tracing: Fix missing tab for hwlat_detector print format · 9a0fd675
      Peter Xu 提交于
      It's been missing for a while but no one is touching that up.  Fix it.
      
      Link: http://lkml.kernel.org/r/20180315060639.9578-1-peterx@redhat.com
      
      CC: Ingo Molnar <mingo@kernel.org>
      Cc:stable@vger.kernel.org
      Fixes: 7b2c8625 ("tracing: Add NMI tracing in hwlat detector")
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      9a0fd675
    • T
      kprobes: Fix random address output of blacklist file · bcbd385b
      Thomas Richter 提交于
      File /sys/kernel/debug/kprobes/blacklist displays random addresses:
      
      [root@s8360046 linux]# cat /sys/kernel/debug/kprobes/blacklist
      0x0000000047149a90-0x00000000bfcb099a	print_type_x8
      ....
      
      This breaks 'perf probe' which uses the blacklist file to prohibit
      probes on certain functions by checking the address range.
      
      Fix this by printing the correct (unhashed) address.
      
      The file mode is read all but this is not an issue as the file
      hierarchy points out:
       # ls -ld /sys/ /sys/kernel/ /sys/kernel/debug/ /sys/kernel/debug/kprobes/
      	/sys/kernel/debug/kprobes/blacklist
      dr-xr-xr-x 12 root root 0 Apr 19 07:56 /sys/
      drwxr-xr-x  8 root root 0 Apr 19 07:56 /sys/kernel/
      drwx------ 16 root root 0 Apr 19 06:56 /sys/kernel/debug/
      drwxr-xr-x  2 root root 0 Apr 19 06:56 /sys/kernel/debug/kprobes/
      -r--r--r--  1 root root 0 Apr 19 06:56 /sys/kernel/debug/kprobes/blacklist
      
      Everything in and below /sys/kernel/debug is rwx to root only,
      no group or others have access.
      
      Background:
      Directory /sys/kernel/debug/kprobes is created by debugfs_create_dir()
      which sets the mode bits to rwxr-xr-x. Maybe change that to use the
      parent's directory mode bits instead?
      
      Link: http://lkml.kernel.org/r/20180419105556.86664-1-tmricht@linux.ibm.com
      
      Fixes: ad67b74d ("printk: hash addresses printed with %p")
      Cc: stable@vger.kernel.org
      Cc: <stable@vger.kernel.org> # v4.15+
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S Miller <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: acme@kernel.org
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      bcbd385b
    • R
      tracing: Fix kernel crash while using empty filter with perf · ba16293d
      Ravi Bangoria 提交于
      Kernel is crashing when user tries to record 'ftrace:function' event
      with empty filter:
      
        # perf record -e ftrace:function --filter="" ls
      
        # dmesg
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
        Oops: 0000 [#1] SMP PTI
        ...
        RIP: 0010:ftrace_profile_set_filter+0x14b/0x2d0
        RSP: 0018:ffffa4a7c0da7d20 EFLAGS: 00010246
        RAX: ffffa4a7c0da7d64 RBX: 0000000000000000 RCX: 0000000000000006
        RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff8c48ffc968f0
        ...
        Call Trace:
         _perf_ioctl+0x54a/0x6b0
         ? rcu_all_qs+0x5/0x30
        ...
      
      After patch:
        # perf record -e ftrace:function --filter="" ls
        failed to set filter "" on event ftrace:function with 22 (Invalid argument)
      
      Also, if user tries to echo "" > filter, it used to throw an error.
      This behavior got changed by commit 80765597 ("tracing: Rewrite
      filter logic to be simpler and faster"). This patch restores the
      behavior as a side effect:
      
      Before patch:
        # echo "" > filter
        #
      
      After patch:
        # echo "" > filter
        bash: echo: write error: Invalid argument
        #
      
      Link: http://lkml.kernel.org/r/20180420150758.19787-1-ravi.bangoria@linux.ibm.com
      
      Fixes: 80765597 ("tracing: Rewrite filter logic to be simpler and faster")
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      ba16293d
  13. 24 4月, 2018 1 次提交
    • J
      bpf: sockmap, fix double page_put on ENOMEM error in redirect path · 4fcfdfb8
      John Fastabend 提交于
      In the case where the socket memory boundary is hit the redirect
      path returns an ENOMEM error. However, before checking for this
      condition the redirect scatterlist buffer is setup with a valid
      page and length. This is never unwound so when the buffers are
      released latter in the error path we do a put_page() and clear
      the scatterlist fields. But, because the initial error happens
      before completing the scatterlist buffer we end up with both the
      original buffer and the redirect buffer pointing to the same page
      resulting in duplicate put_page() calls.
      
      To fix this simply move the initial configuration of the redirect
      scatterlist buffer below the sock memory check.
      
      Found this while running TCP_STREAM test with netperf using Cilium.
      
      Fixes: fa246693 ("bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      4fcfdfb8