1. 23 6月, 2017 27 次提交
  2. 21 6月, 2017 3 次提交
  3. 13 6月, 2017 2 次提交
  4. 12 6月, 2017 1 次提交
  5. 11 6月, 2017 2 次提交
  6. 08 6月, 2017 4 次提交
    • P
      srcu: Allow use of Classic SRCU from both process and interrupt context · 1123a604
      Paolo Bonzini 提交于
      Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
      down a guest running iperf on a VFIO assigned device.  This happens
      because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
      context, while a worker thread does the same inside kvm_set_irq().  If the
      interrupt happens while the worker thread is executing __srcu_read_lock(),
      updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
      ->srcu_lock_count[] field can be lost.
      
      The docs say you are not supposed to call srcu_read_lock() and
      srcu_read_unlock() from irq context, but KVM interrupt injection happens
      from (host) interrupt context and it would be nice if SRCU supported the
      use case.  KVM is using SRCU here not really for the "sleepable" part,
      but rather due to its IPI-free fast detection of grace periods.  It is
      therefore not desirable to switch back to RCU, which would effectively
      revert commit 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
      2014-01-16).
      
      However, the docs are overly conservative.  You can have an SRCU instance
      only has users in irq context, and you can mix process and irq context
      as long as process context users disable interrupts.  In addition,
      __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
      Classic SRCU.  For those two implementations, only srcu_read_lock()
      is unsafe.
      
      When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
      in commit 5a41344a ("srcu: Simplify __srcu_read_unlock() via
      this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
      Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
      the caller.  Tree SRCU however only does one increment, so on most
      architectures it is more efficient for __srcu_read_lock() to use
      this_cpu_inc(), and any performance differences appear to be down in
      the noise.
      
      Cc: stable@vger.kernel.org
      Fixes: 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
      Reported-by: NLinu Cherian <linuc.decode@gmail.com>
      Suggested-by: NLinu Cherian <linuc.decode@gmail.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1123a604
    • P
      srcu: Allow use of Tiny/Tree SRCU from both process and interrupt context · cdf7abc4
      Paolo Bonzini 提交于
      Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
      down a guest running iperf on a VFIO assigned device.  This happens
      because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
      context, while a worker thread does the same inside kvm_set_irq().  If the
      interrupt happens while the worker thread is executing __srcu_read_lock(),
      updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
      ->srcu_lock_count[] field can be lost.
      
      The docs say you are not supposed to call srcu_read_lock() and
      srcu_read_unlock() from irq context, but KVM interrupt injection happens
      from (host) interrupt context and it would be nice if SRCU supported the
      use case.  KVM is using SRCU here not really for the "sleepable" part,
      but rather due to its IPI-free fast detection of grace periods.  It is
      therefore not desirable to switch back to RCU, which would effectively
      revert commit 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
      2014-01-16).
      
      However, the docs are overly conservative.  You can have an SRCU instance
      only has users in irq context, and you can mix process and irq context
      as long as process context users disable interrupts.  In addition,
      __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
      Classic SRCU.  For those two implementations, only srcu_read_lock()
      is unsafe.
      
      When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
      in commit 5a41344a ("srcu: Simplify __srcu_read_unlock() via
      this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
      Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
      the caller.  Tree SRCU however only does one increment, so on most
      architectures it is more efficient for __srcu_read_lock() to use
      this_cpu_inc(), and any performance differences appear to be down in
      the noise.
      
      Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on
      a single variable.  Therefore, as Peter Zijlstra pointed out, Tiny SRCU's
      implementation already supports mixed-context use of srcu_read_lock()
      and srcu_read_unlock(), at least as long as uses of srcu_read_lock()
      and srcu_read_unlock() in each handler are nested and paired properly.
      In other words, it is still illegal to (say) invoke srcu_read_lock()
      in an interrupt handler and to invoke the matching srcu_read_unlock()
      in a softirq handler.  Therefore, the only change required for Tiny SRCU
      is to its comments.
      
      Fixes: 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
      Reported-by: NLinu Cherian <linuc.decode@gmail.com>
      Suggested-by: NLinu Cherian <linuc.decode@gmail.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
      cdf7abc4
    • P
      Revert "printk: fix double printing with earlycon" · dac8bbba
      Petr Mladek 提交于
      This reverts commit cf39bf58.
      
      The commit regression to users that define both console=ttyS1
      and console=ttyS0 on the command line, see
      https://lkml.kernel.org/r/20170509082915.GA13236@bistromath.localdomain
      
      The kernel log messages always appeared only on one serial port. It is
      even documented in Documentation/admin-guide/serial-console.rst:
      
      "Note that you can only define one console per device type (serial,
      video)."
      
      The above mentioned commit changed the order in which the command line
      parameters are searched. As a result, the kernel log messages go to
      the last mentioned ttyS* instead of the first one.
      
      We long thought that using two console=ttyS* on the command line
      did not make sense. But then we realized that console= parameters
      were handled also by systemd, see
      http://0pointer.de/blog/projects/serial-console.html
      
      "By default systemd will instantiate one serial-getty@.service on
      the main kernel console, if it is not a virtual terminal."
      
      where
      
      "[4] If multiple kernel consoles are used simultaneously, the main
      console is the one listed first in /sys/class/tty/console/active,
      which is the last one listed on the kernel command line."
      
      This puts the original report into another light. The system is running
      in qemu. The first serial port is used to store the messages into a file.
      The second one is used to login to the system via a socket. It depends
      on systemd and the historic kernel behavior.
      
      By other words, systemd causes that it makes sense to define both
      console=ttyS1 console=ttyS0 on the command line. The kernel fix
      caused regression related to userspace (systemd) and need to be
      reverted.
      
      In addition, it went out that the fix helped only partially.
      The messages still were duplicated when the boot console was
      removed early by late_initcall(printk_late_init). Then the entire
      log was replayed when the same console was registered as a normal one.
      
      Link: 20170606160339.GC7604@pathway.suse.cz
      Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
      Cc: Sabrina Dubroca <sd@queasysnail.net>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Robin Murphy <robin.murphy@arm.com>,
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Nair, Jayachandran" <Jayachandran.Nair@cavium.com>
      Cc: linux-serial@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reported-by: NSabrina Dubroca <sd@queasysnail.net>
      Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      dac8bbba
    • J
      perf/core: Drop kernel samples even though :u is specified · cc1582c2
      Jin Yao 提交于
      When doing sampling, for example:
      
        perf record -e cycles:u ...
      
      On workloads that do a lot of kernel entry/exits we see kernel
      samples, even though :u is specified. This is due to skid existing.
      
      This might be a security issue because it can leak kernel addresses even
      though kernel sampling support is disabled.
      
      The patch drops the kernel samples if exclude_kernel is specified.
      
      For example, test on Haswell desktop:
      
        perf record -e cycles:u <mgen>
        perf report --stdio
      
      Before patch applied:
      
          99.77%  mgen     mgen              [.] buf_read
           0.20%  mgen     mgen              [.] rand_buf_init
           0.01%  mgen     [kernel.vmlinux]  [k] apic_timer_interrupt
           0.00%  mgen     mgen              [.] last_free_elem
           0.00%  mgen     libc-2.23.so      [.] __random_r
           0.00%  mgen     libc-2.23.so      [.] _int_malloc
           0.00%  mgen     mgen              [.] rand_array_init
           0.00%  mgen     [kernel.vmlinux]  [k] page_fault
           0.00%  mgen     libc-2.23.so      [.] __random
           0.00%  mgen     libc-2.23.so      [.] __strcasestr
           0.00%  mgen     ld-2.23.so        [.] strcmp
           0.00%  mgen     ld-2.23.so        [.] _dl_start
           0.00%  mgen     libc-2.23.so      [.] sched_setaffinity@@GLIBC_2.3.4
           0.00%  mgen     ld-2.23.so        [.] _start
      
      We can see kernel symbols apic_timer_interrupt and page_fault.
      
      After patch applied:
      
          99.79%  mgen     mgen           [.] buf_read
           0.19%  mgen     mgen           [.] rand_buf_init
           0.00%  mgen     libc-2.23.so   [.] __random_r
           0.00%  mgen     mgen           [.] rand_array_init
           0.00%  mgen     mgen           [.] last_free_elem
           0.00%  mgen     libc-2.23.so   [.] vfprintf
           0.00%  mgen     libc-2.23.so   [.] rand
           0.00%  mgen     libc-2.23.so   [.] __random
           0.00%  mgen     libc-2.23.so   [.] _int_malloc
           0.00%  mgen     libc-2.23.so   [.] _IO_doallocbuf
           0.00%  mgen     ld-2.23.so     [.] do_lookup_x
           0.00%  mgen     ld-2.23.so     [.] open_verify.constprop.7
           0.00%  mgen     ld-2.23.so     [.] _dl_important_hwcaps
           0.00%  mgen     libc-2.23.so   [.] sched_setaffinity@@GLIBC_2.3.4
           0.00%  mgen     ld-2.23.so     [.] _start
      
      There are only userspace symbols.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Cc: kan.liang@intel.com
      Cc: mark.rutland@arm.com
      Cc: will.deacon@arm.com
      Cc: yao.jin@intel.com
      Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cc1582c2
  7. 07 6月, 2017 1 次提交