1. 04 3月, 2014 1 次提交
  2. 21 2月, 2014 17 次提交
  3. 14 2月, 2014 1 次提交
    • T
      tick: Clear broadcast pending bit when switching to oneshot · dd5fd9b9
      Thomas Gleixner 提交于
      AMD systems which use the C1E workaround in the amd_e400_idle routine
      trigger the WARN_ON_ONCE in the broadcast code when onlining a CPU.
      
      The reason is that the idle routine of those AMD systems switches the
      cpu into forced broadcast mode early on before the newly brought up
      CPU can switch over to high resolution / NOHZ mode. The timer related
      CPU1 bringup looks like this:
      
        clockevent_register_device(local_apic);
        tick_setup(local_apic);
        ...
        idle()
      	tick_broadcast_on_off(FORCE);
      	tick_broadcast_oneshot_control(ENTER)
      	  cpumask_set(cpu, broadcast_oneshot_mask);
      	halt();
      
      Now the broadcast interrupt on CPU0 sets CPU1 in the
      broadcast_pending_mask and wakes CPU1. So CPU1 continues:
      
      	local_apic_timer_interrupt()
      	   tick_handle_periodic();
      	   softirq()
      	     tick_init_highres();
      	       cpumask_clr(cpu, broadcast_oneshot_mask);
      	
      	tick_broadcast_oneshot_control(ENTER)
      	   WARN_ON(cpumask_test(cpu, broadcast_pending_mask);
      
      So while we remove CPU1 from the broadcast_oneshot_mask when we switch
      over to highres mode, we do not clear the pending bit, which then
      triggers the warning when we go back to idle.
      
      The reason why this is only visible on C1E affected AMD systems is
      that the other machines enter the deep sleep states via
      acpi_idle/intel_idle and exit the broadcast mode before executing the
      remote triggered local_apic_timer_interrupt. So the pending bit is
      already cleared when the switch over to highres mode is clearing the
      oneshot mask.
      
      The solution is simple: Clear the pending bit together with the mask
      bit when we switch over to highres mode.
      
      Stanislaw came up independently with the same patch by enforcing the
      C1E workaround and debugging the fallout. I picked mine, because mine
      has a changelog :)
      Reported-by: Npoma <pomidorabelisima@gmail.com>
      Debugged-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Olaf Hering <olaf@aepfle.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Justin M. Forbes <jforbes@redhat.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402111434180.21991@ionos.tec.linutronix.de
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      dd5fd9b9
  4. 12 2月, 2014 1 次提交
    • S
      ring-buffer: Fix first commit on sub-buffer having non-zero delta · d651aa1d
      Steven Rostedt (Red Hat) 提交于
      Each sub-buffer (buffer page) has a full 64 bit timestamp. The events on
      that page use a 27 bit delta against that timestamp in order to save on
      bits written to the ring buffer. If the time between events is larger than
      what the 27 bits can hold, a "time extend" event is added to hold the
      entire 64 bit timestamp again and the events after that hold a delta from
      that timestamp.
      
      As a "time extend" is always paired with an event, it is logical to just
      allocate the event with the time extend, to make things a bit more efficient.
      
      Unfortunately, when the pairing code was written, it removed the "delta = 0"
      from the first commit on a page, causing the events on the page to be
      slightly skewed.
      
      Fixes: 69d1b839 "ring-buffer: Bind time extend and data events together"
      Cc: stable@vger.kernel.org # 2.6.37+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d651aa1d
  5. 11 2月, 2014 1 次提交
  6. 09 2月, 2014 1 次提交
  7. 06 2月, 2014 2 次提交
    • M
      time: Fix overflow when HZ is smaller than 60 · 80d767d7
      Mikulas Patocka 提交于
      When compiling for the IA-64 ski emulator, HZ is set to 32 because the
      emulation is slow and we don't want to waste too many cycles processing
      timers. Alpha also has an option to set HZ to 32.
      
      This causes integer underflow in
      kernel/time/jiffies.c:
      kernel/time/jiffies.c:66:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
        .mult  = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
        ^
      
      This patch reduces the JIFFIES_SHIFT value to avoid the overflow.
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1401241639100.23871@file01.intranet.prod.int.rdu2.redhat.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      80d767d7
    • L
      execve: use 'struct filename *' for executable name passing · c4ad8f98
      Linus Torvalds 提交于
      This changes 'do_execve()' to get the executable name as a 'struct
      filename', and to free it when it is done.  This is what the normal
      users want, and it simplifies and streamlines their error handling.
      
      The controlled lifetime of the executable name also fixes a
      use-after-free problem with the trace_sched_process_exec tracepoint: the
      lifetime of the passed-in string for kernel users was not at all
      obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
      the pathname allocation lifetime with the execve() having finished,
      which in turn meant that the trace point that happened after
      mm_release() of the old process VM ended up using already free'd memory.
      
      To solve the kernel string lifetime issue, this simply introduces
      "getname_kernel()" that works like the normal user-space getname()
      function, except with the source coming from kernel memory.
      
      As Oleg points out, this also means that we could drop the tcomm[] array
      from 'struct linux_binprm', since the pathname lifetime now covers
      setup_new_exec().  That would be a separate cleanup.
      Reported-by: NIgor Zhbanov <i.zhbanov@samsung.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4ad8f98
  8. 05 2月, 2014 1 次提交
  9. 31 1月, 2014 2 次提交
  10. 28 1月, 2014 6 次提交
  11. 25 1月, 2014 4 次提交
  12. 24 1月, 2014 3 次提交
    • V
      kdump: fix exported size of vmcoreinfo note · 77019967
      Vivek Goyal 提交于
      Right now we seem to be exporting the max data size contained inside
      vmcoreinfo note.  But this does not include the size of meta data around
      vmcore info data.  Like name of the note and starting and ending elf_note.
      
      I think user space expects total size and that size is put in PT_NOTE elf
      header.  Things seem to be fine so far because we are not using vmcoreinfo
      note to the maximum capacity.  But as it starts filling up, to capacity,
      at some point of time, problem will be visible.
      
      I don't think user space will be broken with this change.  So there is no
      need to introduce vmcoreinfo2.  This change is safe and backward
      compatible.  More explanation on why this change is safe is below.
      
      vmcoreinfo contains information about kernel which user space needs to
      know to do things like filtering.  For example, various kernel config
      options or information about size or offset of some data structures etc.
      All this information is commmunicated to user space with an ELF note
      present in ELF /proc/vmcore file.
      
      Currently vmcoreinfo data size is 4096.  With some elf note meta data
      around it, actual size is 4132 bytes.  But we are using barely 25% of that
      size.  Rest is empty.  So even if we tell user space that size of ELf note
      is 4096 and not 4132, nothing will be broken becase after around 1000
      bytes, everything is zero anyway.
      
      But once we start filling up the note to the capacity, and not report the
      full size of note, bad things will start happening.  Either some data will
      be lost or tools will be confused that they did not fine the zero note at
      the end.
      
      So I think this change is safe and should not break existing tools.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
      Cc: Dan Aloni <da-x@monatomic.org>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77019967
    • K
      kexec: add sysctl to disable kexec_load · 7984754b
      Kees Cook 提交于
      For general-purpose (i.e.  distro) kernel builds it makes sense to build
      with CONFIG_KEXEC to allow end users to choose what kind of things they
      want to do with kexec.  However, in the face of trying to lock down a
      system with such a kernel, there needs to be a way to disable kexec_load
      (much like module loading can be disabled).  Without this, it is too easy
      for the root user to modify kernel memory even when CONFIG_STRICT_DEVMEM
      and modules_disabled are set.  With this change, it is still possible to
      load an image for use later, then disable kexec_load so the image (or lack
      of image) can't be altered.
      
      The intention is for using this in environments where "perfect"
      enforcement is hard.  Without a verified boot, along with verified
      modules, and along with verified kexec, this is trying to give a system a
      better chance to defend itself (or at least grow the window of
      discoverability) against attack in the face of a privilege escalation.
      
      In my mind, I consider several boot scenarios:
      
      1) Verified boot of read-only verified root fs loading fd-based
         verification of kexec images.
      2) Secure boot of writable root fs loading signed kexec images.
      3) Regular boot loading kexec (e.g. kcrash) image early and locking it.
      4) Regular boot with no control of kexec image at all.
      
      1 and 2 don't exist yet, but will soon once the verified kexec series has
      landed.  4 is the state of things now.  The gap between 2 and 4 is too
      large, so this change creates scenario 3, a middle-ground above 4 when 2
      and 1 are not possible for a system.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7984754b
    • O
      kernel/signal.c: change do_signal_stop/do_sigaction to use while_each_thread() · 8d38f203
      Oleg Nesterov 提交于
      Change do_signal_stop() and do_sigaction() to avoid next_thread() and use
      while_each_thread() instead.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Reviewed-by: NSameer Nanda <snanda@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d38f203