1. 11 12月, 2014 1 次提交
    • P
      kernel: add panic_on_warn · 9e3961a0
      Prarit Bhargava 提交于
      There have been several times where I have had to rebuild a kernel to
      cause a panic when hitting a WARN() in the code in order to get a crash
      dump from a system.  Sometimes this is easy to do, other times (such as
      in the case of a remote admin) it is not trivial to send new images to
      the user.
      
      A much easier method would be a switch to change the WARN() over to a
      panic.  This makes debugging easier in that I can now test the actual
      image the WARN() was seen on and I do not have to engage in remote
      debugging.
      
      This patch adds a panic_on_warn kernel parameter and
      /proc/sys/kernel/panic_on_warn calls panic() in the
      warn_slowpath_common() path.  The function will still print out the
      location of the warning.
      
      An example of the panic_on_warn output:
      
      The first line below is from the WARN_ON() to output the WARN_ON()'s
      location.  After that the panic() output is displayed.
      
          WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
          Kernel panic - not syncing: panic_on_warn set ...
      
          CPU: 30 PID: 11698 Comm: insmod Tainted: G        W  OE  3.17.0+ #57
          Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
           0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
           0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
           ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
          Call Trace:
           [<ffffffff81665190>] dump_stack+0x46/0x58
           [<ffffffff8165e2ec>] panic+0xd0/0x204
           [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
           [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
           [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
           [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
           [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
           [<ffffffff81002144>] do_one_initcall+0xd4/0x210
           [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
           [<ffffffff810f8889>] load_module+0x16a9/0x1b30
           [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
           [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
           [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
           [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17
      
      Successfully tested by me.
      
      hpa said: There is another very valid use for this: many operators would
      rather a machine shuts down than being potentially compromised either
      functionally or security-wise.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Acked-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e3961a0
  2. 10 12月, 2014 1 次提交
  3. 09 12月, 2014 11 次提交
  4. 08 12月, 2014 1 次提交
  5. 05 12月, 2014 1 次提交
  6. 04 12月, 2014 2 次提交
  7. 02 12月, 2014 2 次提交
  8. 28 11月, 2014 1 次提交
    • A
      arm64: ptrace: add NT_ARM_SYSTEM_CALL regset · 766a85d7
      AKASHI Takahiro 提交于
      This regeset is intended to be used to get and set a system call number
      while tracing.
      There was some discussion about possible approaches to do so:
      
      (1) modify x8 register with ptrace(PTRACE_SETREGSET) indirectly,
          and update regs->syscallno later on in syscall_trace_enter(), or
      (2) define a dedicated regset for this purpose as on s390, or
      (3) support ptrace(PTRACE_SET_SYSCALL) as on arch/arm
      
      Thinking of the fact that user_pt_regs doesn't expose 'syscallno' to
      tracer as well as that secure_computing() expects a changed syscall number,
      especially case of -1, to be visible before this function returns in
      syscall_trace_enter(), (1) doesn't work well.
      We will take (2) since it looks much cleaner.
      Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      766a85d7
  9. 25 11月, 2014 1 次提交
  10. 21 11月, 2014 1 次提交
  11. 20 11月, 2014 3 次提交
    • D
      dlm: adopt orphan locks · 2ab4bd8e
      David Teigland 提交于
      A process may exit, leaving an orphan lock in the lockspace.
      This adds the capability for another process to acquire the
      orphan lock.  Acquiring the orphan just moves the lock from
      the orphan list onto the acquiring process's list of locks.
      
      An adopting process must specify the resource name and mode
      of the lock it wants to adopt.  If a matching lock is found,
      the lock is moved to the caller's 's list of locks, and the
      lkid of the lock is returned like the lkid of a new lock.
      
      If an orphan with a different mode is found, then -EAGAIN is
      returned.  If no orphan lock is found on the resource, then
      -ENOENT is returned.  No async completion is used because
      the result is immediately available.
      
      Also, when orphans are purged, allow a zero nodeid to refer
      to the local nodeid so the caller does not need to look up
      the local nodeid.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      2ab4bd8e
    • M
      dm: enhance internal suspend and resume interface · ffcc3936
      Mike Snitzer 提交于
      Rename dm_internal_{suspend,resume} to dm_internal_{suspend,resume}_fast
      -- dm-stats will continue using these methods to avoid all the extra
      suspend/resume logic that is not needed in order to quickly flush IO.
      
      Introduce dm_internal_suspend_noflush() variant that actually calls the
      mapped_device's target callbacks -- otherwise target-specific hooks are
      avoided (e.g. dm-thin's thin_presuspend and thin_postsuspend).  Common
      code between dm_internal_{suspend_noflush,resume} and
      dm_{suspend,resume} was factored out as __dm_{suspend,resume}.
      
      Update dm_internal_{suspend_noflush,resume} to always take and release
      the mapped_device's suspend_lock.  Also update dm_{suspend,resume} to be
      aware of potential for DM_INTERNAL_SUSPEND_FLAG to be set and respond
      accordingly by interruptibly waiting for the DM_INTERNAL_SUSPEND_FLAG to
      be cleared.  Add lockdep annotation to dm_suspend() and dm_resume().
      
      The existing DM_SUSPEND_FLAG remains unchanged.
      DM_INTERNAL_SUSPEND_FLAG is set by dm_internal_suspend_noflush() and
      cleared by dm_internal_resume().
      
      Both DM_SUSPEND_FLAG and DM_INTERNAL_SUSPEND_FLAG may be set if a device
      was already suspended when dm_internal_suspend_noflush() was called --
      this can be thought of as a "nested suspend".  A "nested suspend" can
      occur with legacy userspace dm-thin code that might suspend all active
      thin volumes before suspending the pool for resize.
      
      But otherwise, in the normal dm-thin-pool suspend case moving forward:
      the thin-pool will have DM_SUSPEND_FLAG set and all active thins from
      that thin-pool will have DM_INTERNAL_SUSPEND_FLAG set.
      
      Also add DM_INTERNAL_SUSPEND_FLAG to status report.  This new
      DM_INTERNAL_SUSPEND_FLAG state is being reported to assist with
      debugging (e.g. 'dmsetup info' will report an internally suspended
      device accordingly).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      ffcc3936
    • M
      dm: add presuspend_undo hook to target_type · d67ee213
      Mike Snitzer 提交于
      The DM thin-pool target now must undo the changes performed during
      pool_presuspend() so introduce presuspend_undo hook in target_type.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      d67ee213
  12. 18 11月, 2014 2 次提交
    • D
      x86, mpx: On-demand kernel allocation of bounds tables · fe3d197f
      Dave Hansen 提交于
      This is really the meat of the MPX patch set.  If there is one patch to
      review in the entire series, this is the one.  There is a new ABI here
      and this kernel code also interacts with userspace memory in a
      relatively unusual manner.  (small FAQ below).
      
      Long Description:
      
      This patch adds two prctl() commands to provide enable or disable the
      management of bounds tables in kernel, including on-demand kernel
      allocation (See the patch "on-demand kernel allocation of bounds tables")
      and cleanup (See the patch "cleanup unused bound tables"). Applications
      do not strictly need the kernel to manage bounds tables and we expect
      some applications to use MPX without taking advantage of this kernel
      support. This means the kernel can not simply infer whether an application
      needs bounds table management from the MPX registers.  The prctl() is an
      explicit signal from userspace.
      
      PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
      require kernel's help in managing bounds tables.
      
      PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
      want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
      won't allocate and free bounds tables even if the CPU supports MPX.
      
      PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
      directory out of a userspace register (bndcfgu) and then cache it into
      a new field (->bd_addr) in  the 'mm_struct'.  PR_MPX_DISABLE_MANAGEMENT
      will set "bd_addr" to an invalid address.  Using this scheme, we can
      use "bd_addr" to determine whether the management of bounds tables in
      kernel is enabled.
      
      Also, the only way to access that bndcfgu register is via an xsaves,
      which can be expensive.  Caching "bd_addr" like this also helps reduce
      the cost of those xsaves when doing table cleanup at munmap() time.
      Unfortunately, we can not apply this optimization to #BR fault time
      because we need an xsave to get the value of BNDSTATUS.
      
      ==== Why does the hardware even have these Bounds Tables? ====
      
      MPX only has 4 hardware registers for storing bounds information.
      If MPX-enabled code needs more than these 4 registers, it needs to
      spill them somewhere. It has two special instructions for this
      which allow the bounds to be moved between the bounds registers
      and some new "bounds tables".
      
      They are similar conceptually to a page fault and will be raised by
      the MPX hardware during both bounds violations or when the tables
      are not present. This patch handles those #BR exceptions for
      not-present tables by carving the space out of the normal processes
      address space (essentially calling the new mmap() interface indroduced
      earlier in this patch set.) and then pointing the bounds-directory
      over to it.
      
      The tables *need* to be accessed and controlled by userspace because
      the instructions for moving bounds in and out of them are extremely
      frequent. They potentially happen every time a register pointing to
      memory is dereferenced. Any direct kernel involvement (like a syscall)
      to access the tables would obviously destroy performance.
      
      ==== Why not do this in userspace? ====
      
      This patch is obviously doing this allocation in the kernel.
      However, MPX does not strictly *require* anything in the kernel.
      It can theoretically be done completely from userspace. Here are
      a few ways this *could* be done. I don't think any of them are
      practical in the real-world, but here they are.
      
      Q: Can virtual space simply be reserved for the bounds tables so
         that we never have to allocate them?
      A: As noted earlier, these tables are *HUGE*. An X-GB virtual
         area needs 4*X GB of virtual space, plus 2GB for the bounds
         directory. If we were to preallocate them for the 128TB of
         user virtual address space, we would need to reserve 512TB+2GB,
         which is larger than the entire virtual address space today.
         This means they can not be reserved ahead of time. Also, a
         single process's pre-popualated bounds directory consumes 2GB
         of virtual *AND* physical memory. IOW, it's completely
         infeasible to prepopulate bounds directories.
      
      Q: Can we preallocate bounds table space at the same time memory
         is allocated which might contain pointers that might eventually
         need bounds tables?
      A: This would work if we could hook the site of each and every
         memory allocation syscall. This can be done for small,
         constrained applications. But, it isn't practical at a larger
         scale since a given app has no way of controlling how all the
         parts of the app might allocate memory (think libraries). The
         kernel is really the only place to intercept these calls.
      
      Q: Could a bounds fault be handed to userspace and the tables
         allocated there in a signal handler instead of in the kernel?
      A: (thanks to tglx) mmap() is not on the list of safe async
         handler functions and even if mmap() would work it still
         requires locking or nasty tricks to keep track of the
         allocation state there.
      
      Having ruled out all of the userspace-only approaches for managing
      bounds tables that we could think of, we create them on demand in
      the kernel.
      Based-on-patch-by: NQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      fe3d197f
    • Q
      mpx: Extend siginfo structure to include bound violation information · ee1b58d3
      Qiaowei Ren 提交于
      This patch adds new fields about bound violation into siginfo
      structure. si_lower and si_upper are respectively lower bound
      and upper bound when bound violation is caused.
      Signed-off-by: NQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151819.1908C900@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ee1b58d3
  13. 16 11月, 2014 2 次提交
  14. 15 11月, 2014 3 次提交
  15. 11 11月, 2014 1 次提交
  16. 06 11月, 2014 1 次提交
  17. 04 11月, 2014 1 次提交
  18. 29 10月, 2014 1 次提交
  19. 28 10月, 2014 2 次提交
  20. 24 10月, 2014 2 次提交