1. 28 1月, 2013 7 次提交
  2. 25 1月, 2013 1 次提交
    • A
      x86/MSI: Support multiple MSIs in presense of IRQ remapping · 51906e77
      Alexander Gordeev 提交于
      The MSI specification has several constraints in comparison with
      MSI-X, most notable of them is the inability to configure MSIs
      independently. As a result, it is impossible to dispatch
      interrupts from different queues to different CPUs. This is
      largely devalues the support of multiple MSIs in SMP systems.
      
      Also, a necessity to allocate a contiguous block of vector
      numbers for devices capable of multiple MSIs might cause a
      considerable pressure on x86 interrupt vector allocator and
      could lead to fragmentation of the interrupt vectors space.
      
      This patch overcomes both drawbacks in presense of IRQ remapping
      and lets devices take advantage of multiple queues and per-IRQ
      affinity assignments.
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/c8bd86ff56b5fc118257436768aaa04489ac0a4c.1353324359.git.agordeev@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      51906e77
  3. 24 1月, 2013 2 次提交
  4. 23 1月, 2013 2 次提交
    • O
      ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL · 9899d11f
      Oleg Nesterov 提交于
      putreg() assumes that the tracee is not running and pt_regs_access() can
      safely play with its stack.  However a killed tracee can return from
      ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
      that debugger can actually read/modify the kernel stack until the tracee
      does SAVE_REST again.
      
      set_task_blockstep() can race with SIGKILL too and in some sense this
      race is even worse, the very fact the tracee can be woken up breaks the
      logic.
      
      As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
      call, this ensures that nobody can ever wakeup the tracee while the
      debugger looks at it.  Not only this fixes the mentioned problems, we
      can do some cleanups/simplifications in arch_ptrace() paths.
      
      Probably ptrace_unfreeze_traced() needs more callers, for example it
      makes sense to make the tracee killable for oom-killer before
      access_process_vm().
      
      While at it, add the comment into may_ptrace_stop() to explain why
      ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
      Reported-by: NSalman Qazi <sqazi@google.com>
      Reported-by: NSuleiman Souhlal <suleiman@google.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9899d11f
    • W
      arm64: elf: fix core dumping to match what glibc expects · 9cf2b72b
      Will Deacon 提交于
      The kernel's internal definition of ELF_NGREG uses struct pt_regs, which
      means that we disagree with userspace on the size of coredumps since
      glibc correctly uses the user-visible struct user_pt_regs.
      
      This patch fixes our ELF_NGREG definition to use struct user_pt_regs
      and introduces our own ELF_CORE_COPY_REGS to convert between the user
      and kernel structure definitions.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      9cf2b72b
  5. 21 1月, 2013 1 次提交
  6. 18 1月, 2013 2 次提交
  7. 17 1月, 2013 1 次提交
    • A
      xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests. · 9174adbe
      Andrew Cooper 提交于
      This fixes CVE-2013-0190 / XSA-40
      
      There has been an error on the xen_failsafe_callback path for failed
      iret, which causes the stack pointer to be wrong when entering the
      iret_exc error path.  This can result in the kernel crashing.
      
      In the classic kernel case, the relevant code looked a little like:
      
              popl %eax      # Error code from hypervisor
              jz 5f
              addl $16,%esp
              jmp iret_exc   # Hypervisor said iret fault
      5:      addl $16,%esp
                             # Hypervisor said segment selector fault
      
      Here, there are two identical addls on either option of a branch which
      appears to have been optimised by hoisting it above the jz, and
      converting it to an lea, which leaves the flags register unaffected.
      
      In the PVOPS case, the code looks like:
      
              popl_cfi %eax         # Error from the hypervisor
              lea 16(%esp),%esp     # Add $16 before choosing fault path
              CFI_ADJUST_CFA_OFFSET -16
              jz 5f
              addl $16,%esp         # Incorrectly adjust %esp again
              jmp iret_exc
      
      It is possible unprivileged userspace applications to cause this
      behaviour, for example by loading an LDT code selector, then changing
      the code selector to be not-present.  At this point, there is a race
      condition where it is possible for the hypervisor to return back to
      userspace from an interrupt, fault on its own iret, and inject a
      failsafe_callback into the kernel.
      
      This bug has been present since the introduction of Xen PVOPS support
      in commit 5ead97c8 (xen: Core Xen implementation), in 2.6.23.
      Signed-off-by: NFrediano Ziglio <frediano.ziglio@citrix.com>
      Signed-off-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9174adbe
  8. 16 1月, 2013 4 次提交
  9. 14 1月, 2013 6 次提交
  10. 13 1月, 2013 3 次提交
  11. 12 1月, 2013 3 次提交
  12. 11 1月, 2013 3 次提交
  13. 10 1月, 2013 4 次提交
    • D
      perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side · a706d965
      David Ahern 提交于
      This patch is brought to you by the letter 'H'.
      
      Commit 20b279 breaks compatiblity with older perf binaries when run with
      precise modifier (:p or :pp) by requiring the exclude_guest attribute to be
      set. Older binaries default exclude_guest to 0 (ie., wanting guest-based
      samples) unless host only profiling is requested (:H modifier). The workaround
      for older binaries is to add H to the modifier list (e.g., -e cycles:ppH -
      toggles exclude_guest to 1). This was deemed unacceptable by Linus:
      
      https://lkml.org/lkml/2012/12/12/570
      
      Between family in town and the fresh snow in Breckenridge there is no time left
      to be working on the proper fix for this over the holidays. In the New Year I
      have more pressing problems to resolve -- like some memory leaks in perf which
      are proving to be elusive -- although the aforementioned snow is probably why
      they are proving to be elusive. Either way I do not have any spare time to work
      on this and from the time I have managed to spend on it the solution is more
      difficult than just moving to a new exclude_guest flag (does not work) or
      flipping the logic to include_guest (which is not as trivial as one would
      think).
      
      So, two options: silently force exclude_guest on as suggested by Gleb which
      means no impact to older perf binaries or revert the original patch which
      caused the breakage.
      
      This patch does the latter -- reverts the original patch that introduced the
      regression. The problem can be revisited in the future as time allows.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      Link: http://lkml.kernel.org/r/1356749767-17322-1-git-send-email-dsahern@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a706d965
    • W
      arm64: mm: introduce present, faulting entries for PAGE_NONE · a6fadf7e
      Will Deacon 提交于
      This is mostly a port of dbf62d50 ("ARM: mm: introduce L_PTE_VALID
      for page table entries") and 26ffd0d4 ("ARM: mm: introduce present,
      faulting entries for PAGE_NONE") from ARM, which makes use of present,
      faulting page table entries for page table entries mapped as PROT_NONE.
      
      The main difference with this implementation is that we can make use of
      the two pte type bits in order to avoid allocating a software bit for
      identifying PROT_NONE pages, instead reserving the 10b suffix for these
      types of mappings.
      
      This is required to prevent users from accessing such pages via syscalls
      such as read/write over a pipe.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      a6fadf7e
    • W
      arm64: mm: only wrprotect clean ptes if they are present · 02522463
      Will Deacon 提交于
      Marking non-present ptes as read-only can corrupt file ptes, breaking
      things like swap and file mappings.
      
      This patch ensures that we only manipulate user pte bits when the pte
      is marked present.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      02522463
    • W
      arm64: vdso: remove broken, redundant sequence counting for timezones · bdba0051
      Will Deacon 提交于
      This patch is an arm64 version of ce73ec6d ("powerpc/vdso: Remove
      redundant locking in update_vsyscall_tz()").
      
      Timezone data is not protected, so the sequence counter is not required
      to ensure consistency. Furthermore, having multiple paths updating the
      counter leads to a race between update_vsyscall and update_vsyscall_tz,
      so remove the timezone sequence counting from both the kernel and the
      vdso.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      bdba0051
  14. 08 1月, 2013 1 次提交