1. 08 12月, 2014 2 次提交
  2. 01 12月, 2014 1 次提交
  3. 25 11月, 2014 1 次提交
  4. 24 11月, 2014 3 次提交
    • A
      x86_64, traps: Rework bad_iret · b645af2d
      Andy Lutomirski 提交于
      It's possible for iretq to userspace to fail.  This can happen because
      of a bad CS, SS, or RIP.
      
      Historically, we've handled it by fixing up an exception from iretq to
      land at bad_iret, which pretends that the failed iret frame was really
      the hardware part of #GP(0) from userspace.  To make this work, there's
      an extra fixup to fudge the gs base into a usable state.
      
      This is suboptimal because it loses the original exception.  It's also
      buggy because there's no guarantee that we were on the kernel stack to
      begin with.  For example, if the failing iret happened on return from an
      NMI, then we'll end up executing general_protection on the NMI stack.
      This is bad for several reasons, the most immediate of which is that
      general_protection, as a non-paranoid idtentry, will try to deliver
      signals and/or schedule from the wrong stack.
      
      This patch throws out bad_iret entirely.  As a replacement, it augments
      the existing swapgs fudge into a full-blown iret fixup, mostly written
      in C.  It's should be clearer and more correct.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b645af2d
    • A
      x86_64, traps: Stop using IST for #SS · 6f442be2
      Andy Lutomirski 提交于
      On a 32-bit kernel, this has no effect, since there are no IST stacks.
      
      On a 64-bit kernel, #SS can only happen in user code, on a failed iret
      to user space, a canonical violation on access via RSP or RBP, or a
      genuine stack segment violation in 32-bit kernel code.  The first two
      cases don't need IST, and the latter two cases are unlikely fatal bugs,
      and promoting them to double faults would be fine.
      
      This fixes a bug in which the espfix64 code mishandles a stack segment
      violation.
      
      This saves 4k of memory per CPU and a tiny bit of code.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f442be2
    • A
      x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C · af726f21
      Andy Lutomirski 提交于
      There's nothing special enough about the espfix64 double fault fixup to
      justify writing it in assembly.  Move it to C.
      
      This also fixes a bug: if the double fault came from an IST stack, the
      old asm code would return to a partially uninitialized stack frame.
      
      Fixes: 3891a04aSigned-off-by: NAndy Lutomirski <luto@amacapital.net>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af726f21
  5. 23 11月, 2014 2 次提交
    • T
      PCI/MSI: Rename mask/unmask_msi_irq treewide · 280510f1
      Thomas Gleixner 提交于
      The PCI/MSI irq chip callbacks mask/unmask_msi_irq have been renamed
      to pci_msi_mask/unmask_irq to mark them PCI specific. Rename all usage
      sites. The conversion helper functions are kept around to avoid
      conflicts in next and will be removed after merging into mainline.
      
      Coccinelle assisted conversion. No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: x86@kernel.org
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Murali Karicheri <m-karicheri2@ti.com>
      Cc: Thierry Reding <thierry.reding@gmail.com>
      Cc: Mohit Kumar <mohit.kumar@st.com>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Cc: Yijing Wang <wangyijing@huawei.com>
      280510f1
    • J
      PCI/MSI: Rename write_msi_msg() to pci_write_msi_msg() · 83a18912
      Jiang Liu 提交于
      Rename write_msi_msg() to pci_write_msi_msg() to mark it as PCI
      specific.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      83a18912
  6. 21 11月, 2014 1 次提交
  7. 20 11月, 2014 2 次提交
    • C
      x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll · fa92c586
      Chen Yucong 提交于
      Uncorrected no action required (UCNA) - is a uncorrected recoverable
      machine check error that is not signaled via a machine check exception
      and, instead, is reported to system software as a corrected machine
      check error. UCNA errors indicate that some data in the system is
      corrupted, but the data has not been consumed and the processor state
      is valid and you may continue execution on this processor. UCNA errors
      require no action from system software to continue execution. Note that
      UCNA errors are supported by the processor only when IA32_MCG_CAP[24]
      (MCG_SER_P) is set.
                                                     -- Intel SDM Volume 3B
      
      Deferred errors are errors that cannot be corrected by hardware, but
      do not cause an immediate interruption in program flow, loss of data
      integrity, or corruption of processor state. These errors indicate
      that data has been corrupted but not consumed. Hardware writes information
      to the status and address registers in the corresponding bank that
      identifies the source of the error if deferred errors are enabled for
      logging. Deferred errors are not reported via machine check exceptions;
      they can be seen by polling the MCi_STATUS registers.
                                                      -- AMD64 APM Volume 2
      
      Above two items, both UCNA and Deferred errors belong to detected
      errors, but they can't be corrected by hardware, and this is very
      similar to Software Recoverable Action Optional (SRAO) errors.
      Therefore, we can take some actions that have been used for handling
      SRAO errors to handle UCNA and Deferred errors.
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NChen Yucong <slaoub@gmail.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      fa92c586
    • C
      x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error · e3480271
      Chen Yucong 提交于
      Until now, the mce_severity mechanism can only identify the severity
      of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
      out DEFERRED error for AMD platform.
      
      This patch extends the mce_severity mechanism for handling
      UCNA/DEFERRED error. In order to do this, the patch introduces a new
      severity level - MCE_UCNA/DEFERRED_SEVERITY.
      
      In addition, mce_severity is specific to machine check exception,
      and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
      mechanism in non-exception context, the patch also introduces a new
      argument (is_excp) for mce_severity. `is_excp' is used to explicitly
      specify the calling context of mce_severity.
      Reviewed-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Signed-off-by: NChen Yucong <slaoub@gmail.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      e3480271
  8. 19 11月, 2014 1 次提交
  9. 18 11月, 2014 2 次提交
    • D
      x86, mpx: On-demand kernel allocation of bounds tables · fe3d197f
      Dave Hansen 提交于
      This is really the meat of the MPX patch set.  If there is one patch to
      review in the entire series, this is the one.  There is a new ABI here
      and this kernel code also interacts with userspace memory in a
      relatively unusual manner.  (small FAQ below).
      
      Long Description:
      
      This patch adds two prctl() commands to provide enable or disable the
      management of bounds tables in kernel, including on-demand kernel
      allocation (See the patch "on-demand kernel allocation of bounds tables")
      and cleanup (See the patch "cleanup unused bound tables"). Applications
      do not strictly need the kernel to manage bounds tables and we expect
      some applications to use MPX without taking advantage of this kernel
      support. This means the kernel can not simply infer whether an application
      needs bounds table management from the MPX registers.  The prctl() is an
      explicit signal from userspace.
      
      PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
      require kernel's help in managing bounds tables.
      
      PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
      want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
      won't allocate and free bounds tables even if the CPU supports MPX.
      
      PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
      directory out of a userspace register (bndcfgu) and then cache it into
      a new field (->bd_addr) in  the 'mm_struct'.  PR_MPX_DISABLE_MANAGEMENT
      will set "bd_addr" to an invalid address.  Using this scheme, we can
      use "bd_addr" to determine whether the management of bounds tables in
      kernel is enabled.
      
      Also, the only way to access that bndcfgu register is via an xsaves,
      which can be expensive.  Caching "bd_addr" like this also helps reduce
      the cost of those xsaves when doing table cleanup at munmap() time.
      Unfortunately, we can not apply this optimization to #BR fault time
      because we need an xsave to get the value of BNDSTATUS.
      
      ==== Why does the hardware even have these Bounds Tables? ====
      
      MPX only has 4 hardware registers for storing bounds information.
      If MPX-enabled code needs more than these 4 registers, it needs to
      spill them somewhere. It has two special instructions for this
      which allow the bounds to be moved between the bounds registers
      and some new "bounds tables".
      
      They are similar conceptually to a page fault and will be raised by
      the MPX hardware during both bounds violations or when the tables
      are not present. This patch handles those #BR exceptions for
      not-present tables by carving the space out of the normal processes
      address space (essentially calling the new mmap() interface indroduced
      earlier in this patch set.) and then pointing the bounds-directory
      over to it.
      
      The tables *need* to be accessed and controlled by userspace because
      the instructions for moving bounds in and out of them are extremely
      frequent. They potentially happen every time a register pointing to
      memory is dereferenced. Any direct kernel involvement (like a syscall)
      to access the tables would obviously destroy performance.
      
      ==== Why not do this in userspace? ====
      
      This patch is obviously doing this allocation in the kernel.
      However, MPX does not strictly *require* anything in the kernel.
      It can theoretically be done completely from userspace. Here are
      a few ways this *could* be done. I don't think any of them are
      practical in the real-world, but here they are.
      
      Q: Can virtual space simply be reserved for the bounds tables so
         that we never have to allocate them?
      A: As noted earlier, these tables are *HUGE*. An X-GB virtual
         area needs 4*X GB of virtual space, plus 2GB for the bounds
         directory. If we were to preallocate them for the 128TB of
         user virtual address space, we would need to reserve 512TB+2GB,
         which is larger than the entire virtual address space today.
         This means they can not be reserved ahead of time. Also, a
         single process's pre-popualated bounds directory consumes 2GB
         of virtual *AND* physical memory. IOW, it's completely
         infeasible to prepopulate bounds directories.
      
      Q: Can we preallocate bounds table space at the same time memory
         is allocated which might contain pointers that might eventually
         need bounds tables?
      A: This would work if we could hook the site of each and every
         memory allocation syscall. This can be done for small,
         constrained applications. But, it isn't practical at a larger
         scale since a given app has no way of controlling how all the
         parts of the app might allocate memory (think libraries). The
         kernel is really the only place to intercept these calls.
      
      Q: Could a bounds fault be handed to userspace and the tables
         allocated there in a signal handler instead of in the kernel?
      A: (thanks to tglx) mmap() is not on the list of safe async
         handler functions and even if mmap() would work it still
         requires locking or nasty tricks to keep track of the
         allocation state there.
      
      Having ruled out all of the userspace-only approaches for managing
      bounds tables that we could think of, we create them on demand in
      the kernel.
      Based-on-patch-by: NQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      fe3d197f
    • D
      x86: Remove arbitrary instruction size limit in instruction decoder · 6ba48ff4
      Dave Hansen 提交于
      The current x86 instruction decoder steps along through the
      instruction stream but always ensures that it never steps farther
      than the largest possible instruction size (MAX_INSN_SIZE).
      
      The MPX code is now going to be doing some decoding of userspace
      instructions.  We copy those from userspace in to the kernel and
      they're obviously completely untrusted coming from userspace.  In
      addition to the constraint that instructions can only be so long,
      we also have to be aware of how long the buffer is that came in
      from userspace.  This _looks_ to be similar to what the perf and
      kprobes is doing, but it's unclear to me whether they are
      affected.
      
      The whole reason we need this is that it is perfectly valid to be
      executing an instruction within MAX_INSN_SIZE bytes of an
      unreadable page. We should be able to gracefully handle short
      reads in those cases.
      
      This adds support to the decoder to record how long the buffer
      being decoded is and to refuse to "validate" the instruction if
      we would have gone over the end of the buffer to decode it.
      
      The kprobes code probably needs to be looked at here a bit more
      carefully.  This patch still respects the MAX_INSN_SIZE limit
      there but the kprobes code does look like it might be able to
      be a bit more strict than it currently is.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NJim Keniston <jkenisto@us.ibm.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: x86@kernel.org
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6ba48ff4
  10. 16 11月, 2014 8 次提交
  11. 12 11月, 2014 4 次提交
  12. 10 11月, 2014 3 次提交
  13. 06 11月, 2014 1 次提交
  14. 05 11月, 2014 5 次提交
  15. 04 11月, 2014 3 次提交
  16. 03 11月, 2014 1 次提交