1. 23 6月, 2017 14 次提交
  2. 19 6月, 2017 1 次提交
    • H
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins 提交于
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      unfortunatelly.
      
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: NOleg Nesterov <oleg@redhat.com>
      Original-patch-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1be7107f
  3. 13 6月, 2017 1 次提交
  4. 08 6月, 2017 1 次提交
  5. 06 6月, 2017 1 次提交
  6. 05 6月, 2017 1 次提交
    • C
      x86/cpu/cyrix: Add alternative Device ID of Geode GX1 SoC · ae1d557d
      Christian Sünkenberg 提交于
      A SoC variant of Geode GX1, notably NSC branded SC1100, seems to
      report an inverted Device ID in its DIR0 configuration register,
      specifically 0xb instead of the expected 0x4.
      
      Catch this presumably quirky version so it's properly recognized
      as GX1 and has its cache switched to write-back mode, which provides
      a significant performance boost in most workloads.
      
      SC1100's datasheet "Geode™ SC1100 Information Appliance On a Chip",
      states in section 1.1.7.1 "Device ID" that device identification
      values are specified in SC1100's device errata. These, however,
      seem to not have been publicly released.
      
      Wading through a number of boot logs and /proc/cpuinfo dumps found on
      pastebin and blogs, this patch should mostly be relevant for a number
      of now admittedly aging Soekris NET4801 and PC Engines WRAP devices,
      the latter being the platform this issue was discovered on.
      Performance impact was verified using "openssl speed", with
      write-back caching scaling throughput between -3% and +41%.
      Signed-off-by: NChristian Sünkenberg <christian.suenkenberg@student.kit.edu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1496596719.26725.14.camel@student.kit.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ae1d557d
  7. 29 5月, 2017 2 次提交
  8. 27 5月, 2017 2 次提交
  9. 26 5月, 2017 1 次提交
  10. 24 5月, 2017 2 次提交
  11. 22 5月, 2017 1 次提交
    • B
      x86/MCE: Export memory_error() · 2d1f4061
      Borislav Petkov 提交于
      Export the function which checks whether an MCE is a memory error to
      other users so that we can reuse the logic. Drop the boot_cpu_data use,
      while at it, as mce.cpuvendor already has the CPU vendor in there.
      
      Integrate a piece from a patch from Vishal Verma
      <vishal.l.verma@intel.com> to export it for modules (nfit).
      
      The main reason we're exporting it is that the nfit handler
      nfit_handle_mce() needs to detect a memory error properly before doing
      its recovery actions.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2d1f4061
  12. 15 5月, 2017 1 次提交
    • W
      KVM: x86: Fix load damaged SSEx MXCSR register · a575813b
      Wanpeng Li 提交于
      Reported by syzkaller:
      
         BUG: unable to handle kernel paging request at ffffffffc07f6a2e
         IP: report_bug+0x94/0x120
         PGD 348e12067
         P4D 348e12067
         PUD 348e14067
         PMD 3cbd84067
         PTE 80000003f7e87161
      
         Oops: 0003 [#1] SMP
         CPU: 2 PID: 7091 Comm: kvm_load_guest_ Tainted: G           OE   4.11.0+ #8
         task: ffff92fdfb525400 task.stack: ffffbda6c3d04000
         RIP: 0010:report_bug+0x94/0x120
         RSP: 0018:ffffbda6c3d07b20 EFLAGS: 00010202
          do_trap+0x156/0x170
          do_error_trap+0xa3/0x170
          ? kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
          ? mark_held_locks+0x79/0xa0
          ? retint_kernel+0x10/0x10
          ? trace_hardirqs_off_thunk+0x1a/0x1c
          do_invalid_op+0x20/0x30
          invalid_op+0x1e/0x30
         RIP: 0010:kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
          ? kvm_load_guest_fpu.part.175+0x1c/0x170 [kvm]
          kvm_arch_vcpu_ioctl_run+0xed6/0x1b70 [kvm]
          kvm_vcpu_ioctl+0x384/0x780 [kvm]
          ? kvm_vcpu_ioctl+0x384/0x780 [kvm]
          ? sched_clock+0x13/0x20
          ? __do_page_fault+0x2a0/0x550
          do_vfs_ioctl+0xa4/0x700
          ? up_read+0x1f/0x40
          ? __do_page_fault+0x2a0/0x550
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x23/0xc2
      
      SDM mentioned that "The MXCSR has several reserved bits, and attempting to write
      a 1 to any of these bits will cause a general-protection exception(#GP) to be
      generated". The syzkaller forks' testcase overrides xsave area w/ random values
      and steps on the reserved bits of MXCSR register. The damaged MXCSR register
      values of guest will be restored to SSEx MXCSR register before vmentry. This
      patch fixes it by catching userspace override MXCSR register reserved bits w/
      random values and bails out immediately.
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a575813b
  13. 13 5月, 2017 1 次提交
  14. 11 5月, 2017 1 次提交
  15. 09 5月, 2017 4 次提交
  16. 08 5月, 2017 2 次提交
    • X
      x86/kexec/64: Use gbpages for identity mappings if available · 8638100c
      Xunlei Pang 提交于
      Kexec sets up all identity mappings before booting into the new
      kernel, and this will cause extra memory consumption for paging
      structures which is quite considerable on modern machines with
      huge memory sizes.
      
      E.g. on a 32TB machine that is kdumping, it could waste around
      128MB (around 4MB/TB) from the reserved memory after kexec sets
      all the identity mappings using the current 2MB page.
      
      Add to that the memory needed for the loaded kdump kernel, initramfs,
      etc., and it causes a kexec syscall -NOMEM failure.
      
      As a result, we had to enlarge reserved memory via "crashkernel=X"
      to work around this problem.
      
      This causes some trouble for distributions that use policies
      to evaluate the proper "crashkernel=X" value for users.
      
      So enable gbpages for kexec mappings.
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: akpm@linux-foundation.org
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1493862171-8799-2-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8638100c
    • X
      x86/mm: Add support for gbpages to kernel_ident_mapping_init() · 66aad4fd
      Xunlei Pang 提交于
      Kernel identity mappings on x86-64 kernels are created in two
      ways: by the early x86 boot code, or by kernel_ident_mapping_init().
      
      Native kernels (which is the dominant usecase) use the former,
      but the kexec and the hibernation code uses kernel_ident_mapping_init().
      
      There's a subtle difference between these two ways of how identity
      mappings are created, the current kernel_ident_mapping_init() code
      creates identity mappings always using 2MB page(PMD level) - while
      the native kernel boot path also utilizes gbpages where available.
      
      This difference is suboptimal both for performance and for memory
      usage: kernel_ident_mapping_init() needs to allocate pages for the
      page tables when creating the new identity mappings.
      
      This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
      to address these concerns.
      
      The primary advantage would be better TLB coverage/performance,
      because we'd utilize 1GB TLBs instead of 2MB ones.
      
      It is also useful for machines with large number of memory to
      save paging structure allocations(around 4MB/TB using 2MB page)
      when setting identity mappings for all the memory, after using
      1GB page it will consume only 8KB/TB.
      
      ( Note that this change alone does not activate gbpages in kexec,
        we are doing that in a separate patch. )
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: akpm@linux-foundation.org
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66aad4fd
  17. 02 5月, 2017 4 次提交