1. 19 6月, 2017 1 次提交
    • H
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins 提交于
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      unfortunatelly.
      
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: NOleg Nesterov <oleg@redhat.com>
      Original-patch-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1be7107f
  2. 16 6月, 2017 1 次提交
  3. 15 6月, 2017 3 次提交
  4. 14 6月, 2017 1 次提交
  5. 13 6月, 2017 2 次提交
  6. 11 6月, 2017 2 次提交
    • W
      KVM: async_pf: avoid async pf injection when in guest mode · 9bc1f09f
      Wanpeng Li 提交于
       INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
             Not tainted 4.12.0-rc4+ #8
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       gnome-terminal- D    0  1734   1015 0x00000000
       Call Trace:
        __schedule+0x3cd/0xb30
        schedule+0x40/0x90
        kvm_async_pf_task_wait+0x1cc/0x270
        ? __vfs_read+0x37/0x150
        ? prepare_to_swait+0x22/0x70
        do_async_page_fault+0x77/0xb0
        ? do_async_page_fault+0x77/0xb0
        async_page_fault+0x28/0x30
      
      This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
      and then gives stress to memory on L1, I can observed this hang on L1 when
      at least ~70% swap area is occupied on L0.
      
      This is due to async pf was injected to L2 which should be injected to L1,
      L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
      actually), and L1 guest starts accumulating tasks stuck in D state in
      kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.
      
      This patch fixes the hang by doing async pf when executing L1 guest.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9bc1f09f
    • G
      hexagon: Use raw_copy_to_user · 4d801cca
      Guenter Roeck 提交于
      Commit ac4691fa ("hexagon: switch to RAW_COPY_USER") replaced
      __copy_to_user_hexagon() with raw_copy_to_user(), but did not catch
      all callers, resulting in the following build error.
      
      arch/hexagon/mm/uaccess.c: In function '__clear_user_hexagon':
      arch/hexagon/mm/uaccess.c:40:3: error:
      	implicit declaration of function '__copy_to_user_hexagon'
      
      Fixes: ac4691fa ("hexagon: switch to RAW_COPY_USER")
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: NRichard Kuo <rkuo@codeaurora.org>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      4d801cca
  7. 09 6月, 2017 1 次提交
  8. 08 6月, 2017 13 次提交
  9. 07 6月, 2017 13 次提交
  10. 06 6月, 2017 3 次提交
    • W
      KVM: nVMX: Fix exception injection · d4912215
      Wanpeng Li 提交于
       WARNING: CPU: 3 PID: 2840 at arch/x86/kvm/vmx.c:10966 nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
       CPU: 3 PID: 2840 Comm: qemu-system-x86 Tainted: G           OE   4.12.0-rc3+ #23
       RIP: 0010:nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
       Call Trace:
        ? kvm_check_async_pf_completion+0xef/0x120 [kvm]
        ? rcu_read_lock_sched_held+0x79/0x80
        vmx_queue_exception+0x104/0x160 [kvm_intel]
        ? vmx_queue_exception+0x104/0x160 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0x1171/0x1ce0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x240 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x240 [kvm]
        kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? __fget+0xf3/0x210
        do_vfs_ioctl+0xa4/0x700
        ? __fget+0x114/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This is triggered occasionally by running both win7 and win2016 in L2, in
      addition, EPT is disabled on both L1 and L2. It can't be reproduced easily.
      
      Commit 0b6ac343 (KVM: nVMX: Correct handling of exception injection) mentioned
      that "KVM wants to inject page-faults which it got to the guest. This function
      assumes it is called with the exit reason in vmcs02 being a #PF exception".
      Commit e011c663 (KVM: nVMX: Check all exceptions for intercept during delivery to
      L2) allows to check all exceptions for intercept during delivery to L2. However,
      there is no guarantee the exit reason is exception currently, when there is an
      external interrupt occurred on host, maybe a time interrupt for host which should
      not be injected to guest, and somewhere queues an exception, then the function
      nested_vmx_check_exception() will be called and the vmexit emulation codes will
      try to emulate the "Acknowledge interrupt on exit" behavior, the warning is
      triggered.
      
      Reusing the exit reason from the L2->L0 vmexit is wrong in this case,
      the reason must always be EXCEPTION_NMI when injecting an exception into
      L1 as a nested vmexit.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Fixes: e011c663 ("KVM: nVMX: Check all exceptions for intercept during delivery to L2")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d4912215
    • P
      kvm: async_pf: fix rcu_irq_enter() with irqs enabled · bbaf0e2b
      Paolo Bonzini 提交于
      native_safe_halt enables interrupts, and you just shouldn't
      call rcu_irq_enter() with interrupts enabled.  Reorder the
      call with the following local_irq_disable() to respect the
      invariant.
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      bbaf0e2b
    • M
      powerpc/perf: Fix Power9 test_adder fields · 8c218578
      Madhavan Srinivasan 提交于
      Commit 8d911904 ('powerpc/perf: Add restrictions to PMC5 in power9 DD1')
      was added to restrict the use of PMC5 in Power9 DD1. Intention was to disable
      the use of PMC5 using raw event code. But instead of updating the
      power9_isa207_pmu structure (used on DD1), the commit incorrectly updated the
      power9_pmu structure. Fix it.
      
      Fixes: 8d911904 ("powerpc/perf: Add restrictions to PMC5 in power9 DD1")
      Reported-by: NShriya <shriyak@linux.vnet.ibm.com>
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Tested-by: NShriya <shriyak@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8c218578