1. 02 3月, 2018 4 次提交
  2. 24 2月, 2018 12 次提交
    • B
      KVM: SVM: Fix SEV LAUNCH_SECRET command · 9c5e0afa
      Brijesh Singh 提交于
      The SEV LAUNCH_SECRET command fails with error code 'invalid param'
      because we missed filling the guest and header system physical address
      while issuing the command.
      
      Fixes: 9f5b5b95 (KVM: SVM: Add support for SEV LAUNCH_SECRET command)
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Joerg Roedel <joro@8bytes.org>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9c5e0afa
    • B
      KVM: SVM: install RSM intercept · 7607b717
      Brijesh Singh 提交于
      RSM instruction is used by the SMM handler to return from SMM mode.
      Currently, rsm causes a #UD - which results in instruction fetch, decode,
      and emulate. By installing the RSM intercept we can avoid the instruction
      fetch since we know that #VMEXIT was due to rsm.
      
      The patch is required for the SEV guest, because in case of SEV guest
      memory is encrypted with guest-specific key and hypervisor will not
      able to fetch the instruction bytes from the guest memory.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7607b717
    • B
      KVM: SVM: no need to call access_ok() in LAUNCH_MEASURE command · 3e233385
      Brijesh Singh 提交于
      Using the access_ok() to validate the input before issuing the SEV
      command does not buy us anything in this case. If userland is
      giving us a garbage pointer then copy_to_user() will catch it when we try
      to return the measurement.
      Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Fixes: 0d0736f7 (KVM: SVM: Add support for KVM_SEV_LAUNCH_MEASURE ...)
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Joerg Roedel <joro@8bytes.org>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3e233385
    • W
      KVM: X86: Avoid traversing all the cpus for pv tlb flush when steal time is disabled · 4f2f61fc
      Wanpeng Li 提交于
      Avoid traversing all the cpus for pv tlb flush when steal time
      is disabled since pv tlb flush depends on the field in steal time
      for shared data.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4f2f61fc
    • D
      x86/kvm: Make parse_no_xxx __init for kvm · afdc3f58
      Dou Liyang 提交于
      The early_param() is only called during kernel initialization, So Linux
      marks the functions of it with __init macro to save memory.
      
      But it forgot to mark the parse_no_kvmapf/stealacc/kvmclock_vsyscall,
      So, Make them __init as well.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: rkrcmar@redhat.com
      Cc: kvm@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: x86@kernel.org
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      afdc3f58
    • R
      KVM: x86: fix backward migration with async_PF · fe2a3027
      Radim Krčmář 提交于
      Guests on new hypersiors might set KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT
      bit when enabling async_PF, but this bit is reserved on old hypervisors,
      which results in a failure upon migration.
      
      To avoid breaking different cases, we are checking for CPUID feature bit
      before enabling the feature and nothing else.
      
      Fixes: 52a5c155 ("KVM: async_pf: Let guest support delivery of async_pf from guest mode")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NWanpeng Li <wanpengli@tencent.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fe2a3027
    • S
      kvm: fix warning for non-x86 builds · f75e4924
      Sebastian Ott 提交于
      Fix the following sparse warning by moving the prototype
      of kvm_arch_mmu_notifier_invalidate_range() to linux/kvm_host.h .
      
        CHECK   arch/s390/kvm/../../../virt/kvm/kvm_main.c
      arch/s390/kvm/../../../virt/kvm/kvm_main.c:138:13: warning: symbol 'kvm_arch_mmu_notifier_invalidate_range' was not declared. Should it be static?
      Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f75e4924
    • W
      KVM: X86: Fix SMRAM accessing even if VM is shutdown · 95e057e2
      Wanpeng Li 提交于
      Reported by syzkaller:
      
         WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
         CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
         RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
         Call Trace:
          vmx_handle_exit+0xbd/0xe20 [kvm_intel]
          kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
          kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
          do_vfs_ioctl+0xa4/0x6a0
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x25/0x9c
      
      The testcase creates a first thread to issue KVM_SMI ioctl, and then creates
      a second thread to mmap and operate on the same vCPU.  This triggers a race
      condition when running the testcase with multiple threads. Sometimes one thread
      exits with a triple fault while another thread mmaps and operates on the same
      vCPU.  Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler
      results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE
      in kvm_handle_bad_page(), which will go on to cause an emulation failure and an
      exit with KVM_EXIT_INTERNAL_ERROR.
      
      Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      95e057e2
    • C
      KVM: nVMX: Don't halt vcpu when L1 is injecting events to L2 · 135a06c3
      Chao Gao 提交于
      Although L2 is in halt state, it will be in the active state after
      VM entry if the VM entry is vectoring according to SDM 26.6.2 Activity
      State. Halting the vcpu here means the event won't be injected to L2
      and this decision isn't reported to L1. Thus L0 drops an event that
      should be injected to L2.
      
      Cc: Liran Alon <liran.alon@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NChao Gao <chao.gao@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      135a06c3
    • E
      KVM/x86: remove WARN_ON() for when vm_munmap() fails · 103c763c
      Eric Biggers 提交于
      On x86, special KVM memslots such as the TSS region have anonymous
      memory mappings created on behalf of userspace, and these mappings are
      removed when the VM is destroyed.
      
      It is however possible for removing these mappings via vm_munmap() to
      fail.  This can most easily happen if the thread receives SIGKILL while
      it's waiting to acquire ->mmap_sem.   This triggers the 'WARN_ON(r < 0)'
      in __x86_set_memory_region().  syzkaller was able to hit this, using
      'exit()' to send the SIGKILL.  Note that while the vm_munmap() failure
      results in the mapping not being removed immediately, it is not leaked
      forever but rather will be freed when the process exits.
      
      It's not really possible to handle this failure properly, so almost
      every other caller of vm_munmap() doesn't check the return value.  It's
      a limitation of having the kernel manage these mappings rather than
      userspace.
      
      So just remove the WARN_ON() so that users can't spam the kernel log
      with this warning.
      
      Fixes: f0d648bd ("KVM: x86: map/unmap private slots in __x86_set_memory_region")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      103c763c
    • R
      KVM: nVMX: preserve SECONDARY_EXEC_DESC without UMIP · 99158246
      Radim Krčmář 提交于
      L1 might want to use SECONDARY_EXEC_DESC, so we must not clear the VMCS
      bit if UMIP is not being emulated.
      
      We must still set the bit when emulating UMIP as the feature can be
      passed to L2 where L0 will do the emulation and because L2 can change
      CR4 without a VM exit, we should clear the bit if UMIP is disabled.
      
      Fixes: 0367f205 ("KVM: vmx: add support for emulating UMIP")
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      99158246
    • P
      KVM: x86: move LAPIC initialization after VMCS creation · 0b2e9904
      Paolo Bonzini 提交于
      The initial reset of the local APIC is performed before the VMCS has been
      created, but it tries to do a vmwrite:
      
       vmwrite error: reg 810 value 4a00 (err 18944)
       CPU: 54 PID: 38652 Comm: qemu-kvm Tainted: G        W I      4.16.0-0.rc2.git0.1.fc28.x86_64 #1
       Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0003.090520141303 09/05/2014
       Call Trace:
        vmx_set_rvi [kvm_intel]
        vmx_hwapic_irr_update [kvm_intel]
        kvm_lapic_reset [kvm]
        kvm_create_lapic [kvm]
        kvm_arch_vcpu_init [kvm]
        kvm_vcpu_init [kvm]
        vmx_create_vcpu [kvm_intel]
        kvm_vm_ioctl [kvm]
      
      Move it later, after the VMCS has been created.
      
      Fixes: 4191db26 ("KVM: x86: Update APICv on APIC reset")
      Cc: stable@vger.kernel.org
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b2e9904
  3. 23 2月, 2018 13 次提交
    • P
      arm64: fix unwind_frame() for filtered out fn for function graph tracing · 9f416319
      Pratyush Anand 提交于
      do_task_stat() calls get_wchan(), which further does unwind_frame().
      unwind_frame() restores frame->pc to original value in case function
      graph tracer has modified a return address (LR) in a stack frame to hook
      a function return. However, if function graph tracer has hit a filtered
      function, then we can't unwind it as ftrace_push_return_trace() has
      biased the index(frame->graph) with a 'huge negative'
      offset(-FTRACE_NOTRACE_DEPTH).
      
      Moreover, arm64 stack walker defines index(frame->graph) as unsigned
      int, which can not compare a -ve number.
      
      Similar problem we can have with calling of walk_stackframe() from
      save_stack_trace_tsk() or dump_backtrace().
      
      This patch fixes unwind_frame() to test the index for -ve value and
      restore index accordingly before we can restore frame->pc.
      
      Reproducer:
      
      cd /sys/kernel/debug/tracing/
      echo schedule > set_graph_notrace
      echo 1 > options/display-graph
      echo wakeup > current_tracer
      ps -ef | grep -i agent
      
      Above commands result in:
      Unable to handle kernel paging request at virtual address ffff801bd3d1e000
      pgd = ffff8003cbe97c00
      [ffff801bd3d1e000] *pgd=0000000000000000, *pud=0000000000000000
      Internal error: Oops: 96000006 [#1] SMP
      [...]
      CPU: 5 PID: 11696 Comm: ps Not tainted 4.11.0+ #33
      [...]
      task: ffff8003c21ba000 task.stack: ffff8003cc6c0000
      PC is at unwind_frame+0x12c/0x180
      LR is at get_wchan+0xd4/0x134
      pc : [<ffff00000808892c>] lr : [<ffff0000080860b8>] pstate: 60000145
      sp : ffff8003cc6c3ab0
      x29: ffff8003cc6c3ab0 x28: 0000000000000001
      x27: 0000000000000026 x26: 0000000000000026
      x25: 00000000000012d8 x24: 0000000000000000
      x23: ffff8003c1c04000 x22: ffff000008c83000
      x21: ffff8003c1c00000 x20: 000000000000000f
      x19: ffff8003c1bc0000 x18: 0000fffffc593690
      x17: 0000000000000000 x16: 0000000000000001
      x15: 0000b855670e2b60 x14: 0003e97f22cf1d0f
      x13: 0000000000000001 x12: 0000000000000000
      x11: 00000000e8f4883e x10: 0000000154f47ec8
      x9 : 0000000070f367c0 x8 : 0000000000000000
      x7 : 00008003f7290000 x6 : 0000000000000018
      x5 : 0000000000000000 x4 : ffff8003c1c03cb0
      x3 : ffff8003c1c03ca0 x2 : 00000017ffe80000
      x1 : ffff8003cc6c3af8 x0 : ffff8003d3e9e000
      
      Process ps (pid: 11696, stack limit = 0xffff8003cc6c0000)
      Stack: (0xffff8003cc6c3ab0 to 0xffff8003cc6c4000)
      [...]
      [<ffff00000808892c>] unwind_frame+0x12c/0x180
      [<ffff000008305008>] do_task_stat+0x864/0x870
      [<ffff000008305c44>] proc_tgid_stat+0x3c/0x48
      [<ffff0000082fde0c>] proc_single_show+0x5c/0xb8
      [<ffff0000082b27e0>] seq_read+0x160/0x414
      [<ffff000008289e6c>] __vfs_read+0x58/0x164
      [<ffff00000828b164>] vfs_read+0x88/0x144
      [<ffff00000828c2e8>] SyS_read+0x60/0xc0
      [<ffff0000080834a0>] __sys_trace_return+0x0/0x4
      
      Fixes: 20380bb3 (arm64: ftrace: fix a stack tracer's output under function graph tracer)
      Signed-off-by: NPratyush Anand <panand@redhat.com>
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      [catalin.marinas@arm.com: replace WARN_ON with WARN_ON_ONCE]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      9f416319
    • S
      x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations · 45967493
      Samuel Neves 提交于
      Without this fix, /proc/cpuinfo will display an incorrect amount
      of CPU cores, after bringing them offline and online again, as
      exemplified below:
      
        $ cat /proc/cpuinfo | grep cores
        cpu cores	: 4
        cpu cores	: 8
        cpu cores	: 8
        cpu cores	: 20
        cpu cores	: 4
        cpu cores	: 3
        cpu cores	: 2
        cpu cores	: 2
      
      This patch fixes this by always zeroing the booted_cores variable
      upon turning off a logical CPU.
      Tested-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: NSamuel Neves <sneves@dei.uc.pt>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jgross@suse.com
      Cc: luto@kernel.org
      Cc: prarit@redhat.com
      Cc: vkuznets@redhat.com
      Link: http://lkml.kernel.org/r/20180221205036.5244-1-sneves@dei.uc.ptSigned-off-by: NIngo Molnar <mingo@kernel.org>
      45967493
    • A
      locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs · 472e8c55
      Andrea Parri 提交于
      Successful RMW operations are supposed to be fully ordered, but
      Alpha's xchg() and cmpxchg() do not meet this requirement.
      
      Will Deacon noticed the bug:
      
        > So MP using xchg:
        >
        > WRITE_ONCE(x, 1)
        > xchg(y, 1)
        >
        > smp_load_acquire(y) == 1
        > READ_ONCE(x) == 0
        >
        > would be allowed.
      
      ... which thus violates the above requirement.
      
      Fix it by adding a leading smp_mb() to the xchg() and cmpxchg() implementations.
      Reported-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrea Parri <parri.andrea@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-alpha@vger.kernel.org
      Link: http://lkml.kernel.org/r/1519291488-5752-1-git-send-email-parri.andrea@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      472e8c55
    • A
      locking/xchg/alpha: Clean up barrier usage by using smp_mb() in place of __ASM__MB · 79d44246
      Andrea Parri 提交于
      Replace each occurrence of __ASM__MB with a (trailing) smp_mb() in
      xchg(), cmpxchg(), and remove the now unused __ASM__MB definitions;
      this improves readability, with no additional synchronization cost.
      Suggested-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrea Parri <parri.andrea@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-alpha@vger.kernel.org
      Link: http://lkml.kernel.org/r/1519291469-5702-1-git-send-email-parri.andrea@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      79d44246
    • W
      x86/intel_rdt: Fix incorrect returned value when creating rdgroup... · 36e74d35
      Wang Hui 提交于
      x86/intel_rdt: Fix incorrect returned value when creating rdgroup sub-directory in resctrl file system
      
      If no monitoring feature is detected because all monitoring features are
      disabled during boot time or there is no monitoring feature in hardware,
      creating rdtgroup sub-directory by "mkdir" command reports error:
      
        mkdir: cannot create directory ‘/sys/fs/resctrl/p1’: No such file or directory
      
      But the sub-directory actually is generated and content is correct:
      
        cpus  cpus_list  schemata  tasks
      
      The error is because rdtgroup_mkdir_ctrl_mon() returns non zero value after
      the sub-directory is created and the returned value is reported as an error
      to user.
      
      Clear the returned value to report to user that the sub-directory is
      actually created successfully.
      Signed-off-by: NWang Hui <john.wanghui@huawei.com>
      Signed-off-by: NZhang Yanfei <yanfei.zhang@huawei.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vikas <vikas.shivappa@intel.com>
      Cc: Xiaochen Shen <xiaochen.shen@intel.com>
      Link: http://lkml.kernel.org/r/1519356363-133085-1-git-send-email-fenghua.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      36e74d35
    • T
      x86/apic/vector: Handle vector release on CPU unplug correctly · e84cf6aa
      Thomas Gleixner 提交于
      When a irq vector is replaced, then the previous vector is normally
      released when the first interrupt happens on the new vector. If the target
      CPU of the previous vector is already offline when the new vector is
      installed, then the previous vector is silently discarded, which leads to
      accounting issues causing suspend failures and other problems.
      
      Adjust the logic so that the previous vector is freed in the underlying
      matrix allocator to ensure that the accounting stays correct.
      
      Fixes: 69cde000 ("x86/vector: Use matrix allocator for vector assignment")
      Reported-by: NYuriy Vostrikov <delamonpansie@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NYuriy Vostrikov <delamonpansie@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180222112316.930791749@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e84cf6aa
    • M
      powerpc/powernv: Support firmware disable of RFI flush · eb0a2d26
      Michael Ellerman 提交于
      Some versions of firmware will have a setting that can be configured
      to disable the RFI flush, add support for it.
      
      Fixes: 6e032b35 ("powerpc/powernv: Check device-tree for RFI flush settings")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      eb0a2d26
    • M
      powerpc/pseries: Support firmware disable of RFI flush · 582605a4
      Michael Ellerman 提交于
      Some versions of firmware will have a setting that can be configured
      to disable the RFI flush, add support for it.
      
      Fixes: 8989d568 ("powerpc/pseries: Query hypervisor for RFI flush settings")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      582605a4
    • B
      powerpc/mm/drmem: Fix unexpected flag value in ibm,dynamic-memory-v2 · 2f7d03e0
      Bharata B Rao 提交于
      Memory addtion and removal by count and indexed-count methods
      temporarily mark the LMBs that are being added/removed by a special
      flag value DRMEM_LMB_RESERVED. Accessing flags value directly at a few
      places without proper accessor method is causing two unexpected
      side-effects:
      
      - DRMEM_LMB_RESERVED bit is becoming part of the flags word of
        drconf_cell_v2 entries in ibm,dynamic-memory-v2 DT property.
      - This results in extra drconf_cell entries in ibm,dynamic-memory-v2.
        For example if 1G memory is added, it leads to one entry for 3 LMBs
        and 1 separate entry for the last LMB. All the 4 LMBs should be
        defined by one entry here.
      
      Fix this by always accessing the flags by its accessor method
      drmem_lmb_flags().
      
      Fixes: 2b31e3ae ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property")
      Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2f7d03e0
    • K
      MIPS: boot: Define __ASSEMBLY__ for its.S build · 0f9da844
      Kees Cook 提交于
      The MIPS %.its.S compiler command did not define __ASSEMBLY__, which meant
      when compiler_types.h was added to kconfig.h, unexpected things appeared
      (e.g. struct declarations) which should not have been present. As done in
      the general %.S compiler command, __ASSEMBLY__ is now included here too.
      
      The failure was:
      
          Error: arch/mips/boot/vmlinux.gz.its:201.1-2 syntax error
          FATAL ERROR: Unable to parse input tree
          /usr/bin/mkimage: Can't read arch/mips/boot/vmlinux.gz.itb.tmp: Invalid argument
          /usr/bin/mkimage Can't add hashes to FIT blob
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Fixes: 28128c61 ("kconfig.h: Include compiler types to avoid missed struct attributes")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f9da844
    • D
      bpf, arm64: fix out of bounds access in tail call · 16338a9b
      Daniel Borkmann 提交于
      I recently noticed a crash on arm64 when feeding a bogus index
      into BPF tail call helper. The crash would not occur when the
      interpreter is used, but only in case of JIT. Output looks as
      follows:
      
        [  347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510
        [...]
        [  347.043065] [fffb850e96492510] address between user and kernel address ranges
        [  347.050205] Internal error: Oops: 96000004 [#1] SMP
        [...]
        [  347.190829] x13: 0000000000000000 x12: 0000000000000000
        [  347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10
        [  347.201427] x9 : 0000000000000000 x8 : 0000000000000000
        [  347.206726] x7 : 0000000000000000 x6 : 001c991738000000
        [  347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a
        [  347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500
        [  347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500
        [  347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61)
        [  347.235221] Call trace:
        [  347.237656]  0xffff000002f3a4fc
        [  347.240784]  bpf_test_run+0x78/0xf8
        [  347.244260]  bpf_prog_test_run_skb+0x148/0x230
        [  347.248694]  SyS_bpf+0x77c/0x1110
        [  347.251999]  el0_svc_naked+0x30/0x34
        [  347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b)
        [...]
      
      In this case the index used in BPF r3 is the same as in r1
      at the time of the call, meaning we fed a pointer as index;
      here, it had the value 0xffff808fd7cf0500 which sits in x2.
      
      While I found tail calls to be working in general (also for
      hitting the error cases), I noticed the following in the code
      emission:
      
        # bpftool p d j i 988
        [...]
        38:   ldr     w10, [x1,x10]
        3c:   cmp     w2, w10
        40:   b.ge    0x000000000000007c              <-- signed cmp
        44:   mov     x10, #0x20                      // #32
        48:   cmp     x26, x10
        4c:   b.gt    0x000000000000007c
        50:   add     x26, x26, #0x1
        54:   mov     x10, #0x110                     // #272
        58:   add     x10, x1, x10
        5c:   lsl     x11, x2, #3
        60:   ldr     x11, [x10,x11]                  <-- faulting insn (f86b694b)
        64:   cbz     x11, 0x000000000000007c
        [...]
      
      Meaning, the tests passed because commit ddb55992 ("arm64:
      bpf: implement bpf_tail_call() helper") was using signed compares
      instead of unsigned which as a result had the test wrongly passing.
      
      Change this but also the tail call count test both into unsigned
      and cap the index as u32. Latter we did as well in 90caccdd
      ("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here,
      too. Tested on HiSilicon Hi1616.
      
      Result after patch:
      
        # bpftool p d j i 268
        [...]
        38:	ldr	w10, [x1,x10]
        3c:	add	w2, w2, #0x0
        40:	cmp	w2, w10
        44:	b.cs	0x0000000000000080
        48:	mov	x10, #0x20                  	// #32
        4c:	cmp	x26, x10
        50:	b.hi	0x0000000000000080
        54:	add	x26, x26, #0x1
        58:	mov	x10, #0x110                 	// #272
        5c:	add	x10, x1, x10
        60:	lsl	x11, x2, #3
        64:	ldr	x11, [x10,x11]
        68:	cbz	x11, 0x0000000000000080
        [...]
      
      Fixes: ddb55992 ("arm64: bpf: implement bpf_tail_call() helper")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      16338a9b
    • D
      bpf, x64: implement retpoline for tail call · a493a87f
      Daniel Borkmann 提交于
      Implement a retpoline [0] for the BPF tail call JIT'ing that converts
      the indirect jump via jmp %rax that is used to make the long jump into
      another JITed BPF image. Since this is subject to speculative execution,
      we need to control the transient instruction sequence here as well
      when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop.
      The latter aligns also with what gcc / clang emits (e.g. [1]).
      
      JIT dump after patch:
      
        # bpftool p d x i 1
         0: (18) r2 = map[id:1]
         2: (b7) r3 = 0
         3: (85) call bpf_tail_call#12
         4: (b7) r0 = 2
         5: (95) exit
      
      With CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000072  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000072  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000072  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	callq  0x000000000000006d  |+
        66:	pause                      |
        68:	lfence                     |
        6b:	jmp    0x0000000000000066  |
        6d:	mov    %rax,(%rsp)         |
        71:	retq                       |
        72:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        + retpoline for indirect jump
      
      Without CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000063  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000063  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000063  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	jmpq   *%rax               |-
        63:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        - plain indirect jump as before
      
        [0] https://support.google.com/faqs/answer/7625886
        [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2bSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a493a87f
    • H
      x86: Treat R_X86_64_PLT32 as R_X86_64_PC32 · b21ebf2f
      H.J. Lu 提交于
      On i386, there are 2 types of PLTs, PIC and non-PIC.  PIE and shared
      objects must use PIC PLT.  To use PIC PLT, you need to load
      _GLOBAL_OFFSET_TABLE_ into EBX first.  There is no need for that on
      x86-64 since x86-64 uses PC-relative PLT.
      
      On x86-64, for 32-bit PC-relative branches, we can generate PLT32
      relocation, instead of PC32 relocation, which can also be used as
      a marker for 32-bit PC-relative branches.  Linker can always reduce
      PLT32 relocation to PC32 if function is defined locally.   Local
      functions should use PC32 relocation.  As far as Linux kernel is
      concerned, R_X86_64_PLT32 can be treated the same as R_X86_64_PC32
      since Linux kernel doesn't use PLT.
      
      R_X86_64_PLT32 for 32-bit PC-relative branches has been enabled in
      binutils master branch which will become binutils 2.31.
      
      [ hjl is working on having better documentation on this all, but a few
        more notes from him:
      
         "PLT32 relocation is used as marker for PC-relative branches. Because
          of EBX, it looks odd to generate PLT32 relocation on i386 when EBX
          doesn't have GOT.
      
          As for symbol resolution, PLT32 and PC32 relocations are almost
          interchangeable. But when linker sees PLT32 relocation against a
          protected symbol, it can resolved locally at link-time since it is
          used on a branch instruction. Linker can't do that for PC32
          relocation"
      
        but for the kernel use, the two are basically the same, and this
        commit gets things building and working with the current binutils
        master   - Linus ]
      Signed-off-by: NH.J. Lu <hjl.tools@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b21ebf2f
  4. 22 2月, 2018 6 次提交
    • W
      arm64: Enforce BBM for huge IO/VMAP mappings · 15122ee2
      Will Deacon 提交于
      ioremap_page_range doesn't honour break-before-make and attempts to put
      down huge mappings (using p*d_set_huge) over the top of pre-existing
      table entries. This leads to us leaking page table memory and also gives
      rise to TLB conflicts and spurious aborts, which have been seen in
      practice on Cortex-A75.
      
      Until this has been resolved, refuse to put block mappings when the
      existing entry is found to be present.
      
      Fixes: 324420bf ("arm64: add support for ioremap() block mappings")
      Reported-by: NHanjun Guo <hanjun.guo@linaro.org>
      Reported-by: NLei Li <lious.lilei@hisilicon.com>
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      15122ee2
    • I
      treewide/trivial: Remove ';;$' typo noise · ed7158ba
      Ingo Molnar 提交于
      On lkml suggestions were made to split up such trivial typo fixes into per subsystem
      patches:
      
        --- a/arch/x86/boot/compressed/eboot.c
        +++ b/arch/x86/boot/compressed/eboot.c
        @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
                struct efi_uga_draw_protocol *uga = NULL, *first_uga;
                efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
                unsigned long nr_ugas;
        -       u32 *handles = (u32 *)uga_handle;;
        +       u32 *handles = (u32 *)uga_handle;
                efi_status_t status = EFI_INVALID_PARAMETER;
                int i;
      
      This patch is the result of the following script:
      
        $ sed -i 's/;;$/;/g' $(git grep -E ';;$'  | grep "\.[ch]:"  | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)
      
      ... followed by manual review to make sure it's all good.
      
      Splitting this up is just crazy talk, let's get over with this and just do it.
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ed7158ba
    • M
      powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access · 083b2090
      Mark Lord 提交于
      I am using SECCOMP to filter syscalls on a ppc32 platform, and noticed
      that the JIT compiler was failing on the BPF even though the
      interpreter was working fine.
      
      The issue was that the compiler was missing one of the instructions
      used by SECCOMP, so here is a patch to enable JIT for that
      instruction.
      
      Fixes: eb84bab0 ("ppc: Kconfig: Enable BPF JIT on ppc32")
      Signed-off-by: NMark Lord <mlord@pobox.com>
      Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      083b2090
    • M
      powerpc/pseries: Revert support for ibm,drc-info devtree property · c7a3275e
      Michael Bringmann 提交于
      This reverts commit 02ef6dd8.
      
      The earlier patch tried to enable support for a new property
      "ibm,drc-info" on powerpc systems.
      
      Unfortunately, some errors in the associated patch set break things
      in some of the DLPAR operations.  In particular when attempting to
      hot-add a new CPU or set of CPUs, the original patch failed to
      properly calculate the available resources, and aborted the operation.
      In addition, the original set missed several opportunities to compress
      and reuse common code.
      
      As the associated patch set was meant to provide an optimization of
      storage and performance of a set of device-tree properties for future
      systems with large amounts of resources, reverting just restores
      the previous behavior for existing systems.  It seems unnecessary
      to enable this feature and introduce the consequent problems in the
      field that it will cause at this time, so please revert it for now
      until testing of the corrections are finished properly.
      
      Fixes: 02ef6dd8 ("powerpc: Enable support for ibm,drc-info devtree property")
      Signed-off-by: NMichael W. Bringmann <mwb@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c7a3275e
    • M
      powerpc/pseries: Fix duplicate firmware feature for DRC_INFO · 5539d31a
      Michael Ellerman 提交于
      We had a mid-air collision between two new firmware features, DRMEM_V2
      and DRC_INFO, and they ended up with the same value.
      
      No one's actually reported any problems, presumably because the new
      firmware that supports both properties is not widely available, and
      the two properties tend to be enabled together.
      
      Still if we ever had one enabled but not the other, the bugs that
      could result are many and varied. So fix it.
      
      Fixes: 3f38000e ("powerpc/firmware: Add definitions for new drc-info firmware feature")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      5539d31a
    • A
      bug.h: work around GCC PR82365 in BUG() · 173a3efd
      Arnd Bergmann 提交于
      Looking at functions with large stack frames across all architectures
      led me discovering that BUG() suffers from the same problem as
      fortify_panic(), which I've added a workaround for already.
      
      In short, variables that go out of scope by calling a noreturn function
      or __builtin_unreachable() keep using stack space in functions
      afterwards.
      
      A workaround that was identified is to insert an empty assembler
      statement just before calling the function that doesn't return.  I'm
      adding a macro "barrier_before_unreachable()" to document this, and
      insert calls to that in all instances of BUG() that currently suffer
      from this problem.
      
      The files that saw the largest change from this had these frame sizes
      before, and much less with my patch:
      
        fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=]
        drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=]
      
      In case of ARC and CRIS, it turns out that the BUG() implementation
      actually does return (or at least the compiler thinks it does),
      resulting in lots of warnings about uninitialized variable use and
      leaving noreturn functions, such as:
      
        block/cfq-iosched.c: In function 'cfq_async_queue_prio':
        block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type]
        include/linux/dmaengine.h: In function 'dma_maxpq':
        include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type]
      
      This makes them call __builtin_trap() instead, which should normally
      dump the stack and kill the current process, like some of the other
      architectures already do.
      
      I tried adding barrier_before_unreachable() to panic() and
      fortify_panic() as well, but that had very little effect, so I'm not
      submitting that patch.
      
      Vineet said:
      
      : For ARC, it is double win.
      :
      : 1. Fixes 3 -Wreturn-type warnings
      :
      : | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function
      : [-Wreturn-type]
      : | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function
      : [-Wreturn-type]
      : | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of
      : non-void function [-Wreturn-type]
      :
      : 2.  bloat-o-meter reports code size improvements as gcc elides the
      :    generated code for stack return.
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
      Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>	[arch/arc]
      Tested-by: Vineet Gupta <vgupta@synopsys.com>	[arch/arc]
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christopher Li <sparse@chrisli.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      173a3efd
  5. 21 2月, 2018 5 次提交