1. 22 7月, 2019 1 次提交
  2. 20 7月, 2019 9 次提交
    • T
      x86/entry/64: Prevent clobbering of saved CR2 value · 6879298b
      Thomas Gleixner 提交于
      The recent fix for CR2 corruption introduced a new way to reliably corrupt
      the saved CR2 value.
      
      CR2 is saved early in the entry code in RDX, which is the third argument to
      the fault handling functions. But it missed that between saving and
      invoking the fault handler enter_from_user_mode() can be called. RDX is a
      caller saved register so the invoked function can freely clobber it with
      the obvious consequences.
      
      The TRACE_IRQS_OFF call is safe as it calls through the thunk which
      preserves RDX, but TRACE_IRQS_OFF_DEBUG is not because it also calls into
      C-code outside of the thunk.
      
      Store CR2 in R12 instead which is a callee saved register and move R12 to
      RDX just before calling the fault handler.
      
      Fixes: a0d14b89 ("x86/mm, tracing: Fix CR2 corruption")
      Reported-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907201020540.1782@nanos.tec.linutronix.de
      6879298b
    • E
      KVM: x86: Add fixed counters to PMU filter · 30cd8604
      Eric Hankland 提交于
      Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist
      fixed counters.
      Signed-off-by: NEric Hankland <ehankland@google.com>
      [No need to check padding fields for zero. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      30cd8604
    • P
      KVM: nVMX: do not use dangling shadow VMCS after guest reset · 88dddc11
      Paolo Bonzini 提交于
      If a KVM guest is reset while running a nested guest, free_nested will
      disable the shadow VMCS execution control in the vmcs01.  However,
      on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
      the VMCS12 to the shadow VMCS which has since been freed.
      
      This causes a vmptrld of a NULL pointer on my machime, but Jan reports
      the host to hang altogether.  Let's see how much this trivial patch fixes.
      Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      88dddc11
    • P
      KVM: VMX: dump VMCS on failed entry · 3b20e03a
      Paolo Bonzini 提交于
      This is useful for debugging, and is ratelimited nowadays.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3b20e03a
    • L
      KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed · 6fc3977c
      Like Xu 提交于
      If a perf_event creation fails due to any reason of the host perf
      subsystem, it has no chance to log the corresponding event for guest
      which may cause abnormal sampling data in guest result. In debug mode,
      this message helps to understand the state of vPMC and we may not
      limit the number of occurrences but not in a spamming style.
      Suggested-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6fc3977c
    • W
      KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup · d9847409
      Wanpeng Li 提交于
      Use kvm_vcpu_wake_up() in kvm_s390_vcpu_wakeup().
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d9847409
    • W
      KVM: Boost vCPUs that are delivering interrupts · d73eb57b
      Wanpeng Li 提交于
      Inspired by commit 9cac38dd (KVM/s390: Set preempted flag during
      vcpu wakeup and interrupt delivery), we want to also boost not just
      lock holders but also vCPUs that are delivering interrupts. Most
      smp_call_function_many calls are synchronous, so the IPI target vCPUs
      are also good yield candidates.  This patch introduces vcpu->ready to
      boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do
      not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken
      into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected
      (VT-d PI handles voluntary preemption separately, in pi_pre_block).
      
      Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
      ebizzy -M
      
                  vanilla     boosting    improved
      1VM          21443       23520         9%
      2VM           2800        8000       180%
      3VM           1800        3100        72%
      
      Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
      one running ebizzy -M, the other running 'stress --cpu 2':
      
      w/ boosting + w/o pv sched yield(vanilla)
      
                  vanilla     boosting   improved
                    1570         4000      155%
      
      w/ boosting + w/ pv sched yield(vanilla)
      
                  vanilla     boosting   improved
                    1844         5157      179%
      
      w/o boosting, perf top in VM:
      
       72.33%  [kernel]       [k] smp_call_function_many
        4.22%  [kernel]       [k] call_function_i
        3.71%  [kernel]       [k] async_page_fault
      
      w/ boosting, perf top in VM:
      
       38.43%  [kernel]       [k] smp_call_function_many
        6.31%  [kernel]       [k] async_page_fault
        6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
        4.88%  [kernel]       [k] call_function_interrupt
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d73eb57b
    • L
      KVM: SVM: Fix detection of AMD Errata 1096 · 118154bd
      Liran Alon 提交于
      When CPU raise #NPF on guest data access and guest CR4.SMAP=1, it is
      possible that CPU microcode implementing DecodeAssist will fail
      to read bytes of instruction which caused #NPF. This is AMD errata
      1096 and it happens because CPU microcode reading instruction bytes
      incorrectly attempts to read code as implicit supervisor-mode data
      accesses (that is, just like it would read e.g. a TSS), which are
      susceptible to SMAP faults. The microcode reads CS:RIP and if it is
      a user-mode address according to the page tables, the processor
      gives up and returns no instruction bytes.  In this case,
      GuestIntrBytes field of the VMCB on a VMEXIT will incorrectly
      return 0 instead of the correct guest instruction bytes.
      
      Current KVM code attemps to detect and workaround this errata, but it
      has multiple issues:
      
      1) It mistakenly checks if guest CR4.SMAP=0 instead of guest CR4.SMAP=1,
      which is required for encountering a SMAP fault.
      
      2) It assumes SMAP faults can only occur when guest CPL==3.
      However, in case guest CR4.SMEP=0, the guest can execute an instruction
      which reside in a user-accessible page with CPL<3 priviledge. If this
      instruction raise a #NPF on it's data access, then CPU DecodeAssist
      microcode will still encounter a SMAP violation.  Even though no sane
      OS will do so (as it's an obvious priviledge escalation vulnerability),
      we still need to handle this semanticly correct in KVM side.
      
      Note that (2) *is* a useful optimization, because CR4.SMAP=1 is an easy
      triggerable condition and guests usually enable SMAP together with SMEP.
      If the vCPU has CR4.SMEP=1, the errata could indeed be encountered onlt
      at guest CPL==3; otherwise, the CPU would raise a SMEP fault to guest
      instead of #NPF.  We keep this condition to avoid false positives in
      the detection of the errata.
      
      In addition, to avoid future confusion and improve code readbility,
      include details of the errata in code and not just in commit message.
      
      Fixes: 05d5a486 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
      Cc: Singh Brijesh <brijesh.singh@amd.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      118154bd
    • W
      KVM: LAPIC: Inject timer interrupt via posted interrupt · 0c5f81da
      Wanpeng Li 提交于
      Dedicated instances are currently disturbed by unnecessary jitter due
      to the emulated lapic timers firing on the same pCPUs where the
      vCPUs reside.  There is no hardware virtual timer on Intel for guest
      like ARM, so both programming timer in guest and the emulated timer fires
      incur vmexits.  This patch tries to avoid vmexit when the emulated timer
      fires, at least in dedicated instance scenario when nohz_full is enabled.
      
      In that case, the emulated timers can be offload to the nearest busy
      housekeeping cpus since APICv has been found for several years in server
      processors. The guest timer interrupt can then be injected via posted interrupts,
      which are delivered by the housekeeping cpu once the emulated timer fires.
      
      The host should tuned so that vCPUs are placed on isolated physical
      processors, and with several pCPUs surplus for busy housekeeping.
      If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
      ~3% redis performance benefit can be observed on Skylake server, and the
      number of external interrupt vmexits drops substantially.  Without patch
      
                  VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
      EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
      
      While with patch:
      
                  VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
      EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0c5f81da
  3. 19 7月, 2019 30 次提交
    • D
      x86/hyper-v: Zero out the VP ASSIST PAGE on allocation · e320ab3c
      Dexuan Cui 提交于
      The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section
      5.2.1 "GPA Overlay Pages" for the details) and here is an excerpt:
      
      "The hypervisor defines several special pages that "overlay" the guest's
       Guest Physical Addresses (GPA) space. Overlays are addressed GPA but are
       not included in the normal GPA map maintained internally by the hypervisor.
       Conceptually, they exist in a separate map that overlays the GPA map.
      
       If a page within the GPA space is overlaid, any SPA page mapped to the
       GPA page is effectively "obscured" and generally unreachable by the
       virtual processor through processor memory accesses.
      
       If an overlay page is disabled, the underlying GPA page is "uncovered",
       and an existing mapping becomes accessible to the guest."
      
      SPA = System Physical Address = the final real physical address.
      
      When a CPU (e.g. CPU1) is onlined, hv_cpu_init() allocates the VP ASSIST
      PAGE and enables the EOI optimization for this CPU by writing the MSR
      HV_X64_MSR_VP_ASSIST_PAGE. From now on, hvp->apic_assist belongs to the
      special SPA page, and this CPU *always* uses hvp->apic_assist (which is
      shared with the hypervisor) to decide if it needs to write the EOI MSR.
      
      When a CPU is offlined then on the outgoing CPU:
      1. hv_cpu_die() disables the EOI optimizaton for this CPU, and from
         now on hvp->apic_assist belongs to the original "normal" SPA page;
      2. the remaining work of stopping this CPU is done
      3. this CPU is completely stopped.
      
      Between 1 and 3, this CPU can still receive interrupts (e.g. reschedule
      IPIs from CPU0, and Local APIC timer interrupts), and this CPU *must* write
      the EOI MSR for every interrupt received, otherwise the hypervisor may not
      deliver further interrupts, which may be needed to completely stop the CPU.
      
      So, after the EOI optimization is disabled in hv_cpu_die(), it's required
      that the hvp->apic_assist's bit0 is zero, which is not guaranteed by the
      current allocation mode because it lacks __GFP_ZERO. As a consequence the
      bit might be set and interrupt handling would not write the EOI MSR causing
      interrupt delivery to become stuck.
      
      Add the missing __GFP_ZERO to the allocation.
      
      Note 1: after the "normal" SPA page is allocted and zeroed out, neither the
      hypervisor nor the guest writes into the page, so the page remains with
      zeros.
      
      Note 2: see Section 10.3.5 "EOI Assist" for the details of the EOI
      optimization. When the optimization is enabled, the guest can still write
      the EOI MSR register irrespective of the "No EOI required" value, but
      that's slower than the optimized assist based variant.
      
      Fixes: ba696429 ("x86/hyper-v: Implement EOI assist")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/ <PU1P153MB0169B716A637FABF07433C04BFCB0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM
      e320ab3c
    • G
      csky: Fixup abiv1 memset error · bdfeb0cc
      Guo Ren 提交于
      Current memset implementation in abiv1 is wrong and it'll cause unalign
      access. Just remove it and use the generic one. This patch will cause
      performance degradation and we will improve it with a new design in next
      patchset.
      Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      bdfeb0cc
    • G
      csky: Improve tlb operation with help of asid · 4e562c11
      Guo Ren 提交于
      There are two generations of tlb operation instruction for C-SKY.
      First generation is use mcr register and it need software do more
      things, second generation is use specific instructions, eg:
       tlbi.va, tlbi.vas, tlbi.alls
      
      We implemented the following functions:
      
       - flush_tlb_range (a range of entries)
       - flush_tlb_page (one entry)
      
       Above functions use asid from vma->mm to invalid tlb entries and
       we could use tlbi.vas instruction for newest generation csky cpu.
      
       - flush_tlb_kernel_range
       - flush_tlb_one
      
       Above functions don't care asid and it invalid the tlb entries only
       with vpn and we could use tlbi.vaas instruction for newest generat-
       ion csky cpu.
      Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      4e562c11
    • G
      csky: Use generic asid algorithm to implement switch_mm · 22d55f02
      Guo Ren 提交于
      Use linux generic asid/vmid algorithm to implement csky
      switch_mm function. The algorithm is from arm and it could
      work with SMP system. It'll help reduce tlb flush for
      switch_mm in task/vm switch.
      Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      22d55f02
    • G
      csky: Add new asid lib code from arm · a231b883
      Guo Ren 提交于
      This patch only contains asid help code from arm for next patch to
      use.
      
      The asid allocator use five level check to reduce the cost of
      switch_mm.
      
       1. Check if the asid version is the same (it's general)
       2. Check reserved_asid which is set in rollover flush_context()
          and key point is to keep the same bit position with the current
          asid version instead of input version.
       3. Check if the position of bitmap is free then it could be set &
          used directly.
       4. find_next_zero_bit() (a little performance cost)
       5. flush_context  (this is the worst cost with increase current asid
          version)
      
      Check is level by level and cost is also higher with the next level.
      The reserved_asid and bitmap mechanism prevent unnecessary
      find_next_zero_bit().
      
      The atomic 64 bit asid is also suitable for 32-bit system and it
      won't cost a lot in 1th 2th 3th level check.
      
      The operation of set/clear mm_cpumask was removed in arm64 compared to
      arm32. It seems no side effect on current arm64 system, but from
      software meaning it's wrong. Although csky also needn't it, we add it
      back for csky.
      
      The asid_per_ctxt is no use for csky and it reserves the lowest bits for
      other use, maybe: trust zone ? Ok, just keep it in csky copy.
      
      Seems it also could be used by other archs and it's worth to move asid
      code to generic in future.
      Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Julien Grall <julien.grall@arm.com>
      a231b883
    • G
      csky: Revert mmu ASID mechanism · 9d35dc30
      Guo Ren 提交于
      Current C-SKY ASID mechanism is from mips and it doesn't work well
      with multi-cores. ASID per core mechanism is not suitable for C-SKY
      SMP tlb maintain operations, eg: tlbi.vas need share the same asid
      in all processors and it'll invalid the tlb entry in all cores with
      the same asid.
      
      This patch is prepare for new ASID mechanism.
      Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      9d35dc30
    • G
      csky: Fixup some error count in 810 & 860. · e7534198
      Guo Ren 提交于
      CK810 pmu only support event with index 0-8 and 0xd; CK860 only
      support event 1~4, 0xa~0x1b. So do not register unsupport event
      to hardware cache event, which may leader to unknown behavior.
      Signed-off-by: NMao Han <han_mao@c-sky.com>
      Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
      e7534198
    • M
      csky: Fix perf record in kernel/user space · d41435d9
      Mao Han 提交于
      csky_pmu_event_init is called several times during the perf record
      initialzation. After configure the event counter in either kernel
      space or user space, csky_pmu_event_init is called twice with no
      attr specified. Configuration will be overwritten with sampling in
      both kernel space and user space. --all-kernel/--all-user is
      useless without this patch applied.
      Signed-off-by: NMao Han <han_mao@c-sky.com>
      Signed-off-by: NGuo Ren <guoren@kernel.org>
      d41435d9
    • M
      csky: Add pmu interrupt support · f622fbf2
      Mao Han 提交于
      This patch add interrupt request and handler for csky pmu.
      perf can record on hardware event with this patch applied.
      Signed-off-by: NMao Han <han_mao@c-sky.com>
      Signed-off-by: NGuo Ren <guoren@kernel.org>
      f622fbf2
    • M
      csky: Add count-width property for csky pmu · ccffa1ad
      Mao Han 提交于
      The csky pmu counter may have different io width. When the counter is
      smaller then 64 bits and counter value is smaller than the old value, it
      will result to a extremely large delta value. So the sampled value should
      be extend to 64 bits to avoid this, the extension bits base on the
      count-width property from dts.
      Signed-off-by: NMao Han <han_mao@c-sky.com>
      Signed-off-by: NGuo Ren <guoren@kernel.org>
      ccffa1ad
    • M
      csky: Init pmu as a device · f132076c
      Mao Han 提交于
      This patch change the csky pmu initialization from arch init to
      device init. The pmu can be configued with information from
      device tree(pmu device name, irq number and etc.).
      Signed-off-by: NMao Han <han_mao@c-sky.com>
      Signed-off-by: NGuo Ren <guoren@kernel.org>
      f132076c
    • G
      csky: Fixup no panic in kernel for some traps · 3158d289
      Guo Ren 提交于
      These traps couldn't be hanppen in kernel and we must panic there not
      send a signal to userspace.
      Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      3158d289
    • G
      csky: Select intc & timer drivers · 1994cc49
      Guo Ren 提交于
      Let arch help to select interrupt controller's and timer's drivers
      instead of people using menuconfig to select. This help the mini system
      boot up.
      Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      1994cc49
    • M
      proc/sysctl: add shared variables for range check · eec4844f
      Matteo Croce 提交于
      In the sysctl code the proc_dointvec_minmax() function is often used to
      validate the user supplied value between an allowed range.  This
      function uses the extra1 and extra2 members from struct ctl_table as
      minimum and maximum allowed value.
      
      On sysctl handler declaration, in every source file there are some
      readonly variables containing just an integer which address is assigned
      to the extra1 and extra2 members, so the sysctl range is enforced.
      
      The special values 0, 1 and INT_MAX are very often used as range
      boundary, leading duplication of variables like zero=0, one=1,
      int_max=INT_MAX in different source files:
      
          $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
          248
      
      Add a const int array containing the most commonly used values, some
      macros to refer more easily to the correct array member, and use them
      instead of creating a local one for every object file.
      
      This is the bloat-o-meter output comparing the old and new binary
      compiled with the default Fedora config:
      
          # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
          add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
          Data                                         old     new   delta
          sysctl_vals                                    -      12     +12
          __kstrtab_sysctl_vals                          -      12     +12
          max                                           14      10      -4
          int_max                                       16       -     -16
          one                                           68       -     -68
          zero                                         128      28    -100
          Total: Before=20583249, After=20583085, chg -0.00%
      
      [mcroce@redhat.com: tipc: remove two unused variables]
        Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
      [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
      [arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
        Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
      [akpm@linux-foundation.org: fix fs/eventpoll.c]
      Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.comSigned-off-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eec4844f
    • D
      mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() · e9c0a3f0
      Dan Williams 提交于
      Allow sub-section sized ranges to be added to the memmap.
      
      populate_section_memmap() takes an explict pfn range rather than
      assuming a full section, and those parameters are plumbed all the way
      through to vmmemap_populate().  There should be no sub-section usage in
      current deployments.  New warnings are added to clarify which memmap
      allocation paths are sub-section capable.
      
      Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9c0a3f0
    • D
      mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns · fbcf73ce
      David Hildenbrand 提交于
      walk_memory_range() was once used to iterate over sections.  Now, it
      iterates over memory blocks.  Rename the function, fixup the
      documentation.
      
      Also, pass start+size instead of PFNs, which is what most callers
      already have at hand.  (we'll rework link_mem_sections() most probably
      soon)
      
      Follow-up patches will rework, simplify, and move walk_memory_blocks()
      to drivers/base/memory.c.
      
      Note: walk_memory_blocks() only works correctly right now if the
      start_pfn is aligned to a section start.  This is the case right now,
      but we'll generalize the function in a follow up patch so the semantics
      match the documentation.
      
      [akpm@linux-foundation.org: remove unused variable]
      Link: http://lkml.kernel.org/r/20190614100114.311-5-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Rashmica Gupta <rashmica.g@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fbcf73ce
    • D
      mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVE · 80ec922d
      David Hildenbrand 提交于
      We want to improve error handling while adding memory by allowing to use
      arch_remove_memory() and __remove_pages() even if
      CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
      
      	arch_add_memory()
      	rc = do_something();
      	if (rc) {
      		arch_remove_memory();
      	}
      
      We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
      quite some dependencies for memory offlining.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80ec922d
    • D
      arm64/mm: add temporary arch_remove_memory() implementation · 22eb6346
      David Hildenbrand 提交于
      A proper arch_remove_memory() implementation is on its way, which also
      cleanly removes page tables in arch_add_memory() in case something goes
      wrong.
      
      As we want to use arch_remove_memory() in case something goes wrong
      during memory hotplug after arch_add_memory() finished, let's add a
      temporary hack that is sufficient enough until we get a proper
      implementation that cleans up page table entries.
      
      We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up
      patches.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-5-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22eb6346
    • D
      s390x/mm: implement arch_remove_memory() · 18c86506
      David Hildenbrand 提交于
      Will come in handy when wanting to handle errors after
      arch_add_memory().
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-4-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18c86506
    • D
      s390x/mm: fail when an altmap is used for arch_add_memory() · 973de24a
      David Hildenbrand 提交于
      ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we
      don't forget arch_add_memory()/arch_remove_memory() when unlocking
      support.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Suggested-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      973de24a
    • T
      sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT · a50a3f4b
      Thomas Gleixner 提交于
      Add a new entry to the preemption menu which enables the real-time support
      for the kernel. The choice is only enabled when an architecture supports
      it.
      
      It selects PREEMPT as the RT features depend on it. To achieve that the
      existing PREEMPT choice is renamed to PREEMPT_LL which select PREEMPT as
      well.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPaul E. McKenney <paulmck@linux.ibm.com>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NClark Williams <williams@redhat.com>
      Acked-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Acked-by: NFrederic Weisbecker <frederic@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: NDaniel Wagner <wagi@monom.org>
      Acked-by: NLuis Claudio R. Goncalves <lgoncalv@redhat.com>
      Acked-by: NJulia Cartwright <julia@ni.com>
      Acked-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Acked-by: NGratian Crisan <gratian.crisan@ni.com>
      Acked-by: NSebastian Siewior <bigeasy@linutronix.de>
      Cc: Andrew Morton <akpm@linuxfoundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Tejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1907172200190.1778@nanos.tec.linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a50a3f4b
    • Z
      x86, boot: Remove multiple copy of static function sanitize_boot_params() · 8c5477e8
      Zhenzhong Duan 提交于
      Kernel build warns:
       'sanitize_boot_params' defined but not used [-Wunused-function]
      
      at below files:
        arch/x86/boot/compressed/cmdline.c
        arch/x86/boot/compressed/error.c
        arch/x86/boot/compressed/early_serial_console.c
        arch/x86/boot/compressed/acpi.c
      
      That's becausethey each include misc.h which includes a definition of
      sanitize_boot_params() via bootparam_utils.h.
      
      Remove the inclusion from misc.h and have the c file including
      bootparam_utils.h directly.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1563283092-1189-1-git-send-email-zhenzhong.duan@oracle.com
      8c5477e8
    • Z
      x86/boot/compressed/64: Remove unused variable · 449f3286
      Zhenzhong Duan 提交于
      Fix gcc warning:
      
      arch/x86/boot/compressed/pgtable_64.c: In function 'find_trampoline_placement':
      arch/x86/boot/compressed/pgtable_64.c:43:16: warning: unused variable 'trampoline_start' [-Wunused-variable]
        unsigned long trampoline_start;
                     ^
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/1563283040-31101-1-git-send-email-zhenzhong.duan@oracle.com
      449f3286
    • Z
      x86/boot/efi: Remove unused variables · cd6697b8
      Zhenzhong Duan 提交于
      Fix gcc warnings:
      
      arch/x86/boot/compressed/eboot.c: In function 'make_boot_params':
      arch/x86/boot/compressed/eboot.c:394:6: warning: unused variable 'i' [-Wunused-variable]
        int i;
            ^
      arch/x86/boot/compressed/eboot.c:393:6: warning: unused variable 's1' [-Wunused-variable]
        u8 *s1;
            ^
      arch/x86/boot/compressed/eboot.c:392:7: warning: unused variable 's2' [-Wunused-variable]
        u16 *s2;
             ^
      arch/x86/boot/compressed/eboot.c:387:8: warning: unused variable 'options' [-Wunused-variable]
        void *options, *handle;
              ^
      arch/x86/boot/compressed/eboot.c: In function 'add_e820ext':
      arch/x86/boot/compressed/eboot.c:498:16: warning: unused variable 'size' [-Wunused-variable]
        unsigned long size;
                      ^
      arch/x86/boot/compressed/eboot.c:497:15: warning: unused variable 'status' [-Wunused-variable]
        efi_status_t status;
                     ^
      arch/x86/boot/compressed/eboot.c: In function 'exit_boot_func':
      arch/x86/boot/compressed/eboot.c:681:15: warning: unused variable 'status' [-Wunused-variable]
        efi_status_t status;
                     ^
      arch/x86/boot/compressed/eboot.c:680:8: warning: unused variable 'nr_desc' [-Wunused-variable]
        __u32 nr_desc;
              ^
      arch/x86/boot/compressed/eboot.c: In function 'efi_main':
      arch/x86/boot/compressed/eboot.c:750:22: warning: unused variable 'image' [-Wunused-variable]
        efi_loaded_image_t *image;
                            ^
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1563282957-26898-1-git-send-email-zhenzhong.duan@oracle.com
      cd6697b8
    • J
      x86/uaccess: Remove redundant CLACs in getuser/putuser error paths · 82e844a6
      Josh Poimboeuf 提交于
      The same getuser/putuser error paths are used regardless of whether AC
      is set.  In non-exception failure cases, this results in an unnecessary
      CLAC.
      
      Fixes the following warnings:
      
        arch/x86/lib/getuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable
        arch/x86/lib/putuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/bc14ded2755ae75bd9010c446079e113dbddb74b.1563413318.git.jpoimboe@redhat.com
      82e844a6
    • J
      x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail() · 5e307a6b
      Josh Poimboeuf 提交于
      After adding mcsafe_handle_tail() to the objtool uaccess safe list,
      objtool reports:
      
        arch/x86/lib/usercopy_64.o: warning: objtool: mcsafe_handle_tail()+0x0: call to __fentry__() with UACCESS enabled
      
      With SMAP, this function is called with AC=1, so it needs to be careful
      about which functions it calls.  Disable the ftrace entry hook, which
      can potentially pull in a lot of extra code.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/8e13d6f0da1c8a3f7603903da6cbf6d582bbfe10.1563413318.git.jpoimboe@redhat.com
      5e307a6b
    • J
      x86/uaccess: Remove ELF function annotation from copy_user_handle_tail() · 3a6ab4bc
      Josh Poimboeuf 提交于
      After an objtool improvement, it's complaining about the CLAC in
      copy_user_handle_tail():
      
        arch/x86/lib/copy_user_64.o: warning: objtool: .altinstr_replacement+0x12: redundant UACCESS disable
        arch/x86/lib/copy_user_64.o: warning: objtool:   copy_user_handle_tail()+0x6: (alt)
        arch/x86/lib/copy_user_64.o: warning: objtool:   copy_user_handle_tail()+0x2: (alt)
        arch/x86/lib/copy_user_64.o: warning: objtool:   copy_user_handle_tail()+0x0: <=== (func)
      
      copy_user_handle_tail() is incorrectly marked as a callable function, so
      objtool is rightfully concerned about the CLAC with no corresponding
      STAC.
      
      Remove the ELF function annotation.  The copy_user_handle_tail() code
      path is already verified by objtool because it's jumped to by other
      callable asm code (which does the corresponding STAC).
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/6b6e436774678b4b9873811ff023bd29935bee5b.1563413318.git.jpoimboe@redhat.com
      3a6ab4bc
    • J
      x86/head/64: Annotate start_cpu0() as non-callable · 61a73f5c
      Josh Poimboeuf 提交于
      After an objtool improvement, it complains about the fact that
      start_cpu0() jumps to code which has an LRET instruction.
      
        arch/x86/kernel/head_64.o: warning: objtool: .head.text+0xe4: unsupported instruction in callable function
      
      Technically, start_cpu0() is callable, but it acts nothing like a
      callable function.  Prevent objtool from treating it like one by
      removing its ELF function annotation.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/6b1b4505fcb90571a55fa1b52d71fb458ca24454.1563413318.git.jpoimboe@redhat.com
      61a73f5c
    • J
      x86/entry: Fix thunk function ELF sizes · e6dd4739
      Josh Poimboeuf 提交于
      Fix the following warnings:
      
        arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_on_thunk() is missing an ELF size annotation
        arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_off_thunk() is missing an ELF size annotation
        arch/x86/entry/thunk_64.o: warning: objtool: lockdep_sys_exit_thunk() is missing an ELF size annotation
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/89c97adc9f6cc44a0f5d03cde6d0357662938909.1563413318.git.jpoimboe@redhat.com
      e6dd4739
    • J
      x86/kvm: Don't call kvm_spurious_fault() from .fixup · 3901336e
      Josh Poimboeuf 提交于
      After making a change to improve objtool's sibling call detection, it
      started showing the following warning:
      
        arch/x86/kvm/vmx/nested.o: warning: objtool: .fixup+0x15: sibling call from callable instruction with modified stack frame
      
      The problem is the ____kvm_handle_fault_on_reboot() macro.  It does a
      fake call by pushing a fake RIP and doing a jump.  That tricks the
      unwinder into printing the function which triggered the exception,
      rather than the .fixup code.
      
      Instead of the hack to make it look like the original function made the
      call, just change the macro so that the original function actually does
      make the call.  This allows removal of the hack, and also makes objtool
      happy.
      
      I triggered a vmx instruction exception and verified that the stack
      trace is still sane:
      
        kernel BUG at arch/x86/kvm/x86.c:358!
        invalid opcode: 0000 [#1] SMP PTI
        CPU: 28 PID: 4096 Comm: qemu-kvm Not tainted 5.2.0+ #16
        Hardware name: Lenovo THINKSYSTEM SD530 -[7X2106Z000]-/-[7X2106Z000]-, BIOS -[TEE113Z-1.00]- 07/17/2017
        RIP: 0010:kvm_spurious_fault+0x5/0x10
        Code: 00 00 00 00 00 8b 44 24 10 89 d2 45 89 c9 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 d4 40 22 00 0f 1f 40 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
        RSP: 0018:ffffbf91c683bd00 EFLAGS: 00010246
        RAX: 000061f040000000 RBX: ffff9e159c77bba0 RCX: ffff9e15a5c87000
        RDX: 0000000665c87000 RSI: ffff9e15a5c87000 RDI: ffff9e159c77bba0
        RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9e15a5c87000
        R10: 0000000000000000 R11: fffff8f2d99721c0 R12: ffff9e159c77bba0
        R13: ffffbf91c671d960 R14: ffff9e159c778000 R15: 0000000000000000
        FS:  00007fa341cbe700(0000) GS:ffff9e15b7400000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fdd38356804 CR3: 00000006759de003 CR4: 00000000007606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        PKRU: 55555554
        Call Trace:
         loaded_vmcs_init+0x4f/0xe0
         alloc_loaded_vmcs+0x38/0xd0
         vmx_create_vcpu+0xf7/0x600
         kvm_vm_ioctl+0x5e9/0x980
         ? __switch_to_asm+0x40/0x70
         ? __switch_to_asm+0x34/0x70
         ? __switch_to_asm+0x40/0x70
         ? __switch_to_asm+0x34/0x70
         ? free_one_page+0x13f/0x4e0
         do_vfs_ioctl+0xa4/0x630
         ksys_ioctl+0x60/0x90
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x55/0x1c0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7fa349b1ee5b
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/64a9b64d127e87b6920a97afde8e96ea76f6524e.1563413318.git.jpoimboe@redhat.com
      3901336e