1. 28 7月, 2017 1 次提交
  2. 27 7月, 2017 7 次提交
    • W
      drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU · a3287c41
      Will Deacon 提交于
      Since the PMU register interface is banked per CPU, CPU PMU interrrupts
      cannot be handled by a CPU other than the one with the PMU asserting the
      interrupt. This means that migrating PMU SPIs, as we do during a CPU
      hotplug operation doesn't make any sense and can lead to the IRQ being
      disabled entirely if we route a spurious IRQ to the new affinity target.
      
      This has been observed in practice on AMD Seattle, where CPUs on the
      non-boot cluster appear to take a spurious PMU IRQ when coming online,
      which is routed to CPU0 where it cannot be handled.
      
      This patch passes IRQF_PERCPU for PMU SPIs and forcefully sets their
      affinity prior to requesting them, ensuring that they cannot
      be migrated during hotplug events. This interacts badly with the DB8500
      erratum workaround that ping-pongs the interrupt affinity from the handler,
      so we avoid passing IRQF_PERCPU in that case by allowing the IRQ flags
      to be overridden in the platdata.
      
      Fixes: 3cf7ee98 ("drivers/perf: arm_pmu: move irq request/free into probe")
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      a3287c41
    • A
      powerpc/mm/hash: Free the subpage_prot_table correctly · 0da12a7a
      Aneesh Kumar K.V 提交于
      Fixes: dad6f37c ("powerpc: subpage_protect: Increase the array size to take care of 64TB")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Tested-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0da12a7a
    • W
      KVM: LAPIC: Fix reentrancy issues with preempt notifiers · 1d518c68
      Wanpeng Li 提交于
      Preempt can occur in the preemption timer expiration handler:
      
                CPU0                    CPU1
      
        preemption timer vmexit
        handle_preemption_timer(vCPU0)
          kvm_lapic_expired_hv_timer
            hv_timer_is_use == true
        sched_out
                                 sched_in
                                 kvm_arch_vcpu_load
                                   kvm_lapic_restart_hv_timer
                                     restart_apic_timer
                                       start_hv_timer
                                         already-expired timer or sw timer triggerd in the window
                                       start_sw_timer
                                         cancel_hv_timer
                                 /* back in kvm_lapic_expired_hv_timer */
                                 cancel_hv_timer
                                   WARN_ON(!apic->lapic_timer.hv_timer_in_use);  ==> Oops
      
      This can be reproduced if CONFIG_PREEMPT is enabled.
      
      ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1563 kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
       CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G           OE   4.13.0-rc2+ #16
       RIP: 0010:kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
      Call Trace:
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xb8/0xd70 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
       ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1498 cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
       CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0-rc2+ #16
       RIP: 0010:cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
      Call Trace:
        kvm_lapic_expired_hv_timer+0x3e/0xb0 [kvm]
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xb8/0xd70 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This patch fixes it by making the caller of cancel_hv_timer, start_hv_timer
      and start_sw_timer be in preemption-disabled regions, which trivially
      avoid any reentrancy issue with preempt notifier.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      [Add more WARNs. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1d518c68
    • W
      KVM: nVMX: Fix loss of L2's NMI blocking state · 2d6144e3
      Wanpeng Li 提交于
      Run kvm-unit-tests/eventinj.flat in L1 w/ ept=0 on both L0 and L1:
      
      Before NMI IRET test
      Sending NMI to self
      NMI isr running stack 0x461000
      Sending nested NMI to self
      After nested NMI to self
      Nested NMI isr running rip=40038e
      After iret
      After NMI to self
      FAIL: NMI
      
      Commit 4c4a6f79 (KVM: nVMX: track NMI blocking state separately
      for each VMCS) tracks NMI blocking state separately for vmcs01 and
      vmcs02. However it is not enough:
      
       - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault
         on IRET, so the L2 can generate #PF which can be intercepted by L0.
       - L0 walks L1's guest page table and sees the mapping is invalid, it
         resumes the L1 guest and injects the #PF into L1.  At this point the
         vmcs02 has nmi_known_unmasked=true.
       - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field
         of vmcs12 (and fixes the shadow page table) before resuming L2 guest.
       - L1 executes VMRESUME to resume L2, causing a vmexit to L0
       - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the
         interruptibility-state field of vmcs02, but nmi_known_unmasked is
         still true.
       - L2 immediately exits to L0 with another page fault, because L0 still has
         not updated the NGVA->HPA page tables.  However, nmi_known_unmasked is
         true so vmx_recover_nmi_blocking does not do anything.
      
      The fix is to update nmi_known_unmasked when preparing vmcs02 from vmcs12.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2d6144e3
    • W
      KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode · 06a5524f
      Wincy Van 提交于
      The PI vector for L0 and L1 must be different. If dest vcpu0
      is in guest mode while vcpu1 is delivering a non-nested PI to
      vcpu0, there wont't be any vmexit so that the non-nested interrupt
      will be delayed.
      Signed-off-by: NWincy Van <fanwenyi0529@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      06a5524f
    • W
      x86: irq: Define a global vector for nested posted interrupts · 210f84b0
      Wincy Van 提交于
      We are using the same vector for nested/non-nested posted
      interrupts delivery, this may cause interrupts latency in
      L1 since we can't kick the L2 vcpu out of vmx-nonroot mode.
      
      This patch introduces a new vector which is only for nested
      posted interrupts to solve the problems above.
      Signed-off-by: NWincy Van <fanwenyi0529@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      210f84b0
    • P
      KVM: x86: do mask out upper bits of PAE CR3 · a512177e
      Paolo Bonzini 提交于
      This reverts the change of commit f85c758d,
      as the behavior it modified was intended.
      
      The VM is running in 32-bit PAE mode, and Table 4-7 of the Intel manual
      says:
      
      Table 4-7. Use of CR3 with PAE Paging
      Bit Position(s)	Contents
      4:0		Ignored
      31:5		Physical address of the 32-Byte aligned
      		page-directory-pointer table used for linear-address
      		translation
      63:32		Ignored (these bits exist only on processors supporting
      		the Intel-64 architecture)
      
      To placate the static checker, write the mask explicitly as an
      unsigned long constant instead of using a 32-bit unsigned constant.
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Fixes: f85c758dSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a512177e
  3. 26 7月, 2017 10 次提交
    • D
      arm64: sysreg: Fix unprotected macro argmuent in write_sysreg · d0153c7f
      Dave Martin 提交于
      write_sysreg() may misparse the value argument because it is used
      without parentheses to protect it.
      
      This patch adds the ( ) in order to avoid any surprises.
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      [will: same change to write_sysreg_s]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      d0153c7f
    • M
      powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain · b40b2386
      Michael Ellerman 提交于
      In commit efe0160c ("powerpc/64: Linker on-demand sfpr functions
      for modules"), we added an ld version check early in the powerpc
      top-level Makefile.
      
      Because the Makefile runs before the kernel config is setup, the
      checks for CONFIG_CPU_LITTLE_ENDIAN etc. all take the default case. So
      we end up configuring ld for 32-bit big endian.
      
      That would be OK, except that for historical (or perhaps no) reason,
      we use 'override LD' to add the endian flags to the LD variable
      itself, rather than the normal approach of adding them to LDFLAGS.
      
      The end result is that when we check the ld version we run it as:
      
        $(CROSS_COMPILE)ld -EB -m elf32ppc --version
      
      This often works, unless you are using a 64-bit only and/or little
      endian only, toolchain. In which case you see something like:
      
        $ make defconfig
        powerpc64le-linux-ld: unrecognised emulation mode: elf32ppc
        Supported emulations: elf64lppc elf32lppc elf32lppclinux elf32lppcsim
        /bin/sh: 1: [: -ge: unexpected operator
      
      The proper fix is to stop using 'override LD', but that will require a
      fair bit of testing. Instead we can fix it for now just by reordering
      the Makefile to do the version check earlier.
      
      Fixes: efe0160c ("powerpc/64: Linker on-demand sfpr functions for modules")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b40b2386
    • L
      powerpc/pseries: Fix of_node_put() underflow during reconfig remove · 4fd1bd44
      Laurent Vivier 提交于
      As for commit 68baf692 ("powerpc/pseries: Fix of_node_put()
      underflow during DLPAR remove"), the call to of_node_put() must be
      removed from pSeries_reconfig_remove_node().
      
      dlpar_detach_node() and pSeries_reconfig_remove_node() both call
      of_detach_node(), and thus the node should not be released in both
      cases.
      
      Fixes: 0829f6d1 ("of: device_node kobject lifecycle fixes")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4fd1bd44
    • B
      powerpc/mm/radix: Workaround prefetch issue with KVM · a25bd72b
      Benjamin Herrenschmidt 提交于
      There's a somewhat architectural issue with Radix MMU and KVM.
      
      When coming out of a guest with AIL (Alternate Interrupt Location, ie,
      MMU enabled), we start executing hypervisor code with the PID register
      still containing whatever the guest has been using.
      
      The problem is that the CPU can (and will) then start prefetching or
      speculatively load from whatever host context has that same PID (if
      any), thus bringing translations for that context into the TLB, which
      Linux doesn't know about.
      
      This can cause stale translations and subsequent crashes.
      
      Fixing this in a way that is neither racy nor a huge performance
      impact is difficult. We could just make the host invalidations always
      use broadcast forms but that would hurt single threaded programs for
      example.
      
      We chose to fix it instead by partitioning the PID space between guest
      and host. This is possible because today Linux only use 19 out of the
      20 bits of PID space, so existing guests will work if we make the host
      use the top half of the 20 bits space.
      
      We additionally add support for a property to indicate to Linux the
      size of the PID register which will be useful if we eventually have
      processors with a larger PID space available.
      
      There is still an issue with malicious guests purposefully setting the
      PID register to a value in the hosts PID range. Hopefully future HW
      can prevent that, but in the meantime, we handle it with a pair of
      kludges:
      
       - On the way out of a guest, before we clear the current VCPU in the
         PACA, we check the PID and if it's outside of the permitted range
         we flush the TLB for that PID.
      
       - When context switching, if the mm is "new" on that CPU (the
         corresponding bit was set for the first time in the mm cpumask), we
         check if any sibling thread is in KVM (has a non-NULL VCPU pointer
         in the PACA). If that is the case, we also flush the PID for that
         CPU (core).
      
      This second part is needed to handle the case where a process is
      migrated (or starts a new pthread) on a sibling thread of the CPU
      coming out of KVM, as there's a window where stale translations can
      exist before we detect it and flush them out.
      
      A future optimization could be added by keeping track of whether the
      PID has ever been used and avoid doing that for completely fresh PIDs.
      We could similarily mark PIDs that have been the subject of a global
      invalidation as "fresh". But for now this will do.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
            unneeded include of kvm_book3s_asm.h]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a25bd72b
    • J
      parisc: Extend disabled preemption in copy_user_page · 56008c04
      John David Anglin 提交于
      It's always bothered me that we only disable preemption in
      copy_user_page around the call to flush_dcache_page_asm.
      This patch extends this to after the copy.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      56008c04
    • J
      parisc: Prevent TLB speculation on flushed pages on CPUs that only support equivalent aliases · ae7a609c
      John David Anglin 提交于
      Helge noticed that we flush the TLB page in flush_cache_page but not in
      flush_cache_range or flush_cache_mm.
      
      For a long time, we have had random segmentation faults building
      packages on machines with PA8800/8900 processors.  These machines only
      support equivalent aliases.  We don't see these faults on machines that
      don't require strict coherency.  So, it appears TLB speculation
      sometimes leads to cache corruption on machines that require coherency.
      
      This patch adds TLB flushes to flush_cache_range and flush_cache_mm when
      coherency is required.  We only flush the TLB in flush_cache_page when
      coherency is required.
      
      The patch also optimizes flush_cache_range.  It turns out we always have
      the right context to use flush_user_dcache_range_asm and
      flush_user_icache_range_asm.
      
      The patch has been tested for some time on rp3440, rp3410 and A500-44.
      It's been boot tested on c8000.  No random segmentation faults were
      observed during testing.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      ae7a609c
    • H
      parisc: Suspend lockup detectors before system halt · 56188832
      Helge Deller 提交于
      Some machines can't power off the machine, so disable the lockup detectors to
      avoid this watchdog BUG to show up every few seconds:
      watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd-shutdow:1]
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # 4.9+
      56188832
    • H
      parisc: Show DIMM slot number which holds broken memory module · c46bafc4
      Helge Deller 提交于
      The Page Deallocation Table (PDT) holds the physical addresses of all broken
      memory addresses. With the physical address we now are able to show which DIMM
      slot (e.g. 1a, 3c) actually holds the broken memory module so that users are
      able to replace it.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      c46bafc4
    • H
      parisc: Add function to return DIMM slot of physical address · 25a9b765
      Helge Deller 提交于
      Add a firmware wrapper function, which asks PDC firmware for the DIMM slot of a
      physical address. This is needed to show users which DIMM module needs
      replacement in case a broken DIMM was encountered.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      25a9b765
    • H
      parisc: Fix crash when calling PDC_PAT_MEM PDT firmware function · f520e552
      Helge Deller 提交于
      Commit c9c2877d ("parisc: Add Page Deallocation Table (PDT) support")
      introduced the pdc_pat_mem_read_pd_pdt() firmware helper function, which
      crashed the system because it trashed the stack if the
      pdc_pat_mem_read_pd_retinfo struct was located on the stack (and which is
      in size less than the required 32 64-bit values).
      
      Fix it by using the pdc_result struct instead when calling firmware and copy
      the return values back into the result struct when finished sucessfully.
      
      While debugging this code I noticed that the pdc_type wasn't set correctly
      either, so let's fix that too.
      
      Fixes: c9c2877d ("parisc: Add Page Deallocation Table (PDT) support")
      Signed-off-by: NHelge Deller <deller@gmx.de>
      f520e552
  4. 25 7月, 2017 2 次提交
    • C
      KVM: s390: take srcu lock when getting/setting storage keys · 4f899147
      Christian Borntraeger 提交于
      The following warning was triggered by missing srcu locks around
      the storage key handling functions.
      
      =============================
      WARNING: suspicious RCU usage
      4.12.0+ #56 Not tainted
      -----------------------------
      ./include/linux/kvm_host.h:572 suspicious rcu_dereference_check() usage!
      rcu_scheduler_active = 2, debug_locks = 1
       1 lock held by live_migration/4936:
        #0:  (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
      kvm_arch_vm_ioctl+0x6b8/0x22d0
      
       CPU: 8 PID: 4936 Comm: live_migration Not tainted 4.12.0+ #56
       Hardware name: IBM 2964 NC9 704 (LPAR)
       Call Trace:
       ([<000000000011378a>] show_stack+0xea/0xf0)
        [<000000000055cc4c>] dump_stack+0x94/0xd8
        [<000000000012ee70>] gfn_to_memslot+0x1a0/0x1b8
        [<0000000000130b76>] gfn_to_hva+0x2e/0x48
        [<0000000000141c3c>] kvm_arch_vm_ioctl+0x714/0x22d0
        [<000000000013306c>] kvm_vm_ioctl+0x11c/0x7b8
        [<000000000037e2c0>] do_vfs_ioctl+0xa8/0x6c8
        [<000000000037e984>] SyS_ioctl+0xa4/0xb8
        [<00000000008b20a4>] system_call+0xc4/0x27c
       1 lock held by live_migration/4936:
        #0:  (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
      kvm_arch_vm_ioctl+0x6b8/0x22d0
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
      4f899147
    • A
      arm64/lib: copy_page: use consistent prefetch stride · 288be97c
      Ard Biesheuvel 提交于
      The optional prefetch instructions in the copy_page() routine are
      inconsistent: at the start of the function, two cachelines are
      prefetched beyond the one being loaded in the first iteration, but
      in the loop, the prefetch is one more line ahead. This appears to
      be unintentional, so let's fix it.
      
      While at it, fix the comment style and white space.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      288be97c
  5. 24 7月, 2017 8 次提交
    • D
      ARM: 8687/1: signal: Fix unparseable iwmmxt_sigframe in uc_regspace[] · ce184a0d
      Dave Martin 提交于
      In kernels with CONFIG_IWMMXT=y running on non-iWMMXt hardware, the
      signal frame can be left partially uninitialised in such a way
      that userspace cannot parse uc_regspace[] safely.  In particular,
      this means that the VFP registers cannot be located reliably in the
      signal frame when a multi_v7_defconfig kernel is run on the
      majority of platforms.
      
      The cause is that the uc_regspace[] is laid out statically based on
      the kernel config, but the decision of whether to save/restore the
      iWMMXt registers must be a runtime decision.
      
      To minimise breakage of software that may assume a fixed layout,
      this patch emits a dummy block of the same size as iwmmxt_sigframe,
      for non-iWMMXt threads.  However, the magic and size of this block
      are now filled in to help parsers skip over it.  A new DUMMY_MAGIC
      is defined for this purpose.
      
      It is probably legitimate (if non-portable) for userspace to
      manufacture its own sigframe for sigreturn, and there is no obvious
      reason why userspace should be required to insert a DUMMY_MAGIC
      block when running on non-iWMMXt hardware, when omitting it has
      worked just fine forever in other configurations.  So in this case,
      sigreturn does not require this block to be present.
      Reported-by: NEdmund Grimley-Evans <Edmund.Grimley-Evans@arm.com>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      ce184a0d
    • D
      ARM: 8686/1: iwmmxt: Add missing __user annotations to sigframe accessors · 26958355
      Dave Martin 提交于
      preserve_iwmmxt_context() and restore_iwmmxt_context() lack __user
      accessors on their arguments pointing to the user signal frame.
      
      There does not be appear to be a bug here, but this omission is
      inconsistent with the crunch and vfp sigframe access functions.
      
      This patch adds the annotations, for consistency.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      26958355
    • P
      KVM: VMX: remove unused field · fa19871a
      Paolo Bonzini 提交于
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fa19871a
    • P
      KVM: PPC: Book3S HV: Fix host crash on changing HPT size · ef427198
      Paul Mackerras 提交于
      Commit f98a8bf9 ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB
      ioctl() to change HPT size", 2016-12-20) changed the behaviour of
      the KVM_PPC_ALLOCATE_HTAB ioctl so that it now allocates a new HPT
      and new revmap array if there was a previously-allocated HPT of a
      different size from the size being requested.  In this case, we need
      to reset the rmap arrays of the memslots, because the rmap arrays
      will contain references to HPTEs which are no longer valid.  Worse,
      these references are also references to slots in the new revmap
      array (which parallels the HPT), and the new revmap array contains
      random contents, since it doesn't get zeroed on allocation.
      
      The effect of having these stale references to slots in the revmap
      array that contain random contents is that subsequent calls to
      functions such as kvmppc_add_revmap_chain will crash because they
      will interpret the non-zero contents of the revmap array as HPTE
      indexes and thus index outside of the revmap array.  This leads to
      host crashes such as the following.
      
      [ 7072.862122] Unable to handle kernel paging request for data at address 0xd000000c250c00f8
      [ 7072.862218] Faulting instruction address: 0xc0000000000e1c78
      [ 7072.862233] Oops: Kernel access of bad area, sig: 11 [#1]
      [ 7072.862286] SMP NR_CPUS=1024
      [ 7072.862286] NUMA
      [ 7072.862325] PowerNV
      [ 7072.862378] Modules linked in: kvm_hv vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm iw_cxgb3 mlx5_ib ib_core ses enclosure scsi_transport_sas ipmi_powernv ipmi_devintf ipmi_msghandler powernv_op_panel i2c_opal nfsd auth_rpcgss oid_registry
      [ 7072.863085]  nfs_acl lockd grace sunrpc kvm_pr kvm xfs libcrc32c scsi_dh_alua dm_service_time radeon lpfc nvme_fc nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit tg3 drm_kms_helper ptp pps_core syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm dm_multipath i2c_core cxgb3 mlx5_core mdio [last unloaded: kvm_hv]
      [ 7072.863381] CPU: 72 PID: 56929 Comm: qemu-system-ppc Not tainted 4.12.0-kvm+ #59
      [ 7072.863457] task: c000000fe29e7600 task.stack: c000001e3ffec000
      [ 7072.863520] NIP: c0000000000e1c78 LR: c0000000000e2e3c CTR: c0000000000e25f0
      [ 7072.863596] REGS: c000001e3ffef560 TRAP: 0300   Not tainted  (4.12.0-kvm+)
      [ 7072.863658] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]>
      [ 7072.863667]   CR: 44082882  XER: 20000000
      [ 7072.863767] CFAR: c0000000000e2e38 DAR: d000000c250c00f8 DSISR: 42000000 SOFTE: 1
      GPR00: c0000000000e2e3c c000001e3ffef7e0 c000000001407d00 d000000c250c00f0
      GPR04: d00000006509fb70 d00000000b3d2048 0000000003ffdfb7 0000000000000000
      GPR08: 00000001007fdfb7 00000000c000000f d0000000250c0000 000000000070f7bf
      GPR12: 0000000000000008 c00000000fdad000 0000000010879478 00000000105a0d78
      GPR16: 00007ffaf4080000 0000000000001190 0000000000000000 0000000000010000
      GPR20: 4001ffffff000415 d00000006509fb70 0000000004091190 0000000ee1881190
      GPR24: 0000000003ffdfb7 0000000003ffdfb7 00000000007fdfb7 c000000f5c958000
      GPR28: d00000002d09fb70 0000000003ffdfb7 d00000006509fb70 d00000000b3d2048
      [ 7072.864439] NIP [c0000000000e1c78] kvmppc_add_revmap_chain+0x88/0x130
      [ 7072.864503] LR [c0000000000e2e3c] kvmppc_do_h_enter+0x84c/0x9e0
      [ 7072.864566] Call Trace:
      [ 7072.864594] [c000001e3ffef7e0] [c000001e3ffef830] 0xc000001e3ffef830 (unreliable)
      [ 7072.864671] [c000001e3ffef830] [c0000000000e2e3c] kvmppc_do_h_enter+0x84c/0x9e0
      [ 7072.864751] [c000001e3ffef920] [d00000000b38d878] kvmppc_map_vrma+0x168/0x200 [kvm_hv]
      [ 7072.864831] [c000001e3ffef9e0] [d00000000b38a684] kvmppc_vcpu_run_hv+0x1284/0x1300 [kvm_hv]
      [ 7072.864914] [c000001e3ffefb30] [d00000000f465664] kvmppc_vcpu_run+0x44/0x60 [kvm]
      [ 7072.865008] [c000001e3ffefb60] [d00000000f461864] kvm_arch_vcpu_ioctl_run+0x114/0x290 [kvm]
      [ 7072.865152] [c000001e3ffefbe0] [d00000000f453c98] kvm_vcpu_ioctl+0x598/0x7a0 [kvm]
      [ 7072.865292] [c000001e3ffefd40] [c000000000389328] do_vfs_ioctl+0xd8/0x8c0
      [ 7072.865410] [c000001e3ffefde0] [c000000000389be4] SyS_ioctl+0xd4/0x130
      [ 7072.865526] [c000001e3ffefe30] [c00000000000b760] system_call+0x58/0x6c
      [ 7072.865644] Instruction dump:
      [ 7072.865715] e95b2110 793a0020 7b4926e4 7f8a4a14 409e0098 807c000c 786326e4 7c6a1a14
      [ 7072.865857] 935e0008 7bbd0020 813c000c 913e000c <93a30008> 93bc000c 48000038 60000000
      [ 7072.866001] ---[ end trace 627b6e4bf8080edc ]---
      
      Note that to trigger this, it is necessary to use a recent upstream
      QEMU (or other userspace that resizes the HPT at CAS time), specify
      a maximum memory size substantially larger than the current memory
      size, and boot a guest kernel that does not support HPT resizing.
      
      This fixes the problem by resetting the rmap arrays when the old HPT
      is freed.
      
      Fixes: f98a8bf9 ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size")
      Cc: stable@vger.kernel.org # v4.11+
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ef427198
    • P
      KVM: PPC: Book3S HV: Enable TM before accessing TM registers · e4705715
      Paul Mackerras 提交于
      Commit 46a704f8 ("KVM: PPC: Book3S HV: Preserve userspace HTM state
      properly", 2017-06-15) added code to read transactional memory (TM)
      registers but forgot to enable TM before doing so.  The result is
      that if userspace does have live values in the TM registers, a KVM_RUN
      ioctl will cause a host kernel crash like this:
      
      [  181.328511] Unrecoverable TM Unavailable Exception f60 at d00000001e7d9980
      [  181.328605] Oops: Unrecoverable TM Unavailable Exception, sig: 6 [#1]
      [  181.328613] SMP NR_CPUS=2048
      [  181.328613] NUMA
      [  181.328618] PowerNV
      [  181.328646] Modules linked in: vhost_net vhost tap nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 dns_resolver nfs
      +fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
      +nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables
      +ip6table_filter ip6_tables iptable_filter bridge stp llc kvm_hv kvm nfsd ses enclosure scsi_transport_sas ghash_generic
      +auth_rpcgss gf128mul xts sg ctr nfs_acl lockd vmx_crypto shpchp ipmi_powernv i2c_opal grace ipmi_devintf i2c_core
      +powernv_rng sunrpc ipmi_msghandler ibmpowernv uio_pdrv_genirq uio leds_powernv powernv_op_panel ip_tables xfs sd_mod
      +lpfc ipr bnx2x libata mdio ptp pps_core scsi_transport_fc libcrc32c dm_mirror dm_region_hash dm_log dm_mod
      [  181.329278] CPU: 40 PID: 9926 Comm: CPU 0/KVM Not tainted 4.12.0+ #1
      [  181.329337] task: c000003fc6980000 task.stack: c000003fe4d80000
      [  181.329396] NIP: d00000001e7d9980 LR: d00000001e77381c CTR: d00000001e7d98f0
      [  181.329465] REGS: c000003fe4d837e0 TRAP: 0f60   Not tainted  (4.12.0+)
      [  181.329523] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
      [  181.329527]   CR: 24022448  XER: 00000000
      [  181.329608] CFAR: d00000001e773818 SOFTE: 1
      [  181.329608] GPR00: d00000001e77381c c000003fe4d83a60 d00000001e7ef410 c000003fdcfe0000
      [  181.329608] GPR04: c000003fe4f00000 0000000000000000 0000000000000000 c000003fd7954800
      [  181.329608] GPR08: 0000000000000001 c000003fc6980000 0000000000000000 d00000001e7e2880
      [  181.329608] GPR12: d00000001e7d98f0 c000000007b19000 00000001295220e0 00007fffc0ce2090
      [  181.329608] GPR16: 0000010011886608 00007fff8c89f260 0000000000000001 00007fff8c080028
      [  181.329608] GPR20: 0000000000000000 00000100118500a6 0000010011850000 0000010011850000
      [  181.329608] GPR24: 00007fffc0ce1b48 0000010011850000 00000000d673b901 0000000000000000
      [  181.329608] GPR28: 0000000000000000 c000003fdcfe0000 c000003fdcfe0000 c000003fe4f00000
      [  181.330199] NIP [d00000001e7d9980] kvmppc_vcpu_run_hv+0x90/0x6b0 [kvm_hv]
      [  181.330264] LR [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm]
      [  181.330322] Call Trace:
      [  181.330351] [c000003fe4d83a60] [d00000001e773478] kvmppc_set_one_reg+0x48/0x340 [kvm] (unreliable)
      [  181.330437] [c000003fe4d83b30] [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm]
      [  181.330513] [c000003fe4d83b50] [d00000001e7700b4] kvm_arch_vcpu_ioctl_run+0x114/0x2a0 [kvm]
      [  181.330586] [c000003fe4d83bd0] [d00000001e7642f8] kvm_vcpu_ioctl+0x598/0x7a0 [kvm]
      [  181.330658] [c000003fe4d83d40] [c0000000003451b8] do_vfs_ioctl+0xc8/0x8b0
      [  181.330717] [c000003fe4d83de0] [c000000000345a64] SyS_ioctl+0xc4/0x120
      [  181.330776] [c000003fe4d83e30] [c00000000000b004] system_call+0x58/0x6c
      [  181.330833] Instruction dump:
      [  181.330869] e92d0260 e9290b50 e9290108 792807e3 41820058 e92d0260 e9290b50 e9290108
      [  181.330941] 792ae8a4 794a1f87 408204f4 e92d0260 <7d4022a6> f9490ff0 e92d0260 7d4122a6
      [  181.331013] ---[ end trace 6f6ddeb4bfe92a92 ]---
      
      The fix is just to turn on the TM bit in the MSR before accessing the
      registers.
      
      Cc: stable@vger.kernel.org # v3.14+
      Fixes: 46a704f8 ("KVM: PPC: Book3S HV: Preserve userspace HTM state properly")
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Tested-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e4705715
    • H
      parisc: regenerate defconfig files · 108ea187
      Helge Deller 提交于
      Signed-off-by: NHelge Deller <deller@gmx.de>
      108ea187
    • H
      parisc: Merge millicode routines via linker script · 6cd819e8
      Helge Deller 提交于
      When compiling the 4.13-rc kernel I got those linker errors:
      libgcc2.c:(.text+0x110): relocation truncated to fit: R_PARISC_PCREL22F against symbol `$$divU'
      	defined in .text.div section in /usr/lib/gcc/hppa64-linux-gnu/4.9.2/libgcc.a(_divU.o)
      hppa64-linux-gnu-ld: /usr/lib/gcc/hppa64-linux-gnu/4.9.2/libgcc.a(_moddi3.o)(.text+0x174): cannot reach $$divU
      
      Avoid such errors by bundling the millicode routines in the linker script.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      6cd819e8
    • H
      parisc: Disable further stack checks when panic occurs during stack check · 5bc64bd2
      Helge Deller 提交于
      Before the irq handler detects a low stack and then panics the kernel, disable
      further stack checks to avoid recursive panics.
      Reported-by: NJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      5bc64bd2
  6. 23 7月, 2017 2 次提交
  7. 21 7月, 2017 5 次提交
    • R
      x86/devicetree: Convert to using %pOF instead of ->full_name · db15e7f2
      Rob Herring 提交于
      Now that we have a custom printf format specifier, convert users of
      full_name to use %pOF instead. This is preparation to remove storing
      of the full path string for each device node.
      Signed-off-by: NRob Herring <robh@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: devicetree@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170718214339.7774-7-robh@kernel.org
      [ Clarify the error message while at it, as 'node' is ambiguous. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      db15e7f2
    • J
      perf/x86/intel: Add proper condition to run sched_task callbacks · df6c3db8
      Jiri Olsa 提交于
      We have 2 functions using the same sched_task callback:
      
        - PEBS drain for free running counters
        - LBR save/store
      
      Both of them are called from intel_pmu_sched_task() and
      either of them can be unwillingly triggered when the
      other one is configured to run.
      
      Let's say there's PEBS drain configured in sched_task
      callback for the event, but in the callback itself
      (intel_pmu_sched_task()) we will also run the code for
      LBR save/restore, which we did not ask for, but the
      code in intel_pmu_sched_task() does not check for that.
      
      This can lead to extra cycles in some perf monitoring,
      like when we monitor PEBS event without LBR data.
      
        # perf record --no-timestamp -c 10000 -e cycles:p ./perf bench sched pipe -l 1000000
      
        (We need PEBS, non freq/non timestamp event to enable
         the sched_task callback)
      
      The perf stat of cycles and msr:write_msr for above
      command before the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,519,557,441      cycles:k
              91,195,527      msr:write_msr
      
            29.334476406 seconds time elapsed
      
      And after the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,704,973,540      cycles:k
              27,184,720      msr:write_msr
      
            16.977875900 seconds time elapsed
      
      There's no affect on cycles:k because the sched_task happens
      with events switched off, however the msr:write_msr tracepoint
      counter together with almost 50% of time speedup show the
      improvement.
      
      Monitoring LBR event and having extra PEBS drain processing
      in sched_task callback showed just a little speedup, because
      the drain function does not do much extra work in case there
      is no PEBS data.
      
      Adding conditions to recognize the configured work that needs
      to be done in the x86_pmu's sched_task callback.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170719075247.GA27506@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      df6c3db8
    • A
      x86/platform/uv/BAU: Disable BAU on single hub configurations · 2fe9a5c6
      Andrew Banman 提交于
      The BAU confers no benefit to a UV system running with only one hub/socket.
      Permanently disable the BAU driver if there are less than two hubs online
      to avoid BAU overhead. We have observed failed boots on single-socket UV4
      systems caused by BAU that are avoided with this patch.
      
      Also, while at it, consolidate initialization error blocks and fix a
      memory leak.
      Signed-off-by: NAndrew Banman <abanman@hpe.com>
      Acked-by: NRuss Anderson <rja@hpe.com>
      Acked-by: NMike Travis <mike.travis@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: tony.ernst@hpe.com
      Link: http://lkml.kernel.org/r/1500588351-78016-1-git-send-email-abanman@hpe.com
      [ Minor cleanups. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2fe9a5c6
    • L
      x86: mark kprobe templates as character arrays, not single characters · 54a7d50b
      Linus Torvalds 提交于
      They really are, and the "take the address of a single character" makes
      the string fortification code unhappy (it believes that you can now only
      acccess one byte, rather than a byte range, and then raises errors for
      the memory copies going on in there).
      
      We could now remove a few 'addressof' operators (since arrays naturally
      degrade to pointers), but this is the minimal patch that just changes
      the C prototypes of those template arrays (the templates themselves are
      defined in inline asm).
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Acked-and-tested-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54a7d50b
    • P
      arm64/numa: Drop duplicate message · ece4b206
      Punit Agrawal 提交于
      When booting linux on a system without CONFIG_NUMA enabled, the
      following messages are printed during boot -
      
      NUMA: Faking a node at [mem 0x0000000000000000-0x00000083ffffffff]
      NUMA: Adding memblock [0x8000000000 - 0x8000e7ffff] on node 0
      NUMA: Adding memblock [0x8000e80000 - 0x83f65cffff] on node 0
      NUMA: Adding memblock [0x83f65d0000 - 0x83f665ffff] on node 0
      NUMA: Adding memblock [0x83f6660000 - 0x83f676ffff] on node 0
      NUMA: Adding memblock [0x83f6770000 - 0x83f678ffff] on node 0
      NUMA: Adding memblock [0x83f6790000 - 0x83fb82ffff] on node 0
      NUMA: Adding memblock [0x83fb830000 - 0x83fbc0ffff] on node 0
      NUMA: Adding memblock [0x83fbc10000 - 0x83fbdfffff] on node 0
      NUMA: Adding memblock [0x83fbe00000 - 0x83fbffffff] on node 0
      NUMA: Adding memblock [0x83fc000000 - 0x83fffbffff] on node 0
      NUMA: Adding memblock [0x83fffc0000 - 0x83fffdffff] on node 0
      NUMA: Adding memblock [0x83fffe0000 - 0x83ffffffff] on node 0
      NUMA: Initmem setup node 0 [mem 0x8000000000-0x83ffffffff]
      NUMA: NODE_DATA [mem 0x83fffec500-0x83fffedfff]
      
      The information is then duplicated by core kernel messages right after
      the above output.
      
      Early memory node ranges
        node   0: [mem 0x0000008000000000-0x0000008000e7ffff]
        node   0: [mem 0x0000008000e80000-0x00000083f65cffff]
        node   0: [mem 0x00000083f65d0000-0x00000083f665ffff]
        node   0: [mem 0x00000083f6660000-0x00000083f676ffff]
        node   0: [mem 0x00000083f6770000-0x00000083f678ffff]
        node   0: [mem 0x00000083f6790000-0x00000083fb82ffff]
        node   0: [mem 0x00000083fb830000-0x00000083fbc0ffff]
        node   0: [mem 0x00000083fbc10000-0x00000083fbdfffff]
        node   0: [mem 0x00000083fbe00000-0x00000083fbffffff]
        node   0: [mem 0x00000083fc000000-0x00000083fffbffff]
        node   0: [mem 0x00000083fffc0000-0x00000083fffdffff]
        node   0: [mem 0x00000083fffe0000-0x00000083ffffffff]
      Initmem setup node 0 [mem 0x0000008000000000-0x00000083ffffffff]
      
      Remove the duplication of memblock layout information printed during
      boot by dropping the messages from arm64 numa initialisation.
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      ece4b206
  8. 20 7月, 2017 5 次提交
    • R
      kvm: x86: hyperv: avoid livelock in oneshot SynIC timers · f1ff89ec
      Roman Kagan 提交于
      If the SynIC timer message delivery fails due to SINT message slot being
      busy, there's no point to attempt starting the timer again until we're
      notified of the slot being released by the guest (via EOM or EOI).
      
      Even worse, when a oneshot timer fails to deliver its message, its
      re-arming with an expiration time in the past leads to immediate retry
      of the delivery, and so on, without ever letting the guest vcpu to run
      and release the slot, which results in a livelock.
      
      To avoid that, only start the timer when there's no timer message
      pending delivery.  When there is, meaning the slot is busy, the
      processing will be restarted upon notification from the guest that the
      slot is released.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f1ff89ec
    • W
      KVM: VMX: Fix invalid guest state detection after task-switch emulation · f244deed
      Wanpeng Li 提交于
      This can be reproduced by EPT=1, unrestricted_guest=N, emulate_invalid_state=Y
      or EPT=0, the trace of kvm-unit-tests/taskswitch2.flat is like below, it tries
      to emulate invalid guest state task-switch:
      
      kvm_exit: reason TASK_SWITCH rip 0x0 info 40000058 0
      kvm_emulate_insn: 42000:0:0f 0b (0x2)
      kvm_emulate_insn: 42000:0:0f 0b (0x2) failed
      kvm_inj_exception: #UD (0x0)
      kvm_entry: vcpu 0
      kvm_exit: reason TASK_SWITCH rip 0x0 info 40000058 0
      kvm_emulate_insn: 42000:0:0f 0b (0x2)
      kvm_emulate_insn: 42000:0:0f 0b (0x2) failed
      kvm_inj_exception: #UD (0x0)
      ......................
      
      It appears that the task-switch emulation updates rflags (and vm86
      flag) only after the segments are loaded, causing vmx->emulation_required
      to be set, when in fact invalid guest state emulation is not needed.
      
      This patch fixes it by updating vmx->emulation_required after the
      rflags (and vm86 flag) is updated in task-switch emulation.
      
      Thanks Radim for moving the update to vmx__set_flags and adding Paolo's
      suggestion for the check.
      Suggested-by: NNadav Amit <nadav.amit@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f244deed
    • V
      ARM: NOMMU: Wire-up default DMA interface · 878ec367
      Vladimir Murzin 提交于
      The way how default DMA pool is exposed has changed and now we need to
      use dedicated interface to work with it. This patch makes alloc/release
      operations to use such interface. Since, default DMA pool is not
      handled by generic code anymore we have to implement our own mmap
      operation.
      Tested-by: NAndras Szemzo <sza@esh.hu>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      878ec367
    • V
      dma-coherent: introduce interface for default DMA pool · 43fc509c
      Vladimir Murzin 提交于
      Christoph noticed [1] that default DMA pool in current form overload
      the DMA coherent infrastructure. In reply, Robin suggested [2] to
      split the per-device vs. global pool interfaces, so allocation/release
      from default DMA pool is driven by dma ops implementation.
      
      This patch implements Robin's idea and provide interface to
      allocate/release/mmap the default (aka global) DMA pool.
      
      To make it clear that existing *_from_coherent routines work on
      per-device pool rename them to *_from_dev_coherent.
      
      [1] https://lkml.org/lkml/2017/7/7/370
      [2] https://lkml.org/lkml/2017/7/7/431
      
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Suggested-by: NRobin Murphy <robin.murphy@arm.com>
      Tested-by: NAndras Szemzo <sza@esh.hu>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      43fc509c
    • R
      ARM: kexec: fix failure to boot crash kernel · 0d70262a
      Russell King 提交于
      When kexec was converted to DTB, the dtb address was passed between
      machine_kexec_prepare() and machine_kexec() using a static variable.
      This is bad news if you load a crash kernel followed by a normal
      kernel or vice versa - the last loaded kernel overwrites the dtb
      address.
      
      This can result in kexec failures, as (eg) we try to boot the crash
      kernel with the last loaded dtb.  For example, with:
      
      the crash kernel fails to find the dtb.
      
      Avoid this by defining a kimage architecture structure, and store
      the address to be passed in r2 there, which will either be the ATAGs
      or the dtb blob.
      
      Fixes: 4cabd1d9 ("ARM: 7539/1: kexec: scan for dtb magic in segments")
      Fixes: 42d720d1 ("ARM: kexec: Make .text R/W in machine_kexec")
      Reported-by: NKeerthy <j-keerthy@ti.com>
      Tested-by: NKeerthy <j-keerthy@ti.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      0d70262a