1. 18 3月, 2020 36 次提交
    • O
      KVM: nVMX: Refactor IO bitmap checks into helper function · f26796b0
      Oliver Upton 提交于
      commit e71237d3ff1abf9f3388337cfebf53b96df2020d upstream.
      
      [ Fixes: CVE-2020-2732 ]
      
      Checks against the IO bitmap are useful for both instruction emulation
      and VM-exit reflection. Refactor the IO bitmap checks into a helper
      function.
      Signed-off-by: NOliver Upton <oupton@google.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      f26796b0
    • P
      KVM: nVMX: Don't emulate instructions in guest mode · c2868767
      Paolo Bonzini 提交于
      commit 07721feee46b4b248402133228235318199b05ec upstream.
      
      [ Fixes: CVE-2020-2732 ]
      
      vmx_check_intercept is not yet fully implemented. To avoid emulating
      instructions disallowed by the L1 hypervisor, refuse to emulate
      instructions by default.
      
      Cc: stable@vger.kernel.org
      [Made commit, added commit msg - Oliver]
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      c2868767
    • M
      kvm: x86: add host poll control msrs · 1e53713f
      Marcelo Tosatti 提交于
      commit 2d5ba19bdfef4dd06add144eb04287ee98409f75 upstream
      
      Add an MSRs which allows the guest to disable
      host polling (specifically the cpuidle-haltpoll,
      when performing polling in the guest, disables
      host side polling).
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
      1e53713f
    • M
      KVM: arm64: Opportunistically turn off WFI trapping when using direct LPI injection · ff9b66b5
      Marc Zyngier 提交于
      commit ef2e78ddadbb939ce79553b10dee0131d65d8f3e upstream.
      
      Just like we do for WFE trapping, it can be useful to turn off
      WFI trapping when the physical CPU is not oversubscribed (that
      is, the vcpu is the only runnable process on this CPU) *and*
      that we're using direct injection of interrupts.
      
      The conditions are reevaluated on each vcpu_load(), ensuring that
      we don't switch to this mode on a busy system.
      
      On a GICv4 system, this has the effect of reducing the generation
      of doorbell interrupts to zero when the right conditions are
      met, which is a huge improvement over the current situation
      (where the doorbells are screaming if the CPU ever hits a
      blocking WFI).
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Link: https://lore.kernel.org/r/20191107160412.30301-3-maz@kernel.orgSigned-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      ff9b66b5
    • M
      mm: introduce MADV_PAGEOUT · 23757dcc
      Minchan Kim 提交于
      commit 1a4e58cce84ee88129d5d49c064bd2852b481357 upstream
      
      When a process expects no accesses to a certain memory range for a long
      time, it could hint kernel that the pages can be reclaimed instantly but
      data should be preserved for future use.  This could reduce workingset
      eviction so it ends up increasing performance.
      
      This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall.
      MADV_PAGEOUT can be used by a process to mark a memory range as not
      expected to be used for a long time so that kernel reclaims *any LRU*
      pages instantly.  The hint can help kernel in deciding which pages to
      evict proactively.
      
      A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit
      intentionally because it's automatically bounded by PMD size.  If PMD
      size(e.g., 256) makes some trouble, we could fix it later by limit it to
      SWAP_CLUSTER_MAX[1].
      
      - man-page material
      
      MADV_PAGEOUT (since Linux x.x)
      
      Do not expect access in the near future so pages in the specified
      regions could be reclaimed instantly regardless of memory pressure.
      Thus, access in the range after successful operation could cause
      major page fault but never lose the up-to-date contents unlike
      MADV_DONTNEED. Pages belonging to a shared mapping are only processed
      if a write access is allowed for the calling process.
      
      MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or
      VM_PFNMAP pages.
      
      [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/
      
      [minchan@kernel.org: clear PG_active on MADV_PAGEOUT]
        Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com
      [akpm@linux-foundation.org: resolve conflicts with hmm.git]
      Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleksandr Natalenko <oleksandr@redhat.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
      23757dcc
    • M
      mm: introduce MADV_COLD · 1af766e8
      Minchan Kim 提交于
      commit 9c276cc65a58faf98be8e56962745ec99ab87636 upstream
      
      Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7.
      
      - Background
      
      The Android terminology used for forking a new process and starting an app
      from scratch is a cold start, while resuming an existing app is a hot
      start.  While we continually try to improve the performance of cold
      starts, hot starts will always be significantly less power hungry as well
      as faster so we are trying to make hot start more likely than cold start.
      
      To increase hot start, Android userspace manages the order that apps
      should be killed in a process called ActivityManagerService.
      ActivityManagerService tracks every Android app or service that the user
      could be interacting with at any time and translates that into a ranked
      list for lmkd(low memory killer daemon).  They are likely to be killed by
      lmkd if the system has to reclaim memory.  In that sense they are similar
      to entries in any other cache.  Those apps are kept alive for
      opportunistic performance improvements but those performance improvements
      will vary based on the memory requirements of individual workloads.
      
      - Problem
      
      Naturally, cached apps were dominant consumers of memory on the system.
      However, they were not significant consumers of swap even though they are
      good candidate for swap.  Under investigation, swapping out only begins
      once the low zone watermark is hit and kswapd wakes up, but the overall
      allocation rate in the system might trip lmkd thresholds and cause a
      cached process to be killed(we measured performance swapping out vs.
      zapping the memory by killing a process.  Unsurprisingly, zapping is 10x
      times faster even though we use zram which is much faster than real
      storage) so kill from lmkd will often satisfy the high zone watermark,
      resulting in very few pages actually being moved to swap.
      
      - Approach
      
      The approach we chose was to use a new interface to allow userspace to
      proactively reclaim entire processes by leveraging platform information.
      This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
      that are known to be cold from userspace and to avoid races with lmkd by
      reclaiming apps as soon as they entered the cached state.  Additionally,
      it could provide many chances for platform to use much information to
      optimize memory efficiency.
      
      To achieve the goal, the patchset introduce two new options for madvise.
      One is MADV_COLD which will deactivate activated pages and the other is
      MADV_PAGEOUT which will reclaim private pages instantly.  These new
      options complement MADV_DONTNEED and MADV_FREE by adding non-destructive
      ways to gain some free memory space.  MADV_PAGEOUT is similar to
      MADV_DONTNEED in a way that it hints the kernel that memory region is not
      currently needed and should be reclaimed immediately; MADV_COLD is similar
      to MADV_FREE in a way that it hints the kernel that memory region is not
      currently needed and should be reclaimed when memory pressure rises.
      
      This patch (of 5):
      
      When a process expects no accesses to a certain memory range, it could
      give a hint to kernel that the pages can be reclaimed when memory pressure
      happens but data should be preserved for future use.  This could reduce
      workingset eviction so it ends up increasing performance.
      
      This patch introduces the new MADV_COLD hint to madvise(2) syscall.
      MADV_COLD can be used by a process to mark a memory range as not expected
      to be used in the near future.  The hint can help kernel in deciding which
      pages to evict early during memory pressure.
      
      It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves
      
      	active file page -> inactive file LRU
      	active anon page -> inacdtive anon LRU
      
      Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file
      LRU's head because MADV_COLD is a little bit different symantic.
      MADV_FREE means it's okay to discard when the memory pressure because the
      content of the page is *garbage* so freeing such pages is almost zero
      overhead since we don't need to swap out and access afterward causes just
      minor fault.  Thus, it would make sense to put those freeable pages in
      inactive file LRU to compete other used-once pages.  It makes sense for
      implmentaion point of view, too because it's not swapbacked memory any
      longer until it would be re-dirtied.  Even, it could give a bonus to make
      them be reclaimed on swapless system.  However, MADV_COLD doesn't mean
      garbage so reclaiming them requires swap-out/in in the end so it's bigger
      cost.  Since we have designed VM LRU aging based on cost-model, anonymous
      cold pages would be better to position inactive anon's LRU list, not file
      LRU.  Furthermore, it would help to avoid unnecessary scanning if system
      doesn't have a swap device.  Let's start simpler way without adding
      complexity at this moment.  However, keep in mind, too that it's a caveat
      that workloads with a lot of pages cache are likely to ignore MADV_COLD on
      anonymous memory because we rarely age anonymous LRU lists.
      
      * man-page material
      
      MADV_COLD (since Linux x.x)
      
      Pages in the specified regions will be treated as less-recently-accessed
      compared to pages in the system with similar access frequencies.  In
      contrast to MADV_FREE, the contents of the region are preserved regardless
      of subsequent writes to pages.
      
      MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP
      pages.
      
      [akpm@linux-foundation.org: resolve conflicts with hmm.git]
      Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleksandr Natalenko <oleksandr@redhat.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
      1af766e8
    • Z
      alinux: arm64: use __kernel_text_address to replace kthread_return_to_user · 64259ab4
      Zou Cao 提交于
      We don't need to use kthread_return_to_user to tell unwind it is kernel
      thread, we can use __kernel_text_address, it is a normal way in other
      arch like x86/ppc.
      Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      64259ab4
    • T
      arm64: reliable stacktraces · 46ad7da7
      Torsten Duwe 提交于
      cherry-picked from: https://patchwork.kernel.org/patch/10657429/
      
      Enhance the stack unwinder so that it reports whether it had to stop
      normally or due to an error condition; unwind_frame() will report
      continue/error/normal ending and walk_stackframe() will pass that
      info. __save_stack_trace() is used to check the validity of a stack;
      save_stack_trace_tsk_reliable() can now trivially be implemented.
      Modify arch/arm64/kernel/time.c as the only external caller so far
      to recognise the new semantics.
      
      I had to introduce a marker symbol kthread_return_to_user to tell
      the normal origin of a kernel thread.
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      46ad7da7
    • Z
      alinux: arm64: add livepatch support · 7d9b185c
      Zou Cao 提交于
      Now we support FTRACE_WITH_REGS with -fpatchable-function-entry, here
      enable the livepatch support depend on FTRACE_WITH_REGS.
      
      Use task flag bit 6 to track patch transisiton state for the consistency
      model. Add it to the work mask so it gets cleared on all kernel exits to
      userland.
      
      Tell livepatch regs->pc + 2*AARCH64_INSN_SIZE is the place to change the
      return address.
      
      these codes have a big change against reference link, beacause we use new
      gcc featrue.
      
      References:
      https://patchwork.kernel.org/patch/10657431/
      
      Based-on-code-from: Torsten Duwe <duwe@suse.de>
      Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      7d9b185c
    • M
      add cpuidle-haltpoll driver · 6d2ef95f
      Marcelo Tosatti 提交于
      commit fa86ee90eb1111267de67cb4272b5ce711f18cbb upstream
      
      Add a cpuidle driver that calls the architecture default_idle routine.
      
      To be used in conjunction with the haltpoll governor.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
      6d2ef95f
    • P
      x86/amd_nb: Make hygon_nb_misc_ids static · 4f36cca7
      Pu Wen 提交于
      commit 025e32048f39e24d8ddf9369d679644ea2bdcce6 upstream.
      
      Fix the following sparse warning:
      
        arch/x86/kernel/amd_nb.c:74:28: warning:
          symbol 'hygon_nb_misc_ids' was not declared. Should it be static?
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Brian Woods <Brian.Woods@amd.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190614155441.22076-1-yuehaibing@huawei.comSigned-off-by: NPu Wen <puwen@hygon.cn>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      4f36cca7
    • D
      mm/memory_hotplug: make remove_memory() take the device_hotplug_lock · d2097173
      David Hildenbrand 提交于
      commit d15e59260f62bd5e0f625cf5f5240f6ffac78ab6 upstream
      
      Patch series "mm: online/offline_pages called w.o. mem_hotplug_lock", v3.
      
      Reading through the code and studying how mem_hotplug_lock is to be used,
      I noticed that there are two places where we can end up calling
      device_online()/device_offline() - online_pages()/offline_pages() without
      the mem_hotplug_lock.  And there are other places where we call
      device_online()/device_offline() without the device_hotplug_lock.
      
      While e.g.
      	echo "online" > /sys/devices/system/memory/memory9/state
      is fine, e.g.
      	echo 1 > /sys/devices/system/memory/memory9/online
      Will not take the mem_hotplug_lock. However the device_lock() and
      device_hotplug_lock.
      
      E.g.  via memory_probe_store(), we can end up calling
      add_memory()->online_pages() without the device_hotplug_lock.  So we can
      have concurrent callers in online_pages().  We e.g.  touch in
      online_pages() basically unprotected zone->present_pages then.
      
      Looks like there is a longer history to that (see Patch #2 for details),
      and fixing it to work the way it was intended is not really possible.  We
      would e.g.  have to take the mem_hotplug_lock in device/base/core.c, which
      sounds wrong.
      
      Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
      More details can be found in patch 3 and patch 6.
      
      I propose the general rules (documentation added in patch 6):
      
      1. add_memory/add_memory_resource() must only be called with
         device_hotplug_lock.
      2. remove_memory() must only be called with device_hotplug_lock. This is
         already documented and holds for all callers.
      3. device_online()/device_offline() must only be called with
         device_hotplug_lock. This is already documented and true for now in core
         code. Other callers (related to memory hotplug) have to be fixed up.
      4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
         online_pages/offline_pages.
      
      To me, this looks way cleaner than what we have right now (and easier to
      verify).  And looking at the documentation of remove_memory, using
      lock_device_hotplug also for add_memory() feels natural.
      
      This patch (of 6):
      
      remove_memory() is exported right now but requires the
      device_hotplug_lock, which is not exported.  So let's provide a variant
      that takes the lock and only export that one.
      
      The lock is already held in
      	arch/powerpc/platforms/pseries/hotplug-memory.c
      	drivers/acpi/acpi_memhotplug.c
      	arch/powerpc/platforms/powernv/memtrace.c
      
      Apart from that, there are not other users in the tree.
      
      Link: http://lkml.kernel.org/r/20180925091457.28651-2-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Rashmica Gupta <rashmica.g@gmail.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: John Allen <jallen@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: Nyinhe <yinhe@linux.alibaba.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      d2097173
    • A
      mm: use mm_zero_struct_page from SPARC on all 64b architectures · e23b0cb5
      Alexander Duyck 提交于
      commit 5470dea49f5382257c242ac617d908267727f1a8 upstream.
      
      Patch series "Deferred page init improvements", v7.
      
      This patchset is essentially a refactor of the page initialization logic
      that is meant to provide for better code reuse while providing a
      significant improvement in deferred page initialization performance.
      
      In my testing on an x86_64 system with 384GB of RAM I have seen the
      following.  In the case of regular memory initialization the deferred init
      time was decreased from 3.75s to 1.38s on average.  This amounts to a 172%
      improvement for the deferred memory initialization performance.
      
      I have called out the improvement observed with each patch.
      
      This patch (of 4):
      
      Use the same approach that was already in use on Sparc on all the
      architectures that support a 64b long.
      
      This is mostly motivated by the fact that 7 to 10 store/move instructions
      are likely always going to be faster than having to call into a function
      that is not specialized for handling page init.
      
      An added advantage to doing it this way is that the compiler can get away
      with combining writes in the __init_single_page call.  As a result the
      memset call will be reduced to only about 4 write operations, or at least
      that is what I am seeing with GCC 6.2 as the flags, LRU pointers, and
      count/mapcount seem to be cancelling out at least 4 of the 8 assignments
      on my system.
      
      One change I had to make to the function was to reduce the minimum page
      size to 56 to support some powerpc64 configurations.
      
      This change should introduce no change on SPARC since it already had this
      code.  In the case of x86_64 I saw a reduction from 3.75s to 2.80s when
      initializing 384GB of RAM per node.  Pavel Tatashin tested on a system
      with Broadcom's Stingray CPU and 48GB of RAM and found that
      __init_single_page() takes 19.30ns / 64-byte struct page before this patch
      and with this patch it takes 17.33ns / 64-byte struct page.  Mike Rapoport
      ran a similar test on a OpenPower (S812LC 8348-21C) with Power8 processor
      and 128GB or RAM.  His results per 64-byte struct page were 4.68ns before,
      and 4.59ns after this patch.
      
      Link: http://lkml.kernel.org/r/20190405221213.12227.9392.stgit@localhost.localdomainSigned-off-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <yi.z.zhang@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      e23b0cb5
    • Q
      mm/memblock.c: skip kmemleak for kasan_init() · 9f093569
      Qian Cai 提交于
      commit fed84c78527009d4f799a3ed9a566502fa026d82 upstream.
      
      Kmemleak does not play well with KASAN (tested on both HPE Apollo 70 and
      Huawei TaiShan 2280 aarch64 servers).
      
      After calling start_kernel()->setup_arch()->kasan_init(), kmemleak early
      log buffer went from something like 280 to 260000 which caused kmemleak
      disabled and crash dump memory reservation failed.  The multitude of
      kmemleak_alloc() calls is from nested loops while KASAN is setting up full
      memory mappings, so let early kmemleak allocations skip those
      memblock_alloc_internal() calls came from kasan_init() given that those
      early KASAN memory mappings should not reference to other memory.  Hence,
      no kmemleak false positives.
      
      kasan_init
        kasan_map_populate [1]
          kasan_pgd_populate [2]
            kasan_pud_populate [3]
              kasan_pmd_populate [4]
                kasan_pte_populate [5]
                  kasan_alloc_zeroed_page
                    memblock_alloc_try_nid
                      memblock_alloc_internal
                        kmemleak_alloc
      
      [1] for_each_memblock(memory, reg)
      [2] while (pgdp++, addr = next, addr != end)
      [3] while (pudp++, addr = next, addr != end && pud_none(READ_ONCE(*pudp)))
      [4] while (pmdp++, addr = next, addr != end && pmd_none(READ_ONCE(*pmdp)))
      [5] while (ptep++, addr = next, addr != end && pte_none(READ_ONCE(*ptep)))
      
      Link: http://lkml.kernel.org/r/1543442925-17794-1-git-send-email-cai@gmx.usSigned-off-by: NQian Cai <cai@gmx.us>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      9f093569
    • J
      arm64: mm: add missing PTE_SPECIAL in pte_mkdevmap on arm64 · f72a099b
      Jia He 提交于
      commit 30e235389faadb9e3d918887b1f126155d7d761d upstream.
      
      Without this patch, the MAP_SYNC test case will cause a print_bad_pte
      warning on arm64 as follows:
      
      [   25.542693] BUG: Bad page map in process mapdax333 pte:2e8000448800f53 pmd:41ff5f003
      [   25.546360] page:ffff7e0010220000 refcount:1 mapcount:-1 mapping:ffff8003e29c7440 index:0x0
      [   25.550281] ext4_dax_aops
      [   25.550282] name:"__aaabbbcccddd__"
      [   25.551553] flags: 0x3ffff0000001002(referenced|reserved)
      [   25.555802] raw: 03ffff0000001002 ffff8003dfffa908 0000000000000000 ffff8003e29c7440
      [   25.559446] raw: 0000000000000000 0000000000000000 00000001fffffffe 0000000000000000
      [   25.563075] page dumped because: bad pte
      [   25.564938] addr:0000ffffbe05b000 vm_flags:208000fb anon_vma:0000000000000000 mapping:ffff8003e29c7440 index:0
      [   25.574272] file:__aaabbbcccddd__ fault:ext4_dax_fault mmmmap:ext4_file_mmap readpage:0x0
      [   25.578799] CPU: 1 PID: 1180 Comm: mapdax333 Not tainted 5.2.0+ #21
      [   25.581702] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      [   25.585624] Call trace:
      [   25.587008]  dump_backtrace+0x0/0x178
      [   25.588799]  show_stack+0x24/0x30
      [   25.590328]  dump_stack+0xa8/0xcc
      [   25.591901]  print_bad_pte+0x18c/0x218
      [   25.593628]  unmap_page_range+0x778/0xc00
      [   25.595506]  unmap_single_vma+0x94/0xe8
      [   25.597304]  unmap_vmas+0x90/0x108
      [   25.598901]  unmap_region+0xc0/0x128
      [   25.600566]  __do_munmap+0x284/0x3f0
      [   25.602245]  __vm_munmap+0x78/0xe0
      [   25.603820]  __arm64_sys_munmap+0x34/0x48
      [   25.605709]  el0_svc_common.constprop.0+0x78/0x168
      [   25.607956]  el0_svc_handler+0x34/0x90
      [   25.609698]  el0_svc+0x8/0xc
      [...]
      
      The root cause is in _vm_normal_page, without the PTE_SPECIAL bit,
      the return value will be incorrectly set to pfn_to_page(pfn) instead
      of NULL. Besides, this patch also rewrite the pmd_mkdevmap to avoid
      setting PTE_SPECIAL for pmd
      
      The MAP_SYNC test case is as follows(Provided by Yibo Cai)
      $#include <stdio.h>
      $#include <string.h>
      $#include <unistd.h>
      $#include <sys/file.h>
      $#include <sys/mman.h>
      
      $#ifndef MAP_SYNC
      $#define MAP_SYNC 0x80000
      $#endif
      
      /* mount -o dax /dev/pmem0 /mnt */
      $#define F "/mnt/__aaabbbcccddd__"
      
      int main(void)
      {
          int fd;
          char buf[4096];
          void *addr;
      
          if ((fd = open(F, O_CREAT|O_TRUNC|O_RDWR, 0644)) < 0) {
              perror("open1");
              return 1;
          }
      
          if (write(fd, buf, 4096) != 4096) {
              perror("lseek");
              return 1;
          }
      
          addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_SYNC, fd, 0);
          if (addr == MAP_FAILED) {
              perror("mmap");
              printf("did you mount with '-o dax'?\n");
              return 1;
          }
      
          memset(addr, 0x55, 4096);
      
          if (munmap(addr, 4096) == -1) {
              perror("munmap");
              return 1;
          }
      
          close(fd);
      
          return 0;
      }
      
      Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
      Reported-by: NYibo Cai <Yibo.Cai@arm.com>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NRobin Murphy <Robin.Murphy@arm.com>
      Signed-off-by: NJia He <justin.he@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      f72a099b
    • R
      arm64: mm: implement pte_devmap support · 1820ca63
      Robin Murphy 提交于
      commit 73b20c84d42de14673a987816dd4d132c7b1f801 upstream.
      
      In order for things like get_user_pages() to work on ZONE_DEVICE memory,
      we need a software PTE bit to identify device-backed PFNs.  Hook this up
      along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP.
      
      [robin.murphy@arm.com: build fixes]
        Link: http://lkml.kernel.org/r/13026c4e64abc17133bbfa07d7731ec6691c0bcd.1559050949.git.robin.murphy@arm.com
      Link: http://lkml.kernel.org/r/817d92886fc3b33bcbf6e105ee83a74babb3a5aa.1558547956.git.robin.murphy@arm.comSigned-off-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      1820ca63
    • R
      mm: introduce ARCH_HAS_PTE_DEVMAP · d7066a91
      Robin Murphy 提交于
      commit 175967318c3018d01931ac950c82adab5deb47ca upstream.
      
      ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
      with the long-out-of-date comment can lead to the impression than an
      architecture may just enable it (since __add_pages() now "comprehends
      device memory" for itself) and expect things to work.
      
      In practice, however, ZONE_DEVICE users have little chance of
      functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
      that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
      dependency so the real situation is clearer.
      
      Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.comSigned-off-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Acked-by: NOliver O'Halloran <oohall@gmail.com>
      Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      d7066a91
    • M
      arm64: ftrace: fix ifdeffery · a990f965
      Mark Rutland 提交于
      commit 70927d02d409b5a79c3ed040ace5017da8284ede upstream.
      
      When I tweaked the ftrace entry assembly in commit:
      
        3b23e4991fb66f6d ("arm64: implement ftrace with regs")
      
      ... my ifdeffery tweaks left ftrace_graph_caller undefined for
      CONFIG_DYNAMIC_FTRACE && CONFIG_FUNCTION_GRAPH_TRACER when ftrace is
      based on mcount.
      
      The kbuild test robot reported that this issue is detected at link time:
      
      | arch/arm64/kernel/entry-ftrace.o: In function `skip_ftrace_call':
      | arch/arm64/kernel/entry-ftrace.S:238: undefined reference to `ftrace_graph_caller'
      | arch/arm64/kernel/entry-ftrace.S:238:(.text+0x3c): relocation truncated to fit: R_AARCH64_CONDBR19 against undefined symbol
      | `ftrace_graph_caller'
      | arch/arm64/kernel/entry-ftrace.S:243: undefined reference to `ftrace_graph_caller'
      | arch/arm64/kernel/entry-ftrace.S:243:(.text+0x54): relocation truncated to fit: R_AARCH64_CONDBR19 against undefined symbol
      | `ftrace_graph_caller'
      
      This patch fixes the ifdeffery so that the mcount version of
      ftrace_graph_caller doesn't depend on CONFIG_DYNAMIC_FTRACE. At the same
      time, a redundant #else is removed from the ifdeffery for the
      patchable-function-entry version of ftrace_graph_caller.
      
      Fixes: 3b23e4991fb66f6d ("arm64: implement ftrace with regs")
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Torsten Duwe <duwe@lst.de>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      a990f965
    • Z
      alinux: arm64: fixed _mcount undefined reference error · 53b89c33
      Zou Cao 提交于
      fixed warnging as follow:
      arm64ksyms.c:(___ksymtab+_mcount+0x0): undefined reference to `_mcount'
      Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      53b89c33
    • M
      arm64: ftrace: always pass instrumented pc in x0 · 54760b8d
      Mark Rutland 提交于
      commit 7dc48bf96aa0fc8aa5b38cc3e5c36ac03171e680 upstream.
      
      The core ftrace hooks take the instrumented PC in x0, but for some
      reason arm64's prepare_ftrace_return() takes this in x1.
      
      For consistency, let's flip the argument order and always pass the
      instrumented PC in x0.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      54760b8d
    • M
      arm64: ftrace: remove return_regs macros · a7cd9c60
      Mark Rutland 提交于
      commit 49e258e05e8e56d53af20be481b311c43d7c286b upstream.
      
      The save_return_regs and restore_return_regs macros are only used by
      return_to_handler, and having them defined out-of-line only serves to
      obscure the logic.
      
      Before we complicate, let's clean this up and fold the logic directly
      into return_to_handler, saving a few lines of macro boilerplate in the
      process. At the same time, a missing trailing space is added to the
      comments, fixing a code style violation.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      a7cd9c60
    • T
      arm64: implement ftrace with regs · 1f77f2fc
      Torsten Duwe 提交于
      commit 3b23e4991fb66f6d152f9055ede271a726ef9f21 upstream
      
      This patch implements FTRACE_WITH_REGS for arm64, which allows a traced
      function's arguments (and some other registers) to be captured into a
      struct pt_regs, allowing these to be inspected and/or modified. This is
      a building block for live-patching, where a function's arguments may be
      forwarded to another function. This is also necessary to enable ftrace
      and in-kernel pointer authentication at the same time, as it allows the
      LR value to be captured and adjusted prior to signing.
      
      Using GCC's -fpatchable-function-entry=N option, we can have the
      compiler insert a configurable number of NOPs between the function entry
      point and the usual prologue. This also ensures functions are AAPCS
      compliant (e.g. disabling inter-procedural register allocation).
      
      For example, with -fpatchable-function-entry=2, GCC 8.1.0 compiles the
      following:
      
      | unsigned long bar(void);
      |
      | unsigned long foo(void)
      | {
      |         return bar() + 1;
      | }
      
      ... to:
      
      | <foo>:
      |         nop
      |         nop
      |         stp     x29, x30, [sp, #-16]!
      |         mov     x29, sp
      |         bl      0 <bar>
      |         add     x0, x0, #0x1
      |         ldp     x29, x30, [sp], #16
      |         ret
      
      This patch builds the kernel with -fpatchable-function-entry=2,
      prefixing each function with two NOPs. To trace a function, we replace
      these NOPs with a sequence that saves the LR into a GPR, then calls an
      ftrace entry assembly function which saves this and other relevant
      registers:
      
      | mov	x9, x30
      | bl	<ftrace-entry>
      
      Since patchable functions are AAPCS compliant (and the kernel does not
      use x18 as a platform register), x9-x18 can be safely clobbered in the
      patched sequence and the ftrace entry code.
      
      There are now two ftrace entry functions, ftrace_regs_entry (which saves
      all GPRs), and ftrace_entry (which saves the bare minimum). A PLT is
      allocated for each within modules.
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      [Mark: rework asm, comments, PLTs, initialization, commit message]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Tested-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Tested-by: NTorsten Duwe <duwe@suse.de>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Julien Thierry <jthierry@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      1f77f2fc
    • M
      arm64: asm-offsets: add S_FP · ecb06c7b
      Mark Rutland 提交于
      commit 1f377e043b3b8ef68caffe47bdad794f4e2cb030 upstream
      
      So that assembly code can more easily manipulate the FP (x29) within a
      pt_regs, add an S_FP asm-offsets definition.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Tested-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Tested-by: NTorsten Duwe <duwe@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      ecb06c7b
    • M
      arm64: insn: add encoder for MOV (register) · 88892d31
      Mark Rutland 提交于
      commit e3bf8a67f759b498e09999804c3837688e03b304 upstream
      
      For FTRACE_WITH_REGS, we're going to want to generate a MOV (register)
      instruction as part of the callsite intialization. As MOV (register) is
      an alias for ORR (shifted register), we can generate this with
      aarch64_insn_gen_logical_shifted_reg(), but it's somewhat verbose and
      difficult to read in-context.
      
      Add a aarch64_insn_gen_move_reg() wrapper for this case so that we can
      write callers in a more straightforward way.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Tested-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Tested-by: NTorsten Duwe <duwe@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      88892d31
    • M
      arm64: module/ftrace: intialize PLT at load time · 19f2b4ae
      Mark Rutland 提交于
      commit f1a54ae9af0da4d76239256ed640a93ab3aadac0 upstream
      
      Currently we lazily-initialize a module's ftrace PLT at runtime when we
      install the first ftrace call. To do so we have to apply a number of
      sanity checks, transiently mark the module text as RW, and perform an
      IPI as part of handling Neoverse-N1 erratum #1542419.
      
      We only expect the ftrace trampoline to point at ftrace_caller() (AKA
      FTRACE_ADDR), so let's simplify all of this by intializing the PLT at
      module load time, before the module loader marks the module RO and
      performs the intial I-cache maintenance for the module.
      
      Thus we can rely on the module having been correctly intialized, and can
      simplify the runtime work necessary to install an ftrace call in a
      module. This will also allow for the removal of module_disable_ro().
      
      Tested by forcing ftrace_make_call() to use the module PLT, and then
      loading up a module after setting up ftrace with:
      
      | echo ":mod:<module-name>" > set_ftrace_filter;
      | echo function > current_tracer;
      | modprobe <module-name>
      
      Since FTRACE_ADDR is only defined when CONFIG_DYNAMIC_FTRACE is
      selected, we wrap its use along with most of module_init_ftrace_plt()
      with ifdeffery rather than using IS_ENABLED().
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Tested-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Tested-by: NTorsten Duwe <duwe@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      19f2b4ae
    • M
      arm64: module: rework special section handling · d4199e8c
      Mark Rutland 提交于
      commit bd8b21d3dd661658addc1cd4cc869bab11d28596 upstream
      
      When we load a module, we have to perform some special work for a couple
      of named sections. To do this, we iterate over all of the module's
      sections, and perform work for each section we recognize.
      
      To make it easier to handle the unexpected absence of a section, and to
      make the section-specific logic easer to read, let's factor the section
      search into a helper. Similar is already done in the core module loader,
      and other architectures (and ideally we'd unify these in future).
      
      If we expect a module to have an ftrace trampoline section, but it
      doesn't have one, we'll now reject loading the module. When
      ARM64_MODULE_PLTS is selected, any correctly built module should have
      one (and this is assumed by arm64's ftrace PLT code) and the absence of
      such a section implies something has gone wrong at build time.
      
      Subsequent patches will make use of the new helper.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Tested-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Tested-by: NTorsten Duwe <duwe@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      d4199e8c
    • T
      arm64: Makefile: Replace -pg with CC_FLAGS_FTRACE · 007a7aa5
      Torsten Duwe 提交于
      commit edf072d36dbfdf74465b66988f30084b6c996fbf upstream.
      
      In preparation for arm64 supporting ftrace built on other compiler
      options, let's have the arm64 Makefiles remove the $(CC_FLAGS_FTRACE)
      flags, whatever these may be, rather than assuming '-pg'.
      
      There should be no functional change as a result of this patch.
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      007a7aa5
    • M
      arm64: ftrace: use GLOBAL() · 9709ac64
      Mark Rutland 提交于
      commit e4fe196642678565766815d99ab98a3a32d72dd4 upstream.
      
      The global exports of ftrace_call and ftrace_graph_call are somewhat
      painful to read. Let's use the generic GLOBAL() macro to ameliorate
      matters.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      9709ac64
    • P
      KVM: arm64: Add support for creating PUD hugepages at stage 2 · 497b3882
      Punit Agrawal 提交于
      commit b8e0ba7c8bea994011aff3b4c35256b180fab874 upstream.
      
      KVM only supports PMD hugepages at stage 2. Now that the various page
      handling routines are updated, extend the stage 2 fault handling to
      map in PUD hugepages.
      
      Addition of PUD hugepage support enables additional page sizes (e.g.,
      1G with 4K granule) which can be useful on cores that support mapping
      larger block sizes in the TLB entries.
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      [ Replace BUG() => WARN_ON(1) for arm32 PUD helpers ]
      Signed-off-by: NSuzuki Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      497b3882
    • P
      KVM: arm64: Update age handlers to support PUD hugepages · 445074ed
      Punit Agrawal 提交于
      commit 35a63966194dd994f44150f07398c62f8dca011e upstream.
      
      In preparation for creating larger hugepages at Stage 2, add support
      to the age handling notifiers for PUD hugepages when encountered.
      
      Provide trivial helpers for arm32 to allow sharing code.
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      [ Replaced BUG() => WARN_ON(1) for arm32 PUD helpers ]
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      445074ed
    • P
      KVM: arm64: Support handling access faults for PUD hugepages · 1f37c38f
      Punit Agrawal 提交于
      commit eb3f0624ea082def887acc79e97934e27d0188b7 upstream.
      
      In preparation for creating larger hugepages at Stage 2, extend the
      access fault handling at Stage 2 to support PUD hugepages when
      encountered.
      
      Provide trivial helpers for arm32 to allow sharing of code.
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      [ Replaced BUG() => WARN_ON(1) in PUD helpers ]
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      1f37c38f
    • P
      KVM: arm64: Support PUD hugepage in stage2_is_exec() · 29c25546
      Punit Agrawal 提交于
      commit 86d1c55ea605025f78d026e7fc3a2bb4c3fc2d6a upstream.
      
      In preparation for creating PUD hugepages at stage 2, add support for
      detecting execute permissions on PUD page table entries. Faults due to
      lack of execute permissions on page table entries is used to perform
      i-cache invalidation on first execute.
      
      Provide trivial implementations of arm32 helpers to allow sharing of
      code.
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      [ Replaced BUG() => WARN_ON(1) in arm32 PUD helpers ]
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      29c25546
    • P
      KVM: arm64: Support dirty page tracking for PUD hugepages · f89dc7dc
      Punit Agrawal 提交于
      commit 4ea5af53114091e23a8fc279f25637e6c4e892c6 upstream.
      
      In preparation for creating PUD hugepages at stage 2, add support for
      write protecting PUD hugepages when they are encountered. Write
      protecting guest tables is used to track dirty pages when migrating
      VMs.
      
      Also, provide trivial implementations of required kvm_s2pud_* helpers
      to allow sharing of code with arm32.
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      [ Replaced BUG() => WARN_ON() in arm32 pud helpers ]
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      f89dc7dc
    • P
      KVM: arm/arm64: Introduce helpers to manipulate page table entries · 8351986b
      Punit Agrawal 提交于
      commit f8df73388ee25b5e5f1d26249202e7126ca8139d upstream.
      
      Introduce helpers to abstract architectural handling of the conversion
      of pfn to page table entries and marking a PMD page table entry as a
      block entry.
      
      The helpers are introduced in preparation for supporting PUD hugepages
      at stage 2 - which are supported on arm64 but do not exist on arm.
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      8351986b
    • M
      KVM: arm/arm64: Log PSTATE for unhandled sysregs · 6088bea0
      Mark Rutland 提交于
      commit d1878af3a5a6ac00bc8a3edfecf80539ee9c546e upstream.
      
      When KVM traps an unhandled sysreg/coproc access from a guest, it logs
      the guest PC. To aid debugging, it would be helpful to know which
      exception level the trap came from, along with other PSTATE/CPSR bits,
      so let's log the PSTATE/CPSR too.
      Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Reviewed-by: NZou Cao <zoucao@linux.alibaba.com>
      6088bea0
    • M
      arm64: KVM: Consistently advance singlestep when emulating instructions · 02107bfd
      Mark Rutland 提交于
      commit bd7d95cafb499e24903b7d21f9eeb2c5208160c2 upstream.
      
      When we emulate a guest instruction, we don't advance the hardware
      singlestep state machine, and thus the guest will receive a software
      step exception after a next instruction which is not emulated by the
      host.
      
      We bodge around this in an ad-hoc fashion. Sometimes we explicitly check
      whether userspace requested a single step, and fake a debug exception
      from within the kernel. Other times, we advance the HW singlestep state
      rely on the HW to generate the exception for us. Thus, the observed step
      behaviour differs for host and guest.
      
      Let's make this simpler and consistent by always advancing the HW
      singlestep state machine when we skip an instruction. Thus we can rely
      on the hardware to generate the singlestep exception for us, and never
      need to explicitly check for an active-pending step, nor do we need to
      fake a debug exception from the guest.
      
      Cc: Peter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Reviewed-by: NZou Cao <zoucao@linux.alibaba.com>
      02107bfd
  2. 17 1月, 2020 4 次提交