1. 02 9月, 2020 10 次提交
  2. 29 6月, 2020 5 次提交
  3. 28 5月, 2020 1 次提交
    • A
      open: introduce openat2(2) syscall · 5b9369e5
      Aleksa Sarai 提交于
      to #26323588
      
      commit fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179 upstream.
      
      /* Background. */
      For a very long time, extending openat(2) with new features has been
      incredibly frustrating. This stems from the fact that openat(2) is
      possibly the most famous counter-example to the mantra "don't silently
      accept garbage from userspace" -- it doesn't check whether unknown flags
      are present[1].
      
      This means that (generally) the addition of new flags to openat(2) has
      been fraught with backwards-compatibility issues (O_TMPFILE has to be
      defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
      kernels gave errors, since it's insecure to silently ignore the
      flag[2]). All new security-related flags therefore have a tough road to
      being added to openat(2).
      
      Userspace also has a hard time figuring out whether a particular flag is
      supported on a particular kernel. While it is now possible with
      contemporary kernels (thanks to [3]), older kernels will expose unknown
      flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
      openat(2) time matches modern syscall designs and is far more
      fool-proof.
      
      In addition, the newly-added path resolution restriction LOOKUP flags
      (which we would like to expose to user-space) don't feel related to the
      pre-existing O_* flag set -- they affect all components of path lookup.
      We'd therefore like to add a new flag argument.
      
      Adding a new syscall allows us to finally fix the flag-ignoring problem,
      and we can make it extensible enough so that we will hopefully never
      need an openat3(2).
      
      /* Syscall Prototype. */
        /*
         * open_how is an extensible structure (similar in interface to
         * clone3(2) or sched_setattr(2)). The size parameter must be set to
         * sizeof(struct open_how), to allow for future extensions. All future
         * extensions will be appended to open_how, with their zero value
         * acting as a no-op default.
         */
        struct open_how { /* ... */ };
      
        int openat2(int dfd, const char *pathname,
                    struct open_how *how, size_t size);
      
      /* Description. */
      The initial version of 'struct open_how' contains the following fields:
      
        flags
          Used to specify openat(2)-style flags. However, any unknown flag
          bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
          will result in -EINVAL. In addition, this field is 64-bits wide to
          allow for more O_ flags than currently permitted with openat(2).
      
        mode
          The file mode for O_CREAT or O_TMPFILE.
      
          Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.
      
        resolve
          Restrict path resolution (in contrast to O_* flags they affect all
          path components). The current set of flags are as follows (at the
          moment, all of the RESOLVE_ flags are implemented as just passing
          the corresponding LOOKUP_ flag).
      
          RESOLVE_NO_XDEV       => LOOKUP_NO_XDEV
          RESOLVE_NO_SYMLINKS   => LOOKUP_NO_SYMLINKS
          RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
          RESOLVE_BENEATH       => LOOKUP_BENEATH
          RESOLVE_IN_ROOT       => LOOKUP_IN_ROOT
      
      open_how does not contain an embedded size field, because it is of
      little benefit (userspace can figure out the kernel open_how size at
      runtime fairly easily without it). It also only contains u64s (even
      though ->mode arguably should be a u16) to avoid having padding fields
      which are never used in the future.
      
      Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
      is no longer permitted for openat(2). As far as I can tell, this has
      always been a bug and appears to not be used by userspace (and I've not
      seen any problems on my machines by disallowing it). If it turns out
      this breaks something, we can special-case it and only permit it for
      openat(2) but not openat2(2).
      
      After input from Florian Weimer, the new open_how and flag definitions
      are inside a separate header from uapi/linux/fcntl.h, to avoid problems
      that glibc has with importing that header.
      
      /* Testing. */
      In a follow-up patch there are over 200 selftests which ensure that this
      syscall has the correct semantics and will correctly handle several
      attack scenarios.
      
      In addition, I've written a userspace library[4] which provides
      convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
      because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
      must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
      syscalls). During the development of this patch, I've run numerous
      verification tests using libpathrs (showing that the API is reasonably
      usable by userspace).
      
      /* Future Work. */
      Additional RESOLVE_ flags have been suggested during the review period.
      These can be easily implemented separately (such as blocking auto-mount
      during resolution).
      
      Furthermore, there are some other proposed changes to the openat(2)
      interface (the most obvious example is magic-link hardening[5]) which
      would be a good opportunity to add a way for userspace to restrict how
      O_PATH file descriptors can be re-opened.
      
      Another possible avenue of future work would be some kind of
      CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
      which openat2(2) flags and fields are supported by the current kernel
      (to avoid userspace having to go through several guesses to figure it
      out).
      
      [1]: https://lwn.net/Articles/588444/
      [2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
      [3]: commit 629e014b ("fs: completely ignore unknown open flags")
      [4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
      [5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
      [6]: https://youtu.be/ggD-eb3yPVsSuggested-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NAleksa Sarai <cyphar@cyphar.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      5b9369e5
  4. 28 4月, 2020 3 次提交
  5. 22 4月, 2020 1 次提交
  6. 18 3月, 2020 20 次提交
    • M
      KVM: arm64: Opportunistically turn off WFI trapping when using direct LPI injection · ff9b66b5
      Marc Zyngier 提交于
      commit ef2e78ddadbb939ce79553b10dee0131d65d8f3e upstream.
      
      Just like we do for WFE trapping, it can be useful to turn off
      WFI trapping when the physical CPU is not oversubscribed (that
      is, the vcpu is the only runnable process on this CPU) *and*
      that we're using direct injection of interrupts.
      
      The conditions are reevaluated on each vcpu_load(), ensuring that
      we don't switch to this mode on a busy system.
      
      On a GICv4 system, this has the effect of reducing the generation
      of doorbell interrupts to zero when the right conditions are
      met, which is a huge improvement over the current situation
      (where the doorbells are screaming if the CPU ever hits a
      blocking WFI).
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Link: https://lore.kernel.org/r/20191107160412.30301-3-maz@kernel.orgSigned-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      ff9b66b5
    • Z
      alinux: arm64: use __kernel_text_address to replace kthread_return_to_user · 64259ab4
      Zou Cao 提交于
      We don't need to use kthread_return_to_user to tell unwind it is kernel
      thread, we can use __kernel_text_address, it is a normal way in other
      arch like x86/ppc.
      Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      64259ab4
    • T
      arm64: reliable stacktraces · 46ad7da7
      Torsten Duwe 提交于
      cherry-picked from: https://patchwork.kernel.org/patch/10657429/
      
      Enhance the stack unwinder so that it reports whether it had to stop
      normally or due to an error condition; unwind_frame() will report
      continue/error/normal ending and walk_stackframe() will pass that
      info. __save_stack_trace() is used to check the validity of a stack;
      save_stack_trace_tsk_reliable() can now trivially be implemented.
      Modify arch/arm64/kernel/time.c as the only external caller so far
      to recognise the new semantics.
      
      I had to introduce a marker symbol kthread_return_to_user to tell
      the normal origin of a kernel thread.
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      46ad7da7
    • Z
      alinux: arm64: add livepatch support · 7d9b185c
      Zou Cao 提交于
      Now we support FTRACE_WITH_REGS with -fpatchable-function-entry, here
      enable the livepatch support depend on FTRACE_WITH_REGS.
      
      Use task flag bit 6 to track patch transisiton state for the consistency
      model. Add it to the work mask so it gets cleared on all kernel exits to
      userland.
      
      Tell livepatch regs->pc + 2*AARCH64_INSN_SIZE is the place to change the
      return address.
      
      these codes have a big change against reference link, beacause we use new
      gcc featrue.
      
      References:
      https://patchwork.kernel.org/patch/10657431/
      
      Based-on-code-from: Torsten Duwe <duwe@suse.de>
      Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      7d9b185c
    • Q
      mm/memblock.c: skip kmemleak for kasan_init() · 9f093569
      Qian Cai 提交于
      commit fed84c78527009d4f799a3ed9a566502fa026d82 upstream.
      
      Kmemleak does not play well with KASAN (tested on both HPE Apollo 70 and
      Huawei TaiShan 2280 aarch64 servers).
      
      After calling start_kernel()->setup_arch()->kasan_init(), kmemleak early
      log buffer went from something like 280 to 260000 which caused kmemleak
      disabled and crash dump memory reservation failed.  The multitude of
      kmemleak_alloc() calls is from nested loops while KASAN is setting up full
      memory mappings, so let early kmemleak allocations skip those
      memblock_alloc_internal() calls came from kasan_init() given that those
      early KASAN memory mappings should not reference to other memory.  Hence,
      no kmemleak false positives.
      
      kasan_init
        kasan_map_populate [1]
          kasan_pgd_populate [2]
            kasan_pud_populate [3]
              kasan_pmd_populate [4]
                kasan_pte_populate [5]
                  kasan_alloc_zeroed_page
                    memblock_alloc_try_nid
                      memblock_alloc_internal
                        kmemleak_alloc
      
      [1] for_each_memblock(memory, reg)
      [2] while (pgdp++, addr = next, addr != end)
      [3] while (pudp++, addr = next, addr != end && pud_none(READ_ONCE(*pudp)))
      [4] while (pmdp++, addr = next, addr != end && pmd_none(READ_ONCE(*pmdp)))
      [5] while (ptep++, addr = next, addr != end && pte_none(READ_ONCE(*ptep)))
      
      Link: http://lkml.kernel.org/r/1543442925-17794-1-git-send-email-cai@gmx.usSigned-off-by: NQian Cai <cai@gmx.us>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      9f093569
    • J
      arm64: mm: add missing PTE_SPECIAL in pte_mkdevmap on arm64 · f72a099b
      Jia He 提交于
      commit 30e235389faadb9e3d918887b1f126155d7d761d upstream.
      
      Without this patch, the MAP_SYNC test case will cause a print_bad_pte
      warning on arm64 as follows:
      
      [   25.542693] BUG: Bad page map in process mapdax333 pte:2e8000448800f53 pmd:41ff5f003
      [   25.546360] page:ffff7e0010220000 refcount:1 mapcount:-1 mapping:ffff8003e29c7440 index:0x0
      [   25.550281] ext4_dax_aops
      [   25.550282] name:"__aaabbbcccddd__"
      [   25.551553] flags: 0x3ffff0000001002(referenced|reserved)
      [   25.555802] raw: 03ffff0000001002 ffff8003dfffa908 0000000000000000 ffff8003e29c7440
      [   25.559446] raw: 0000000000000000 0000000000000000 00000001fffffffe 0000000000000000
      [   25.563075] page dumped because: bad pte
      [   25.564938] addr:0000ffffbe05b000 vm_flags:208000fb anon_vma:0000000000000000 mapping:ffff8003e29c7440 index:0
      [   25.574272] file:__aaabbbcccddd__ fault:ext4_dax_fault mmmmap:ext4_file_mmap readpage:0x0
      [   25.578799] CPU: 1 PID: 1180 Comm: mapdax333 Not tainted 5.2.0+ #21
      [   25.581702] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      [   25.585624] Call trace:
      [   25.587008]  dump_backtrace+0x0/0x178
      [   25.588799]  show_stack+0x24/0x30
      [   25.590328]  dump_stack+0xa8/0xcc
      [   25.591901]  print_bad_pte+0x18c/0x218
      [   25.593628]  unmap_page_range+0x778/0xc00
      [   25.595506]  unmap_single_vma+0x94/0xe8
      [   25.597304]  unmap_vmas+0x90/0x108
      [   25.598901]  unmap_region+0xc0/0x128
      [   25.600566]  __do_munmap+0x284/0x3f0
      [   25.602245]  __vm_munmap+0x78/0xe0
      [   25.603820]  __arm64_sys_munmap+0x34/0x48
      [   25.605709]  el0_svc_common.constprop.0+0x78/0x168
      [   25.607956]  el0_svc_handler+0x34/0x90
      [   25.609698]  el0_svc+0x8/0xc
      [...]
      
      The root cause is in _vm_normal_page, without the PTE_SPECIAL bit,
      the return value will be incorrectly set to pfn_to_page(pfn) instead
      of NULL. Besides, this patch also rewrite the pmd_mkdevmap to avoid
      setting PTE_SPECIAL for pmd
      
      The MAP_SYNC test case is as follows(Provided by Yibo Cai)
      $#include <stdio.h>
      $#include <string.h>
      $#include <unistd.h>
      $#include <sys/file.h>
      $#include <sys/mman.h>
      
      $#ifndef MAP_SYNC
      $#define MAP_SYNC 0x80000
      $#endif
      
      /* mount -o dax /dev/pmem0 /mnt */
      $#define F "/mnt/__aaabbbcccddd__"
      
      int main(void)
      {
          int fd;
          char buf[4096];
          void *addr;
      
          if ((fd = open(F, O_CREAT|O_TRUNC|O_RDWR, 0644)) < 0) {
              perror("open1");
              return 1;
          }
      
          if (write(fd, buf, 4096) != 4096) {
              perror("lseek");
              return 1;
          }
      
          addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_SYNC, fd, 0);
          if (addr == MAP_FAILED) {
              perror("mmap");
              printf("did you mount with '-o dax'?\n");
              return 1;
          }
      
          memset(addr, 0x55, 4096);
      
          if (munmap(addr, 4096) == -1) {
              perror("munmap");
              return 1;
          }
      
          close(fd);
      
          return 0;
      }
      
      Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
      Reported-by: NYibo Cai <Yibo.Cai@arm.com>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NRobin Murphy <Robin.Murphy@arm.com>
      Signed-off-by: NJia He <justin.he@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      f72a099b
    • R
      arm64: mm: implement pte_devmap support · 1820ca63
      Robin Murphy 提交于
      commit 73b20c84d42de14673a987816dd4d132c7b1f801 upstream.
      
      In order for things like get_user_pages() to work on ZONE_DEVICE memory,
      we need a software PTE bit to identify device-backed PFNs.  Hook this up
      along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP.
      
      [robin.murphy@arm.com: build fixes]
        Link: http://lkml.kernel.org/r/13026c4e64abc17133bbfa07d7731ec6691c0bcd.1559050949.git.robin.murphy@arm.com
      Link: http://lkml.kernel.org/r/817d92886fc3b33bcbf6e105ee83a74babb3a5aa.1558547956.git.robin.murphy@arm.comSigned-off-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      1820ca63
    • M
      arm64: ftrace: fix ifdeffery · a990f965
      Mark Rutland 提交于
      commit 70927d02d409b5a79c3ed040ace5017da8284ede upstream.
      
      When I tweaked the ftrace entry assembly in commit:
      
        3b23e4991fb66f6d ("arm64: implement ftrace with regs")
      
      ... my ifdeffery tweaks left ftrace_graph_caller undefined for
      CONFIG_DYNAMIC_FTRACE && CONFIG_FUNCTION_GRAPH_TRACER when ftrace is
      based on mcount.
      
      The kbuild test robot reported that this issue is detected at link time:
      
      | arch/arm64/kernel/entry-ftrace.o: In function `skip_ftrace_call':
      | arch/arm64/kernel/entry-ftrace.S:238: undefined reference to `ftrace_graph_caller'
      | arch/arm64/kernel/entry-ftrace.S:238:(.text+0x3c): relocation truncated to fit: R_AARCH64_CONDBR19 against undefined symbol
      | `ftrace_graph_caller'
      | arch/arm64/kernel/entry-ftrace.S:243: undefined reference to `ftrace_graph_caller'
      | arch/arm64/kernel/entry-ftrace.S:243:(.text+0x54): relocation truncated to fit: R_AARCH64_CONDBR19 against undefined symbol
      | `ftrace_graph_caller'
      
      This patch fixes the ifdeffery so that the mcount version of
      ftrace_graph_caller doesn't depend on CONFIG_DYNAMIC_FTRACE. At the same
      time, a redundant #else is removed from the ifdeffery for the
      patchable-function-entry version of ftrace_graph_caller.
      
      Fixes: 3b23e4991fb66f6d ("arm64: implement ftrace with regs")
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Torsten Duwe <duwe@lst.de>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      a990f965
    • Z
      alinux: arm64: fixed _mcount undefined reference error · 53b89c33
      Zou Cao 提交于
      fixed warnging as follow:
      arm64ksyms.c:(___ksymtab+_mcount+0x0): undefined reference to `_mcount'
      Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      53b89c33
    • M
      arm64: ftrace: always pass instrumented pc in x0 · 54760b8d
      Mark Rutland 提交于
      commit 7dc48bf96aa0fc8aa5b38cc3e5c36ac03171e680 upstream.
      
      The core ftrace hooks take the instrumented PC in x0, but for some
      reason arm64's prepare_ftrace_return() takes this in x1.
      
      For consistency, let's flip the argument order and always pass the
      instrumented PC in x0.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      54760b8d
    • M
      arm64: ftrace: remove return_regs macros · a7cd9c60
      Mark Rutland 提交于
      commit 49e258e05e8e56d53af20be481b311c43d7c286b upstream.
      
      The save_return_regs and restore_return_regs macros are only used by
      return_to_handler, and having them defined out-of-line only serves to
      obscure the logic.
      
      Before we complicate, let's clean this up and fold the logic directly
      into return_to_handler, saving a few lines of macro boilerplate in the
      process. At the same time, a missing trailing space is added to the
      comments, fixing a code style violation.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      a7cd9c60
    • T
      arm64: implement ftrace with regs · 1f77f2fc
      Torsten Duwe 提交于
      commit 3b23e4991fb66f6d152f9055ede271a726ef9f21 upstream
      
      This patch implements FTRACE_WITH_REGS for arm64, which allows a traced
      function's arguments (and some other registers) to be captured into a
      struct pt_regs, allowing these to be inspected and/or modified. This is
      a building block for live-patching, where a function's arguments may be
      forwarded to another function. This is also necessary to enable ftrace
      and in-kernel pointer authentication at the same time, as it allows the
      LR value to be captured and adjusted prior to signing.
      
      Using GCC's -fpatchable-function-entry=N option, we can have the
      compiler insert a configurable number of NOPs between the function entry
      point and the usual prologue. This also ensures functions are AAPCS
      compliant (e.g. disabling inter-procedural register allocation).
      
      For example, with -fpatchable-function-entry=2, GCC 8.1.0 compiles the
      following:
      
      | unsigned long bar(void);
      |
      | unsigned long foo(void)
      | {
      |         return bar() + 1;
      | }
      
      ... to:
      
      | <foo>:
      |         nop
      |         nop
      |         stp     x29, x30, [sp, #-16]!
      |         mov     x29, sp
      |         bl      0 <bar>
      |         add     x0, x0, #0x1
      |         ldp     x29, x30, [sp], #16
      |         ret
      
      This patch builds the kernel with -fpatchable-function-entry=2,
      prefixing each function with two NOPs. To trace a function, we replace
      these NOPs with a sequence that saves the LR into a GPR, then calls an
      ftrace entry assembly function which saves this and other relevant
      registers:
      
      | mov	x9, x30
      | bl	<ftrace-entry>
      
      Since patchable functions are AAPCS compliant (and the kernel does not
      use x18 as a platform register), x9-x18 can be safely clobbered in the
      patched sequence and the ftrace entry code.
      
      There are now two ftrace entry functions, ftrace_regs_entry (which saves
      all GPRs), and ftrace_entry (which saves the bare minimum). A PLT is
      allocated for each within modules.
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      [Mark: rework asm, comments, PLTs, initialization, commit message]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Tested-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Tested-by: NTorsten Duwe <duwe@suse.de>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Julien Thierry <jthierry@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      1f77f2fc
    • M
      arm64: asm-offsets: add S_FP · ecb06c7b
      Mark Rutland 提交于
      commit 1f377e043b3b8ef68caffe47bdad794f4e2cb030 upstream
      
      So that assembly code can more easily manipulate the FP (x29) within a
      pt_regs, add an S_FP asm-offsets definition.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Tested-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Tested-by: NTorsten Duwe <duwe@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      ecb06c7b
    • M
      arm64: insn: add encoder for MOV (register) · 88892d31
      Mark Rutland 提交于
      commit e3bf8a67f759b498e09999804c3837688e03b304 upstream
      
      For FTRACE_WITH_REGS, we're going to want to generate a MOV (register)
      instruction as part of the callsite intialization. As MOV (register) is
      an alias for ORR (shifted register), we can generate this with
      aarch64_insn_gen_logical_shifted_reg(), but it's somewhat verbose and
      difficult to read in-context.
      
      Add a aarch64_insn_gen_move_reg() wrapper for this case so that we can
      write callers in a more straightforward way.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Tested-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Tested-by: NTorsten Duwe <duwe@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      88892d31
    • M
      arm64: module/ftrace: intialize PLT at load time · 19f2b4ae
      Mark Rutland 提交于
      commit f1a54ae9af0da4d76239256ed640a93ab3aadac0 upstream
      
      Currently we lazily-initialize a module's ftrace PLT at runtime when we
      install the first ftrace call. To do so we have to apply a number of
      sanity checks, transiently mark the module text as RW, and perform an
      IPI as part of handling Neoverse-N1 erratum #1542419.
      
      We only expect the ftrace trampoline to point at ftrace_caller() (AKA
      FTRACE_ADDR), so let's simplify all of this by intializing the PLT at
      module load time, before the module loader marks the module RO and
      performs the intial I-cache maintenance for the module.
      
      Thus we can rely on the module having been correctly intialized, and can
      simplify the runtime work necessary to install an ftrace call in a
      module. This will also allow for the removal of module_disable_ro().
      
      Tested by forcing ftrace_make_call() to use the module PLT, and then
      loading up a module after setting up ftrace with:
      
      | echo ":mod:<module-name>" > set_ftrace_filter;
      | echo function > current_tracer;
      | modprobe <module-name>
      
      Since FTRACE_ADDR is only defined when CONFIG_DYNAMIC_FTRACE is
      selected, we wrap its use along with most of module_init_ftrace_plt()
      with ifdeffery rather than using IS_ENABLED().
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Tested-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Tested-by: NTorsten Duwe <duwe@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      19f2b4ae
    • M
      arm64: module: rework special section handling · d4199e8c
      Mark Rutland 提交于
      commit bd8b21d3dd661658addc1cd4cc869bab11d28596 upstream
      
      When we load a module, we have to perform some special work for a couple
      of named sections. To do this, we iterate over all of the module's
      sections, and perform work for each section we recognize.
      
      To make it easier to handle the unexpected absence of a section, and to
      make the section-specific logic easer to read, let's factor the section
      search into a helper. Similar is already done in the core module loader,
      and other architectures (and ideally we'd unify these in future).
      
      If we expect a module to have an ftrace trampoline section, but it
      doesn't have one, we'll now reject loading the module. When
      ARM64_MODULE_PLTS is selected, any correctly built module should have
      one (and this is assumed by arm64's ftrace PLT code) and the absence of
      such a section implies something has gone wrong at build time.
      
      Subsequent patches will make use of the new helper.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Tested-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Tested-by: NTorsten Duwe <duwe@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      d4199e8c
    • T
      arm64: Makefile: Replace -pg with CC_FLAGS_FTRACE · 007a7aa5
      Torsten Duwe 提交于
      commit edf072d36dbfdf74465b66988f30084b6c996fbf upstream.
      
      In preparation for arm64 supporting ftrace built on other compiler
      options, let's have the arm64 Makefiles remove the $(CC_FLAGS_FTRACE)
      flags, whatever these may be, rather than assuming '-pg'.
      
      There should be no functional change as a result of this patch.
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      007a7aa5
    • M
      arm64: ftrace: use GLOBAL() · 9709ac64
      Mark Rutland 提交于
      commit e4fe196642678565766815d99ab98a3a32d72dd4 upstream.
      
      The global exports of ftrace_call and ftrace_graph_call are somewhat
      painful to read. Let's use the generic GLOBAL() macro to ameliorate
      matters.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: Zou Cao<zoucao@linux.alibaba.com>
      Acked-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
      9709ac64
    • P
      KVM: arm64: Add support for creating PUD hugepages at stage 2 · 497b3882
      Punit Agrawal 提交于
      commit b8e0ba7c8bea994011aff3b4c35256b180fab874 upstream.
      
      KVM only supports PMD hugepages at stage 2. Now that the various page
      handling routines are updated, extend the stage 2 fault handling to
      map in PUD hugepages.
      
      Addition of PUD hugepage support enables additional page sizes (e.g.,
      1G with 4K granule) which can be useful on cores that support mapping
      larger block sizes in the TLB entries.
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      [ Replace BUG() => WARN_ON(1) for arm32 PUD helpers ]
      Signed-off-by: NSuzuki Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      497b3882
    • P
      KVM: arm64: Update age handlers to support PUD hugepages · 445074ed
      Punit Agrawal 提交于
      commit 35a63966194dd994f44150f07398c62f8dca011e upstream.
      
      In preparation for creating larger hugepages at Stage 2, add support
      to the age handling notifiers for PUD hugepages when encountered.
      
      Provide trivial helpers for arm32 to allow sharing code.
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      [ Replaced BUG() => WARN_ON(1) for arm32 PUD helpers ]
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      445074ed