1. 06 5月, 2021 1 次提交
    • A
      userfaultfd: add minor fault registration mode · 7677f7fd
      Axel Rasmussen 提交于
      Patch series "userfaultfd: add minor fault handling", v9.
      
      Overview
      ========
      
      This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
      When enabled (via the UFFDIO_API ioctl), this feature means that any
      hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
      get events for "minor" faults.  By "minor" fault, I mean the following
      situation:
      
      Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
      memory).  One of the mappings is registered with userfaultfd (in minor
      mode), and the other is not.  Via the non-UFFD mapping, the underlying
      pages have already been allocated & filled with some contents.  The UFFD
      mapping has not yet been faulted in; when it is touched for the first
      time, this results in what I'm calling a "minor" fault.  As a concrete
      example, when working with hugetlbfs, we have huge_pte_none(), but
      find_lock_page() finds an existing page.
      
      We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE.  The idea
      is, userspace resolves the fault by either a) doing nothing if the
      contents are already correct, or b) updating the underlying contents using
      the second, non-UFFD mapping (via memcpy/memset or similar, or something
      fancier like RDMA, or etc...).  In either case, userspace issues
      UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
      correct, carry on setting up the mapping".
      
      Use Case
      ========
      
      Consider the use case of VM live migration (e.g. under QEMU/KVM):
      
      1. While a VM is still running, we copy the contents of its memory to a
         target machine. The pages are populated on the target by writing to the
         non-UFFD mapping, using the setup described above. The VM is still running
         (and therefore its memory is likely changing), so this may be repeated
         several times, until we decide the target is "up to date enough".
      
      2. We pause the VM on the source, and start executing on the target machine.
         During this gap, the VM's user(s) will *see* a pause, so it is desirable to
         minimize this window.
      
      3. Between the last time any page was copied from the source to the target, and
         when the VM was paused, the contents of that page may have changed - and
         therefore the copy we have on the target machine is out of date. Although we
         can keep track of which pages are out of date, for VMs with large amounts of
         memory, it is "slow" to transfer this information to the target machine. We
         want to resume execution before such a transfer would complete.
      
      4. So, the guest begins executing on the target machine. The first time it
         touches its memory (via the UFFD-registered mapping), userspace wants to
         intercept this fault. Userspace checks whether or not the page is up to date,
         and if not, copies the updated page from the source machine, via the non-UFFD
         mapping. Finally, whether a copy was performed or not, userspace issues a
         UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
         are correct, carry on setting up the mapping".
      
      We don't have to do all of the final updates on-demand. The userfaultfd manager
      can, in the background, also copy over updated pages once it receives the map of
      which pages are up-to-date or not.
      
      Interaction with Existing APIs
      ==============================
      
      Because this is a feature, a registered VMA could potentially receive both
      missing and minor faults.  I spent some time thinking through how the
      existing API interacts with the new feature:
      
      UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
      allocate a new page.  If UFFDIO_CONTINUE is used on a non-minor fault:
      
      - For non-shared memory or shmem, -EINVAL is returned.
      - For hugetlb, -EFAULT is returned.
      
      UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
      Without modifications, the existing codepath assumes a new page needs to
      be allocated.  This is okay, since userspace must have a second
      non-UFFD-registered mapping anyway, thus there isn't much reason to want
      to use these in any case (just memcpy or memset or similar).
      
      - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
      - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
        in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
      - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
        -ENOENT in that case (regardless of the kind of fault).
      
      Future Work
      ===========
      
      This series only supports hugetlbfs.  I have a second series in flight to
      support shmem as well, extending the functionality.  This series is more
      mature than the shmem support at this point, and the functionality works
      fully on hugetlbfs, so this series can be merged first and then shmem
      support will follow.
      
      This patch (of 6):
      
      This feature allows userspace to intercept "minor" faults.  By "minor"
      faults, I mean the following situation:
      
      Let there exist two mappings (i.e., VMAs) to the same page(s).  One of the
      mappings is registered with userfaultfd (in minor mode), and the other is
      not.  Via the non-UFFD mapping, the underlying pages have already been
      allocated & filled with some contents.  The UFFD mapping has not yet been
      faulted in; when it is touched for the first time, this results in what
      I'm calling a "minor" fault.  As a concrete example, when working with
      hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
      page.
      
      This commit adds the new registration mode, and sets the relevant flag on
      the VMAs being registered.  In the hugetlb fault path, if we find that we
      have huge_pte_none(), but find_lock_page() does indeed find an existing
      page, then we have a "minor" fault, and if the VMA has the userfaultfd
      registration flag, we call into userfaultfd to handle it.
      
      This is implemented as a new registration mode, instead of an API feature.
      This is because the alternative implementation has significant drawbacks
      [1].
      
      However, doing it this was requires we allocate a VM_* flag for the new
      registration mode.  On 32-bit systems, there are no unused bits, so this
      feature is only supported on architectures with
      CONFIG_ARCH_USES_HIGH_VMA_FLAGS.  When attempting to register a VMA in
      MINOR mode on 32-bit architectures, we return -EINVAL.
      
      [1] https://lore.kernel.org/patchwork/patch/1380226/
      
      [peterx@redhat.com: fix minor fault page leak]
        Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com
      
      Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
      Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.comSigned-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michal Koutn" <mkoutny@suse.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shawn Anastasio <shawn@anastas.io>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Adam Ruprecht <ruprecht@google.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Cannon Matthews <cannonmatthews@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7677f7fd
  2. 01 5月, 2021 1 次提交
  3. 20 4月, 2021 1 次提交
    • I
      x86/platform/uv: Fix !KEXEC build failure · c2209ea5
      Ingo Molnar 提交于
      When KEXEC is disabled, the UV build fails:
      
        arch/x86/platform/uv/uv_nmi.c:875:14: error: ‘uv_nmi_kexec_failed’ undeclared (first use in this function)
      
      Since uv_nmi_kexec_failed is only defined in the KEXEC_CORE #ifdef branch,
      this code cannot ever have been build tested:
      
      	if (main)
      		pr_err("UV: NMI kdump: KEXEC not supported in this kernel\n");
      	atomic_set(&uv_nmi_kexec_failed, 1);
      
      Nor is this use possible in uv_handle_nmi():
      
                      atomic_set(&uv_nmi_kexec_failed, 0);
      
      These bugs were introduced in this commit:
      
          d0a9964e: ("x86/platform/uv: Implement simple dump failover if kdump fails")
      
      Which added the uv_nmi_kexec_failed assignments to !KEXEC code, while making the
      definition KEXEC-only - apparently without testing the !KEXEC case.
      
      Instead of complicating the #ifdef maze, simplify the code by requiring X86_UV
      to depend on KEXEC_CORE. This pattern is present in other architectures as well.
      
      ( We'll remove the untested, 7 years old !KEXEC complications from the file in a
        separate commit. )
      
      Fixes: d0a9964e: ("x86/platform/uv: Implement simple dump failover if kdump fails")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Mike Travis <travis@sgi.com>
      Cc: linux-kernel@vger.kernel.org
      c2209ea5
  4. 19 4月, 2021 1 次提交
  5. 08 4月, 2021 1 次提交
  6. 20 3月, 2021 1 次提交
  7. 11 3月, 2021 1 次提交
  8. 09 3月, 2021 1 次提交
  9. 08 3月, 2021 1 次提交
    • A
      x86/stackprotector/32: Make the canary into a regular percpu variable · 3fb0fdb3
      Andy Lutomirski 提交于
      On 32-bit kernels, the stackprotector canary is quite nasty -- it is
      stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
      percpu storage.  It's even nastier because it means that whether %gs
      contains userspace state or kernel state while running kernel code
      depends on whether stackprotector is enabled (this is
      CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
      that segment selectors work.  Supporting both variants is a
      maintenance and testing mess.
      
      Merely rearranging so that percpu and the stack canary
      share the same segment would be messy as the 32-bit percpu address
      layout isn't currently compatible with putting a variable at a fixed
      offset.
      
      Fortunately, GCC 8.1 added options that allow the stack canary to be
      accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
      percpu variable.  This lets us get rid of all of the code to manage the
      stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
      
      (That name is special.  We could use any symbol we want for the
       %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
       name other than __stack_chk_guard.)
      
      Forcibly disable stackprotector on older compilers that don't support
      the new options and turn the stack canary into a percpu variable. The
      "lazy GS" approach is now used for all 32-bit configurations.
      
      Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels,
      it loads the GS selector and updates the user GSBASE accordingly. (This
      is unchanged.) On 32-bit kernels, it loads the GS selector and updates
      GSBASE, which is now always the user base. This means that the overall
      effect is the same on 32-bit and 64-bit, which avoids some ifdeffery.
      
       [ bp: Massage commit message. ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org
      3fb0fdb3
  10. 27 2月, 2021 1 次提交
  11. 24 2月, 2021 2 次提交
  12. 17 2月, 2021 1 次提交
  13. 16 2月, 2021 1 次提交
  14. 11 2月, 2021 2 次提交
  15. 09 2月, 2021 1 次提交
  16. 08 2月, 2021 1 次提交
  17. 06 2月, 2021 1 次提交
  18. 29 1月, 2021 1 次提交
  19. 06 1月, 2021 3 次提交
    • A
      Kconfig: regularize selection of CONFIG_BINFMT_ELF · 41026c34
      Al Viro 提交于
      with mips converted to use of fs/config_binfmt_elf.c, there's no
      need to keep selects of that thing all over arch/* - we can simply
      turn into def_bool y if COMPAT && BINFMT_ELF (in fs/Kconfig.binfmt)
      and get rid of all selects.
      
      Several architectures got those selects wrong (e.g. you could
      end up with sparc64 sans BINFMT_ELF, with select violating
      dependencies, etc.)
      
      Randy Dunlap has spotted some of those; IMO this is simpler than
      his fix, but it depends upon the stuff that would need to be
      backported, so we might end up using his variant for -stable.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      41026c34
    • A
      x32: make X32, !IA32_EMULATION setups able to execute x32 binaries · 85f2ada7
      Al Viro 提交于
      It's really trivial - the only wrinkle is making sure that
      compiler knows that ia32-related side of COMPAT_ARCH_DLINFO
      is dead code on such configs (we don't get there without
      having passed compat_elf_check_arch(), and on such configs
      that'll fail for ia32 binary).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      85f2ada7
    • A
      [amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly · 7facdc42
      Al Viro 提交于
      To get rid of hardcoded size/offset in those macros we need to have
      a definition of i386 variant of struct elf_prstatus.  However, we can't
      do that in asm/compat.h - the types needed for that are not there and
      adding an include of asm/user32.h into asm/compat.h would cause a lot
      of mess.
      
      That could be conveniently done in elfcore-compat.h, but currently there
      is nowhere to put arch-dependent parts of it - no asm/elfcore-compat.h.
      So we introduce a new file (asm/elfcore-compat.h, present on architectures
      that have CONFIG_ARCH_HAS_ELFCORE_COMPAT set, currently only on x86),
      have it pulled by linux/elfcore-compat.h and move the definitions there.
      
      As a side benefit, we don't need to worry about accidental inclusion of
      that file into binfmt_elf.c itself, so we don't need the dance with
      COMPAT_PRSTATUS_SIZE, etc. - only fs/compat_binfmt_elf.c will see
      that header.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7facdc42
  20. 28 12月, 2020 1 次提交
  21. 16 12月, 2020 2 次提交
    • M
      arch, mm: restore dependency of __kernel_map_pages() on DEBUG_PAGEALLOC · 5d6ad668
      Mike Rapoport 提交于
      The design of DEBUG_PAGEALLOC presumes that __kernel_map_pages() must
      never fail.  With this assumption is wouldn't be safe to allow general
      usage of this function.
      
      Moreover, some architectures that implement __kernel_map_pages() have this
      function guarded by #ifdef DEBUG_PAGEALLOC and some refuse to map/unmap
      pages when page allocation debugging is disabled at runtime.
      
      As all the users of __kernel_map_pages() were converted to use
      debug_pagealloc_map_pages() it is safe to make it available only when
      DEBUG_PAGEALLOC is set.
      
      Link: https://lkml.kernel.org/r/20201109192128.960-4-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d6ad668
    • K
      x86: mremap speedup - Enable HAVE_MOVE_PUD · be37c98d
      Kalesh Singh 提交于
      HAVE_MOVE_PUD enables remapping pages at the PUD level if both the
      source and destination addresses are PUD-aligned.
      
      With HAVE_MOVE_PUD enabled it can be inferred that there is
      approximately a 13x improvement in performance on x86.  (See data
      below).
      
      ------- Test Results ---------
      
      The following results were obtained using a 5.4 kernel, by remapping
      a PUD-aligned, 1GB sized region to a PUD-aligned destination.
      The results from 10 iterations of the test are given below:
      
      Total mremap times for 1GB data on x86. All times are in nanoseconds.
      
        Control        HAVE_MOVE_PUD
      
        180394         15089
        235728         14056
        238931         25741
        187330         13838
        241742         14187
        177925         14778
        182758         14728
        160872         14418
        205813         15107
        245722         13998
      
        205721.5       15594    <-- Mean time in nanoseconds
      
      A 1GB mremap completion time drops from ~205 microseconds
      to ~15 microseconds on x86. (~13x speed up).
      
      Link: https://lkml.kernel.org/r/20201014005320.2233162-6-kaleshsingh@google.comSigned-off-by: NKalesh Singh <kaleshsingh@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Gavin Shan <gshan@redhat.com>
      Cc: Hassan Naveed <hnaveed@wavecomp.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Minchan Kim <minchan@google.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: SeongJae Park <sjpark@amazon.de>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be37c98d
  22. 01 12月, 2020 1 次提交
  23. 24 11月, 2020 1 次提交
  24. 19 11月, 2020 1 次提交
  25. 17 11月, 2020 1 次提交
  26. 14 11月, 2020 1 次提交
    • S
      ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default · 02a474ca
      Steven Rostedt (VMware) 提交于
      Currently, the only way to get access to the registers of a function via a
      ftrace callback is to set the "FL_SAVE_REGS" bit in the ftrace_ops. But as this
      saves all regs as if a breakpoint were to trigger (for use with kprobes), it
      is expensive.
      
      The regs are already saved on the stack for the default ftrace callbacks, as
      that is required otherwise a function being traced will get the wrong
      arguments and possibly crash. And on x86, the arguments are already stored
      where they would be on a pt_regs structure to use that code for both the
      regs version of a callback, it makes sense to pass that information always
      to all functions.
      
      If an architecture does this (as x86_64 now does), it is to set
      HAVE_DYNAMIC_FTRACE_WITH_ARGS, and this will let the generic code that it
      could have access to arguments without having to set the flags.
      
      This also includes having the stack pointer being saved, which could be used
      for accessing arguments on the stack, as well as having the function graph
      tracer not require its own trampoline!
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      02a474ca
  27. 07 11月, 2020 1 次提交
  28. 31 10月, 2020 1 次提交
    • A
      timekeeping: default GENERIC_CLOCKEVENTS to enabled · 0774a6ed
      Arnd Bergmann 提交于
      Almost all machines use GENERIC_CLOCKEVENTS, so it feels wrong to
      require each one to select that symbol manually.
      
      Instead, enable it whenever CONFIG_LEGACY_TIMER_TICK is disabled as
      a simplification. It should be possible to select both
      GENERIC_CLOCKEVENTS and LEGACY_TIMER_TICK from an architecture now
      and decide at runtime between the two.
      
      For the clockevents arch-support.txt file, this means that additional
      architectures are marked as TODO when they have at least one machine
      that still uses LEGACY_TIMER_TICK, rather than being marked 'ok' when
      at least one machine has been converted. This means that both m68k and
      arm (for riscpc) revert to TODO.
      
      At this point, we could just always enable CONFIG_GENERIC_CLOCKEVENTS
      rather than leaving it off when not needed. I built an m68k
      defconfig kernel (using gcc-10.1.0) and found that this would add
      around 5.5KB in kernel image size:
      
         text	   data	    bss	    dec	    hex	filename
      3861936	1092236	 196656	5150828	 4e986c	obj-m68k/vmlinux-no-clockevent
      3866201	1093832	 196184	5156217	 4ead79	obj-m68k/vmlinux-clockevent
      
      On Arm (MACH_RPC), that difference appears to be twice as large,
      around 11KB on top of an 6MB vmlinux.
      Reviewed-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      0774a6ed
  29. 09 10月, 2020 1 次提交
  30. 06 10月, 2020 1 次提交
    • D
      x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() · ec6347bb
      Dan Williams 提交于
      In reaction to a proposal to introduce a memcpy_mcsafe_fast()
      implementation Linus points out that memcpy_mcsafe() is poorly named
      relative to communicating the scope of the interface. Specifically what
      addresses are valid to pass as source, destination, and what faults /
      exceptions are handled.
      
      Of particular concern is that even though x86 might be able to handle
      the semantics of copy_mc_to_user() with its common copy_user_generic()
      implementation other archs likely need / want an explicit path for this
      case:
      
        On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
        >
        > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
        > >
        > > However now I see that copy_user_generic() works for the wrong reason.
        > > It works because the exception on the source address due to poison
        > > looks no different than a write fault on the user address to the
        > > caller, it's still just a short copy. So it makes copy_to_user() work
        > > for the wrong reason relative to the name.
        >
        > Right.
        >
        > And it won't work that way on other architectures. On x86, we have a
        > generic function that can take faults on either side, and we use it
        > for both cases (and for the "in_user" case too), but that's an
        > artifact of the architecture oddity.
        >
        > In fact, it's probably wrong even on x86 - because it can hide bugs -
        > but writing those things is painful enough that everybody prefers
        > having just one function.
      
      Replace a single top-level memcpy_mcsafe() with either
      copy_mc_to_user(), or copy_mc_to_kernel().
      
      Introduce an x86 copy_mc_fragile() name as the rename for the
      low-level x86 implementation formerly named memcpy_mcsafe(). It is used
      as the slow / careful backend that is supplanted by a fast
      copy_mc_generic() in a follow-on patch.
      
      One side-effect of this reorganization is that separating copy_mc_64.S
      to its own file means that perf no longer needs to track dependencies
      for its memcpy_64.S benchmarks.
      
       [ bp: Massage a bit. ]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: <stable@vger.kernel.org>
      Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
      Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
      ec6347bb
  31. 16 9月, 2020 2 次提交
  32. 09 9月, 2020 2 次提交
  33. 08 9月, 2020 1 次提交