提交 · 5557c766abad25acc8091ccb9641b96e3b3da06f · openeuler / Kernel

15 5月, 2019 15 次提交

mm: use mm_zero_struct_page from SPARC on all 64b architectures · 5470dea4

由 Alexander Duyck 提交于 5月 13, 2019

Patch series "Deferred page init improvements", v7.

This patchset is essentially a refactor of the page initialization logic
that is meant to provide for better code reuse while providing a
significant improvement in deferred page initialization performance.

In my testing on an x86_64 system with 384GB of RAM I have seen the
following.  In the case of regular memory initialization the deferred init
time was decreased from 3.75s to 1.38s on average.  This amounts to a 172%
improvement for the deferred memory initialization performance.

I have called out the improvement observed with each patch.

This patch (of 4):

Use the same approach that was already in use on Sparc on all the
architectures that support a 64b long.

This is mostly motivated by the fact that 7 to 10 store/move instructions
are likely always going to be faster than having to call into a function
that is not specialized for handling page init.

An added advantage to doing it this way is that the compiler can get away
with combining writes in the __init_single_page call.  As a result the
memset call will be reduced to only about 4 write operations, or at least
that is what I am seeing with GCC 6.2 as the flags, LRU pointers, and
count/mapcount seem to be cancelling out at least 4 of the 8 assignments
on my system.

One change I had to make to the function was to reduce the minimum page
size to 56 to support some powerpc64 configurations.

This change should introduce no change on SPARC since it already had this
code.  In the case of x86_64 I saw a reduction from 3.75s to 2.80s when
initializing 384GB of RAM per node.  Pavel Tatashin tested on a system
with Broadcom's Stingray CPU and 48GB of RAM and found that
__init_single_page() takes 19.30ns / 64-byte struct page before this patch
and with this patch it takes 17.33ns / 64-byte struct page.  Mike Rapoport
ran a similar test on a OpenPower (S812LC 8348-21C) with Power8 processor
and 128GB or RAM.  His results per 64-byte struct page were 4.68ns before,
and 4.59ns after this patch.

Link: http://lkml.kernel.org/r/20190405221213.12227.9392.stgit@localhost.localdomainSigned-off-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <yi.z.zhang@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5470dea4

hugetlb: allow to free gigantic pages regardless of the configuration · 4eb0716e

由 Alexandre Ghiti 提交于 5月 13, 2019

On systems without CONTIG_ALLOC activated but that support gigantic pages,
boottime reserved gigantic pages can not be freed at all.  This patch
simply enables the possibility to hand back those pages to memory
allocator.

Link: http://lkml.kernel.org/r/20190327063626.18421-5-alex@ghiti.frSigned-off-by: NAlexandre Ghiti <alex@ghiti.fr>
Acked-by: David S. Miller <davem@davemloft.net> [sparc]
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4eb0716e

mm: simplify MEMORY_ISOLATION && COMPACTION || CMA into CONTIG_ALLOC · 8df995f6

由 Alexandre Ghiti 提交于 5月 13, 2019

This condition allows to define alloc_contig_range, so simplify it into a
more accurate naming.

Link: http://lkml.kernel.org/r/20190327063626.18421-4-alex@ghiti.frSigned-off-by: NAlexandre Ghiti <alex@ghiti.fr>
Suggested-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8df995f6

sparc: advertise gigantic page support · b53f4569

由 Alexandre Ghiti 提交于 5月 13, 2019

sparc actually supports gigantic pages and selecting
ARCH_HAS_GIGANTIC_PAGE allows it to allocate and free gigantic pages at
runtime.

sparc allows configuration such as huge pages of 16GB, pages of 8KB and
MAX_ORDER = 13 (default): HPAGE_SHIFT (34) - PAGE_SHIFT (13) = 21 >=
MAX_ORDER (13)

Link: http://lkml.kernel.org/r/20190327063626.18421-3-alex@ghiti.frSigned-off-by: NAlexandre Ghiti <alex@ghiti.fr>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b53f4569

sh: advertise gigantic page support · a861bbce

由 Alexandre Ghiti 提交于 5月 13, 2019

Patch series "Fix free/allocation of runtime gigantic pages", v8.

This series fixes sh and sparc that did not advertise their gigantic page
support and then were not able to allocate and free those pages at
runtime.  It renames MEMORY_ISOLATION && COMPACTION || CMA condition into
the more accurate CONTIG_ALLOC, since it allows the definition of
alloc_contig_range function.

Finally, it then fixes the wrong definition of ARCH_HAS_GIGANTIC_PAGE
config that, without MEMORY_ISOLATION && COMPACTION || CMA defined, did
not allow architectures to free boottime allocated gigantic pages although
unrelated.

This patch (of 4):

sh actually supports gigantic pages and selecting ARCH_HAS_GIGANTIC_PAGE
allows it to allocate and free gigantic pages at runtime.

At least sdk7786_defconfig exposes such a configuration with huge pages of
64MB, pages of 4KB and MAX_ORDER = 11: HPAGE_SHIFT (26) - PAGE_SHIFT (12)
= 14 >= MAX_ORDER (11)

Link: http://lkml.kernel.org/r/20190327063626.18421-2-alex@ghiti.frSigned-off-by: NAlexandre Ghiti <alex@ghiti.fr>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a861bbce

riscv: switch over to generic free_initmem() · 0d7b4a60

由 Mike Rapoport 提交于 5月 13, 2019

The riscv version of free_initmem() differs from the generic one only in
that it sets the freed memory to zero.

Make ricsv use the generic version and poison the freed memory.

Link: http://lkml.kernel.org/r/1550515285-17446-5-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Reviewed-by: NPalmer Dabbelt <palmer@sifive.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0d7b4a60

init: free_initmem: poison freed init memory · f4039999

由 Mike Rapoport 提交于 5月 13, 2019

Various architectures including x86 poison the freed init memory. Do the
same in the generic free_initmem implementation and switch sparc32
architecture that is identical to the generic code over to it now.

Link: http://lkml.kernel.org/r/1550515285-17446-4-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f4039999

hexagon: switch over to generic free_initmem() · 522c9919

由 Mike Rapoport 提交于 5月 13, 2019

hexagon implementation of free_initmem() is currently empty and marked
with comment

 * Todo:  free pages between __init_begin and __init_end; possibly
 * some devtree related stuff as well.

Switch it to the generic implementation.

Link: http://lkml.kernel.org/r/1550515285-17446-3-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

522c9919

init: provide a generic free_initmem implementation · 997aef68

由 Mike Rapoport 提交于 5月 13, 2019

Patch series "provide a generic free_initmem implementation", v2.

Many architectures implement free_initmem() in exactly the same or very
similar way: they wrap the call to free_initmem_default() with sometimes
different 'poison' parameter.

These patches switch those architectures to use a generic implementation
that does free_initmem_default(POISON_FREE_INITMEM).

This was inspired by Christoph's patches for free_initrd_mem [1] and I
shamelessly copied changelog entries from his patches :)

[1] https://lore.kernel.org/lkml/20190213174621.29297-1-hch@lst.de/

This patch (of 2):

For most architectures free_initmem just a wrapper for the same
free_initmem_default(-1) call.  Provide that as a generic implementation
marked __weak.

Link: http://lkml.kernel.org/r/1550515285-17446-2-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

997aef68

initramfs: poison freed initrd memory · f94f7434

由 Christoph Hellwig 提交于 5月 13, 2019

Various architectures including x86 poison the freed initrd memory.  Do
the same in the generic free_initrd_mem implementation and switch a few
more architectures that are identical to the generic code over to it now.

Link: http://lkml.kernel.org/r/20190213174621.29297-9-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Steven Price <steven.price@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f94f7434

initramfs: provide a generic free_initrd_mem implementation · 4afd58e1

由 Christoph Hellwig 提交于 5月 13, 2019

For most architectures free_initrd_mem just expands to the same
free_reserved_area call.  Provide that as a generic implementation marked
__weak.

Link: http://lkml.kernel.org/r/20190213174621.29297-8-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Acked-by: NMike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Steven Price <steven.price@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4afd58e1

initramfs: move the legacy keepinitrd parameter to core code · d8ae8a37

由 Christoph Hellwig 提交于 5月 13, 2019

No need to handle the freeing disable in arch code when we already have a
core hook (and a different name for the option) for it.

Link: http://lkml.kernel.org/r/20190213174621.29297-7-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Acked-by: NMike Rapoport <rppt@linux.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Steven Price <steven.price@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8ae8a37

mm/gup: change GUP fast to use flags rather than a write 'bool' · 73b0140b

由 Ira Weiny 提交于 5月 13, 2019

To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted.  This breaks the current
GUP call site convention of having the returned pages be the final
parameter.  So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73b0140b

mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM · 932f4a63

由 Ira Weiny 提交于 5月 13, 2019

Pach series "Add FOLL_LONGTERM to GUP fast and use it".

HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
advantages.  These pages can be held for a significant time.  But
get_user_pages_fast() does not protect against mapping FS DAX pages.

Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
retains the performance while also adding the FS DAX checks.  XDP has also
shown interest in using this functionality.[1]

In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
and remove the specialized get_user_pages_longterm call.

[1] https://lkml.org/lkml/2019/3/19/939

"longterm" is a relative thing and at this point is probably a misnomer.
This is really flagging a pin which is going to be given to hardware and
can't move.  I've thought of a couple of alternative names but I think we
have to settle on if we are going to use FL_LAYOUT or something else to
solve the "longterm" problem.  Then I think we can change the flag to a
better name.

Secondly, it depends on how often you are registering memory.  I have
spoken with some RDMA users who consider MR in the performance path...
For the overall application performance.  I don't have the numbers as the
tests for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an aside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

This patch (of 7):

This patch starts a series which aims to support FOLL_LONGTERM in
get_user_pages_fast().  Some callers who would like to do a longterm (user
controlled pin) of pages with the fast variant of GUP for performance
purposes.

Rather than have a separate get_user_pages_longterm() call, introduce
FOLL_LONGTERM and change the longterm callers to use it.

This patch does not change any functionality.  In the short term
"longterm" or user controlled pins are unsafe for Filesystems and FS DAX
in particular has been blocked.  However, callers of get_user_pages_fast()
were not "protected".

FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
requires vmas to determine if DAX is in use.

NOTE: In merging with the CMA changes we opt to change the
get_user_pages() call in check_and_migrate_cma_pages() to a call of
__get_user_pages_locked() on the newly migrated pages.  This makes the
code read better in that we are calling __get_user_pages_locked() on the
pages before and after a potential migration.

As a side affect some of the interfaces are cleaned up but this is not the
primary purpose of the series.

In review[1] it was asked:

<quote>
> This I don't get - if you do lock down long term mappings performance
> of the actual get_user_pages call shouldn't matter to start with.
>
> What do I miss?

A couple of points.

First "longterm" is a relative thing and at this point is probably a
misnomer.  This is really flagging a pin which is going to be given to
hardware and can't move.  I've thought of a couple of alternative names
but I think we have to settle on if we are going to use FL_LAYOUT or
something else to solve the "longterm" problem.  Then I think we can
change the flag to a better name.

Second, It depends on how often you are registering memory.  I have spoken
with some RDMA users who consider MR in the performance path...  For the
overall application performance.  I don't have the numbers as the tests
for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an asside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

</quote>

[1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965

[ira.weiny@intel.com: v3]
  Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

932f4a63

arch/sh/boards/mach-dreamcast/irq.c: Remove duplicate header · e602b26c

由 Sabyasachi Gupta 提交于 5月 13, 2019

Remove linux/irq.h which is included more than once.

Link: http://lkml.kernel.org/r/5c8682ef.1c69fb81.5a1ea.2e7f@mx.google.comSigned-off-by: NSabyasachi Gupta <sabyasachi.linux@gmail.com>
Acked-by: NSouptick Joarder <jrdr.linux@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e602b26c

10 5月, 2019 2 次提交

sparc64: simplify reduce_memory() function · f4d9a23d

由 Mike Rapoport 提交于 2月 12, 2019

The reduce_memory() function clampls the available memory to a limit
defined by the "mem=" command line parameter. It takes into account the
amount of already reserved memory and excludes it from the limit
calculations.

Rather than traverse memblocks and remove them by hand, use
memblock_reserved_size() to account the reserved memory and
memblock_enforce_memory_limit() to clamp the available memory.
Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4d9a23d

sparc: use struct_size() in kzalloc() · bc0025b6

由 Gustavo A. R. Silva 提交于 1月 08, 2019

One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc0025b6

09 5月, 2019 12 次提交

powerpc/64s: Use early_mmu_has_feature() in set_kuap() · 8150a153

由 Michael Ellerman 提交于 5月 08, 2019

When implementing the KUAP support on Radix we fixed one case where
mmu_has_feature() was being called too early in boot via
__put_user_size().

However since then some new code in linux-next has created a new path
via which we can end up calling mmu_has_feature() too early.

On P9 this leads to crashes early in boot if we have both PPC_KUAP and
CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG enabled. Our early boot code
calls printk() which calls probe_kernel_read(), that does a
__copy_from_user_inatomic() which calls into set_kuap() and that uses
mmu_has_feature().

At that point in boot we haven't patched MMU features yet so the debug
code in mmu_has_feature() complains, and calls printk(). At that point
we recurse, eg:

  ...
  dump_stack+0xdc
  probe_kernel_read+0x1a4
  check_pointer+0x58
  ...
  printk+0x40
  dump_stack_print_info+0xbc
  dump_stack+0x8
  probe_kernel_read+0x1a4
  probe_kernel_read+0x19c
  check_pointer+0x58
  ...
  printk+0x40
  cpufeatures_process_feature+0xc8
  scan_cpufeatures_subnodes+0x380
  of_scan_flat_dt_subnodes+0xb4
  dt_cpu_ftrs_scan_callback+0x158
  of_scan_flat_dt+0xf0
  dt_cpu_ftrs_scan+0x3c
  early_init_devtree+0x360
  early_setup+0x9c

And so on for infinity, symptom is a dead system.

Even more fun is what happens when using the hash MMU (ie. p8 or p9
with Radix disabled), and when we don't have
CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG enabled. With the debug disabled
we don't check if static keys have been initialised, we just rely on
the jump label. But the jump label defaults to true so we just whack
the AMR even though Radix is not enabled.

Clearing the AMR is fine, but after we've done the user copy we write
(0b11 << 62) into AMR. When using hash that makes all pages with key
zero no longer readable or writable. All kernel pages implicitly have
key zero, and so all of a sudden the kernel can't read or write any of
its memory. Again dead system.

In the medium term we have several options for fixing this.
probe_kernel_read() doesn't need to touch AMR at all, it's not doing a
user access after all, but it uses __copy_from_user_inatomic() just
because it's easy, we could fix that.

It would also be safe to default to not writing to the AMR during
early boot, until we've detected features. But it's not clear that
flipping all the MMU features to static_key_false won't introduce
other bugs.

But for now just switch to early_mmu_has_feature() in set_kuap(), that
avoids all the problems with jump labels. It adds the overhead of a
global lookup and test, but that's probably trivial compared to the
writes to the AMR anyway.

Fixes: 890274c2 ("powerpc/64s: Implement KUAP for Radix MMU")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NRussell Currey <ruscur@russell.cc>

8150a153

sparc/iommu: merge iommu_get_one and __sbus_iommu_map_page · 376b1371

由 Christoph Hellwig 提交于 4月 16, 2019

There is only one caller of iommu_get_one left, so merge it into
that one to clean things up a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

376b1371

sparc/iommu: use __sbus_iommu_map_page to implement the map_sg path · edb1f072

由 Christoph Hellwig 提交于 4月 16, 2019

This means we handle > PAGE_SIZE offsets fine, and grow the size check
so far only performed in the map_page path.  We lose the optimization
to not double flush a page if it apears in multiple consecutive SG list
entries.  But at least for block I/O those don't happen anymore since
we properly merge in higher layers anyway.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edb1f072

sparc/iommu: fix __sbus_iommu_map_page for highmem pages · 7e996890

由 Christoph Hellwig 提交于 4月 16, 2019

__sbus_iommu_map_page currently assumes all pages are mapped into the
kernel direct mapping.  Switch to using physical address instead of
virtual ones for all the normal mapping operations, and only use
the virtual addresses for cache flushing when not operating on
a highmem page.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e996890

sparc/iommu: move per-page flushing into __sbus_iommu_map_page · 8668b38c

由 Christoph Hellwig 提交于 4月 16, 2019

This prepares for reusing __sbus_iommu_map_page in the map_sg path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8668b38c

sparc/iommu: pass a physical address to iommu_get_one · b8205942

由 Christoph Hellwig 提交于 4月 16, 2019

No need for the page structure, just the paddr / pfn.  This is
going to simplify fixes to the callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8205942

sparc/iommu: create a common helper for map_sg · ff5cbec0

由 Christoph Hellwig 提交于 4月 16, 2019

Share the code for the global and per-page flush map_sg loops using a
simple bool parameter to disable the per-page flush for the former
variant.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff5cbec0

sparc/iommu: merge iommu_release_one and sbus_iommu_unmap_page · f25b23bc

由 Christoph Hellwig 提交于 4月 16, 2019

There is only one caller of iommu_release_one left, so merge it into
that one to clean things up a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f25b23bc

sparc/iommu: use sbus_iommu_unmap_page in sbus_iommu_unmap_sg · a7fce1f7

由 Christoph Hellwig 提交于 4月 16, 2019

Use the page-level helper instead of duplicating the logic, while also
fixing the incorrect handling of larger than page sized offsets in
the sg variant.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7fce1f7

sparc/iommu: use !PageHighMem to check if a page has a kernel mapping · 031abf0b

由 Christoph Hellwig 提交于 4月 16, 2019

This deobsfucates the check a bit, and prepares for future changes.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

031abf0b

sparc: vdso: add FORCE to the build rule of %.so · 269fe565

由 Masahiro Yamada 提交于 4月 03, 2019

$(call if_changed,...) must have FORCE as a prerequisite.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

269fe565

arch:sparc:kernel/uprobes.c : Remove duplicate header · a95a9e5f

由 Jagadeesh Pagadala 提交于 3月 28, 2019

Remove duplicate header which is included twice.
Signed-off-by: NJagadeesh Pagadala <jagdsh.linux@gmail.com>
Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a95a9e5f

08 5月, 2019 11 次提交

um: irq: don't set the chip for all irqs · 1987b1b8

由 Bartosz Golaszewski 提交于 4月 11, 2019

Setting a chip for an interrupt marks it as allocated. Since UM doesn't
support dynamic interrupt numbers (yet), it means we cannot simply
increase NR_IRQS and then use the free irqs between LAST_IRQ and NR_IRQS
with gpio-mockup or iio testing drivers as irq_alloc_descs() will fail
after not being able to neither find an unallocated range of interrupts
nor expand the range.

Only call irq_set_chip_and_handler() for irqs until LAST_IRQ.
Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
Acked-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>

1987b1b8

um: define set_pte_at() as a static inline function, not a macro · ea70d791

由 Bartosz Golaszewski 提交于 4月 11, 2019

When defined as macro, the mm argument is unused and subsequently the
variable passed as mm is considered unused by the compiler. This fixes
a build warning.
Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
Acked-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>

ea70d791

um: remove uses of variable length arrays · 0d4e5ac7

由 Bartosz Golaszewski 提交于 4月 11, 2019

While the affected code is run in user-mode, the build still warns
about it. Convert all uses of VLA to dynamic allocations.
Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>

0d4e5ac7

um: remove unused variable · 4b6b4c90

由 Bartosz Golaszewski 提交于 4月 11, 2019

The buf variable is unused. Remove it.
Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
Acked-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>

4b6b4c90

uml: fix a boot splat wrt use of cpu_all_mask · 689a5860

由 Maciej Żenczykowski 提交于 4月 10, 2019

Memory: 509108K/542612K available (3835K kernel code, 919K rwdata, 1028K rodata, 129K init, 211K bss, 33504K reserved, 0K cma-reserved)
NR_IRQS: 15
clocksource: timer: mask: 0xffffffffffffffff max_cycles: 0x1cd42e205, max_idle_ns: 881590404426 ns
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/time/clockevents.c:458 clockevents_register_device+0x72/0x140
posix-timer cpumask == cpu_all_mask, using cpu_possible_mask instead
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc4-00048-ged79cc87 #4
Stack:
 604ebda0 603c5370 604ebe20 6046fd17
 00000000 6006fcbb 604ebdb0 603c53b5
 604ebe10 6003bfc4 604ebdd0 9000001ca
Call Trace:
 [<6006fcbb>] ? printk+0x0/0x94
 [<60083160>] ? clockevents_register_device+0x72/0x140
 [<6001f16e>] show_stack+0x13b/0x155
 [<603c5370>] ? dump_stack_print_info+0xe2/0xeb
 [<6006fcbb>] ? printk+0x0/0x94
 [<603c53b5>] dump_stack+0x2a/0x2c
 [<6003bfc4>] __warn+0x10e/0x13e
 [<60070320>] ? vprintk_func+0xc8/0xcf
 [<60030fd6>] ? block_signals+0x0/0x16
 [<6006fcbb>] ? printk+0x0/0x94
 [<6003c08b>] warn_slowpath_fmt+0x97/0x99
 [<600311a1>] ? set_signals+0x0/0x3f
 [<6003bff4>] ? warn_slowpath_fmt+0x0/0x99
 [<600842cb>] ? tick_oneshot_mode_active+0x44/0x4f
 [<60030fd6>] ? block_signals+0x0/0x16
 [<6006fcbb>] ? printk+0x0/0x94
 [<6007d2d5>] ? __clocksource_select+0x20/0x1b1
 [<60030fd6>] ? block_signals+0x0/0x16
 [<6006fcbb>] ? printk+0x0/0x94
 [<60083160>] clockevents_register_device+0x72/0x140
 [<60031192>] ? get_signals+0x0/0xf
 [<60030fd6>] ? block_signals+0x0/0x16
 [<6006fcbb>] ? printk+0x0/0x94
 [<60002eec>] um_timer_setup+0xc8/0xca
 [<60001b59>] start_kernel+0x47f/0x57e
 [<600035bc>] start_kernel_proc+0x49/0x4d
 [<6006c483>] ? kmsg_dump_register+0x82/0x8a
 [<6001de62>] new_thread_handler+0x81/0xb2
 [<60003571>] ? kmsg_dumper_stdout_init+0x1a/0x1c
 [<60020c75>] uml_finishsetup+0x54/0x59

random: get_random_bytes called from init_oops_id+0x27/0x34 with crng_init=0
---[ end trace 00173d0117a88acb ]---
Calibrating delay loop... 6941.90 BogoMIPS (lpj=34709504)
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: linux-um@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NRichard Weinberger <richard@nod.at>

689a5860

um: Do not unlock mutex that is not hold. · 9ca55299

由 Daniel Walter 提交于 4月 02, 2019

Return error instead of trying to unlock a mutex that is not hold.
Signed-off-by: NDaniel Walter <dwalter@google.com>
Reviewed-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
Acked-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>

9ca55299

arch: um: drivers: Kconfig: pedantic formatting · 75f24f78

由 Enrico Weigelt, metux IT consult 提交于 3月 07, 2019

Formatting of Kconfig files doesn't look so pretty, so just
take damp cloth and clean it up. Just indention changes.
Signed-off-by: NEnrico Weigelt, metux IT consult <info@metux.net>
Signed-off-by: NRichard Weinberger <richard@nod.at>

75f24f78

arch: um: Kconfig: pedantic indention cleanups · 37606596

由 Enrico Weigelt, metux IT consult 提交于 3月 06, 2019

Formatting of Kconfig files doesn't look so pretty, so just
take damp cloth and clean it up.
Signed-off-by: NEnrico Weigelt, metux IT consult <info@metux.net>
Signed-off-by: NRichard Weinberger <richard@nod.at>

37606596

um: Revert to using stack for pt_regs in signal handling · 5c2ffce1

由 Anton Ivanov 提交于 1月 04, 2019

Reverts commit b6024b21 and
adjusts default stack sizing to cope with larger size of
floating point save registers on the newer Intel CPUs.

b6024b21 replaced storing the
register state on the stack with kmalloc-ed storage. That has
a number of issues and a panic if that fails.
    1. kmalloc/ATOMIC can fail. There was a latent hard crash
in all interrupt and fault handling as a result.
    2. kmalloc in the interrupt path introduces a considerable
performance penalty for networking ~ 14% on iperf.

This commit restores uml to a stable state until a better
solution is found.
Signed-off-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>

5c2ffce1

xtensa: implement initialize_cacheattr for MPU cores · a5944195

由 Max Filippov 提交于 5月 06, 2019

Use CONFIG_MEMMAP_CACHEATTR to initialize MPU as described in the Xtensa
LSP RM document. Coalesce adjacent regions with the same cacheattr.
Update Kconfig help text.
Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>

a5944195

xtensa: add exclusive atomics support · f7c34874

由 Max Filippov 提交于 12月 20, 2018

Implement atomic primitives using exclusive access opcodes available in
the recent xtensa cores.
Since l32ex/s32ex don't have any memory ordering guarantees don't define
__smp_mb__before_atomic/__smp_mb__after_atomic to make them use memw.
Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>

f7c34874

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功