1. 16 11月, 2017 1 次提交
    • P
      sparc64: optimize struct page zeroing · 78c94366
      Pavel Tatashin 提交于
      Add an optimized mm_zero_struct_page(), so struct page's are zeroed
      without calling memset().  We do eight to ten regular stores based on
      the size of struct page.  Compiler optimizes out the conditions of
      switch() statement.
      
      SPARC-M6 with 15T of memory, single thread performance:
      
                                     BASE            FIX  OPTIMIZED_FIX
              bootmem_init   28.440467985s   2.305674818s   2.305161615s
      free_area_init_nodes  202.845901673s 225.343084508s 172.556506560s
                            --------------------------------------------
      Total                 231.286369658s 227.648759326s 174.861668175s
      
      BASE:  current linux
      FIX:   This patch series without "optimized struct page zeroing"
      OPTIMIZED_FIX: This patch series including the current patch.
      
      bootmem_init() is where memory for struct pages is zeroed during
      allocation.  Note, about two seconds in this function is a fixed time:
      it does not increase as memory is increased.
      
      Link: http://lkml.kernel.org/r/20171013173214.27300-11-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78c94366
  2. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  3. 16 8月, 2017 2 次提交
  4. 11 8月, 2017 1 次提交
  5. 10 8月, 2017 2 次提交
  6. 07 4月, 2017 1 次提交
    • T
      sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write() · 9ae34dbd
      Tom Hromatka 提交于
      This commit moves sparc64's prototype of pmd_write() outside
      of the CONFIG_TRANSPARENT_HUGEPAGE ifdef.
      
      In 2013, commit a7b9403f ("sparc64: Encode huge PMDs using PTE
      encoding.") exposed a path where pmd_write() could be called without
      CONFIG_TRANSPARENT_HUGEPAGE defined.  This can result in the panic below.
      
      The diff is awkward to read, but the changes are straightforward.
      pmd_write() was moved outside of #ifdef CONFIG_TRANSPARENT_HUGEPAGE.
      Also, __HAVE_ARCH_PMD_WRITE was defined.
      
      kernel BUG at include/asm-generic/pgtable.h:576!
                    \|/ ____ \|/
                    "@'/ .. \`@"
                    /_| \__/ |_\
                       \__U_/
      oracle_8114_cdb(8114): Kernel bad sw trap 5 [#1]
      CPU: 120 PID: 8114 Comm: oracle_8114_cdb Not tainted
      4.1.12-61.7.1.el6uek.rc1.sparc64 #1
      task: fff8400700a24d60 ti: fff8400700bc4000 task.ti: fff8400700bc4000
      TSTATE: 0000004411e01607 TPC: 00000000004609f8 TNPC: 00000000004609fc Y:
      00000005    Not tainted
      TPC: <gup_huge_pmd+0x198/0x1e0>
      g0: 000000000001c000 g1: 0000000000ef3954 g2: 0000000000000000 g3: 0000000000000001
      g4: fff8400700a24d60 g5: fff8001fa5c10000 g6: fff8400700bc4000 g7: 0000000000000720
      o0: 0000000000bc5058 o1: 0000000000000240 o2: 0000000000006000 o3: 0000000000001c00
      o4: 0000000000000000 o5: 0000048000080000 sp: fff8400700bc6ab1 ret_pc: 00000000004609f0
      RPC: <gup_huge_pmd+0x190/0x1e0>
      l0: fff8400700bc74fc l1: 0000000000020000 l2: 0000000000002000 l3: 0000000000000000
      l4: fff8001f93250950 l5: 000000000113f800 l6: 0000000000000004 l7: 0000000000000000
      i0: fff8400700ca46a0 i1: bd0000085e800453 i2: 000000026a0c4000 i3: 000000026a0c6000
      i4: 0000000000000001 i5: fff800070c958de8 i6: fff8400700bc6b61 i7: 0000000000460dd0
      I7: <gup_pud_range+0x170/0x1a0>
      Call Trace:
       [0000000000460dd0] gup_pud_range+0x170/0x1a0
       [0000000000460e84] get_user_pages_fast+0x84/0x120
       [00000000006f5a18] iov_iter_get_pages+0x98/0x240
       [00000000005fa744] do_direct_IO+0xf64/0x1e00
       [00000000005fbbc0] __blockdev_direct_IO+0x360/0x15a0
       [00000000101f74fc] ext4_ind_direct_IO+0xdc/0x400 [ext4]
       [00000000101af690] ext4_ext_direct_IO+0x1d0/0x2c0 [ext4]
       [00000000101af86c] ext4_direct_IO+0xec/0x220 [ext4]
       [0000000000553bd4] generic_file_read_iter+0x114/0x140
       [00000000005bdc2c] __vfs_read+0xac/0x100
       [00000000005bf254] vfs_read+0x54/0x100
       [00000000005bf368] SyS_pread64+0x68/0x80
      Signed-off-by: NTom Hromatka <tom.hromatka@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ae34dbd
  7. 10 3月, 2017 1 次提交
  8. 02 3月, 2017 1 次提交
  9. 24 2月, 2017 1 次提交
    • N
      sparc64: Multi-page size support · c7d9f77d
      Nitin Gupta 提交于
      Add support for using multiple hugepage sizes simultaneously
      on mainline. Currently, support for 256M has been added which
      can be used along with 8M pages.
      
      Page tables are set like this (e.g. for 256M page):
          VA + (8M * x) -> PA + (8M * x) (sz bit = 256M) where x in [0, 31]
      
      and TSB is set similarly:
          VA + (4M * x) -> PA + (4M * x) (sz bit = 256M) where x in [0, 63]
      
      - Testing
      
      Tested on Sonoma (which supports 256M pages) by running stream
      benchmark instances in parallel: one instance uses 8M pages and
      another uses 256M pages, consuming 48G each.
      
      Boot params used:
      
      default_hugepagesz=256M hugepagesz=256M hugepages=300 hugepagesz=8M
      hugepages=10000
      Signed-off-by: NNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d9f77d
  10. 12 12月, 2016 1 次提交
  11. 30 7月, 2016 1 次提交
  12. 21 5月, 2016 1 次提交
  13. 20 5月, 2016 1 次提交
    • H
      arch: fix has_transparent_hugepage() · fd8cfd30
      Hugh Dickins 提交于
      I've just discovered that the useful-sounding has_transparent_hugepage()
      is actually an architecture-dependent minefield: on some arches it only
      builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
      not, but on some of those (arm and arm64) it then gives the wrong
      answer; and on mips alone it's marked __init, which would crash if
      called later (but so far it has not been called later).
      
      Straighten this out: make it available to all configs, with a sensible
      default in asm-generic/pgtable.h, removing its definitions from those
      arches (arc, arm, arm64, sparc, tile) which are served by the default,
      adding #define has_transparent_hugepage has_transparent_hugepage to
      those (mips, powerpc, s390, x86) which need to override the default at
      runtime, and removing the __init from mips (but maybe that kind of code
      should be avoided after init: set a static variable the first time it's
      called).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arch/arc]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[arch/s390]
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd8cfd30
  14. 21 3月, 2016 1 次提交
  15. 16 1月, 2016 2 次提交
    • M
      arch/sparc/include/asm/pgtable_64.h: add pmd_[dirty|mkclean] for THP · 79cedb8f
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
      support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79cedb8f
    • K
      sparc, thp: remove infrastructure for handling splitting PMDs · 99f1bc01
      Kirill A. Shutemov 提交于
      With new refcounting we don't need to mark PMDs splitting.  Let's drop
      code to handle this.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99f1bc01
  16. 25 6月, 2015 1 次提交
  17. 01 6月, 2015 1 次提交
    • K
      sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE · 494e5b6f
      Khalid Aziz 提交于
      sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE
      
      Bit 9 of TTE is CV (Cacheable in V-cache) on sparc v9 processor while
      the same bit 9 is MCDE (Memory Corruption Detection Enable) on M7
      processor. This creates a conflicting usage of the same bit. Kernel
      sets TTE.cv bit on all pages for sun4v architecture which works well
      for sparc v9 but enables memory corruption detection on M7 processor
      which is not the intent. This patch adds code to determine if kernel
      is running on M7 processor and takes steps to not enable memory
      corruption detection in TTE erroneously.
      Signed-off-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      494e5b6f
  18. 12 2月, 2015 1 次提交
  19. 11 2月, 2015 1 次提交
  20. 11 12月, 2014 1 次提交
  21. 06 10月, 2014 5 次提交
    • D
      sparc64: Kill unnecessary tables and increase MAX_BANKS. · d195b71b
      David S. Miller 提交于
      swapper_low_pmd_dir and swapper_pud_dir are actually completely
      useless and unnecessary.
      
      We just need swapper_pg_dir[].  Naturally the other page table chunks
      will be allocated on an as-needed basis.  Since the kernel actually
      accesses these tables in the PAGE_OFFSET view, there is not even a TLB
      locality advantage of placing them in the kernel image.
      
      Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is
      naturally page aligned.
      
      Increase MAX_BANKS to 1024 in order to handle heavily fragmented
      virtual guests.
      
      Even with this MAX_BANKS increase, the kernel is 20K+ smaller.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      d195b71b
    • D
      sparc64: Adjust vmalloc region size based upon available virtual address bits. · bb4e6e85
      David S. Miller 提交于
      In order to accomodate embedded per-cpu allocation with large numbers
      of cpus and numa nodes, we have to use as much virtual address space
      as possible for the vmalloc region.  Otherwise we can get things like:
      
      PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000
      
      So, once we select a value for PAGE_OFFSET, derive the size of the
      vmalloc region based upon that.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      bb4e6e85
    • D
      sparc64: Increase MAX_PHYS_ADDRESS_BITS to 53. · 7c0fa0f2
      David S. Miller 提交于
      Make sure, at compile time, that the kernel can properly support
      whatever MAX_PHYS_ADDRESS_BITS is defined to.
      
      On M7 chips, use a max_phys_bits value of 49.
      
      Based upon a patch by Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      7c0fa0f2
    • D
      sparc64: Fix physical memory management regressions with large max_phys_bits. · 0dd5b7b0
      David S. Miller 提交于
      If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
      DEBUG_PAGEALLOC stop working because the 3-level page tables only
      can cover up to 43 bits.
      
      Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
      47, several statically allocated tables became enormous.
      
      Compounding this is that we will need to support up to 49 bits of
      physical addressing for M7 chips.
      
      The two tables in question are sparc64_valid_addr_bitmap and
      kpte_linear_bitmap.
      
      The first holds a bitmap, with 1 bit for each 4MB chunk of physical
      memory, indicating whether that chunk actually exists in the machine
      and is valid.
      
      The second table is a set of 2-bit values which tell how large of a
      mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
      chunk of ram in the system.
      
      These tables are huge and take up an enormous amount of the BSS
      section of the sparc64 kernel image.  Specifically, the
      sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
      
      So let's solve the space wastage and the DEBUG_PAGEALLOC problem
      at the same time, by using the kernel page tables (as designed) to
      manage this information.
      
      We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
      and we do this by encoding huge PMDs and PUDs.
      
      On a T4-2 with 256GB of ram the kernel page table takes up 16K with
      DEBUG_PAGEALLOC disabled and 256MB with it enabled.  Furthermore, this
      memory is dynamically allocated at run time rather than coded
      statically into the kernel image.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      0dd5b7b0
    • D
      sparc64: Switch to 4-level page tables. · ac55c768
      David S. Miller 提交于
      This has become necessary with chips that support more than 43-bits
      of physical addressing.
      
      Based almost entirely upon a patch by Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      ac55c768
  22. 19 5月, 2014 1 次提交
  23. 09 5月, 2014 1 次提交
    • D
      sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus. · b18eb2d7
      David S. Miller 提交于
      Access to the TSB hash tables during TLB misses requires that there be
      an atomic 128-bit quad load available so that we fetch a matching TAG
      and DATA field at the same time.
      
      On cpus prior to UltraSPARC-III only virtual address based quad loads
      are available.  UltraSPARC-III and later provide physical address
      based variants which are easier to use.
      
      When we only have virtual address based quad loads available this
      means that we have to lock the TSB into the TLB at a fixed virtual
      address on each cpu when it runs that process.  We can't just access
      the PAGE_OFFSET based aliased mapping of these TSBs because we cannot
      take a recursive TLB miss inside of the TLB miss handler without
      risking running out of hardware trap levels (some trap combinations
      can be deep, such as those generated by register window spill and fill
      traps).
      
      Without huge pages it's working perfectly fine, but when the huge TSB
      got added another chunk of fixed virtual address space was not
      allocated for this second TSB mapping.
      
      So we were mapping both the 8K and 4MB TSBs to the same exact virtual
      address, causing multiple TLB matches which gives undefined behavior.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b18eb2d7
  24. 04 5月, 2014 8 次提交
  25. 19 12月, 2013 1 次提交
    • R
      mm: fix TLB flush race between migration, and change_protection_range · 20841405
      Rik van Riel 提交于
      There are a few subtle races, between change_protection_range (used by
      mprotect and change_prot_numa) on one side, and NUMA page migration and
      compaction on the other side.
      
      The basic race is that there is a time window between when the PTE gets
      made non-present (PROT_NONE or NUMA), and the TLB is flushed.
      
      During that time, a CPU may continue writing to the page.
      
      This is fine most of the time, however compaction or the NUMA migration
      code may come in, and migrate the page away.
      
      When that happens, the CPU may continue writing, through the cached
      translation, to what is no longer the current memory location of the
      process.
      
      This only affects x86, which has a somewhat optimistic pte_accessible.
      All other architectures appear to be safe, and will either always flush,
      or flush whenever there is a valid mapping, even with no permissions
      (SPARC).
      
      The basic race looks like this:
      
      CPU A			CPU B			CPU C
      
      						load TLB entry
      make entry PTE/PMD_NUMA
      			fault on entry
      						read/write old page
      			start migrating page
      			change PTE/PMD to new page
      						read/write old page [*]
      flush TLB
      						reload TLB from new entry
      						read/write new page
      						lose data
      
      [*] the old page may belong to a new user at this point!
      
      The obvious fix is to flush remote TLB entries, by making sure that
      pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
      still be accessible if there is a TLB flush pending for the mm.
      
      This should fix both NUMA migration and compaction.
      
      [mgorman@suse.de: fix build]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20841405
  26. 14 11月, 2013 1 次提交
    • D
      sparc64: Encode huge PMDs using PTE encoding. · a7b9403f
      David S. Miller 提交于
      Now that we have 64-bits for PMDs we can stop using special encodings
      for the huge PMD values, and just put real PTEs in there.
      
      We allocate a _PAGE_PMD_HUGE bit to distinguish between plain PMDs and
      huge ones.  It is the same for both 4U and 4V PTE layouts.
      
      We also use _PAGE_SPECIAL to indicate the splitting state, since a
      huge PMD cannot also be special.
      
      All of the PMD --> PTE translation code disappears, and most of the
      huge PMD bit modifications and tests just degenerate into the PTE
      operations.  In particular USER_PGTABLE_CHECK_PMD_HUGE becomes
      trivial.
      
      As a side effect, normal PMDs don't shift the physical address around.
      This also speeds up the page table walks in the TLB miss paths since
      they don't have to do the shifts any more.
      
      Another non-trivial aspect is that pte_modify() has to be changed
      to preserve the _PAGE_PMD_HUGE bits as well as the page size field
      of the pte.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7b9403f