1. 07 6月, 2017 4 次提交
  2. 18 5月, 2017 1 次提交
  3. 19 4月, 2017 1 次提交
  4. 07 4月, 2017 2 次提交
  5. 28 3月, 2017 3 次提交
  6. 02 3月, 2017 3 次提交
  7. 25 2月, 2017 1 次提交
  8. 24 2月, 2017 5 次提交
  9. 23 2月, 2017 1 次提交
  10. 26 1月, 2017 1 次提交
  11. 25 12月, 2016 1 次提交
  12. 15 11月, 2016 1 次提交
  13. 11 11月, 2016 1 次提交
    • T
      sparc64: Fix find_node warning if numa node cannot be found · 74a5ed5c
      Thomas Tai 提交于
      When booting up LDOM, find_node() warns that a physical address
      doesn't match a NUMA node.
      
      WARNING: CPU: 0 PID: 0 at arch/sparc/mm/init_64.c:835
      find_node+0xf4/0x120 find_node: A physical address doesn't
      match a NUMA node rule. Some physical memory will be
      owned by node 0.Modules linked in:
      
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc3 #4
      Call Trace:
       [0000000000468ba0] __warn+0xc0/0xe0
       [0000000000468c74] warn_slowpath_fmt+0x34/0x60
       [00000000004592f4] find_node+0xf4/0x120
       [0000000000dd0774] add_node_ranges+0x38/0xe4
       [0000000000dd0b1c] numa_parse_mdesc+0x268/0x2e4
       [0000000000dd0e9c] bootmem_init+0xb8/0x160
       [0000000000dd174c] paging_init+0x808/0x8fc
       [0000000000dcb0d0] setup_arch+0x2c8/0x2f0
       [0000000000dc68a0] start_kernel+0x48/0x424
       [0000000000dcb374] start_early_boot+0x27c/0x28c
       [0000000000a32c08] tlb_fixup_done+0x4c/0x64
       [0000000000027f08] 0x27f08
      
      It is because linux use an internal structure node_masks[] to
      keep the best memory latency node only. However, LDOM mdesc can
      contain single latency-group with multiple memory latency nodes.
      
      If the address doesn't match the best latency node within
      node_masks[], it should check for an alternative via mdesc.
      The warning message should only be printed if the address
      doesn't match any node_masks[] nor within mdesc. To minimize
      the impact of searching mdesc every time, the last matched
      mask and index is stored in a variable.
      Signed-off-by: NThomas Tai <thomas.tai@oracle.com>
      Reviewed-by: NChris Hyser <chris.hyser@oracle.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74a5ed5c
  14. 28 10月, 2016 1 次提交
    • D
      sparc64: Handle extremely large kernel TLB range flushes more gracefully. · a74ad5e6
      David S. Miller 提交于
      When the vmalloc area gets fragmented, and because the firmware
      mapping area sits between where modules live and the vmalloc area, we
      can sometimes receive requests for enormous kernel TLB range flushes.
      
      When this happens the cpu just spins flushing billions of pages and
      this triggers the NMI watchdog and other problems.
      
      We took care of this on the TSB side by doing a linear scan of the
      table once we pass a certain threshold.
      
      Do something similar for the TLB flush, however we are limited by
      the TLB flush facilities provided by the different chip variants.
      
      First of all we use an (mostly arbitrary) cut-off of 256K which is
      about 32 pages.  This can be tuned in the future.
      
      The huge range code path for each chip works as follows:
      
      1) On spitfire we flush all non-locked TLB entries using diagnostic
         acceses.
      
      2) On cheetah we use the "flush all" TLB flush.
      
      3) On sun4v/hypervisor we do a TLB context flush on context 0, which
         unlike previous chips does not remove "permanent" or locked
         entries.
      
      We could probably do something better on spitfire, such as limiting
      the flush to kernel TLB entries or even doing range comparisons.
      However that probably isn't worth it since those chips are old and
      the TLB only had 64 entries.
      Reported-by: NJames Clarke <jrtc27@jrtc27.com>
      Tested-by: NJames Clarke <jrtc27@jrtc27.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a74ad5e6
  15. 27 10月, 2016 2 次提交
  16. 26 10月, 2016 2 次提交
  17. 19 10月, 2016 1 次提交
  18. 06 10月, 2016 1 次提交
  19. 28 9月, 2016 3 次提交
    • A
      sparc64: Fix irq stack bootmem allocation. · ebb99a4c
      Atish Patra 提交于
      Currently, irq stack bootmem is allocated for all possible cpus
      before nr_cpus value changes the list of possible cpus. As a result,
      there is unnecessary wastage of bootmemory.
      
      Move the irq stack bootmem allocation so that it happens after
      possible cpu list is modified based on nr_cpus value.
      Signed-off-by: NAtish Patra <atish.patra@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NVijay Kumar <vijay.ac.kumar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebb99a4c
    • M
      sparc64 mm: Fix more TSB sizing issues · 1e953d84
      Mike Kravetz 提交于
      Commit af1b1a9b ("sparc64 mm: Fix base TSB sizing when hugetlb
      pages are used") addressed the difference between hugetlb and THP
      pages when computing TSB sizes.  The following additional issues
      were also discovered while working with the code.
      
      In order to save memory, THP makes use of a huge zero page.  This huge
      zero page does not count against a task's RSS, but it does consume TSB
      entries.  This is similar to hugetlb pages.  Therefore, count huge
      zero page entries in hugetlb_pte_count.
      
      Accounting of THP pages is done in the routine set_pmd_at().
      Unfortunately, this does not catch the case where a THP page is split.
      To handle this case, decrement the count in pmdp_invalidate().
      pmdp_invalidate is only called when splitting a THP.  However, 'sanity
      checks' are added in case it is ever called for other purposes.
      
      A more general issue exists with HPAGE_SIZE accounting.
      hugetlb_pte_count tracks the number of HPAGE_SIZE (8M) pages.  This
      value is used to size the TSB for HPAGE_SIZE pages.  However,
      each HPAGE_SIZE page consists of two REAL_HPAGE_SIZE (4M) pages.
      The TSB contains an entry for each REAL_HPAGE_SIZE page.  Therefore,
      the number of REAL_HPAGE_SIZE pages should be used to size the huge
      page TSB.  A new compile time constant REAL_HPAGE_PER_HPAGE is used
      to multiply hugetlb_pte_count before sizing the TSB.
      
      Changes from V1
      - Fixed build issue if hugetlb or THP not configured
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e953d84
    • P
      sparc64: fix section mismatch in find_numa_latencies_for_group · bdf2f59e
      Paul Gortmaker 提交于
      To fix:
      
        WARNING: vmlinux.o(.text.unlikely+0x580): Section mismatch in
        reference from the function find_numa_latencies_for_group() to the
        function .init.text:find_mlgroup()
      
        The function find_numa_latencies_for_group() references the
        function __init find_mlgroup().  This is often because
        find_numa_latencies_for_group lacks a __init annotation or the
        annotation of find_mlgroup is wrong.
      
      It turns out find_numa_latencies_for_group is only called from:
          static int __init numa_parse_mdesc(void)
      and hence we can tag find_numa_latencies_for_group with __init.
      
      In doing so we see that find_best_numa_node_for_mlgroup is only
      called from within __init and hence can also be marked with __init.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
      Cc: Chris Hyser <chris.hyser@oracle.com>
      Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdf2f59e
  20. 30 7月, 2016 1 次提交
  21. 29 7月, 2016 1 次提交
    • M
      sparc64 mm: Fix base TSB sizing when hugetlb pages are used · af1b1a9b
      Mike Kravetz 提交于
      do_sparc64_fault() calculates both the base and huge page RSS sizes and
      uses this information in calls to tsb_grow().  The calculation for base
      page TSB size is not correct if the task uses hugetlb pages.  hugetlb
      pages are not accounted for in RSS, therefore the call to get_mm_rss(mm)
      does not include hugetlb pages.  However, the number of pages based on
      huge_pte_count (which does include hugetlb pages) is subtracted from
      this value.  This will result in an artificially small and often negative
      RSS calculation.  The base TSB size is then often set to max_tsb_size
      as the passed RSS is unsigned, so a negative value looks really big.
      
      THP pages are also accounted for in huge_pte_count, and THP pages are
      accounted for in RSS so the calculation in do_sparc64_fault() is correct
      if a task only uses THP pages.
      
      A single huge_pte_count is not sufficient for TSB sizing if both hugetlb
      and THP pages can be used.  Instead of a single counter, use two:  one
      for hugetlb and one for THP.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af1b1a9b
  22. 27 7月, 2016 1 次提交
  23. 25 6月, 2016 1 次提交
    • M
      tree wide: get rid of __GFP_REPEAT for order-0 allocations part I · 32d6bd90
      Michal Hocko 提交于
      This is the third version of the patchset previously sent [1].  I have
      basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
      rid of superfluous gfp flags" which went through dm tree.  I am sending
      it now because it is tree wide and chances for conflicts are reduced
      considerably when we want to target rc2.  I plan to send the next step
      and rename the flag and move to a better semantic later during this
      release cycle so we will have a new semantic ready for 4.8 merge window
      hopefully.
      
      Motivation:
      
      While working on something unrelated I've checked the current usage of
      __GFP_REPEAT in the tree.  It seems that a majority of the usage is and
      always has been bogus because __GFP_REPEAT has always been about costly
      high order allocations while we are using it for order-0 or very small
      orders very often.  It seems that a big pile of them is just a
      copy&paste when a code has been adopted from one arch to another.
      
      I think it makes some sense to get rid of them because they are just
      making the semantic more unclear.  Please note that GFP_REPEAT is
      documented as
      
      * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
      
      * _might_ fail.  This depends upon the particular VM implementation.
        while !costly requests have basically nofail semantic.  So one could
        reasonably expect that order-0 request with __GFP_REPEAT will not loop
        for ever.  This is not implemented right now though.
      
      I would like to move on with __GFP_REPEAT and define a better semantic
      for it.
      
        $ git grep __GFP_REPEAT origin/master | wc -l
        111
        $ git grep __GFP_REPEAT | wc -l
        36
      
      So we are down to the third after this patch series.  The remaining
      places really seem to be relying on __GFP_REPEAT due to large allocation
      requests.  This still needs some double checking which I will do later
      after all the simple ones are sorted out.
      
      I am touching a lot of arch specific code here and I hope I got it right
      but as a matter of fact I even didn't compile test for some archs as I
      do not have cross compiler for them.  Patches should be quite trivial to
      review for stupid compile mistakes though.  The tricky parts are usually
      hidden by macro definitions and thats where I would appreciate help from
      arch maintainers.
      
      [1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org
      
      This patch (of 19):
      
      __GFP_REPEAT has a rather weak semantic but since it has been introduced
      around 2.6.12 it has been ignored for low order allocations.  Yet we
      have the full kernel tree with its usage for apparently order-0
      allocations.  This is really confusing because __GFP_REPEAT is
      explicitly documented to allow allocation failures which is a weaker
      semantic than the current order-0 has (basically nofail).
      
      Let's simply drop __GFP_REPEAT from those places.  This would allow to
      identify place which really need allocator to retry harder and formulate
      a more specific semantic for what the flag is supposed to do actually.
      
      Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Crispin <blogic@openwrt.org>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32d6bd90
  24. 26 5月, 2016 1 次提交