1. 07 6月, 2017 5 次提交
    • P
      sparc64: new context wrap · a0582f26
      Pavel Tatashin 提交于
      The current wrap implementation has a race issue: it is called outside of
      the ctx_alloc_lock, and also does not wait for all CPUs to complete the
      wrap.  This means that a thread can get a new context with a new version
      and another thread might still be running with the same context. The
      problem is especially severe on CPUs with shared TLBs, like sun4v. I used
      the following test to very quickly reproduce the problem:
      - start over 8K processes (must be more than context IDs)
      - write and read values at a  memory location in every process.
      
      Very quickly memory corruptions start happening, and what we read back
      does not equal what we wrote.
      
      Several approaches were explored before settling on this one:
      
      Approach 1:
      Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
      every process to complete the wrap. (Note: every CPU must WAIT before
      leaving smp_new_mmu_context_version_client() until every one arrives).
      
      This approach ends up with deadlocks, as some threads own locks which other
      threads are waiting for, and they never receive softint until these threads
      exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
      deadlock happens.
      
      Approach 2:
      Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
      into C code, and issue new versions to every CPU.
      This approach adds some overhead to runtime: in switch_mm() we must add
      some checks to make sure that versions have not changed due to wrap while
      we were loading the new secondary context. (could be protected by PSTATE_IE
      but that degrades performance as on M7 and older CPUs as it takes 50 cycles
      for each access). Also, we still need a global per-cpu array of MMs to know
      where we need to load new contexts, otherwise we can change context to a
      thread that is going way (if we received mondo between switch_mm() and
      switch_to() time). Finally, there are some issues with window registers in
      rtrap() when context IDs are changed during CPU mondo time.
      
      The approach in this patch is the simplest and has almost no impact on
      runtime.  We use the array with mm's where last secondary contexts were
      loaded onto CPUs and bump their versions to the new generation without
      changing context IDs. If a new process comes in to get a context ID, it
      will go through get_new_mmu_context() because of version mismatch. But the
      running processes do not need to be interrupted. And wrap is quicker as we
      do not need to xcall and wait for everyone to receive and complete wrap.
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0582f26
    • P
      sparc64: add per-cpu mm of secondary contexts · 7a5b4bbf
      Pavel Tatashin 提交于
      The new wrap is going to use information from this array to figure out
      mm's that currently have valid secondary contexts setup.
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a5b4bbf
    • P
      sparc64: redefine first version · c4415235
      Pavel Tatashin 提交于
      CTX_FIRST_VERSION defines the first context version, but also it defines
      first context. This patch redefines it to only include the first context
      version.
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4415235
    • P
      sparc64: reset mm cpumask after wrap · 58897485
      Pavel Tatashin 提交于
      After a wrap (getting a new context version) a process must get a new
      context id, which means that we would need to flush the context id from
      the TLB before running for the first time with this ID on every CPU. But,
      we use mm_cpumask to determine if this process has been running on this CPU
      before, and this mask is not reset after a wrap. So, there are two possible
      fixes for this issue:
      
      1. Clear mm cpumask whenever mm gets a new context id
      2. Unconditionally flush context every time process is running on a CPU
      
      This patch implements the first solution
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58897485
    • L
      sparc/mm/hugepages: Fix setup_hugepagesz for invalid values. · f322980b
      Liam R. Howlett 提交于
      hugetlb_bad_size needs to be called on invalid values.  Also change the
      pr_warn to a pr_err to better align with other platforms.
      Signed-off-by: NLiam R. Howlett <Liam.Howlett@Oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f322980b
  2. 28 3月, 2017 2 次提交
  3. 24 2月, 2017 4 次提交
  4. 25 12月, 2016 1 次提交
  5. 15 11月, 2016 1 次提交
  6. 11 11月, 2016 1 次提交
    • T
      sparc64: Fix find_node warning if numa node cannot be found · 74a5ed5c
      Thomas Tai 提交于
      When booting up LDOM, find_node() warns that a physical address
      doesn't match a NUMA node.
      
      WARNING: CPU: 0 PID: 0 at arch/sparc/mm/init_64.c:835
      find_node+0xf4/0x120 find_node: A physical address doesn't
      match a NUMA node rule. Some physical memory will be
      owned by node 0.Modules linked in:
      
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc3 #4
      Call Trace:
       [0000000000468ba0] __warn+0xc0/0xe0
       [0000000000468c74] warn_slowpath_fmt+0x34/0x60
       [00000000004592f4] find_node+0xf4/0x120
       [0000000000dd0774] add_node_ranges+0x38/0xe4
       [0000000000dd0b1c] numa_parse_mdesc+0x268/0x2e4
       [0000000000dd0e9c] bootmem_init+0xb8/0x160
       [0000000000dd174c] paging_init+0x808/0x8fc
       [0000000000dcb0d0] setup_arch+0x2c8/0x2f0
       [0000000000dc68a0] start_kernel+0x48/0x424
       [0000000000dcb374] start_early_boot+0x27c/0x28c
       [0000000000a32c08] tlb_fixup_done+0x4c/0x64
       [0000000000027f08] 0x27f08
      
      It is because linux use an internal structure node_masks[] to
      keep the best memory latency node only. However, LDOM mdesc can
      contain single latency-group with multiple memory latency nodes.
      
      If the address doesn't match the best latency node within
      node_masks[], it should check for an alternative via mdesc.
      The warning message should only be printed if the address
      doesn't match any node_masks[] nor within mdesc. To minimize
      the impact of searching mdesc every time, the last matched
      mask and index is stored in a variable.
      Signed-off-by: NThomas Tai <thomas.tai@oracle.com>
      Reviewed-by: NChris Hyser <chris.hyser@oracle.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74a5ed5c
  7. 06 10月, 2016 1 次提交
  8. 28 9月, 2016 2 次提交
    • A
      sparc64: Fix irq stack bootmem allocation. · ebb99a4c
      Atish Patra 提交于
      Currently, irq stack bootmem is allocated for all possible cpus
      before nr_cpus value changes the list of possible cpus. As a result,
      there is unnecessary wastage of bootmemory.
      
      Move the irq stack bootmem allocation so that it happens after
      possible cpu list is modified based on nr_cpus value.
      Signed-off-by: NAtish Patra <atish.patra@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NVijay Kumar <vijay.ac.kumar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebb99a4c
    • P
      sparc64: fix section mismatch in find_numa_latencies_for_group · bdf2f59e
      Paul Gortmaker 提交于
      To fix:
      
        WARNING: vmlinux.o(.text.unlikely+0x580): Section mismatch in
        reference from the function find_numa_latencies_for_group() to the
        function .init.text:find_mlgroup()
      
        The function find_numa_latencies_for_group() references the
        function __init find_mlgroup().  This is often because
        find_numa_latencies_for_group lacks a __init annotation or the
        annotation of find_mlgroup is wrong.
      
      It turns out find_numa_latencies_for_group is only called from:
          static int __init numa_parse_mdesc(void)
      and hence we can tag find_numa_latencies_for_group with __init.
      
      In doing so we see that find_best_numa_node_for_mlgroup is only
      called from within __init and hence can also be marked with __init.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
      Cc: Chris Hyser <chris.hyser@oracle.com>
      Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdf2f59e
  9. 30 7月, 2016 1 次提交
  10. 29 7月, 2016 1 次提交
    • M
      sparc64 mm: Fix base TSB sizing when hugetlb pages are used · af1b1a9b
      Mike Kravetz 提交于
      do_sparc64_fault() calculates both the base and huge page RSS sizes and
      uses this information in calls to tsb_grow().  The calculation for base
      page TSB size is not correct if the task uses hugetlb pages.  hugetlb
      pages are not accounted for in RSS, therefore the call to get_mm_rss(mm)
      does not include hugetlb pages.  However, the number of pages based on
      huge_pte_count (which does include hugetlb pages) is subtracted from
      this value.  This will result in an artificially small and often negative
      RSS calculation.  The base TSB size is then often set to max_tsb_size
      as the passed RSS is unsigned, so a negative value looks really big.
      
      THP pages are also accounted for in huge_pte_count, and THP pages are
      accounted for in RSS so the calculation in do_sparc64_fault() is correct
      if a task only uses THP pages.
      
      A single huge_pte_count is not sufficient for TSB sizing if both hugetlb
      and THP pages can be used.  Instead of a single counter, use two:  one
      for hugetlb and one for THP.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af1b1a9b
  11. 25 6月, 2016 1 次提交
    • M
      tree wide: get rid of __GFP_REPEAT for order-0 allocations part I · 32d6bd90
      Michal Hocko 提交于
      This is the third version of the patchset previously sent [1].  I have
      basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
      rid of superfluous gfp flags" which went through dm tree.  I am sending
      it now because it is tree wide and chances for conflicts are reduced
      considerably when we want to target rc2.  I plan to send the next step
      and rename the flag and move to a better semantic later during this
      release cycle so we will have a new semantic ready for 4.8 merge window
      hopefully.
      
      Motivation:
      
      While working on something unrelated I've checked the current usage of
      __GFP_REPEAT in the tree.  It seems that a majority of the usage is and
      always has been bogus because __GFP_REPEAT has always been about costly
      high order allocations while we are using it for order-0 or very small
      orders very often.  It seems that a big pile of them is just a
      copy&paste when a code has been adopted from one arch to another.
      
      I think it makes some sense to get rid of them because they are just
      making the semantic more unclear.  Please note that GFP_REPEAT is
      documented as
      
      * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
      
      * _might_ fail.  This depends upon the particular VM implementation.
        while !costly requests have basically nofail semantic.  So one could
        reasonably expect that order-0 request with __GFP_REPEAT will not loop
        for ever.  This is not implemented right now though.
      
      I would like to move on with __GFP_REPEAT and define a better semantic
      for it.
      
        $ git grep __GFP_REPEAT origin/master | wc -l
        111
        $ git grep __GFP_REPEAT | wc -l
        36
      
      So we are down to the third after this patch series.  The remaining
      places really seem to be relying on __GFP_REPEAT due to large allocation
      requests.  This still needs some double checking which I will do later
      after all the simple ones are sorted out.
      
      I am touching a lot of arch specific code here and I hope I got it right
      but as a matter of fact I even didn't compile test for some archs as I
      do not have cross compiler for them.  Patches should be quite trivial to
      review for stupid compile mistakes though.  The tricky parts are usually
      hidden by macro definitions and thats where I would appreciate help from
      arch maintainers.
      
      [1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org
      
      This patch (of 19):
      
      __GFP_REPEAT has a rather weak semantic but since it has been introduced
      around 2.6.12 it has been ignored for low order allocations.  Yet we
      have the full kernel tree with its usage for apparently order-0
      allocations.  This is really confusing because __GFP_REPEAT is
      explicitly documented to allow allocation failures which is a weaker
      semantic than the current order-0 has (basically nofail).
      
      Let's simply drop __GFP_REPEAT from those places.  This would allow to
      identify place which really need allocator to retry harder and formulate
      a more specific semantic for what the flag is supposed to do actually.
      
      Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Crispin <blogic@openwrt.org>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32d6bd90
  12. 26 5月, 2016 1 次提交
  13. 21 5月, 2016 1 次提交
  14. 22 4月, 2016 1 次提交
  15. 30 1月, 2016 1 次提交
    • T
      arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM · 35d98e93
      Toshi Kani 提交于
      Set IORESOURCE_SYSTEM_RAM in flags of resource ranges with
      "System RAM", "Kernel code", "Kernel data", and "Kernel bss".
      
      Note that:
      
       - IORESOURCE_SYSRAM (i.e. modifier bit) is set in flags when
         IORESOURCE_MEM is already set. IORESOURCE_SYSTEM_RAM is defined
         as (IORESOURCE_MEM|IORESOURCE_SYSRAM).
      
       - Some archs do not set 'flags' for children nodes, such as
         "Kernel code".  This patch does not change 'flags' in this
         case.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1453841853-11383-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35d98e93
  16. 15 1月, 2016 1 次提交
  17. 05 11月, 2015 1 次提交
    • N
      sparc64: Fix numa distance values · 52708d69
      Nitin Gupta 提交于
      Orabug: 21896119
      
      Use machine descriptor (MD) to get node latency
      values instead of just using default values.
      
      Testing:
      On an T5-8 system with:
       - total nodes = 8
       - self latencies = 0x26d18
       - latency to other nodes = 0x3a598
         => latency ratio = ~1.5
      
      output of numactl --hardware
      
       - before fix:
      
      node distances:
      node   0   1   2   3   4   5   6   7
        0:  10  20  20  20  20  20  20  20
        1:  20  10  20  20  20  20  20  20
        2:  20  20  10  20  20  20  20  20
        3:  20  20  20  10  20  20  20  20
        4:  20  20  20  20  10  20  20  20
        5:  20  20  20  20  20  10  20  20
        6:  20  20  20  20  20  20  10  20
        7:  20  20  20  20  20  20  20  10
      
       - after fix:
      
      node distances:
      node   0   1   2   3   4   5   6   7
        0:  10  15  15  15  15  15  15  15
        1:  15  10  15  15  15  15  15  15
        2:  15  15  10  15  15  15  15  15
        3:  15  15  15  10  15  15  15  15
        4:  15  15  15  15  10  15  15  15
        5:  15  15  15  15  15  10  15  15
        6:  15  15  15  15  15  15  10  15
        7:  15  15  15  15  15  15  15  10
      Signed-off-by: NNitin Gupta <nitin.m.gupta@oracle.com>
      Reviewed-by: NChris Hyser <chris.hyser@oracle.com>
      Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52708d69
  18. 25 6月, 2015 1 次提交
    • T
      mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute · fc6daaf9
      Tony Luck 提交于
      Some high end Intel Xeon systems report uncorrectable memory errors as a
      recoverable machine check.  Linux has included code for some time to
      process these and just signal the affected processes (or even recover
      completely if the error was in a read only page that can be replaced by
      reading from disk).
      
      But we have no recovery path for errors encountered during kernel code
      execution.  Except for some very specific cases were are unlikely to ever
      be able to recover.
      
      Enter memory mirroring. Actually 3rd generation of memory mirroing.
      
      Gen1: All memory is mirrored
      	Pro: No s/w enabling - h/w just gets good data from other side of the
      	     mirror
      	Con: Halves effective memory capacity available to OS/applications
      
      Gen2: Partial memory mirror - just mirror memory begind some memory controllers
      	Pro: Keep more of the capacity
      	Con: Nightmare to enable. Have to choose between allocating from
      	     mirrored memory for safety vs. NUMA local memory for performance
      
      Gen3: Address range partial memory mirror - some mirror on each memory
            controller
      	Pro: Can tune the amount of mirror and keep NUMA performance
      	Con: I have to write memory management code to implement
      
      The current plan is just to use mirrored memory for kernel allocations.
      This has been broken into two phases:
      
      1) This patch series - find the mirrored memory, use it for boot time
         allocations
      
      2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the
         unused mirrored memory from mm/memblock.c and only give it out to
         select kernel allocations (this is still being scoped because
         page_alloc.c is scary).
      
      This patch (of 3):
      
      Add extra "flags" to memblock to allow selection of memory based on
      attribute.  No functional changes
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Xiexiuqi <xiexiuqi@huawei.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc6daaf9
  19. 01 6月, 2015 1 次提交
    • K
      sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE · 494e5b6f
      Khalid Aziz 提交于
      sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE
      
      Bit 9 of TTE is CV (Cacheable in V-cache) on sparc v9 processor while
      the same bit 9 is MCDE (Memory Corruption Detection Enable) on M7
      processor. This creates a conflicting usage of the same bit. Kernel
      sets TTE.cv bit on all pages for sun4v architecture which works well
      for sparc v9 but enables memory corruption detection on M7 processor
      which is not the intent. This patch adds code to determine if kernel
      is running on M7 processor and takes steps to not enable memory
      corruption detection in TTE erroneously.
      Signed-off-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      494e5b6f
  20. 19 5月, 2015 1 次提交
    • D
      mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler · 70ffdb93
      David Hildenbrand 提交于
      Introduce faulthandler_disabled() and use it to check for irq context and
      disabled pagefaults (via pagefault_disable()) in the pagefault handlers.
      
      Please note that we keep the in_atomic() checks in place - to detect
      whether in irq context (in which case preemption is always properly
      disabled).
      
      In contrast, preempt_disable() should never be used to disable pagefaults.
      With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
      counter, and therefore the result of in_atomic() differs.
      We validate that condition by using might_fault() checks when calling
      might_sleep().
      
      Therefore, add a comment to faulthandler_disabled(), describing why this
      is needed.
      
      faulthandler_disabled() and pagefault_disable() are defined in
      linux/uaccess.h, so let's properly add that include to all relevant files.
      
      This patch is based on a patch from Thomas Gleixner.
      Reviewed-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: airlied@linux.ie
      Cc: akpm@linux-foundation.org
      Cc: benh@kernel.crashing.org
      Cc: bigeasy@linutronix.de
      Cc: borntraeger@de.ibm.com
      Cc: daniel.vetter@intel.com
      Cc: heiko.carstens@de.ibm.com
      Cc: herbert@gondor.apana.org.au
      Cc: hocko@suse.cz
      Cc: hughd@google.com
      Cc: mst@redhat.com
      Cc: paulus@samba.org
      Cc: ralf@linux-mips.org
      Cc: schwidefsky@de.ibm.com
      Cc: yang.shi@windriver.com
      Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      70ffdb93
  21. 19 3月, 2015 1 次提交
  22. 14 12月, 2014 1 次提交
    • J
      mm/debug-pagealloc: make debug-pagealloc boottime configurable · 031bc574
      Joonsoo Kim 提交于
      Now, we have prepared to avoid using debug-pagealloc in boottime.  So
      introduce new kernel-parameter to disable debug-pagealloc in boottime, and
      makes related functions to be disabled in this case.
      
      Only non-intuitive part is change of guard page functions.  Because guard
      page is effective only if debug-pagealloc is enabled, turning off
      according to debug-pagealloc is reasonable thing to do.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Jungsoo Son <jungsoo.son@lge.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      031bc574
  23. 06 10月, 2014 8 次提交
  24. 05 10月, 2014 1 次提交
    • D
      sparc64: Fix reversed start/end in flush_tlb_kernel_range() · 473ad7f4
      David S. Miller 提交于
      When we have to split up a flush request into multiple pieces
      (in order to avoid the firmware range) we don't specify the
      arguments in the right order for the second piece.
      
      Fix the order, or else we get hangs as the code tries to
      flush "a lot" of entries and we get lockups like this:
      
      [ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032]
      [ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core
      [ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608
      [ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000
      [ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000    Not tainted
      [ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40>
      [ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000
      [ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006
      [ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000
      [ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54
      [ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0>
      [ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000
      [ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138
      [ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000
      [ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24
      [ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80>
      [ 4423.234193] Call Trace:
      [ 4423.239051]  [0000000000530c24] free_vmap_area_noflush+0x64/0x80
      [ 4423.251029]  [0000000000531a7c] remove_vm_area+0x5c/0x80
      [ 4423.261628]  [0000000000531b80] __vunmap+0x20/0x120
      [ 4423.271352]  [000000000071cf18] n_tty_close+0x18/0x40
      [ 4423.281423]  [00000000007222b0] tty_ldisc_close+0x30/0x60
      [ 4423.292183]  [00000000007225a4] tty_ldisc_reinit+0x24/0xa0
      [ 4423.303120]  [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0
      [ 4423.314232]  [0000000000719aa0] __tty_hangup+0x280/0x3c0
      [ 4423.324835]  [0000000000724cb4] pty_close+0x134/0x1a0
      [ 4423.334905]  [000000000071aa24] tty_release+0x104/0x500
      [ 4423.345316]  [00000000005511d0] __fput+0x90/0x1e0
      [ 4423.354701]  [000000000047fa54] task_work_run+0x94/0xe0
      [ 4423.365126]  [0000000000404b44] __handle_signal+0xc/0x2c
      
      Fixes: 4ca9a237 ("sparc64: Guard against flushing openfirmware mappings.")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      473ad7f4