1. 22 2月, 2016 1 次提交
  2. 17 2月, 2016 2 次提交
  3. 11 2月, 2016 4 次提交
    • K
      ARM: 8502/1: mm: mark section-aligned portion of rodata NX · 64ac2e74
      Kees Cook 提交于
      When rodata is large enough that it crosses a section boundary after the
      kernel text, mark the rest NX. This is as close to full NX of rodata as
      we can get without splitting page tables or doing section alignment via
      CONFIG_DEBUG_ALIGN_RODATA.
      
      When the config is:
      
       CONFIG_DEBUG_RODATA=y
       # CONFIG_DEBUG_ALIGN_RODATA is not set
      
      Before:
      
      ---[ Kernel Mapping ]---
      0x80000000-0x80100000           1M     RW NX SHD
      0x80100000-0x80a00000           9M     ro x  SHD
      0x80a00000-0xa0000000         502M     RW NX SHD
      
      After:
      
      ---[ Kernel Mapping ]---
      0x80000000-0x80100000           1M     RW NX SHD
      0x80100000-0x80700000           6M     ro x  SHD
      0x80700000-0x80a00000           3M     ro NX SHD
      0x80a00000-0xa0000000         502M     RW NX SHD
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      64ac2e74
    • C
      ARM: 8518/1: Use correct symbols for XIP_KERNEL · 02afa9a8
      Chris Brandt 提交于
      For an XIP build, _etext does not represent the end of the
      binary image that needs to stay mapped into the MODULES_VADDR area.
      Years ago, data came before text in the memory map. However,
      now that the order is text/init/data, an XIP_KERNEL needs to map
      up to the data location in order to keep from cutting off
      parts of the kernel that are needed.
      We only map up to the beginning of data because data has already been
      copied, so there's no reason to keep it around anymore.
      A new symbol is created to make it clear what it is we are referring
      to.
      
      This fixes the bug where you might lose the end of your kernel area
      after page table setup is complete.
      Signed-off-by: NChris Brandt <chris.brandt@renesas.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      02afa9a8
    • D
      ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc · 14d3ae2e
      Doug Anderson 提交于
      If we know that TLB efficiency will not be an issue when memory is
      accessed then it's not terribly important to allocate big chunks of
      memory.  The whole point of allocating the big chunks was that it would
      make TLB usage efficient.
      
      As Marek Szyprowski indicated:
          Please note that mapping memory with larger pages significantly
          improves performance, especially when IOMMU has a little TLB
          cache. This can be easily observed when multimedia devices do
          processing of RGB data with 90/270 degree rotation
      Image rotation is distinctly an operation that needs to bounce around
      through memory, so it makes sense that TLB efficiency is important
      there.
      
      Video decoding, on the other hand, is a fairly sequential operation.
      During video decoding it's not expected that we'll be jumping all over
      memory.  Decoding video is also pretty heavy and the TLB misses aren't a
      huge deal.  Presumably most HW video acceleration users of dma-mapping
      will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.
      
      Allocating big chunks of memory is quite expensive, especially if we're
      doing it repeadly and memory is full.  In one (out of tree) usage model
      it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
      each one trying to allocate 4 MB of memory.  This is called whenever the
      system encounters a new video, which could easily happen while the
      memory system is stressed out.  In fact, on certain social media
      websites that auto-play video and have infinite scrolling, it's quite
      common to see not just one of these 16x4MB allocations but 2 or 3 right
      after another.  Asking the system even to do a small amount of extra
      work to give us big chunks in this case is just not a good use of time.
      
      Allocating big chunks of memory is also expensive indirectly.  Even if
      we ask the system not to do ANY extra work to allocate _our_ memory,
      we're still potentially eating up all big chunks in the system.
      Presumably there are other users in the system that aren't quite as
      flexible and that actually need these big chunks.  By eating all the big
      chunks we're causing extra work for the rest of the system.  We also may
      start making other memory allocations fail.  While the system may be
      robust to such failures (as is the case with dwc2 USB trying to allocate
      buffers for Ethernet data and with WiFi trying to allocate buffers for
      WiFi data), it is yet another big performance hit.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Tested-by: NJavier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      14d3ae2e
    • D
      ARM: 8505/1: dma-mapping: Optimize allocation · 33298ef6
      Doug Anderson 提交于
      The __iommu_alloc_buffer() is expected to be called to allocate pretty
      sizeable buffers.  Upon simple tests of video I saw it trying to
      allocate 4,194,304 bytes.  The function tries to allocate large chunks
      in order to optimize IOMMU TLB usage.
      
      The current function is very, very slow.
      
      One problem is the way it keeps trying and trying to allocate big
      chunks.  Imagine a very fragmented memory that has 4M free but no
      contiguous pages at all.  Further imagine allocating 4M (1024 pages).
      We'll do the following memory allocations:
      - For page 1:
        - Try to allocate order 10 (no retry)
        - Try to allocate order 9 (no retry)
        - ...
        - Try to allocate order 0 (with retry, but not needed)
      - For page 2:
        - Try to allocate order 9 (no retry)
        - Try to allocate order 8 (no retry)
        - ...
        - Try to allocate order 0 (with retry, but not needed)
      - ...
      - ...
      
      Total number of calls to alloc() calls for this case is:
        sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
        => 9228
      
      The above is obviously worse case, but given how slow alloc can be we
      really want to try to avoid even somewhat bad cases.  I timed the old
      code with a device under memory pressure and it wasn't hard to see it
      take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
      was done on kernel 3.14, so possibly mainline would behave
      differently).
      
      A second problem is that allocating big chunks under memory pressure
      when we don't need them is just not a great idea anyway unless we really
      need them.  We can make due pretty well with smaller chunks so it's
      probably wise to leave bigger chunks for other users once memory
      pressure is on.
      
      Let's adjust the allocation like this:
      
      1. If a big chunk fails, stop trying to hard and bump down to lower
         order allocations.
      2. Don't try useless orders.  The whole point of big chunks is to
         optimize the TLB and it can really only make use of 2M, 1M, 64K and
         4K sizes.
      
      We'll still tend to eat up a bunch of big chunks, but that might be the
      right answer for some users.  A future patch could possibly add a new
      DMA_ATTR that would let the caller decide that TLB optimization isn't
      important and that we should use smaller chunks.  Presumably this would
      be a sane strategy for some callers.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Reviewed-by: NTomasz Figa <tfiga@chromium.org>
      Tested-by: NJavier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      33298ef6
  4. 08 2月, 2016 2 次提交
    • K
      ARM: 8501/1: mm: flip priority of CONFIG_DEBUG_RODATA · 25362dc4
      Kees Cook 提交于
      The use of CONFIG_DEBUG_RODATA is generally seen as an essential part of
      kernel self-protection:
      http://www.openwall.com/lists/kernel-hardening/2015/11/30/13
      Additionally, its name has grown to mean things beyond just rodata. To
      get ARM closer to this, we ought to rearrange the names of the configs
      that control how the kernel protects its memory. What was called
      CONFIG_ARM_KERNMEM_PERMS is realy doing the work that other architectures
      call CONFIG_DEBUG_RODATA.
      
      This redefines CONFIG_DEBUG_RODATA to actually do the bulk of the
      ROing (and NXing). In the place of the old CONFIG_DEBUG_RODATA, use
      CONFIG_DEBUG_ALIGN_RODATA, since that's what the option does: adds
      section alignment for making rodata explicitly NX, as arm does not split
      the page tables like arm64 does without _ALIGN_RODATA.
      
      Also adds human readable names to the sections so I could more easily
      debug my typos, and makes CONFIG_DEBUG_RODATA default "y" for CPU_V7.
      
      Results in /sys/kernel/debug/kernel_page_tables for each config state:
      
       # CONFIG_DEBUG_RODATA is not set
       # CONFIG_DEBUG_ALIGN_RODATA is not set
      
      ---[ Kernel Mapping ]---
      0x80000000-0x80900000           9M     RW x  SHD
      0x80900000-0xa0000000         503M     RW NX SHD
      
       CONFIG_DEBUG_RODATA=y
       CONFIG_DEBUG_ALIGN_RODATA=y
      
      ---[ Kernel Mapping ]---
      0x80000000-0x80100000           1M     RW NX SHD
      0x80100000-0x80700000           6M     ro x  SHD
      0x80700000-0x80a00000           3M     ro NX SHD
      0x80a00000-0xa0000000         502M     RW NX SHD
      
       CONFIG_DEBUG_RODATA=y
       # CONFIG_DEBUG_ALIGN_RODATA is not set
      
      ---[ Kernel Mapping ]---
      0x80000000-0x80100000           1M     RW NX SHD
      0x80100000-0x80a00000           9M     ro x  SHD
      0x80a00000-0xa0000000         502M     RW NX SHD
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NLaura Abbott <labbott@fedoraproject.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      25362dc4
    • R
      ARM: make virt_to_idmap() return unsigned long · 28410293
      Russell King 提交于
      Make virt_to_idmap() return an unsigned long rather than phys_addr_t.
      
      Returning phys_addr_t here makes no sense, because the definition of
      virt_to_idmap() is that it shall return a physical address which maps
      identically with the virtual address.  Since virtual addresses are
      limited to 32-bit, identity mapped physical addresses are as well.
      
      Almost all users already had an implicit narrowing cast to unsigned long
      so let's make this official and part of this interface.
      Tested-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      28410293
  5. 23 1月, 2016 1 次提交
  6. 16 1月, 2016 2 次提交
  7. 15 1月, 2016 1 次提交
    • D
      arm: mm: support ARCH_MMAP_RND_BITS · e0c25d95
      Daniel Cashman 提交于
      arm: arch_mmap_rnd() uses a hard-code value of 8 to generate the random
      offset for the mmap base address.  This value represents a compromise
      between increased ASLR effectiveness and avoiding address-space
      fragmentation.  Replace it with a Kconfig option, which is sensibly
      bounded, so that platform developers may choose where to place this
      compromise.  Keep 8 as the minimum acceptable value.
      
      [arnd@arndb.de: ARM: avoid ARCH_MMAP_RND_BITS for NOMMU]
      Signed-off-by: NDaniel Cashman <dcashman@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hector Marco-Gisbert <hecmargi@upv.es>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0c25d95
  8. 04 1月, 2016 1 次提交
  9. 22 12月, 2015 1 次提交
    • L
      ARM: 8482/1: l2x0: make it possible to disable outer sync from DT · 36f46d6d
      Linus Walleij 提交于
      According to commit 2503a5ec
      "ARM: 6201/1: RealView: Do not use outer_sync() on ARM11MPCore
      boards with L220" Some PB11MPCore RealView core tiles have broken
      outer_sync.
      
      We got rid of the custom barriers from the machine by disabling
      outer sync, but that was just for the boardfile case. We have
      to be able to do the same in the device tree case.
      
      Since __l2c_init() is cloning and copying the L2C vtable,
      we pass an argument to this function to optionally numb
      the outer sync operation if desired, before initializing
      the cache.
      
      After this we can set up the cache correctly on the RealView
      PB11MPCore. This was tested on a PB11MPCore known to have the
      issue. Before this, spurious crashes would occur if we try to
      set up the cache properly, after this it boots rock solid.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: devicetree@vger.kernel.org
      Acked-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      36f46d6d
  10. 18 12月, 2015 1 次提交
  11. 17 12月, 2015 1 次提交
  12. 16 12月, 2015 1 次提交
    • D
      Revert "scatterlist: use sg_phys()" · 3e6110fd
      Dan Williams 提交于
      commit db0fa0cb "scatterlist: use sg_phys()" did replacements of
      the form:
      
          phys_addr_t phys = page_to_phys(sg_page(s));
          phys_addr_t phys = sg_phys(s) & PAGE_MASK;
      
      However, this breaks platforms where sizeof(phys_addr_t) >
      sizeof(unsigned long).  Revert for 4.3 and 4.4 to make room for a
      combined helper in 4.5.
      
      Cc: <stable@vger.kernel.org>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: db0fa0cb ("scatterlist: use sg_phys()")
      Suggested-by: NJoerg Roedel <joro@8bytes.org>
      Reported-by: NVitaly Lavrov <vel21ripn@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3e6110fd
  13. 15 12月, 2015 2 次提交
    • A
      ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted · fa0708b3
      Anson Huang 提交于
      In cpu_v7_do_suspend routine, r11 is used while it is NOT
      saved/restored, different compiler may have different usage
      of ARM general registers, so it may cause issues during
      calling cpu_v7_do_suspend.
      
      We meet kernel fault occurs when using GCC 4.8.3, r11 contains
      valid value before calling into cpu_v7_do_suspend, but when returned
      from this routine, r11 is corrupted and lead to kernel fault.
      Doing save/restore for those corrupted registers is a must in
      assemble code.
      Signed-off-by: NAnson Huang <Anson.Huang@freescale.com>
      Reviewed-by: NNicolas Pitre <nico@linaro.org>
      Cc: <stable@vger.kernel.org> # v3.3+
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      fa0708b3
    • A
      ARM: no longer force unbuffered DMA for realview · 38541bf4
      Arnd Bergmann 提交于
      Commit 42c4dafe ("ARM: 6202/1: Do not ARM_DMA_MEM_BUFFERABLE
      on RealView boards with L210/L220") changed the generic setting for
      ARM_DMA_MEM_BUFFERABLE to be disabled on any Realview kernel that includes
      support for any of the ARM11 variations. Doing this was required to
      allow doing DMA without a lockup in the l2x0 cache controller on the
      Realview platform.
      
      Unfortunately, in a kernel that also contains support for any ARMv7
      based machine, the same change makes it impossible to do DMA on ARMv7,
      which gets in the way of enabling multiplatform support on Realview.
      
      As confirmed by Catalin Marinas and Linus Walleij, the current
      code for Realview that we have in the kernel does not actually
      perform any DMA, and this is unlikely to change in the future.
      Therefore we can revert 42c4dafe without introducing regressions,
      but we must never start using DMA on this platform in the future.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      38541bf4
  14. 14 12月, 2015 6 次提交
  15. 05 12月, 2015 1 次提交
  16. 03 12月, 2015 2 次提交
    • M
      ARM: 8462/1: cache-uniphier: use common API to find the next level cache · 89e69fbf
      Masahiro Yamada 提交于
      The function uniphier_cache_get_next_level_node() does the same thing
      as of_find_next_cache_node().  Drop the former and stick to the common
      API.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      89e69fbf
    • W
      ARM: 8465/1: mm: keep reserved ASIDs in sync with mm after multiple rollovers · 40ee068e
      Will Deacon 提交于
      Under some unusual context-switching patterns, it is possible to end up
      with multiple threads from the same mm running concurrently with
      different ASIDs:
      
      1. CPU x schedules task t with mm p containing ASID a and generation g
         This task doesn't block and the CPU doesn't context switch.
         So:
           * per_cpu(active_asid, x) = {g,a}
           * p->context.id = {g,a}
      
      2. Some other CPU generates an ASID rollover. The global generation is
         now (g + 1). CPU x is still running t, with no context switch and
         so per_cpu(reserved_asid, x) = {g,a}
      
      3. CPU y schedules task t', which shares mm p with t. The generation
         mismatches, so we take the slowpath and hit the reserved ASID from
         CPU x. p is then updated so that p->context.id = {g + 1,a}
      
      4. CPU y schedules some other task u, which has an mm != p.
      
      5. Some other CPU generates *another* CPU rollover. The global
         generation is now (g + 2). CPU x is still running t, with no context
         switch and so per_cpu(reserved_asid, x) = {g,a}.
      
      6. CPU y once again schedules task t', but now *fails* to hit the
         reserved ASID from CPU x because of the generation mismatch. This
         results in a new ASID being allocated, despite the fact that t is
         still running on CPU x with the same mm.
      
      Consequently, TLBIs (e.g. as a result of CoW) will not be synchronised
      between the two threads.
      
      This patch fixes the problem by updating all of the matching reserved
      ASIDs when we hit on the slowpath (i.e. in step 3 above). This keeps
      the reserved ASIDs in-sync with the mm and avoids the problem.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NTony Thompson <anthony.thompson@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      40ee068e
  17. 02 12月, 2015 2 次提交
    • A
      ARM: mohawk: allow building with MMU disabled · 0f67b876
      Arnd Bergmann 提交于
      It is in principle possible to build an MMP kernel for
      the mohawk CPU with the MMU code disabled, except for one
      simple build error:
      
      proc-mohawk.S:345: Error: invalid operands (*UND* and *UND* sections) for `|'
      proc-mohawk.S:345: Error: invalid operands (*ABS* and *UND* sections) for `|'
      proc-mohawk.S:345: Error: invalid operands (*UND* and *UND* sections) for `|'
      proc-mohawk.S:345: Error: invalid operands (*UND* and *UND* sections) for `|'
      proc-mohawk.S:345: Error: undefined symbol L_PTE_USER used as an immediate value
      
      This patch changes the proc-mohawk code to do the same as the
      other CPUs and not try to actually do anything for the
      cpu_mohawk_set_pte_ext function, which won't be used anyway.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      0f67b876
    • A
      ARM: make xscale iwmmxt code multiplatform aware · d33c43ac
      Arnd Bergmann 提交于
      In a multiplatform configuration, we may end up building a kernel for
      both Marvell PJ1 and an ARMv4 CPU implementation. In that case, the
      xscale-cp0 code is built with gcc -march=armv4{,t}, which results in a
      build error from the coprocessor instructions.
      
      Since we know this code will only have to run on an actual xscale
      processor, we can simply build the entire file for ARMv5TE.
      
      Related to this, we need to handle the iWMMXT initialization sequence
      differently during boot, to ensure we don't try to touch xscale
      specific registers on other CPUs from the xscale_cp0_init initcall.
      cpu_is_xscale() used to be hardcoded to '1' in any configuration that
      enables any XScale-compatible core, but this breaks once we can have a
      combined kernel with MMP1 and something else.
      
      In this patch, I replace the existing cpu_is_xscale() macro with a new
      cpu_is_xscale_family() macro that evaluates true for xscale, xsc3 and
      mohawk, which makes the behavior more deterministic.
      
      The two existing users of cpu_is_xscale() are modified accordingly,
      but slightly change behavior for kernels that enable CPU_MOHAWK without
      also enabling CPU_XSCALE or CPU_XSC3. Previously, these would leave leave
      PMD_BIT4 in the page tables untouched, now they clear it as we've always
      done for kernels that enable both MOHAWK and the support for the older
      CPU types.
      
      Since the previous behavior was inconsistent, I assume it was
      unintentional.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      d33c43ac
  18. 17 11月, 2015 2 次提交
  19. 10 11月, 2015 1 次提交
  20. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  21. 06 11月, 2015 1 次提交
    • A
      uaccess: reimplement probe_kernel_address() using probe_kernel_read() · 0ab32b6f
      Andrew Morton 提交于
      probe_kernel_address() is basically the same as the (later added)
      probe_kernel_read().
      
      The return value on EFAULT is a bit different: probe_kernel_address()
      returns number-of-bytes-not-copied whereas probe_kernel_read() returns
      -EFAULT.  All callers have been checked, none cared.
      
      probe_kernel_read() can be overridden by the architecture whereas
      probe_kernel_address() cannot.  parisc, blackfin and um do this, to insert
      additional checking.  Hence this patch possibly fixes obscure bugs,
      although there are only two probe_kernel_address() callsites outside
      arch/.
      
      My first attempt involved removing probe_kernel_address() entirely and
      converting all callsites to use probe_kernel_read() directly, but that got
      tiresome.
      
      This patch shrinks mm/slab_common.o by 218 bytes.  For a single
      probe_kernel_address() callsite.
      
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ab32b6f
  22. 27 10月, 2015 1 次提交
  23. 20 10月, 2015 1 次提交
  24. 03 10月, 2015 2 次提交