1. 15 4月, 2016 1 次提交
  2. 05 3月, 2016 2 次提交
    • R
      ARM: 8546/1: dma-mapping: refactor to fix coherent+cma+gfp=0 · b4268676
      Rabin Vincent 提交于
      Given a device which uses arm_coherent_dma_ops and on which
      dev_get_cma_area(dev) returns non-NULL, the following usage of the DMA
      API with gfp=0 results in memory corruption and a memory leak.
      
       p = dma_alloc_coherent(dev, sz, &dma, 0);
       if (p)
       	dma_free_coherent(dev, sz, p, dma);
      
      The memory leak is because the alloc allocates using
      __alloc_simple_buffer() but the free attempts
      dma_release_from_contiguous() which does not do free anything since the
      page is not in the CMA area.
      
      The memory corruption is because the free calls __dma_remap() on a page
      which is backed by only first level page tables.  The
      apply_to_page_range() + __dma_update_pte() loop ends up interpreting the
      section mapping as an addresses to a second level page table and writing
      the new PTE to memory which is not used by page tables.
      
      We don't have access to the GFP flags used for allocation in the free
      function.  Fix this by adding allocator backends and using this
      information in the free function so that we always use the correct
      release routine.
      
      Fixes: 21caf3a7 ("ARM: 8398/1: arm DMA: Fix allocation from CMA for coherent DMA")
      Signed-off-by: NRabin Vincent <rabin.vincent@axis.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      b4268676
    • R
      ARM: 8547/1: dma-mapping: store buffer information · 19e6e5e5
      Rabin Vincent 提交于
      Keep a list of allocated DMA buffers so that we can store metadata in
      alloc() which we later need in free().
      Signed-off-by: NRabin Vincent <rabin.vincent@axis.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      19e6e5e5
  3. 11 2月, 2016 2 次提交
    • D
      ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc · 14d3ae2e
      Doug Anderson 提交于
      If we know that TLB efficiency will not be an issue when memory is
      accessed then it's not terribly important to allocate big chunks of
      memory.  The whole point of allocating the big chunks was that it would
      make TLB usage efficient.
      
      As Marek Szyprowski indicated:
          Please note that mapping memory with larger pages significantly
          improves performance, especially when IOMMU has a little TLB
          cache. This can be easily observed when multimedia devices do
          processing of RGB data with 90/270 degree rotation
      Image rotation is distinctly an operation that needs to bounce around
      through memory, so it makes sense that TLB efficiency is important
      there.
      
      Video decoding, on the other hand, is a fairly sequential operation.
      During video decoding it's not expected that we'll be jumping all over
      memory.  Decoding video is also pretty heavy and the TLB misses aren't a
      huge deal.  Presumably most HW video acceleration users of dma-mapping
      will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.
      
      Allocating big chunks of memory is quite expensive, especially if we're
      doing it repeadly and memory is full.  In one (out of tree) usage model
      it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
      each one trying to allocate 4 MB of memory.  This is called whenever the
      system encounters a new video, which could easily happen while the
      memory system is stressed out.  In fact, on certain social media
      websites that auto-play video and have infinite scrolling, it's quite
      common to see not just one of these 16x4MB allocations but 2 or 3 right
      after another.  Asking the system even to do a small amount of extra
      work to give us big chunks in this case is just not a good use of time.
      
      Allocating big chunks of memory is also expensive indirectly.  Even if
      we ask the system not to do ANY extra work to allocate _our_ memory,
      we're still potentially eating up all big chunks in the system.
      Presumably there are other users in the system that aren't quite as
      flexible and that actually need these big chunks.  By eating all the big
      chunks we're causing extra work for the rest of the system.  We also may
      start making other memory allocations fail.  While the system may be
      robust to such failures (as is the case with dwc2 USB trying to allocate
      buffers for Ethernet data and with WiFi trying to allocate buffers for
      WiFi data), it is yet another big performance hit.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Tested-by: NJavier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      14d3ae2e
    • D
      ARM: 8505/1: dma-mapping: Optimize allocation · 33298ef6
      Doug Anderson 提交于
      The __iommu_alloc_buffer() is expected to be called to allocate pretty
      sizeable buffers.  Upon simple tests of video I saw it trying to
      allocate 4,194,304 bytes.  The function tries to allocate large chunks
      in order to optimize IOMMU TLB usage.
      
      The current function is very, very slow.
      
      One problem is the way it keeps trying and trying to allocate big
      chunks.  Imagine a very fragmented memory that has 4M free but no
      contiguous pages at all.  Further imagine allocating 4M (1024 pages).
      We'll do the following memory allocations:
      - For page 1:
        - Try to allocate order 10 (no retry)
        - Try to allocate order 9 (no retry)
        - ...
        - Try to allocate order 0 (with retry, but not needed)
      - For page 2:
        - Try to allocate order 9 (no retry)
        - Try to allocate order 8 (no retry)
        - ...
        - Try to allocate order 0 (with retry, but not needed)
      - ...
      - ...
      
      Total number of calls to alloc() calls for this case is:
        sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
        => 9228
      
      The above is obviously worse case, but given how slow alloc can be we
      really want to try to avoid even somewhat bad cases.  I timed the old
      code with a device under memory pressure and it wasn't hard to see it
      take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
      was done on kernel 3.14, so possibly mainline would behave
      differently).
      
      A second problem is that allocating big chunks under memory pressure
      when we don't need them is just not a great idea anyway unless we really
      need them.  We can make due pretty well with smaller chunks so it's
      probably wise to leave bigger chunks for other users once memory
      pressure is on.
      
      Let's adjust the allocation like this:
      
      1. If a big chunk fails, stop trying to hard and bump down to lower
         order allocations.
      2. Don't try useless orders.  The whole point of big chunks is to
         optimize the TLB and it can really only make use of 2M, 1M, 64K and
         4K sizes.
      
      We'll still tend to eat up a bunch of big chunks, but that might be the
      right answer for some users.  A future patch could possibly add a new
      DMA_ATTR that would let the caller decide that TLB optimization isn't
      important and that we should use smaller chunks.  Presumably this would
      be a sane strategy for some callers.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Reviewed-by: NTomasz Figa <tfiga@chromium.org>
      Tested-by: NJavier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      33298ef6
  4. 23 1月, 2016 1 次提交
  5. 16 12月, 2015 1 次提交
    • D
      Revert "scatterlist: use sg_phys()" · 3e6110fd
      Dan Williams 提交于
      commit db0fa0cb "scatterlist: use sg_phys()" did replacements of
      the form:
      
          phys_addr_t phys = page_to_phys(sg_page(s));
          phys_addr_t phys = sg_phys(s) & PAGE_MASK;
      
      However, this breaks platforms where sizeof(phys_addr_t) >
      sizeof(unsigned long).  Revert for 4.3 and 4.4 to make room for a
      combined helper in 4.5.
      
      Cc: <stable@vger.kernel.org>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: db0fa0cb ("scatterlist: use sg_phys()")
      Suggested-by: NJoerg Roedel <joro@8bytes.org>
      Reported-by: NVitaly Lavrov <vel21ripn@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3e6110fd
  6. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  7. 03 10月, 2015 2 次提交
  8. 17 9月, 2015 1 次提交
    • A
      ARM: 8437/1: dma-mapping: fix build warning with new DMA_ERROR_CODE definition · 90cde558
      Andre Przywara 提交于
      Commit 96231b26: ("ARM: 8419/1: dma-mapping: harmonize definition
      of DMA_ERROR_CODE") changed the definition of DMA_ERROR_CODE to use
      dma_addr_t, which makes the compiler barf on assigning this to an
      "int" variable on ARM with LPAE enabled:
      *************
      In file included from /src/linux/include/linux/dma-mapping.h:86:0,
                       from /src/linux/arch/arm/mm/dma-mapping.c:21:
      /src/linux/arch/arm/mm/dma-mapping.c: In function '__iommu_create_mapping':
      /src/linux/arch/arm/include/asm/dma-mapping.h:16:24: warning:
      overflow in implicit constant conversion [-Woverflow]
       #define DMA_ERROR_CODE (~(dma_addr_t)0x0)
                              ^
      /src/linux/arch/arm/mm/dma-mapping.c:1252:15: note: in expansion of
      macro DMA_ERROR_CODE'
        int i, ret = DMA_ERROR_CODE;
                     ^
      *************
      
      Remove the actually unneeded initialization of "ret" in
      __iommu_create_mapping() and move the variable declaration inside the
      for-loop to make the scope of this variable more clear.
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      90cde558
  9. 11 9月, 2015 1 次提交
    • C
      dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} · 6894258e
      Christoph Hellwig 提交于
      Since 2009 we have a nice asm-generic header implementing lots of DMA API
      functions for architectures using struct dma_map_ops, but unfortunately
      it's still missing a lot of APIs that all architectures still have to
      duplicate.
      
      This series consolidates the remaining functions, although we still need
      arch opt outs for two of them as a few architectures have very
      non-standard implementations.
      
      This patch (of 5):
      
      The coherent DMA allocator works the same over all architectures supporting
      dma_map operations.
      
      This patch consolidates them and converges the minor differences:
      
       - the debug_dma helpers are now called from all architectures, including
         those that were previously missing them
       - dma_alloc_from_coherent and dma_release_from_coherent are now always
         called from the generic alloc/free routines instead of the ops
         dma-mapping-common.h always includes dma-coherent.h to get the defintions
         for them, or the stubs if the architecture doesn't support this feature
       - checks for ->alloc / ->free presence are removed.  There is only one
         magic instead of dma_map_ops without them (mic_dma_ops) and that one
         is x86 only anyway.
      
      Besides that only x86 needs special treatment to replace a default devices
      if none is passed and tweak the gfp_flags.  An optional arch hook is provided
      for that.
      
      [linux@roeck-us.net: fix build]
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6894258e
  10. 17 8月, 2015 1 次提交
    • D
      scatterlist: use sg_phys() · db0fa0cb
      Dan Williams 提交于
      Coccinelle cleanup to replace open coded sg to physical address
      translations.  This is in preparation for introducing scatterlists that
      reference __pfn_t.
      
      // sg_phys.cocci: convert usage page_to_phys(sg_page(sg)) to sg_phys(sg)
      // usage: make coccicheck COCCI=sg_phys.cocci MODE=patch
      
      virtual patch
      
      @@
      struct scatterlist *sg;
      @@
      
      - page_to_phys(sg_page(sg)) + sg->offset
      + sg_phys(sg)
      
      @@
      struct scatterlist *sg;
      @@
      
      - page_to_phys(sg_page(sg))
      + sg_phys(sg) & PAGE_MASK
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      db0fa0cb
  11. 04 8月, 2015 1 次提交
  12. 02 8月, 2015 1 次提交
    • R
      ARM: reduce visibility of dmac_* functions · 1234e3fd
      Russell King 提交于
      The dmac_* functions are private to the ARM DMA API implementation, and
      should not be used by drivers.  In order to discourage their use, remove
      their prototypes and macros from asm/*.h.
      
      We have to leave dmac_flush_range() behind as Exynos and MSM IOMMU code
      use these; once these sites are fixed, this can be moved also.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      1234e3fd
  13. 17 7月, 2015 1 次提交
  14. 06 6月, 2015 1 次提交
    • M
      ARM: 8387/1: arm/mm/dma-mapping.c: Add arm_coherent_dma_mmap · 55af8a91
      Mike Looijmans 提交于
      When dma-coherent transfers are enabled, the mmap call must
      not change the pg_prot flags in the vma struct.
      
      Split the arm_dma_mmap into a common and specific parts,
      and add a "arm_coherent_dma_mmap" implementation that does
      not alter the page protection flags.
      
      Tested on a topic-miami board (Zynq) using the ACP port
      to transfer data between FPGA and CPU using the Dyplo
      framework. Without this patch, byte-wise access to mmapped
      coherent DMA memory was about 20x slower because of the
      memory being marked as non-cacheable, and transfer speeds
      would not exceed 240MB/s.
      
      After this patch, the mapped memory is cacheable and the
      transfer speed is again 600MB/s (limited by the FPGA) when
      the data is in the L2 cache, while data integrity is being
      maintained.
      
      The patch has no effect on non-coherent DMA.
      Signed-off-by: NMike Looijmans <mike.looijmans@topic.nl>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      55af8a91
  15. 04 5月, 2015 1 次提交
  16. 02 4月, 2015 1 次提交
  17. 18 3月, 2015 1 次提交
  18. 13 3月, 2015 1 次提交
  19. 11 3月, 2015 1 次提交
    • R
      ARM: dma-api: fix off-by-one error in __dma_supported() · 8bf1268f
      Russell King 提交于
      When validating the mask against the amount of memory we have available
      (so that we can trap 32-bit DMA addresses with >32-bits memory), we had
      not taken account of the fact that max_pfn is the maximum PFN number
      plus one that would be in the system.
      
      There are several references in the code which bear this out:
      
      mm/page_owner.c:
      	for (; pfn < max_pfn; pfn++) {
      	}
      
      arch/x86/kernel/setup.c:
      	high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1)
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      8bf1268f
  20. 23 2月, 2015 1 次提交
  21. 20 2月, 2015 1 次提交
  22. 30 1月, 2015 1 次提交
  23. 29 1月, 2015 1 次提交
  24. 02 12月, 2014 1 次提交
  25. 30 10月, 2014 1 次提交
  26. 10 10月, 2014 2 次提交
  27. 07 8月, 2014 1 次提交
  28. 18 7月, 2014 1 次提交
  29. 22 5月, 2014 2 次提交
  30. 20 5月, 2014 1 次提交
  31. 07 5月, 2014 1 次提交
  32. 23 4月, 2014 1 次提交
  33. 28 2月, 2014 2 次提交
    • M
      arm: dma-mapping: remove order parameter from arm_iommu_create_mapping() · 68efd7d2
      Marek Szyprowski 提交于
      The 'order' parameter for IOMMU-aware dma-mapping implementation was
      introduced mainly as a hack to reduce size of the bitmap used for
      tracking IO virtual address space. Since now it is possible to dynamically
      resize the bitmap, this hack is not needed and can be removed without any
      impact on the client devices. This way the parameters for
      arm_iommu_create_mapping() becomes much easier to understand. 'size'
      parameter now means the maximum supported IO address space size.
      
      The code will allocate (resize) bitmap in chunks, ensuring that a single
      chunk is not larger than a single memory page to avoid unreliable
      allocations of size larger than PAGE_SIZE in atomic context.
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      68efd7d2
    • A
      arm: dma-mapping: Add support to extend DMA IOMMU mappings · 4d852ef8
      Andreas Herrmann 提交于
      Instead of using just one bitmap to keep track of IO virtual addresses
      (handed out for IOMMU use) introduce an array of bitmaps. This allows
      us to extend existing mappings when running out of iova space in the
      initial mapping etc.
      
      If there is not enough space in the mapping to service an IO virtual
      address allocation request, __alloc_iova() tries to extend the mapping
      -- by allocating another bitmap -- and makes another allocation
      attempt using the freshly allocated bitmap.
      
      This allows arm iommu drivers to start with a decent initial size when
      an dma_iommu_mapping is created and still to avoid running out of IO
      virtual addresses for the mapping.
      Signed-off-by: NAndreas Herrmann <andreas.herrmann@calxeda.com>
      [mszyprow: removed extensions parameter to arm_iommu_create_mapping()
       function, which will be modified in the next patch anyway, also some
       debug messages about extending bitmap]
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      4d852ef8
  34. 19 2月, 2014 1 次提交
    • S
      ARM: 7979/1: mm: Remove hugetlb warning from Coherent DMA allocator · 6ea41c80
      Steven Capper 提交于
      The Coherant DMA allocator allocates pages of high order then splits
      them up into smaller pages.
      
      This splitting logic would run into problems if the allocator was
      given compound pages. Thus the Coherant DMA allocator was originally
      incompatible with compound pages existing and, by extension, huge
      pages. A compile #error was put in place whenever huge pages were
      enabled.
      
      Compatibility with compound pages has since been introduced by the
      following commit (which merely excludes GFP_COMP pages from being
      requested by the coherant DMA allocator):
        ea2e7057 ARM: 7172/1: dma: Drop GFP_COMP for DMA memory allocations
      
      When huge page support was introduced to ARM, the compile #error in
      dma-mapping.c was replaced by a #warning when it should have been
      removed instead.
      
      This patch removes the compile #warning in dma-mapping.c when huge
      pages are enabled.
      Signed-off-by: NSteve Capper <steve.capper@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      6ea41c80