1. 13 1月, 2017 1 次提交
  2. 10 1月, 2017 2 次提交
    • P
      dmaengine: omap-dma: Fix the port_window support · 527a2759
      Peter Ujfalusi 提交于
      We do not yet have users of port_window. The following errors were found
      when converting the tusb6010_omap.c musb driver:
      
      - The peripheral side must have SRC_/DST_PACKED disabled
      - when configuring the burst for the peripheral side the memory side
        configuration were overwritten: d->csdp = ... -> d->csdp |= ...
      - The EI and FI were configured for the wrong sides of the transfers.
      
      With these changes and the converted tus6010_omap.c I was able to verify
      that things are working as they expected to work.
      
      Fixes: 201ac486 ("dmaengine: omap-dma: Support for slave devices with data port window")
      Signed-off-by: NPeter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      527a2759
    • K
      dmaengine: iota: ioat_alloc_chan_resources should not perform sleeping allocations. · 21d25f6a
      Krister Johansen 提交于
      On a kernel with DEBUG_LOCKS, ioat_free_chan_resources triggers an
      in_interrupt() warning.  With PROVE_LOCKING, it reports detecting a
      SOFTIRQ-safe to SOFTIRQ-unsafe lock ordering in the same code path.
      
      This is because dma_generic_alloc_coherent() checks if the GFP flags
      permit blocking.  It allocates from different subsystems if blocking is
      permitted.  The free path knows how to return the memory to the correct
      allocator.  If GFP_KERNEL is specified then the alloc and free end up
      going through cma_alloc(), which uses mutexes.
      
      Given that ioat_free_chan_resources() can be called in interrupt
      context, ioat_alloc_chan_resources() must specify GFP_NOWAIT so that the
      allocations do not block and instead use an allocator that uses
      spinlocks.
      Signed-off-by: NKrister Johansen <kjlx@templeofstupid.com>
      Acked-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      21d25f6a
  3. 03 1月, 2017 3 次提交
  4. 02 1月, 2017 8 次提交
  5. 31 12月, 2016 2 次提交
  6. 30 12月, 2016 2 次提交
    • O
      mm/filemap: fix parameters to test_bit() · 98473f9f
      Olof Johansson 提交于
       mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte':
        mm/filemap.c:933:9: error: too few arguments to function 'test_bit'
          return test_bit(PG_waiters);
               ^~~~~~~~
      
      Fixes: b91e1302 ('mm: optimize PageWaiters bit use for unlock_page()')
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Brown-paper-bag-by: NLinus Torvalds <dummy@duh.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98473f9f
    • L
      mm: optimize PageWaiters bit use for unlock_page() · b91e1302
      Linus Torvalds 提交于
      In commit 62906027 ("mm: add PageWaiters indicating tasks are
      waiting for a page bit") Nick Piggin made our page locking no longer
      unconditionally touch the hashed page waitqueue, which not only helps
      performance in general, but is particularly helpful on NUMA machines
      where the hashed wait queues can bounce around a lot.
      
      However, the "clear lock bit atomically and then test the waiters bit"
      sequence turns out to be much more expensive than it needs to be,
      because you get a nasty stall when trying to access the same word that
      just got updated atomically.
      
      On architectures where locking is done with LL/SC, this would be trivial
      to fix with a new primitive that clears one bit and tests another
      atomically, but that ends up not working on x86, where the only atomic
      operations that return the result end up being cmpxchg and xadd.  The
      atomic bit operations return the old value of the same bit we changed,
      not the value of an unrelated bit.
      
      On x86, we could put the lock bit in the high bit of the byte, and use
      "xadd" with that bit (where the overflow ends up not touching other
      bits), and look at the other bits of the result.  However, an even
      simpler model is to just use a regular atomic "and" to clear the lock
      bit, and then the sign bit in eflags will indicate the resulting state
      of the unrelated bit #7.
      
      So by moving the PageWaiters bit up to bit #7, we can atomically clear
      the lock bit and test the waiters bit on x86 too.  And architectures
      with LL/SC (which is all the usual RISC suspects), the particular bit
      doesn't matter, so they are fine with this approach too.
      
      This avoids the extra access to the same atomic word, and thus avoids
      the costly stall at page unlock time.
      
      The only downside is that the interface ends up being a bit odd and
      specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
      love the resulting name of the new primitive, but I'd rather make the
      name be descriptive and very clear about the limitation imposed by
      trying to work across all relevant architectures than make it be some
      generic thing that doesn't make the odd semantics explicit.
      
      So this introduces the new architecture primitive
      
          clear_bit_unlock_is_negative_byte();
      
      and adds the trivial implementation for x86.  We have a generic
      non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
      combination) which can be overridden by any architecture that can do
      better.  According to Nick, Power has the same hickup x86 has, for
      example, but some other architectures may not even care.
      
      All these optimizations mean that my page locking stress-test (which is
      just executing a lot of small short-lived shell scripts: "make test" in
      the git source tree) no longer makes our page locking look horribly bad.
      Before all these optimizations, just the unlock_page() costs were just
      over 3% of all CPU overhead on "make test".  After this, it's down to
      0.66%, so just a quarter of the cost it used to be.
      
      (The difference on NUMA is bigger, but there this micro-optimization is
      likely less noticeable, since the big issue on NUMA was not the accesses
      to 'struct page', but the waitqueue accesses that were already removed
      by Nick's earlier commit).
      Acked-by: NNick Piggin <npiggin@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b91e1302
  7. 28 12月, 2016 10 次提交
  8. 27 12月, 2016 12 次提交