1. 03 1月, 2017 18 次提交
  2. 02 1月, 2017 2 次提交
    • L
      Linux 4.10-rc2 · 0c744ea4
      Linus Torvalds 提交于
      0c744ea4
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 4759d386
      Linus Torvalds 提交于
      Pull DAX updates from Dan Williams:
       "The completion of Jan's DAX work for 4.10.
      
        As I mentioned in the libnvdimm-for-4.10 pull request, these are some
        final fixes for the DAX dirty-cacheline-tracking invalidation work
        that was merged through the -mm, ext4, and xfs trees in -rc1. These
        patches were prepared prior to the merge window, but we waited for
        4.10-rc1 to have a stable merge base after all the prerequisites were
        merged.
      
        Quoting Jan on the overall changes in these patches:
      
           "So I'd like all these 6 patches to go for rc2. The first three
            patches fix invalidation of exceptional DAX entries (a bug which
            is there for a long time) - without these patches data loss can
            occur on power failure even though user called fsync(2). The other
            three patches change locking of DAX faults so that ->iomap_begin()
            is called in a more relaxed locking context and we are safe to
            start a transaction there for ext4"
      
        These have received a build success notification from the kbuild
        robot, and pass the latest libnvdimm unit tests. There have not been
        any -next releases since -rc1, so they have not appeared there"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        ext4: Simplify DAX fault path
        dax: Call ->iomap_begin without entry lock during dax fault
        dax: Finish fault completely when loading holes
        dax: Avoid page invalidation races and unnecessary radix tree traversals
        mm: Invalidate DAX radix tree entries only if appropriate
        ext2: Return BH_New buffers for zeroed blocks
      4759d386
  3. 31 12月, 2016 2 次提交
  4. 30 12月, 2016 2 次提交
    • O
      mm/filemap: fix parameters to test_bit() · 98473f9f
      Olof Johansson 提交于
       mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte':
        mm/filemap.c:933:9: error: too few arguments to function 'test_bit'
          return test_bit(PG_waiters);
               ^~~~~~~~
      
      Fixes: b91e1302 ('mm: optimize PageWaiters bit use for unlock_page()')
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Brown-paper-bag-by: NLinus Torvalds <dummy@duh.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98473f9f
    • L
      mm: optimize PageWaiters bit use for unlock_page() · b91e1302
      Linus Torvalds 提交于
      In commit 62906027 ("mm: add PageWaiters indicating tasks are
      waiting for a page bit") Nick Piggin made our page locking no longer
      unconditionally touch the hashed page waitqueue, which not only helps
      performance in general, but is particularly helpful on NUMA machines
      where the hashed wait queues can bounce around a lot.
      
      However, the "clear lock bit atomically and then test the waiters bit"
      sequence turns out to be much more expensive than it needs to be,
      because you get a nasty stall when trying to access the same word that
      just got updated atomically.
      
      On architectures where locking is done with LL/SC, this would be trivial
      to fix with a new primitive that clears one bit and tests another
      atomically, but that ends up not working on x86, where the only atomic
      operations that return the result end up being cmpxchg and xadd.  The
      atomic bit operations return the old value of the same bit we changed,
      not the value of an unrelated bit.
      
      On x86, we could put the lock bit in the high bit of the byte, and use
      "xadd" with that bit (where the overflow ends up not touching other
      bits), and look at the other bits of the result.  However, an even
      simpler model is to just use a regular atomic "and" to clear the lock
      bit, and then the sign bit in eflags will indicate the resulting state
      of the unrelated bit #7.
      
      So by moving the PageWaiters bit up to bit #7, we can atomically clear
      the lock bit and test the waiters bit on x86 too.  And architectures
      with LL/SC (which is all the usual RISC suspects), the particular bit
      doesn't matter, so they are fine with this approach too.
      
      This avoids the extra access to the same atomic word, and thus avoids
      the costly stall at page unlock time.
      
      The only downside is that the interface ends up being a bit odd and
      specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
      love the resulting name of the new primitive, but I'd rather make the
      name be descriptive and very clear about the limitation imposed by
      trying to work across all relevant architectures than make it be some
      generic thing that doesn't make the odd semantics explicit.
      
      So this introduces the new architecture primitive
      
          clear_bit_unlock_is_negative_byte();
      
      and adds the trivial implementation for x86.  We have a generic
      non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
      combination) which can be overridden by any architecture that can do
      better.  According to Nick, Power has the same hickup x86 has, for
      example, but some other architectures may not even care.
      
      All these optimizations mean that my page locking stress-test (which is
      just executing a lot of small short-lived shell scripts: "make test" in
      the git source tree) no longer makes our page locking look horribly bad.
      Before all these optimizations, just the unlock_page() costs were just
      over 3% of all CPU overhead on "make test".  After this, it's down to
      0.66%, so just a quarter of the cost it used to be.
      
      (The difference on NUMA is bigger, but there this micro-optimization is
      likely less noticeable, since the big issue on NUMA was not the accesses
      to 'struct page', but the waitqueue accesses that were already removed
      by Nick's earlier commit).
      Acked-by: NNick Piggin <npiggin@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b91e1302
  5. 28 12月, 2016 10 次提交
  6. 27 12月, 2016 6 次提交
    • L
      crypto: testmgr - Use heap buffer for acomp test input · 02608e02
      Laura Abbott 提交于
      Christopher Covington reported a crash on aarch64 on recent Fedora
      kernels:
      
      kernel BUG at ./include/linux/scatterlist.h:140!
      Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 2 PID: 752 Comm: cryptomgr_test Not tainted 4.9.0-11815-ge93b1cc8 #162
      Hardware name: linux,dummy-virt (DT)
      task: ffff80007c650080 task.stack: ffff800008910000
      PC is at sg_init_one+0xa0/0xb8
      LR is at sg_init_one+0x24/0xb8
      ...
      [<ffff000008398db8>] sg_init_one+0xa0/0xb8
      [<ffff000008350a44>] test_acomp+0x10c/0x438
      [<ffff000008350e20>] alg_test_comp+0xb0/0x118
      [<ffff00000834f28c>] alg_test+0x17c/0x2f0
      [<ffff00000834c6a4>] cryptomgr_test+0x44/0x50
      [<ffff0000080dac70>] kthread+0xf8/0x128
      [<ffff000008082ec0>] ret_from_fork+0x10/0x50
      
      The test vectors used for input are part of the kernel image. These
      inputs are passed as a buffer to sg_init_one which eventually blows up
      with BUG_ON(!virt_addr_valid(buf)). On arm64, virt_addr_valid returns
      false for the kernel image since virt_to_page will not return the
      correct page. Fix this by copying the input vectors to heap buffer
      before setting up the scatterlist.
      Reported-by: NChristopher Covington <cov@codeaurora.org>
      Fixes: d7db7a88 ("crypto: acomp - update testmgr with support for acomp")
      Signed-off-by: NLaura Abbott <labbott@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      02608e02
    • J
      ext4: Simplify DAX fault path · 1db17542
      Jan Kara 提交于
      Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we
      can use transaction starting in ext4_iomap_begin() and thus simplify
      ext4_dax_fault(). It also provides us proper retries in case of ENOSPC.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1db17542
    • J
      dax: Call ->iomap_begin without entry lock during dax fault · 9f141d6e
      Jan Kara 提交于
      Currently ->iomap_begin() handler is called with entry lock held. If the
      filesystem held any locks between ->iomap_begin() and ->iomap_end()
      (such as ext4 which will want to hold transaction open), this would cause
      lock inversion with the iomap_apply() from standard IO path which first
      calls ->iomap_begin() and only then calls ->actor() callback which grabs
      entry locks for DAX (if it faults when copying from/to user provided
      buffers).
      
      Fix the problem by nesting grabbing of entry lock inside ->iomap_begin()
      - ->iomap_end() pair.
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9f141d6e
    • J
      dax: Finish fault completely when loading holes · f449b936
      Jan Kara 提交于
      The only case when we do not finish the page fault completely is when we
      are loading hole pages into a radix tree. Avoid this special case and
      finish the fault in that case as well inside the DAX fault handler. It
      will allow us for easier iomap handling.
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f449b936
    • J
      dax: Avoid page invalidation races and unnecessary radix tree traversals · e3fce68c
      Jan Kara 提交于
      Currently dax_iomap_rw() takes care of invalidating page tables and
      evicting hole pages from the radix tree when write(2) to the file
      happens. This invalidation is only necessary when there is some block
      allocation resulting from write(2). Furthermore in current place the
      invalidation is racy wrt page fault instantiating a hole page just after
      we have invalidated it.
      
      So perform the page invalidation inside dax_iomap_actor() where we can
      do it only when really necessary and after blocks have been allocated so
      nobody will be instantiating new hole pages anymore.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e3fce68c
    • J
      mm: Invalidate DAX radix tree entries only if appropriate · c6dcf52c
      Jan Kara 提交于
      Currently invalidate_inode_pages2_range() and invalidate_mapping_pages()
      just delete all exceptional radix tree entries they find. For DAX this
      is not desirable as we track cache dirtiness in these entries and when
      they are evicted, we may not flush caches although it is necessary. This
      can for example manifest when we write to the same block both via mmap
      and via write(2) (to different offsets) and fsync(2) then does not
      properly flush CPU caches when modification via write(2) was the last
      one.
      
      Create appropriate DAX functions to handle invalidation of DAX entries
      for invalidate_inode_pages2_range() and invalidate_mapping_pages() and
      wire them up into the corresponding mm functions.
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c6dcf52c