提交 · 0fe49f70a08d7d25acee3b066a88c654fea26121 · openeuler / Kernel

17 7月, 2019 1 次提交

dax: Fix missed wakeup with PMD faults · 23c84eb7

由 Matthew Wilcox (Oracle) 提交于 7月 03, 2019

RocksDB can hang indefinitely when using a DAX file.  This is due to
a bug in the XArray conversion when handling a PMD fault and finding a
PTE entry.  We use the wrong index in the hash and end up waiting on
the wrong waitqueue.

There's actually no need to wait; if we find a PTE entry while looking
for a PMD entry, we can return immediately as we know we should fall
back to a PTE fault (which may not conflict with the lock held).

We reuse the XA_RETRY_ENTRY to signal a conflicting entry was found.
This value can never be found in an XArray while holding its lock, so
it does not create an ambiguity.

Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/CAPcyv4hwHpX-MkUEqxwdTj7wCCZCN4RV-L4jsnuwLGyL_UEG4A@mail.gmail.com
Fixes: b15cd800 ("dax: Convert page fault handlers to XArray")
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Reported-by: NRobert Barror <robert.barror@intel.com>
Reported-by: NSeema Pandit <seema.pandit@intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

23c84eb7

17 6月, 2019 1 次提交

locking/lockdep: Rename lockdep_assert_held_exclusive() -> lockdep_assert_held_write() · 9ffbe8ac

由 Nikolay Borisov 提交于 5月 31, 2019

All callers of lockdep_assert_held_exclusive() use it to verify the
correct locking state of either a semaphore (ldisc_sem in tty,
mmap_sem for perf events, i_rwsem of inode for dax) or rwlock by
apparmor. Thus it makes sense to rename _exclusive to _write since
that's the semantics callers care. Additionally there is already
lockdep_assert_held_read(), which this new naming is more consistent with.

No functional changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190531100651.3969-1-nborisov@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9ffbe8ac

07 6月, 2019 1 次提交

dax: Fix xarray entry association for mixed mappings · 1571c029

由 Jan Kara 提交于 6月 06, 2019

When inserting entry into xarray, we store mapping and index in
corresponding struct pages for memory error handling. When it happened
that one process was mapping file at PMD granularity while another
process at PTE granularity, we could wrongly deassociate PMD range and
then reassociate PTE range leaving the rest of struct pages in PMD range
without mapping information which could later cause missed notifications
about memory errors. Fix the problem by calling the association /
deassociation code if and only if we are really going to update the
xarray (deassociating and associating zero or empty entries is just
no-op so there's no reason to complicate the code with trying to avoid
the calls for these cases).

Cc: <stable@vger.kernel.org>
Fixes: d2c997c0 ("fs, dax: use page->mapping to warn if truncate...")
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1571c029

05 6月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 · 2025cf9e

由 Thomas Gleixner 提交于 5月 29, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms and conditions of the gnu general public license
  version 2 as published by the free software foundation this program
  is distributed in the hope it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 263 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NAlexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2025cf9e

15 5月, 2019 2 次提交

mm: page_mkclean vs MADV_DONTNEED race · 024eee0e

由 Aneesh Kumar K.V 提交于 5月 13, 2019

MADV_DONTNEED is handled with mmap_sem taken in read mode.  We call
page_mkclean without holding mmap_sem.

MADV_DONTNEED implies that pages in the region are unmapped and subsequent
access to the pages in that range is handled as a new page fault.  This
implies that if we don't have parallel access to the region when
MADV_DONTNEED is run we expect those range to be unallocated.

w.r.t page_mkclean() we need to make sure that we don't break the
MADV_DONTNEED semantics.  MADV_DONTNEED check for pmd_none without holding
pmd_lock.  This implies we skip the pmd if we temporarily mark pmd none.
Avoid doing that while marking the page clean.

Keep the sequence same for dax too even though we don't support
MADV_DONTNEED for dax mapping

The bug was noticed by code review and I didn't observe any failures w.r.t
test run.  This is similar to

commit 58ceeb6b
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Thu Apr 13 14:56:26 2017 -0700

    thp: fix MADV_DONTNEED vs. MADV_FREE race

commit ced10803
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Thu Apr 13 14:56:20 2017 -0700

    thp: fix MADV_DONTNEED vs. numa balancing race

Link: http://lkml.kernel.org/r/20190321040610.14226-1-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc:"Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

024eee0e

mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses · fce86ff5

由 Dan Williams 提交于 5月 13, 2019

Starting with c6f3c5ee ("mm/huge_memory.c: fix modifying of page
protection by insert_pfn_pmd()") vmf_insert_pfn_pmd() internally calls
pmdp_set_access_flags().  That helper enforces a pmd aligned @address
argument via VM_BUG_ON() assertion.

Update the implementation to take a 'struct vm_fault' argument directly
and apply the address alignment fixup internally to fix crash signatures
like:

    kernel BUG at arch/x86/mm/pgtable.c:515!
    invalid opcode: 0000 [#1] SMP NOPTI
    CPU: 51 PID: 43713 Comm: java Tainted: G           OE     4.19.35 #1
    [..]
    RIP: 0010:pmdp_set_access_flags+0x48/0x50
    [..]
    Call Trace:
     vmf_insert_pfn_pmd+0x198/0x350
     dax_iomap_fault+0xe82/0x1190
     ext4_dax_huge_fault+0x103/0x1f0
     ? __switch_to_asm+0x40/0x70
     __handle_mm_fault+0x3f6/0x1370
     ? __switch_to_asm+0x34/0x70
     ? __switch_to_asm+0x40/0x70
     handle_mm_fault+0xda/0x200
     __do_page_fault+0x249/0x4f0
     do_page_fault+0x32/0x110
     ? page_fault+0x8/0x30
     page_fault+0x1e/0x30

Link: http://lkml.kernel.org/r/155741946350.372037.11148198430068238140.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: c6f3c5ee ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()")
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reported-by: NPiotr Balcer <piotr.balcer@intel.com>
Tested-by: NYan Ma <yan.ma@intel.com>
Tested-by: NPankaj Gupta <pagupta@redhat.com>
Reviewed-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Chandan Rajendra <chandan@linux.ibm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fce86ff5

14 3月, 2019 1 次提交

fs/dax: Deposit pagetable even when installing zero page · 11cf9d86

由 Aneesh Kumar K.V 提交于 3月 09, 2019

Architectures like ppc64 use the deposited page table to store hardware
page table slot information. Make sure we deposit a page table when
using zero page at the pmd level for hash.

Without this we hit

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000082a74
Oops: Kernel access of bad area, sig: 11 [#1]
....

NIP [c000000000082a74] __hash_page_thp+0x224/0x5b0
LR [c0000000000829a4] __hash_page_thp+0x154/0x5b0
Call Trace:
 hash_page_mm+0x43c/0x740
 do_hash_page+0x2c/0x3c
 copy_from_iter_flushcache+0xa4/0x4a0
 pmem_copy_from_iter+0x2c/0x50 [nd_pmem]
 dax_copy_from_iter+0x40/0x70
 dax_iomap_actor+0x134/0x360
 iomap_apply+0xfc/0x1b0
 dax_iomap_rw+0xac/0x130
 ext4_file_write_iter+0x254/0x460 [ext4]
 __vfs_write+0x120/0x1e0
 vfs_write+0xd8/0x220
 SyS_write+0x6c/0x110
 system_call+0x3c/0x130

Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
Cc: <stable@vger.kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

11cf9d86

02 3月, 2019 1 次提交

dax: Flush partial PMDs correctly · e4b3448b

由 Matthew Wilcox 提交于 3月 01, 2019

The radix tree would rewind the index in an iterator to the lowest index
of a multi-slot entry.  The XArray iterators instead leave the index
unchanged, but I overlooked that when converting DAX from the radix tree
to the XArray.  Adjust the index that we use for flushing to the start
of the PMD range.

Fixes: c1901cd3 ("page cache: Convert find_get_entries_tag to XArray")
Cc: <stable@vger.kernel.org>
Reported-by: NPiotr Balcer <piotr.balcer@intel.com>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e4b3448b

13 2月, 2019 2 次提交

fs/dax: NIT fix comment regarding start/end vs range · 0cefc36b

由 Ira Weiny 提交于 1月 17, 2019

Fixes: ac46d4f3 ("mm/mmu_notifier: use structure for invalidate_range_start/end calls v2")
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0cefc36b

fs/dax: Convert to use vmf_error() · c9aed74e

由 Souptick Joarder 提交于 1月 05, 2019

This code is converted to use vmf_error().
Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c9aed74e

29 12月, 2018 1 次提交

mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 · ac46d4f3

由 Jérôme Glisse 提交于 12月 28, 2018

To avoid having to change many call sites everytime we want to add a
parameter use a structure to group all parameters for the mmu_notifier
invalidate_range_start/end cakks.  No functional changes with this patch.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/20181205053628.3210-3-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Acked-by: NJan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
From: Jérôme Glisse <jglisse@redhat.com>
Subject: mm/mmu_notifier: use structure for invalidate_range_start/end calls v3

fix build warning in migrate.c when CONFIG_MMU_NOTIFIER=n

Link: http://lkml.kernel.org/r/20181213171330.8489-3-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac46d4f3

22 12月, 2018 1 次提交

dax: Use non-exclusive wait in wait_entry_unlocked() · d8a70641

由 Dan Williams 提交于 12月 21, 2018

get_unlocked_entry() uses an exclusive wait because it is guaranteed to
eventually obtain the lock and follow on with an unlock+wakeup cycle.
The wait_entry_unlocked() path does not have the same guarantee. Rather
than open-code an extra wakeup, just switch to a non-exclusive wait.

Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d8a70641

05 12月, 2018 1 次提交

dax: Fix unlock mismatch with updated API · 27359fd6

由 Matthew Wilcox 提交于 11月 30, 2018

Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.

In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.

Change the api to return the unlock cookie from dax_lock_page() and pass
it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was
assuming that the page was PMD-aligned if the entry was a PMD entry with
signatures like:

 WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
 RIP: 0010:dax_insert_entry+0x2b2/0x2d0
 [..]
 Call Trace:
  dax_iomap_pte_fault.isra.41+0x791/0xde0
  ext4_dax_huge_fault+0x16f/0x1f0
  ? up_read+0x1c/0xa0
  __do_fault+0x1f/0x160
  __handle_mm_fault+0x1033/0x1490
  handle_mm_fault+0x18b/0x3d0

Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
Reported-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

27359fd6

29 11月, 2018 2 次提交

dax: Don't access a freed inode · 55e56f06

由 Matthew Wilcox 提交于 11月 27, 2018

After we drop the i_pages lock, the inode can be freed at any time.
The get_unlocked_entry() code has no choice but to reacquire the lock,
so it can't be used here.  Create a new wait_entry_unlocked() which takes
care not to acquire the lock or dereference the address_space in any way.

Fixes: c2a7d2a1 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc: <stable@vger.kernel.org>
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

55e56f06

dax: Check page->mapping isn't NULL · c93db7bb

由 Matthew Wilcox 提交于 11月 27, 2018

If we race with inode destroy, it's possible for page->mapping to be
NULL before we even enter this routine, as well as after having slept
waiting for the dax entry to become unlocked.

Fixes: c2a7d2a1 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc: <stable@vger.kernel.org>
Reported-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c93db7bb

19 11月, 2018 1 次提交

dax: Avoid losing wakeup in dax_lock_mapping_entry · 25bbe21b

由 Matthew Wilcox 提交于 11月 16, 2018

After calling get_unlocked_entry(), you have to call
put_unlocked_entry() to avoid subsequent waiters losing wakeups.

Fixes: c2a7d2a1 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc: stable@vger.kernel.org
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

25bbe21b

18 11月, 2018 2 次提交

dax: Fix huge page faults · 0e40de03

由 Matthew Wilcox 提交于 11月 16, 2018

Using xas_load() with a PMD-sized xa_state would work if either a
PMD-sized entry was present or a PTE sized entry was present in the
first 64 entries (of the 512 PTEs in a PMD on x86).  If there was no
PTE in the first 64 entries, grab_mapping_entry() would believe there
were no entries present, allocate a PMD-sized entry and overwrite the
PTE in the page cache.

Use xas_find_conflict() instead which turns out to simplify
both get_unlocked_entry() and grab_mapping_entry().  Also remove a
WARN_ON_ONCE from grab_mapping_entry() as it will have already triggered
in get_unlocked_entry().

Fixes: cfc93c6c ("dax: Convert dax_insert_pfn_mkwrite to XArray")
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

0e40de03

dax: Fix dax_unlock_mapping_entry for PMD pages · fda490d3

由 Matthew Wilcox 提交于 11月 16, 2018

Device DAX PMD pages do not set the PageHead bit for compound pages.
Fix for now by retrieving the PMD bit from the entry, but eventually we
will be passed the page size by the caller.
Reported-by: NDan Williams <dan.j.williams@intel.com>
Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

fda490d3

17 11月, 2018 3 次提交

dax: Reinstate RCU protection of inode · c5bbd451

由 Matthew Wilcox 提交于 11月 16, 2018

For the device-dax case, it is possible that the inode can go away
underneath us. The rcu_read_lock() was there to prevent it from
being freed, and not (as I thought) to protect the tree. Bring back
the rcu_read_lock() protection. Also add a little kernel-doc; while
this function is not exported to modules, it is used from outside dax.c
Reported-by: NDan Williams <dan.j.williams@intel.com>
Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

c5bbd451

dax: Make sure the unlocking entry isn't locked · 7ae2ea7d

由 Matthew Wilcox 提交于 11月 09, 2018

I wrote the semantics in the commit message, but didn't document it in
the source code. Use a BUG_ON instead (if any code does do this, it's
really buggy; we can't recover and it's worth taking the machine down).
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

7ae2ea7d

dax: Remove optimisation from dax_lock_mapping_entry · 6d7cd8c1

由 Matthew Wilcox 提交于 11月 06, 2018

Skipping some of the revalidation after we sleep can lead to returning
a mapping which has already been freed. Just drop this optimisation.
Reported-by: NDan Williams <dan.j.williams@intel.com>
Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

6d7cd8c1

21 10月, 2018 8 次提交

dax: Convert page fault handlers to XArray · b15cd800

由 Matthew Wilcox 提交于 3月 29, 2018

This is the last part of DAX to be converted to the XArray so
remove all the old helper functions.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

b15cd800

dax: Convert dax_lock_mapping_entry to XArray · 9f32d221

由 Matthew Wilcox 提交于 6月 12, 2018

Instead of always retrying when we slept, only retry if the page has
moved.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

9f32d221

dax: Convert dax writeback to XArray · 9fc747f6

由 Matthew Wilcox 提交于 3月 28, 2018

Use XArray iteration instead of a pagevec.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

9fc747f6

dax: Convert __dax_invalidate_entry to XArray · 07f2d89c

由 Matthew Wilcox 提交于 3月 28, 2018

Avoids walking the radix tree multiple times looking for tags.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

07f2d89c

dax: Convert dax_layout_busy_page to XArray · 084a8990

由 Matthew Wilcox 提交于 5月 17, 2018

Instead of using a pagevec, just use the XArray iterators.  Add a
conditional rescheduling point which probably should have been there in
the original.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

084a8990

dax: Convert dax_insert_pfn_mkwrite to XArray · cfc93c6c

由 Matthew Wilcox 提交于 3月 28, 2018

Add some XArray-based helper functions to replace the radix tree based
metaphors currently in use. The biggest change is that converted code
doesn't see its own lock bit; get_unlocked_entry() always returns an
entry with the lock bit clear. So we don't have to mess around loading
the current entry and clearing the lock bit; we can just store the
unlocked entry that we already have.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

cfc93c6c

dax: Hash on XArray instead of mapping · ec4907ff

由 Matthew Wilcox 提交于 3月 28, 2018

Since the XArray is embedded in the struct address_space, its address
contains exactly as much entropy as the address of the mapping.  This
patch is purely preparatory for later patches which will simplify the
wait/wake interfaces.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

ec4907ff

dax: Rename some functions · a77d19f4

由 Matthew Wilcox 提交于 3月 27, 2018

Remove mentions of 'radix' and 'radix tree'.  Simplify some names by
dropping the word 'mapping'.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

a77d19f4

09 10月, 2018 1 次提交

filesystem-dax: Fix dax_layout_busy_page() livelock · d7782145

由 Dan Williams 提交于 10月 06, 2018

In the presence of multi-order entries the typical
pagevec_lookup_entries() pattern may loop forever:

	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
				min(end - index, (pgoff_t)PAGEVEC_SIZE),
				indices)) {
		...
		for (i = 0; i < pagevec_count(&pvec); i++) {
			index = indices[i];
			...
		}
		index++; /* BUG */
	}

The loop updates 'index' for each index found and then increments to the
next possible page to continue the lookup. However, if the last entry in
the pagevec is multi-order then the next possible page index is more
than 1 page away. Fix this locally for the filesystem-dax case by
checking for dax-multi-order entries. Going forward new users of
multi-order entries need to be similarly careful, or we need a generic
way to report the page increment in the radix iterator.

Fixes: 5fac7408 ("mm, fs, dax: handle layout changes to pinned dax...")
Cc: <stable@vger.kernel.org>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d7782145

30 9月, 2018 1 次提交

xarray: Replace exceptional entries · 3159f943

由 Matthew Wilcox 提交于 11月 03, 2017

Introduce xarray value entries and tagged pointers to replace radix
tree exceptional entries.  This is a slight change in encoding to allow
the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
value entry).  It is also a change in emphasis; exceptional entries are
intimidating and different.  As the comment explains, you can choose
to store values or pointers in the xarray and they are both first-class
citizens.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJosef Bacik <jbacik@fb.com>

3159f943

28 9月, 2018 1 次提交

dax: Fix deadlock in dax_lock_mapping_entry() · f52afc93

由 Jan Kara 提交于 9月 27, 2018

When dax_lock_mapping_entry() has to sleep to obtain entry lock, it will
fail to unlock mapping->i_pages spinlock and thus immediately deadlock
against itself when retrying to grab the entry lock again. Fix the
problem by unlocking mapping->i_pages before retrying.

Fixes: c2a7d2a1 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Reported-by: NBarret Rhoden <brho@google.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f52afc93

12 9月, 2018 1 次提交

filesystem-dax: Fix use of zero page · b90ca5cc

由 Matthew Wilcox 提交于 9月 11, 2018

Use my_zero_pfn instead of ZERO_PAGE(), and pass the vaddr to it instead
of zero so it works on MIPS and s390 who reference the vaddr to select a
zero page.

Cc: <stable@vger.kernel.org>
Fixes: 91d25ba8 ("dax: use common 4k zero page for dax mmap reads")
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b90ca5cc

31 7月, 2018 1 次提交

filesystem-dax: Do not request kaddr and pfn when not required · 86ed913b

由 Huaisheng Ye 提交于 7月 30, 2018

Some functions within fs/dax don't need to get local pointer kaddr
or variable pfn from direct_access. Using NULL instead of having to
pass in useless pointer or variable that caller then just throw away.
Signed-off-by: NHuaisheng Ye <yehs1@lenovo.com>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>

86ed913b

30 7月, 2018 1 次提交

dax: dax_layout_busy_page() warn on !exceptional · cdbf8897

由 Ross Zwisler 提交于 7月 29, 2018

Inodes using DAX should only ever have exceptional entries in their page
caches.  Make this clear by warning if the iteration in
dax_layout_busy_page() ever sees a non-exceptional entry, and by adding a
comment for the pagevec_release() call which only deals with struct page
pointers.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

cdbf8897

24 7月, 2018 1 次提交

filesystem-dax: Introduce dax_lock_mapping_entry() · c2a7d2a1

由 Dan Williams 提交于 7月 13, 2018

In preparation for implementing support for memory poison (media error)
handling via dax mappings, implement a lock_page() equivalent. Poison
error handling requires rmap and needs guarantees that the page->mapping
association is maintained / valid (inode not freed) for the duration of
the lookup.

In the device-dax case it is sufficient to simply hold a dev_pagemap
reference. In the filesystem-dax case we need to use the entry lock.

Export the entry lock via dax_lock_mapping_entry() that uses
rcu_read_lock() to protect against the inode being freed, and
revalidates the page->mapping association under xa_lock().

Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>

c2a7d2a1

21 7月, 2018 1 次提交

filesystem-dax: Set page->index · 73449daf

由 Dan Williams 提交于 7月 13, 2018

In support of enabling memory_failure() handling for filesystem-dax
mappings, set ->index to the pgoff of the page. The rmap implementation
requires ->index to bound the search through the vma interval tree. The
index is set and cleared at dax_associate_entry() and
dax_disassociate_entry() time respectively.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>

73449daf

08 6月, 2018 1 次提交

fs/dax.c: use new return type vm_fault_t · ab77dab4

由 Souptick Joarder 提交于 6月 07, 2018

Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

commit 1c8f4220 ("mm: change return type to vm_fault_t")

There was an existing bug inside dax_load_hole() if vm_insert_mixed had
failed to allocate a page table, we'd return VM_FAULT_NOPAGE instead of
VM_FAULT_OOM.  With new vmf_insert_mixed() this issue is addressed.

vm_insert_mixed_mkwrite has inefficiency when it returns an error value,
driver has to convert it to vm_fault_t type.  With new
vmf_insert_mixed_mkwrite() this limitation will be addressed.

Link: http://lkml.kernel.org/r/20180510181121.GA15239@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab77dab4

03 6月, 2018 1 次提交

dax: dax_insert_mapping_entry always succeeds · cc4a90ac

由 Matthew Wilcox 提交于 6月 02, 2018

It does not return an error, so we don't need to check the return value
for IS_ERR(). Indeed, it is a bug to do so; with a sufficiently large
PFN, a legitimate DAX entry may be mistaken for an error return.
Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

cc4a90ac

23 5月, 2018 1 次提交

dax: Report bytes remaining in dax_iomap_actor() · a77d4786

由 Dan Williams 提交于 3月 16, 2018

In preparation for protecting the dax read(2) path from media errors
with copy_to_iter_mcsafe() (via dax_copy_to_iter()), convert the
implementation to report the bytes successfully transferred.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

a77d4786

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功