提交 · fce86ff5802bac3a7b19db171aa1949ef9caac31 · openeuler / Kernel

15 5月, 2019 1 次提交

mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses · fce86ff5

由 Dan Williams 提交于 5月 13, 2019

Starting with c6f3c5ee ("mm/huge_memory.c: fix modifying of page
protection by insert_pfn_pmd()") vmf_insert_pfn_pmd() internally calls
pmdp_set_access_flags().  That helper enforces a pmd aligned @address
argument via VM_BUG_ON() assertion.

Update the implementation to take a 'struct vm_fault' argument directly
and apply the address alignment fixup internally to fix crash signatures
like:

    kernel BUG at arch/x86/mm/pgtable.c:515!
    invalid opcode: 0000 [#1] SMP NOPTI
    CPU: 51 PID: 43713 Comm: java Tainted: G           OE     4.19.35 #1
    [..]
    RIP: 0010:pmdp_set_access_flags+0x48/0x50
    [..]
    Call Trace:
     vmf_insert_pfn_pmd+0x198/0x350
     dax_iomap_fault+0xe82/0x1190
     ext4_dax_huge_fault+0x103/0x1f0
     ? __switch_to_asm+0x40/0x70
     __handle_mm_fault+0x3f6/0x1370
     ? __switch_to_asm+0x34/0x70
     ? __switch_to_asm+0x40/0x70
     handle_mm_fault+0xda/0x200
     __do_page_fault+0x249/0x4f0
     do_page_fault+0x32/0x110
     ? page_fault+0x8/0x30
     page_fault+0x1e/0x30

Link: http://lkml.kernel.org/r/155741946350.372037.11148198430068238140.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: c6f3c5ee ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()")
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reported-by: NPiotr Balcer <piotr.balcer@intel.com>
Tested-by: NYan Ma <yan.ma@intel.com>
Tested-by: NPankaj Gupta <pagupta@redhat.com>
Reviewed-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Chandan Rajendra <chandan@linux.ibm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fce86ff5

14 3月, 2019 1 次提交

fs/dax: Deposit pagetable even when installing zero page · 11cf9d86

由 Aneesh Kumar K.V 提交于 3月 09, 2019

Architectures like ppc64 use the deposited page table to store hardware
page table slot information. Make sure we deposit a page table when
using zero page at the pmd level for hash.

Without this we hit

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000082a74
Oops: Kernel access of bad area, sig: 11 [#1]
....

NIP [c000000000082a74] __hash_page_thp+0x224/0x5b0
LR [c0000000000829a4] __hash_page_thp+0x154/0x5b0
Call Trace:
 hash_page_mm+0x43c/0x740
 do_hash_page+0x2c/0x3c
 copy_from_iter_flushcache+0xa4/0x4a0
 pmem_copy_from_iter+0x2c/0x50 [nd_pmem]
 dax_copy_from_iter+0x40/0x70
 dax_iomap_actor+0x134/0x360
 iomap_apply+0xfc/0x1b0
 dax_iomap_rw+0xac/0x130
 ext4_file_write_iter+0x254/0x460 [ext4]
 __vfs_write+0x120/0x1e0
 vfs_write+0xd8/0x220
 SyS_write+0x6c/0x110
 system_call+0x3c/0x130

Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
Cc: <stable@vger.kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

11cf9d86

02 3月, 2019 1 次提交

dax: Flush partial PMDs correctly · e4b3448b

由 Matthew Wilcox 提交于 3月 01, 2019

The radix tree would rewind the index in an iterator to the lowest index
of a multi-slot entry.  The XArray iterators instead leave the index
unchanged, but I overlooked that when converting DAX from the radix tree
to the XArray.  Adjust the index that we use for flushing to the start
of the PMD range.

Fixes: c1901cd3 ("page cache: Convert find_get_entries_tag to XArray")
Cc: <stable@vger.kernel.org>
Reported-by: NPiotr Balcer <piotr.balcer@intel.com>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e4b3448b

13 2月, 2019 2 次提交

fs/dax: NIT fix comment regarding start/end vs range · 0cefc36b

由 Ira Weiny 提交于 1月 17, 2019

Fixes: ac46d4f3 ("mm/mmu_notifier: use structure for invalidate_range_start/end calls v2")
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0cefc36b

fs/dax: Convert to use vmf_error() · c9aed74e

由 Souptick Joarder 提交于 1月 05, 2019

This code is converted to use vmf_error().
Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c9aed74e

29 12月, 2018 1 次提交

mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 · ac46d4f3

由 Jérôme Glisse 提交于 12月 28, 2018

To avoid having to change many call sites everytime we want to add a
parameter use a structure to group all parameters for the mmu_notifier
invalidate_range_start/end cakks.  No functional changes with this patch.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/20181205053628.3210-3-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Acked-by: NJan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
From: Jérôme Glisse <jglisse@redhat.com>
Subject: mm/mmu_notifier: use structure for invalidate_range_start/end calls v3

fix build warning in migrate.c when CONFIG_MMU_NOTIFIER=n

Link: http://lkml.kernel.org/r/20181213171330.8489-3-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac46d4f3

22 12月, 2018 1 次提交

dax: Use non-exclusive wait in wait_entry_unlocked() · d8a70641

由 Dan Williams 提交于 12月 21, 2018

get_unlocked_entry() uses an exclusive wait because it is guaranteed to
eventually obtain the lock and follow on with an unlock+wakeup cycle.
The wait_entry_unlocked() path does not have the same guarantee. Rather
than open-code an extra wakeup, just switch to a non-exclusive wait.

Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d8a70641

05 12月, 2018 1 次提交

dax: Fix unlock mismatch with updated API · 27359fd6

由 Matthew Wilcox 提交于 11月 30, 2018

Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.

In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.

Change the api to return the unlock cookie from dax_lock_page() and pass
it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was
assuming that the page was PMD-aligned if the entry was a PMD entry with
signatures like:

 WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
 RIP: 0010:dax_insert_entry+0x2b2/0x2d0
 [..]
 Call Trace:
  dax_iomap_pte_fault.isra.41+0x791/0xde0
  ext4_dax_huge_fault+0x16f/0x1f0
  ? up_read+0x1c/0xa0
  __do_fault+0x1f/0x160
  __handle_mm_fault+0x1033/0x1490
  handle_mm_fault+0x18b/0x3d0

Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
Reported-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

27359fd6

29 11月, 2018 2 次提交

dax: Don't access a freed inode · 55e56f06

由 Matthew Wilcox 提交于 11月 27, 2018

After we drop the i_pages lock, the inode can be freed at any time.
The get_unlocked_entry() code has no choice but to reacquire the lock,
so it can't be used here.  Create a new wait_entry_unlocked() which takes
care not to acquire the lock or dereference the address_space in any way.

Fixes: c2a7d2a1 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc: <stable@vger.kernel.org>
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

55e56f06

dax: Check page->mapping isn't NULL · c93db7bb

由 Matthew Wilcox 提交于 11月 27, 2018

If we race with inode destroy, it's possible for page->mapping to be
NULL before we even enter this routine, as well as after having slept
waiting for the dax entry to become unlocked.

Fixes: c2a7d2a1 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc: <stable@vger.kernel.org>
Reported-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c93db7bb

19 11月, 2018 1 次提交

dax: Avoid losing wakeup in dax_lock_mapping_entry · 25bbe21b

由 Matthew Wilcox 提交于 11月 16, 2018

After calling get_unlocked_entry(), you have to call
put_unlocked_entry() to avoid subsequent waiters losing wakeups.

Fixes: c2a7d2a1 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc: stable@vger.kernel.org
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

25bbe21b

18 11月, 2018 2 次提交

dax: Fix huge page faults · 0e40de03

由 Matthew Wilcox 提交于 11月 16, 2018

Using xas_load() with a PMD-sized xa_state would work if either a
PMD-sized entry was present or a PTE sized entry was present in the
first 64 entries (of the 512 PTEs in a PMD on x86).  If there was no
PTE in the first 64 entries, grab_mapping_entry() would believe there
were no entries present, allocate a PMD-sized entry and overwrite the
PTE in the page cache.

Use xas_find_conflict() instead which turns out to simplify
both get_unlocked_entry() and grab_mapping_entry().  Also remove a
WARN_ON_ONCE from grab_mapping_entry() as it will have already triggered
in get_unlocked_entry().

Fixes: cfc93c6c ("dax: Convert dax_insert_pfn_mkwrite to XArray")
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

0e40de03

dax: Fix dax_unlock_mapping_entry for PMD pages · fda490d3

由 Matthew Wilcox 提交于 11月 16, 2018

Device DAX PMD pages do not set the PageHead bit for compound pages.
Fix for now by retrieving the PMD bit from the entry, but eventually we
will be passed the page size by the caller.
Reported-by: NDan Williams <dan.j.williams@intel.com>
Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

fda490d3

17 11月, 2018 3 次提交

dax: Reinstate RCU protection of inode · c5bbd451

由 Matthew Wilcox 提交于 11月 16, 2018

For the device-dax case, it is possible that the inode can go away
underneath us. The rcu_read_lock() was there to prevent it from
being freed, and not (as I thought) to protect the tree. Bring back
the rcu_read_lock() protection. Also add a little kernel-doc; while
this function is not exported to modules, it is used from outside dax.c
Reported-by: NDan Williams <dan.j.williams@intel.com>
Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

c5bbd451

dax: Make sure the unlocking entry isn't locked · 7ae2ea7d

由 Matthew Wilcox 提交于 11月 09, 2018

I wrote the semantics in the commit message, but didn't document it in
the source code. Use a BUG_ON instead (if any code does do this, it's
really buggy; we can't recover and it's worth taking the machine down).
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

7ae2ea7d

dax: Remove optimisation from dax_lock_mapping_entry · 6d7cd8c1

由 Matthew Wilcox 提交于 11月 06, 2018

Skipping some of the revalidation after we sleep can lead to returning
a mapping which has already been freed. Just drop this optimisation.
Reported-by: NDan Williams <dan.j.williams@intel.com>
Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

6d7cd8c1

21 10月, 2018 8 次提交

dax: Convert page fault handlers to XArray · b15cd800

由 Matthew Wilcox 提交于 3月 29, 2018

This is the last part of DAX to be converted to the XArray so
remove all the old helper functions.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

b15cd800

dax: Convert dax_lock_mapping_entry to XArray · 9f32d221

由 Matthew Wilcox 提交于 6月 12, 2018

Instead of always retrying when we slept, only retry if the page has
moved.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

9f32d221

dax: Convert dax writeback to XArray · 9fc747f6

由 Matthew Wilcox 提交于 3月 28, 2018

Use XArray iteration instead of a pagevec.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

9fc747f6

dax: Convert __dax_invalidate_entry to XArray · 07f2d89c

由 Matthew Wilcox 提交于 3月 28, 2018

Avoids walking the radix tree multiple times looking for tags.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

07f2d89c

dax: Convert dax_layout_busy_page to XArray · 084a8990

由 Matthew Wilcox 提交于 5月 17, 2018

Instead of using a pagevec, just use the XArray iterators.  Add a
conditional rescheduling point which probably should have been there in
the original.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

084a8990

dax: Convert dax_insert_pfn_mkwrite to XArray · cfc93c6c

由 Matthew Wilcox 提交于 3月 28, 2018

Add some XArray-based helper functions to replace the radix tree based
metaphors currently in use. The biggest change is that converted code
doesn't see its own lock bit; get_unlocked_entry() always returns an
entry with the lock bit clear. So we don't have to mess around loading
the current entry and clearing the lock bit; we can just store the
unlocked entry that we already have.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

cfc93c6c

dax: Hash on XArray instead of mapping · ec4907ff

由 Matthew Wilcox 提交于 3月 28, 2018

Since the XArray is embedded in the struct address_space, its address
contains exactly as much entropy as the address of the mapping.  This
patch is purely preparatory for later patches which will simplify the
wait/wake interfaces.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

ec4907ff

dax: Rename some functions · a77d19f4

由 Matthew Wilcox 提交于 3月 27, 2018

Remove mentions of 'radix' and 'radix tree'.  Simplify some names by
dropping the word 'mapping'.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

a77d19f4

09 10月, 2018 1 次提交

filesystem-dax: Fix dax_layout_busy_page() livelock · d7782145

由 Dan Williams 提交于 10月 06, 2018

In the presence of multi-order entries the typical
pagevec_lookup_entries() pattern may loop forever:

	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
				min(end - index, (pgoff_t)PAGEVEC_SIZE),
				indices)) {
		...
		for (i = 0; i < pagevec_count(&pvec); i++) {
			index = indices[i];
			...
		}
		index++; /* BUG */
	}

The loop updates 'index' for each index found and then increments to the
next possible page to continue the lookup. However, if the last entry in
the pagevec is multi-order then the next possible page index is more
than 1 page away. Fix this locally for the filesystem-dax case by
checking for dax-multi-order entries. Going forward new users of
multi-order entries need to be similarly careful, or we need a generic
way to report the page increment in the radix iterator.

Fixes: 5fac7408 ("mm, fs, dax: handle layout changes to pinned dax...")
Cc: <stable@vger.kernel.org>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d7782145

30 9月, 2018 1 次提交

xarray: Replace exceptional entries · 3159f943

由 Matthew Wilcox 提交于 11月 03, 2017

Introduce xarray value entries and tagged pointers to replace radix
tree exceptional entries.  This is a slight change in encoding to allow
the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
value entry).  It is also a change in emphasis; exceptional entries are
intimidating and different.  As the comment explains, you can choose
to store values or pointers in the xarray and they are both first-class
citizens.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJosef Bacik <jbacik@fb.com>

3159f943

28 9月, 2018 1 次提交

dax: Fix deadlock in dax_lock_mapping_entry() · f52afc93

由 Jan Kara 提交于 9月 27, 2018

When dax_lock_mapping_entry() has to sleep to obtain entry lock, it will
fail to unlock mapping->i_pages spinlock and thus immediately deadlock
against itself when retrying to grab the entry lock again. Fix the
problem by unlocking mapping->i_pages before retrying.

Fixes: c2a7d2a1 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Reported-by: NBarret Rhoden <brho@google.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f52afc93

12 9月, 2018 1 次提交

filesystem-dax: Fix use of zero page · b90ca5cc

由 Matthew Wilcox 提交于 9月 11, 2018

Use my_zero_pfn instead of ZERO_PAGE(), and pass the vaddr to it instead
of zero so it works on MIPS and s390 who reference the vaddr to select a
zero page.

Cc: <stable@vger.kernel.org>
Fixes: 91d25ba8 ("dax: use common 4k zero page for dax mmap reads")
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b90ca5cc

31 7月, 2018 1 次提交

filesystem-dax: Do not request kaddr and pfn when not required · 86ed913b

由 Huaisheng Ye 提交于 7月 30, 2018

Some functions within fs/dax don't need to get local pointer kaddr
or variable pfn from direct_access. Using NULL instead of having to
pass in useless pointer or variable that caller then just throw away.
Signed-off-by: NHuaisheng Ye <yehs1@lenovo.com>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>

86ed913b

30 7月, 2018 1 次提交

dax: dax_layout_busy_page() warn on !exceptional · cdbf8897

由 Ross Zwisler 提交于 7月 29, 2018

Inodes using DAX should only ever have exceptional entries in their page
caches.  Make this clear by warning if the iteration in
dax_layout_busy_page() ever sees a non-exceptional entry, and by adding a
comment for the pagevec_release() call which only deals with struct page
pointers.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

cdbf8897

24 7月, 2018 1 次提交

filesystem-dax: Introduce dax_lock_mapping_entry() · c2a7d2a1

由 Dan Williams 提交于 7月 13, 2018

In preparation for implementing support for memory poison (media error)
handling via dax mappings, implement a lock_page() equivalent. Poison
error handling requires rmap and needs guarantees that the page->mapping
association is maintained / valid (inode not freed) for the duration of
the lookup.

In the device-dax case it is sufficient to simply hold a dev_pagemap
reference. In the filesystem-dax case we need to use the entry lock.

Export the entry lock via dax_lock_mapping_entry() that uses
rcu_read_lock() to protect against the inode being freed, and
revalidates the page->mapping association under xa_lock().

Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>

c2a7d2a1

21 7月, 2018 1 次提交

filesystem-dax: Set page->index · 73449daf

由 Dan Williams 提交于 7月 13, 2018

In support of enabling memory_failure() handling for filesystem-dax
mappings, set ->index to the pgoff of the page. The rmap implementation
requires ->index to bound the search through the vma interval tree. The
index is set and cleared at dax_associate_entry() and
dax_disassociate_entry() time respectively.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>

73449daf

08 6月, 2018 1 次提交

fs/dax.c: use new return type vm_fault_t · ab77dab4

由 Souptick Joarder 提交于 6月 07, 2018

Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

commit 1c8f4220 ("mm: change return type to vm_fault_t")

There was an existing bug inside dax_load_hole() if vm_insert_mixed had
failed to allocate a page table, we'd return VM_FAULT_NOPAGE instead of
VM_FAULT_OOM.  With new vmf_insert_mixed() this issue is addressed.

vm_insert_mixed_mkwrite has inefficiency when it returns an error value,
driver has to convert it to vm_fault_t type.  With new
vmf_insert_mixed_mkwrite() this limitation will be addressed.

Link: http://lkml.kernel.org/r/20180510181121.GA15239@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab77dab4

03 6月, 2018 1 次提交

dax: dax_insert_mapping_entry always succeeds · cc4a90ac

由 Matthew Wilcox 提交于 6月 02, 2018

It does not return an error, so we don't need to check the return value
for IS_ERR(). Indeed, it is a bug to do so; with a sufficiently large
PFN, a legitimate DAX entry may be mistaken for an error return.
Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

cc4a90ac

23 5月, 2018 2 次提交

dax: Report bytes remaining in dax_iomap_actor() · a77d4786

由 Dan Williams 提交于 3月 16, 2018

In preparation for protecting the dax read(2) path from media errors
with copy_to_iter_mcsafe() (via dax_copy_to_iter()), convert the
implementation to report the bytes successfully transferred.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

a77d4786

dax: Introduce a ->copy_to_iter dax operation · b3a9a0c3

由 Dan Williams 提交于 5月 02, 2018

Similar to the ->copy_from_iter() operation, a platform may want to
deploy an architecture or device specific routine for handling reads
from a dax_device like /dev/pmemX. On x86 this routine will point to a
machine check safe version of copy_to_iter(). For now, add the plumbing
to device-mapper and the dax core.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b3a9a0c3

22 5月, 2018 1 次提交

mm, fs, dax: handle layout changes to pinned dax mappings · 5fac7408

由 Dan Williams 提交于 3月 09, 2018

Background:

get_user_pages() in the filesystem pins file backed memory pages for
access by devices performing dma. However, it only pins the memory pages
not the page-to-file offset association. If a file is truncated the
pages are mapped out of the file and dma may continue indefinitely into
a page that is owned by a device driver. This breaks coherency of the
file vs dma, but the assumption is that if userspace wants the
file-space truncated it does not matter what data is inbound from the
device, it is not relevant anymore. The only expectation is that dma can
safely continue while the filesystem reallocates the block(s).

Problem:

This expectation that dma can safely continue while the filesystem
changes the block map is broken by dax. With dax the target dma page
*is* the filesystem block. The model of leaving the page pinned for dma,
but truncating the file block out of the file, means that the filesytem
is free to reallocate a block under active dma to another file and now
the expected data-incoherency situation has turned into active
data-corruption.

Solution:

Defer all filesystem operations (fallocate(), truncate()) on a dax mode
file while any page/block in the file is under active dma. This solution
assumes that dma is transient. Cases where dma operations are known to
not be transient, like RDMA, have been explicitly disabled via
commits like 5f1d43de "IB/core: disable memory registration of
filesystem-dax vmas".

The dax_layout_busy_page() routine is called by filesystems with a lock
held against mm faults (i_mmap_lock) to find pinned / busy dax pages.
The process of looking up a busy page invalidates all mappings
to trigger any subsequent get_user_pages() to block on i_mmap_lock.
The filesystem continues to call dax_layout_busy_page() until it finally
returns no more active pages. This approach assumes that the page
pinning is transient, if that assumption is violated the system would
have likely hung from the uncompleted I/O.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5fac7408

17 4月, 2018 1 次提交

docs/vm: rename documentation files to .rst · ad56b738

由 Mike Rapoport 提交于 3月 21, 2018

Signed-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

ad56b738

12 4月, 2018 1 次提交

page cache: use xa_lock · b93b0163

由 Matthew Wilcox 提交于 4月 10, 2018

Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
  Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b93b0163

03 4月, 2018 1 次提交

fs, dax: use page->mapping to warn if truncate collides with a busy page · d2c997c0

由 Dan Williams 提交于 12月 22, 2017

Catch cases where extent unmap operations encounter pages that are
pinned / busy. Typically this is pinned pages that are under active dma.
This warning is a canary for potential data corruption as truncated
blocks could be allocated to a new file while the device is still
performing i/o.

Here is an example of a collision that this implementation catches:

 WARNING: CPU: 2 PID: 1286 at fs/dax.c:343 dax_disassociate_entry+0x55/0x80
 [..]
 Call Trace:
  __dax_invalidate_mapping_entry+0x6c/0xf0
  dax_delete_mapping_entry+0xf/0x20
  truncate_exceptional_pvec_entries.part.12+0x1af/0x200
  truncate_inode_pages_range+0x268/0x970
  ? tlb_gather_mmu+0x10/0x20
  ? up_write+0x1c/0x40
  ? unmap_mapping_range+0x73/0x140
  xfs_free_file_space+0x1b6/0x5b0 [xfs]
  ? xfs_file_fallocate+0x7f/0x320 [xfs]
  ? down_write_nested+0x40/0x70
  ? xfs_ilock+0x21d/0x2f0 [xfs]
  xfs_file_fallocate+0x162/0x320 [xfs]
  ? rcu_read_lock_sched_held+0x3f/0x70
  ? rcu_sync_lockdep_assert+0x2a/0x50
  ? __sb_start_write+0xd0/0x1b0
  ? vfs_fallocate+0x20c/0x270
  vfs_fallocate+0x154/0x270
  SyS_fallocate+0x43/0x80
  entry_SYSCALL_64_fastpath+0x1f/0x96

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d2c997c0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功