提交 · cb66a7a1f149ff705fa37cad6d1252b046e0ad4f · bug2833 / cloud-kernel

08 4月, 2014 1 次提交

mm: introduce vm_ops->map_pages() · 8c6e50b0

由 Kirill A. Shutemov 提交于 4月 07, 2014

Here's new version of faultaround patchset.  It took a while to tune it
and collect performance data.

First patch adds new callback ->map_pages to vm_operations_struct.

->map_pages() is called when VM asks to map easy accessible pages.
Filesystem should find and map pages associated with offsets from
"pgoff" till "max_pgoff".  ->map_pages() is called with page table
locked and must not block.  If it's not possible to reach a page without
blocking, filesystem should skip it.  Filesystem should use do_set_pte()
to setup page table entry.  Pointer to entry associated with offset
"pgoff" is passed in "pte" field in vm_fault structure.  Pointers to
entries for other offsets should be calculated relative to "pte".

Currently VM use ->map_pages only on read page fault path.  We try to
map FAULT_AROUND_PAGES a time.  FAULT_AROUND_PAGES is 16 for now.
Performance data for different FAULT_AROUND_ORDER is below.

TODO:
 - implement ->map_pages() for shmem/tmpfs;
 - modify get_user_pages() to be able to use ->map_pages() and implement
   mmap(MAP_POPULATE|MAP_NONBLOCK) on top.

=========================================================================
Tested on 4-socket machine (120 threads) with 128GiB of RAM.

Few real-world workloads. The sweet spot for FAULT_AROUND_ORDER here is
somewhere between 3 and 5. Let's say 4 :)

Linux build (make -j60)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		283,301,572	247,151,987	212,215,789	204,772,882	199,568,944	194,703,779	193,381,485
	time, seconds		151.227629483	153.920996480	151.356125472	150.863792049	150.879207877	151.150764954	151.450962358
Linux rebuild (make -j60)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		5,396,854	4,148,444	2,855,286	2,577,282	2,361,957	2,169,573	2,112,643
	time, seconds		27.404543757	27.559725591	27.030057426	26.855045126	26.678618635	26.974523490	26.761320095
Git test suite (make -j60 test)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		129,591,823	99,200,751	66,106,718	57,606,410	51,510,808	45,776,813	44,085,515
	time, seconds		66.087215026	64.784546905	64.401156567	65.282708668	66.034016829	66.793780811	67.237810413

Two synthetic tests: access every word in file in sequential/random order.
It doesn't improve much after FAULT_AROUND_ORDER == 4.

Sequential access 16GiB file
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
 1 thread
	minor-faults		4,195,437	2,098,275	525,068		262,251		131,170		32,856		8,282
	time, seconds		7.250461742	6.461711074	5.493859139	5.488488147	5.707213983	5.898510832	5.109232856
 8 threads
	minor-faults		33,557,540	16,892,728	4,515,848	2,366,999	1,423,382	442,732		142,339
	time, seconds		16.649304881	9.312555263	6.612490639	6.394316732	6.669827501	6.75078944	6.371900528
 32 threads
	minor-faults		134,228,222	67,526,810	17,725,386	9,716,537	4,763,731	1,668,921	537,200
	time, seconds		49.164430543	29.712060103	12.938649729	10.175151004	11.840094583	9.594081325	9.928461797
 60 threads
	minor-faults		251,687,988	126,146,952	32,919,406	18,208,804	10,458,947	2,733,907	928,217
	time, seconds		86.260656897	49.626551828	22.335007632	17.608243696	16.523119035	16.339489186	16.326390902
 120 threads
	minor-faults		503,352,863	252,939,677	67,039,168	35,191,827	19,170,091	4,688,357	1,471,862
	time, seconds		124.589206333	79.757867787	39.508707872	32.167281632	29.972989292	28.729834575	28.042251622
Random access 1GiB file
 1 thread
	minor-faults		262,636		132,743		34,369		17,299		8,527		3,451		1,222
	time, seconds		15.351890914	16.613802482	16.569227308	15.179220992	16.557356122	16.578247824	15.365266994
 8 threads
	minor-faults		2,098,948	1,061,871	273,690		154,501		87,110		25,663		7,384
	time, seconds		15.040026343	15.096933500	14.474757288	14.289129964	14.411537468	14.296316837	14.395635804
 32 threads
	minor-faults		8,390,734	4,231,023	1,054,432	528,847		269,242		97,746		26,881
	time, seconds		20.430433109	21.585235358	22.115062928	14.872878951	14.880856305	14.883370649	14.821261690
 60 threads
	minor-faults		15,733,258	7,892,809	1,973,393	988,266		594,789		164,994		51,691
	time, seconds		26.577302548	25.692397770	18.728863715	20.153026398	21.619101933	17.745086260	17.613215273
 120 threads
	minor-faults		31,471,111	15,816,616	3,959,209	1,978,685	1,008,299	264,635		96,010
	time, seconds		41.835322703	40.459786095	36.085306105	35.313894834	35.814445675	36.552633793	34.289210594

Touch only one page in page table in 16GiB file
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
 1 thread
	minor-faults		8,372		8,324		8,270		8,260		8,249		8,239		8,237
	time, seconds		0.039892712	0.045369149	0.051846126	0.063681685	0.079095975	0.17652406	0.541213386
 8 threads
	minor-faults		65,731		65,681		65,628		65,620		65,608		65,599		65,596
	time, seconds		0.124159196	0.488600638	0.156854426	0.191901957	0.242631486	0.543569456	1.677303984
 32 threads
	minor-faults		262,388		262,341		262,285		262,276		262,266		262,257		263,183
	time, seconds		0.452421421	0.488600638	0.565020946	0.648229739	0.789850823	1.651584361	5.000361559
 60 threads
	minor-faults		491,822		491,792		491,723		491,711		491,701		491,691		491,825
	time, seconds		0.763288616	0.869620515	0.980727360	1.161732354	1.466915814	3.04041448	9.308612938
 120 threads
	minor-faults		983,466		983,655		983,366		983,372		983,363		984,083		984,164
	time, seconds		1.595846553	1.667902182	2.008959376	2.425380942	2.941368804	5.977807890	18.401846125

This patch (of 2):

Introduce new vm_ops callback ->map_pages() and uses it for mapping easy
accessible pages around fault address.

On read page fault, if filesystem provides ->map_pages(), we try to map up
to FAULT_AROUND_PAGES pages around page fault address in hope to reduce
number of minor page faults.

We call ->map_pages first and use ->fault() as fallback if page by the
offset is not ready to be mapped (cold page cache or something).
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ning Qu <quning@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c6e50b0

02 4月, 2014 1 次提交
- A
  switch ->is_partially_uptodate() to saner arguments · c186afb4
  由 Al Viro 提交于 2月 02, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  c186afb4
01 4月, 2014 1 次提交

vfs: add renameat2 syscall · 520c8b16

由 Miklos Szeredi 提交于 4月 01, 2014

Add new renameat2 syscall, which is the same as renameat with an added
flags argument.

Pass flags to vfs_rename() and to i_op->rename() as well.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>

520c8b16

19 12月, 2013 1 次提交
- S
  doc: Fix typo (acces_process_vm -> access_process_vm) · 507da6a1
  由 Stefan Weil 提交于 12月 05, 2013
```
Signed-off-by: NStefan Weil <sw@weilnetz.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
```
  507da6a1
03 7月, 2013 1 次提交
- A
  Document ->tmpfile() · 48bde8d3
  由 Al Viro 提交于 7月 03, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  48bde8d3
29 6月, 2013 5 次提交

locks: give the blocked_hash its own spinlock · 7b2296af

由 Jeff Layton 提交于 6月 21, 2013

There's no reason we have to protect the blocked_hash and file_lock_list
with the same spinlock. With the tests I have, breaking it in two gives
a barely measurable performance benefit, but it seems reasonable to make
this locking as granular as possible.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7b2296af

locks: add a new "lm_owner_key" lock operation · 3999e493

由 Jeff Layton 提交于 6月 21, 2013

Currently, the hashing that the locking code uses to add these values
to the blocked_hash is simply calculated using fl_owner field. That's
valid in most cases except for server-side lockd, which validates the
owner of a lock based on fl_owner and fl_pid.

In the case where you have a small number of NFS clients doing a lot
of locking between different processes, you could end up with all
the blocked requests sitting in a very small number of hash buckets.

Add a new lm_owner_key operation to the lock_manager_operations that
will generate an unsigned long to use as the key in the hashtable.
That function is only implemented for server-side lockd, and simply
XORs the fl_owner and fl_pid.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3999e493

locks: protect most of the file_lock handling with i_lock · 1c8c601a

由 Jeff Layton 提交于 6月 21, 2013

Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1c8c601a

Don't pass inode to ->d_hash() and ->d_compare() · da53be12

由 Linus Torvalds 提交于 5月 21, 2013

Instances either don't look at it at all (the majority of cases) or
only want it to find the superblock (which can be had as dentry->d_sb).
A few cases that want more are actually safe with dentry->d_inode -
the only precaution needed is the check that it hadn't been replaced with
NULL by rmdir() or by overwriting rename(), which case should be simply
treated as cache miss.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

da53be12

[readdir] ->readdir() is gone · 2233f31a

由 Al Viro 提交于 5月 22, 2013

everything's converted to ->iterate()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2233f31a

22 5月, 2013 1 次提交

mm: change invalidatepage prototype to accept length · d47992f8

由 Lukas Czerner 提交于 5月 21, 2013

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>

d47992f8

26 2月, 2013 1 次提交

vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op · ecf3d1f1

由 Jeff Layton 提交于 2月 20, 2013

The following set of operations on a NFS client and server will cause

    server# mkdir a
    client# cd a
    server# mv a a.bak
    client# sleep 30  # (or whatever the dir attrcache timeout is)
    client# stat .
    stat: cannot stat `.': Stale NFS file handle

Obviously, we should not be getting an ESTALE error back there since the
inode still exists on the server. The problem is that the lookup code
will call d_revalidate on the dentry that "." refers to, because NFS has
FS_REVAL_DOT set.

nfs_lookup_revalidate will see that the parent directory has changed and
will try to reverify the dentry by redoing a LOOKUP. That of course
fails, so the lookup code returns ESTALE.

The problem here is that d_revalidate is really a bad fit for this case.
What we really want to know at this point is whether the inode is still
good or not, but we don't really care what name it goes by or whether
the dcache is still valid.

Add a new d_op->d_weak_revalidate operation and have complete_walk call
that instead of d_revalidate. The intent there is to allow for a
"weaker" d_revalidate that just checks to see whether the inode is still
good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
special casing.

[AV: changed method name, added note in porting, fixed confusion re
having it possibly called from RCU mode (it won't be)]

Cc: NeilBrown <neilb@suse.de>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ecf3d1f1

21 12月, 2012 1 次提交

documentation: drop vmtruncate · b9f61c3c

由 Marco Stornelli 提交于 12月 15, 2012

Removed vmtruncate
Signed-off-by: NMarco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b9f61c3c

04 8月, 2012 1 次提交

Documentation: get rid of write_super · 34e5053f

由 Artem Bityutskiy 提交于 7月 25, 2012

The '->write_super' superblock method is gone, and this patch removes all the
references to 'write_super' from various pieces of the kernel documentation.

Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

34e5053f

02 8月, 2012 1 次提交

locks: remove unused lm_release_private · 068535f1

由 J. Bruce Fields 提交于 8月 01, 2012

In commit 3b6e2723 ("locks: prevent side-effects of
locks_release_private before file_lock is initialized") we removed the
last user of lm_release_private without removing the field itself.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

068535f1

01 8月, 2012 1 次提交

mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages · 62c230bc

由 Mel Gorman 提交于 7月 31, 2012

Currently swapfiles are managed entirely by the core VM by using ->bmap to
allocate space and write to the blocks directly.  This effectively ensures
that the underlying blocks are allocated and avoids the need for the swap
subsystem to locate what physical blocks store offsets within a file.

If the swap subsystem is to use the filesystem information to locate the
blocks, it is critical that information such as block groups, block
bitmaps and the block descriptor table that map the swap file were
resident in memory.  This patch adds address_space_operations that the VM
can call when activating or deactivating swap backed by a file.

  int swap_activate(struct file *);
  int swap_deactivate(struct file *);

The ->swap_activate() method is used to communicate to the file that the
VM relies on it, and the address_space should take adequate measures such
as reserving space in the underlying device, reserving memory for mempools
and pinning information such as the block descriptor table in memory.  The
->swap_deactivate() method is called on sys_swapoff() if ->swap_activate()
returned success.

After a successful swapfile ->swap_activate, the swapfile is marked
SWP_FILE and swapper_space.a_ops will proxy to
sis->swap_file->f_mappings->a_ops using ->direct_io to write swapcache
pages and ->readpage to read.

It is perfectly possible that direct_IO be used to read the swap pages but
it is an unnecessary complication.  Similarly, it is possible that
->writepage be used instead of direct_io to write the pages but filesystem
developers have stated that calling writepage from the VM is undesirable
for a variety of reasons and using direct_IO opens up the possibility of
writing back batches of swap pages in the future.

[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

62c230bc

31 7月, 2012 1 次提交

Documentation: Correct s_umount state for freeze_fs/unfreeze_fs · 06fd516c

由 Valerie Aurora 提交于 6月 12, 2012

freeze_fs/unfreeze_fs ops are called with s_umount held for write, not read.
Signed-off-by: NValerie Aurora <val@vaaconsulting.com>
Signed-off-by: NKamal Mostafa <kamal@canonical.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

06fd516c

14 7月, 2012 7 次提交

don't pass nameidata to ->create() · ebfc3b49

由 Al Viro 提交于 6月 10, 2012

boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ebfc3b49

stop passing nameidata to ->lookup() · 00cd8dd3

由 Al Viro 提交于 6月 10, 2012

Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

00cd8dd3

stop passing nameidata * to ->d_revalidate() · 0b728e19

由 Al Viro 提交于 6月 10, 2012

Just the lookup flags.  Die, bastard, die...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0b728e19

kill struct opendata · 30d90494

由 Al Viro 提交于 6月 22, 2012

Just pass struct file *.  Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int.  Next: saner prototypes for parts in
namei.c
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

30d90494

make ->atomic_open() return int · d9585277

由 Al Viro 提交于 6月 22, 2012

Change of calling conventions:
old		new
NULL		1
file		0
ERR_PTR(-ve)	-ve

Caller *knows* that struct file *; no need to return it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d9585277

->atomic_open() prototype change - pass int * instead of bool * · 47237687

由 Al Viro 提交于 6月 10, 2012

... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.

[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

47237687

vfs: add i_op->atomic_open() · d18e9008

由 Miklos Szeredi 提交于 6月 05, 2012

Add a new inode operation which is called on the last component of an open.
Using this the filesystem can look up, possibly create and open the file in one
atomic operation. If it cannot perform this (e.g. the file type turned out to
be wrong) it may signal this by returning NULL instead of an open struct file
pointer.

i_op->atomic_open() is only called if the last component is negative or needs
lookup. Handling cached positive dentries here doesn't add much value: these
can be opened using f_op->open(). If the cached file turns out to be invalid,
the open can be retried, this time using ->atomic_open() with a fresh dentry.

For now leave the old way of using open intents in lookup and revalidate in
place. This will be removed once all the users are converted.

David Howells noticed that if ->atomic_open() opens the file but does not create
it, handle_truncate() will be called on it even if it is not a regular file.
Fix this by checking the file type in this case too.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d18e9008

02 6月, 2012 1 次提交

fs: introduce inode operation ->update_time · c3b2da31

由 Josef Bacik 提交于 3月 26, 2012

Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail.  We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates.  So introduce
->update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made.  The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.

I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

c3b2da31

30 5月, 2012 1 次提交

mm/fs: remove truncate_range · 17cf28af

由 Hugh Dickins 提交于 5月 29, 2012

Remove vmtruncate_range(), and remove the truncate_range method from
struct inode_operations: only tmpfs ever supported it, and tmpfs has now
converted over to using the fallocate method of file_operations.

Update Documentation accordingly, adding (setlease and) fallocate lines.
And while we're in mm.h, remove duplicate declarations of shmem_lock() and
shmem_file_setup(): everyone is now using the ones in shmem_fs.h.
Based-on-patch-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Cong Wang <amwang@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

17cf28af

07 1月, 2012 1 次提交
- A
  vfs: switch ->show_options() to struct dentry * · 34c80b1d
  由 Al Viro 提交于 12月 08, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  34c80b1d
04 1月, 2012 3 次提交

A
switch ->mknod() to umode_t · 1a67aafb
由 Al Viro 提交于 7月 26, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1a67aafb

switch ->create() to umode_t · 4acdaf27

由 Al Viro 提交于 7月 26, 2011

vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4acdaf27

switch vfs_mkdir() and ->mkdir() to umode_t · 18bb1db3

由 Al Viro 提交于 7月 26, 2011

vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

18bb1db3

02 11月, 2011 1 次提交

vfs: add d_prune dentry operation · f0023bc6

由 Sage Weil 提交于 10月 28, 2011

This adds a d_prune dentry operation that is called by the VFS prior to
pruning (i.e. unhashing and killing) a hashed dentry from the dcache.
Wrap dentry_lru_del() and use the new _prune() helper in the cases where we
are about to unhash and kill the dentry.

This will be used by Ceph to maintain a flag indicating whether the
complete contents of a directory are contained in the dcache, allowing it
to satisfy lookups and readdir without addition server communication.

Renumber a few DCACHE_* #defines to group DCACHE_OP_PRUNE with the other
DCACHE_OP_ bits.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f0023bc6

26 7月, 2011 1 次提交

fs: take the ACL checks to common code · 4e34e719

由 Christoph Hellwig 提交于 7月 23, 2011

Replace the ->check_acl method with a ->get_acl method that simply reads an
ACL from disk after having a cache miss. This means we can replace the ACL
checking boilerplate code with a single implementation in namei.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4e34e719

21 7月, 2011 2 次提交

fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82

由 Josef Bacik 提交于 7月 16, 2011

Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

02c24a82

locks: rename lock-manager ops · 8fb47a4f

由 J. Bruce Fields 提交于 7月 20, 2011

Both the filesystem and the lock manager can associate operations with a
lock.  Confusingly, one of them (fl_release_private) actually has the
same name in both operation structures.

It would save some confusion to give the lock-manager ops different
names.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

8fb47a4f

20 7月, 2011 1 次提交
- A
  ->permission() sanitizing: don't pass flags to ->check_acl() · 7e40145e
  由 Al Viro 提交于 6月 20, 2011
```
not used in the instances anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  7e40145e
27 5月, 2011 1 次提交

fs: pass exact type of data dirties to ->dirty_inode · aa385729

由 Christoph Hellwig 提交于 5月 27, 2011

Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.

This is just the prototype change with no user for it yet.  I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.

Also remove incorrect comments that ->dirty_inode can't block.  That
has been changed a long time ago, and many implementations rely on it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aa385729

25 3月, 2011 1 次提交

fs: remove inode_lock from iput_final and prune_icache · f283c86a

由 Dave Chinner 提交于 3月 22, 2011

Now that inode state changes are protected by the inode->i_lock and
the inode LRU manipulations by the inode_lru_lock, we can remove the
inode_lock from prune_icache and the initial part of iput_final().

instead of using the inode_lock to protect the inode during
iput_final, use the inode->i_lock instead. This protects the inode
against new references being taken while we change the inode state
to I_FREEING, as well as preventing prune_icache from grabbing the
inode while we are manipulating it. Hence we no longer need the
inode_lock in iput_final prior to setting I_FREEING on the inode.

For prune_icache, we no longer need the inode_lock to protect the
LRU list, and the inodes themselves are protected against freeing
races by the inode->i_lock. Hence we can lift the inode_lock from
prune_icache as well.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f283c86a

17 3月, 2011 1 次提交

vfs: bury ->get_sb() · 1a102ff9

由 Al Viro 提交于 3月 16, 2011

This is an ex-parrot.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1a102ff9

17 1月, 2011 1 次提交

fallocate should be a file operation · 2fe17c10

由 Christoph Hellwig 提交于 1月 14, 2011

Currently all filesystems except XFS implement fallocate asynchronously,
while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC
I/O we really want our allocation on disk, especially for the !KEEP_SIZE
case where we actually grow the file with user-visible zeroes. On the
other hand always commiting the transaction is a bad idea for fast-path
uses of fallocate like for example in recent Samba versions. Given
that block allocation is a data plane operation anyway change it from
an inode operation to a file operation so that we have the file structure
available that lets us check for O_SYNC.

This also includes moving the code around for a few of the filesystems,
and remove the already unnedded S_ISDIR checks given that we only wire
up fallocate for regular files.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2fe17c10

16 1月, 2011 1 次提交

Allow d_manage() to be used in RCU-walk mode · ab90911f

由 David Howells 提交于 1月 14, 2011

Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well
as when it is in Ref-walk mode.  This permits __follow_mount_rcu() to call
d_manage() directly.  d_manage() needs a parameter to indicate that it is in
RCU-walk mode as it isn't allowed to sleep if in that mode (but should return
-ECHILD instead).

autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon
accesses it and otherwise request dropping back to ref-walk mode.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ab90911f

bug2833 / cloud-kernel 与 Fork 源项目一致

bug2833 / cloud-kernel
与 Fork 源项目一致