提交 · b686d1f79acb65c6a34473c15fcfa2ee54aed8e2 · openeuler / raspberrypi-kernel

25 8月, 2012 5 次提交

xfs: xfs_seek_hole() refinement with hole searching from page cache for unwritten extents · b686d1f7

由 Jeff Liu 提交于 8月 21, 2012

xfs_seek_hole() refinement with hole searching from page cache for unwritten extent.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

b686d1f7

xfs: xfs_seek_data() refinement with unwritten extents check up from page cache · 52f1acc8

由 Jeff Liu 提交于 8月 21, 2012

xfs_seek_data() refinement with unwritten extents check up from page cache.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

52f1acc8

xfs: Introduce a helper routine to probe data or hole offset from page cache · d126d43f

由 Jeff Liu 提交于 8月 21, 2012

Introduce helpers to probe data or hole offset from page cache.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

d126d43f

xfs: Remove type argument from xfs_seek_data()/xfs_seek_hole() · 834ab122

由 Jeff Liu 提交于 8月 21, 2012

The type is already indicated by the function naming explicitly, so this argument
can be omitted from those calls.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

834ab122

xfs: fix race while discarding buffers [V4] · e599b325

由 Carlos Maiolino 提交于 8月 10, 2012

While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with
buffers from lru list), there is a possibility to have xfs_buf_stale() racing
with it, and removing buffers from dispose list before xfs_buftarg_shrink() does
it.

This happens because xfs_buftarg_shrink() handle the dispose list without
locking and the test condition in xfs_buf_stale() checks for the buffer being in
*any* list:

if (!list_empty(&bp->b_lru))

If the buffer happens to be on dispose list, this causes the buffer counter of
lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink()
and another in xfs_buf_stale()) causing a wrong account usage of the lru list.

This may cause xfs_buftarg_shrink() to return a wrong value to the memory
shrinker shrink_slab(), and such account error may also cause an underflowed
value to be returned; since the counter is lower than the current number of
items in the lru list, a decrement may happen when the counter is 0, causing
an underflow on the counter.

The fix uses a new flag field (and a new buffer flag) to serialize buffer
handling during the shrink process. The new flag field has been designed to use
btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism.

dchinner, sandeen, aquini and aris also deserve credits for this.
Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

e599b325

17 8月, 2012 4 次提交

xfs: check for possible overflow in xfs_ioc_trim · 643bfc06

由 Tomas Racek 提交于 8月 14, 2012

If range.start or range.minlen is bigger than filesystem size, return
invalid value error. This fixes possible overflow in BTOBB macro when
passed value was nearly ULLONG_MAX.
Signed-off-by: NTomas Racek <tracek@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

643bfc06

xfs: unlock the AGI buffer when looping in xfs_dialloc · c4982110

由 Christoph Hellwig 提交于 8月 07, 2012

Also update some commens in the area to make the code easier to read.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

c4982110

xfs: kill struct declarations in xfs_mount.h · 1ed845df

由 Alex Elder 提交于 8月 01, 2012

I noticed that "struct xfs_mount_args" was still declared in
"fs/xfs/xfs_mount.h".  That struct doesn't even exist any more (and
is obviously not referenced elsewhere in that header file).  While
in there, delete four other unneeded struct declarations in that
file.

Doing so highlights that "fs/xfs/xfs_trace.h" was relying indirectly
on "xfs_mount.h" to be #included in order to declare "struct
xfs_bmbt_irec", so add that declaration to resolve that issue.
Signed-off-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

1ed845df

xfs: fix uninitialised variable in xfs_rtbuf_get() · a76cccbe

由 Dave Chinner 提交于 7月 31, 2012

Results in this assert failure in generic/090:

XFS: Assertion failed: *nmap >= 1, file: fs/xfs/xfs_bmap.c, line: 4363
.....
Call Trace:
 [<ffffffff814680db>] xfs_bmapi_read+0x6b/0x370
 [<ffffffff814b64b2>] xfs_rtbuf_get+0x42/0x130
 [<ffffffff814b6f09>] xfs_rtget_summary+0x89/0x120
 [<ffffffff814b7bfe>] xfs_rtallocate_extent_size+0xce/0x340
 [<ffffffff814b89f0>] xfs_rtallocate_extent+0x240/0x290
 [<ffffffff81462c1a>] xfs_bmap_rtalloc+0x1ba/0x340
 [<ffffffff81463a65>] xfs_bmap_alloc+0x35/0x40
 [<ffffffff8146f111>] xfs_bmapi_allocate+0xf1/0x350
 [<ffffffff8146f9de>] xfs_bmapi_write+0x66e/0xa60
 [<ffffffff8144538a>] xfs_iomap_write_direct+0x22a/0x3f0
 [<ffffffff8143707b>] __xfs_get_blocks+0x38b/0x5d0
 [<ffffffff814372d4>] xfs_get_blocks_direct+0x14/0x20
 [<ffffffff811b0081>] do_blockdev_direct_IO+0xf71/0x1eb0
 [<ffffffff811b1015>] __blockdev_direct_IO+0x55/0x60
 [<ffffffff814355ca>] xfs_vm_direct_IO+0x11a/0x1e0
 [<ffffffff8112d617>] generic_file_direct_write+0xd7/0x1b0
 [<ffffffff8143e16c>] xfs_file_dio_aio_write+0x13c/0x320
 [<ffffffff8143e6f2>] xfs_file_aio_write+0x1c2/0x1d0
 [<ffffffff81174a07>] do_sync_write+0xa7/0xe0
 [<ffffffff81175288>] vfs_write+0xa8/0x160
 [<ffffffff81175702>] sys_pwrite64+0x92/0xb0
 [<ffffffff81b68f69>] system_call_fastpath+0x16/0x1b
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

a76cccbe

03 8月, 2012 1 次提交

ceph: simplify+fix atomic_open · 5ef50c3b

由 Sage Weil 提交于 7月 31, 2012

The initial ->atomic_open op was carried over from the old intent code,
which was incomplete and didn't really work.  Replace it with a fresh
method.  In particular:

 * always attempt to do an atomic open+lookup, both for the create case
   and for lookups of existing files.
 * fix symlink handling by returning 1 to the VFS so that we can follow
   the link to its destination. This fixes a longstanding ceph bug (#2392).
Signed-off-by: NSage Weil <sage@inktank.com>

5ef50c3b

02 8月, 2012 1 次提交

locks: remove unused lm_release_private · 068535f1

由 J. Bruce Fields 提交于 8月 01, 2012

In commit 3b6e2723 ("locks: prevent side-effects of
locks_release_private before file_lock is initialized") we removed the
last user of lm_release_private without removing the field itself.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

068535f1

01 8月, 2012 9 次提交

nfs: prevent page allocator recursions with swap over NFS. · 192e501b

由 Mel Gorman 提交于 7月 31, 2012

GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO,
just not of any filesystem data.

The problem is that previously NOFS was correct because that avoids
recursion into the NFS code.  With swap-over-NFS, it is no longer correct
as swap IO can lead to this recursion.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

192e501b

nfs: enable swap on NFS · a564b8f0

由 Mel Gorman 提交于 7月 31, 2012

Implement the new swapfile a_ops for NFS and hook up ->direct_IO.  This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us to
receive the packets required for the TCP connection buildup.

[jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a564b8f0

nfs: disable data cache revalidation for swapfiles · 29418aa4

由 Mel Gorman 提交于 7月 31, 2012

The VM does not like PG_private set on PG_swapcache pages.  As suggested
by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS
data cache revalidation on swap files.  as it does not make sense to have
other clients change the file while it is being used as swap.  This avoids
setting PG_private on swap pages, since there ought to be no further races
with invalidate_inode_pages2() to deal with.

Since we cannot set PG_private we cannot use page->private which is
already used by PG_swapcache pages to store the nfs_page.  Thus augment
the new nfs_page_find_request logic.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29418aa4

nfs: teach the NFS client how to treat PG_swapcache pages · d56b4ddf

由 Mel Gorman 提交于 7月 31, 2012

Replace all relevant occurences of page->index and page->mapping in the
NFS client with the new page_file_index() and page_file_mapping()
functions.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d56b4ddf

vmscan: remove obsolete shrink_control comment · 8e125cd8

由 Minchan Kim 提交于 7月 31, 2012

09f363c7 ("vmscan: fix shrinker callback bug in fs/super.c") fixed a
shrinker callback which was returning -1 when nr_to_scan is zero, which
caused excessive slab scanning.  But 635697c6 ("vmscan: fix initial
shrinker size handling") fixed the problem, again so we can freely return
-1 although nr_to_scan is zero.  So let's revert 09f363c7 because the
comment added in 09f363c7 made an unnecessary rule.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8e125cd8

hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages · 24669e58

由 Aneesh Kumar K.V 提交于 7月 31, 2012

Use a mmu_gather instead of a temporary linked list for accumulating pages
when we unmap a hugepage range
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

24669e58

mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads · 3965c9ae

由 Wanpeng Li 提交于 7月 31, 2012

Since per-BDI flusher threads were introduced in 2.6, the pdflush
mechanism is not used any more.  But the old interface exported through
/proc/sys/vm/nr_pdflush_threads still exists and is obviously useless.

For back-compatibility, printk warning information and return 2 to notify
the users that the interface is removed.
Signed-off-by: NWanpeng Li <liwp@linux.vnet.ibm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3965c9ae

nfs: explicitly reject LOCK_MAND flock() requests · ad0fcd4e

由 Jeff Layton 提交于 7月 23, 2012

We have no mechanism to emulate LOCK_MAND locks on NFSv4, so explicitly
return -EINVAL if someone requests it.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

ad0fcd4e

nfs: increase number of permitted callback connections. · b042414f

由 NeilBrown 提交于 7月 31, 2012

By default a sunrpc service is limited to (N+3)*20 connections
where N is the number of threads.  This is 80 when N==1.
If this number is exceeded a warning is printed suggesting that
the number of threads be increased.  However with services which
run a single thread, this is impossible.

For such services there is a ->sv_maxconn setting that can be
used to forcibly increase the limit, and silence the message.
This is used by lockd.

The nfs client uses a sunrpc service to handle callbacks and
it too is single-threaded, so to avoid the useless messages,
and to allow a reasonable number of concurrent connections,
we need to set ->sv_maxconn.  1024 seems like a good number.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b042414f

31 7月, 2012 20 次提交

fs: Remove old freezing mechanism · d9c95bdd

由 Jan Kara 提交于 6月 12, 2012

Now that all users are converted, we can remove functions, variables, and
constants defined by the old freezing mechanism.

BugLink: https://bugs.launchpad.net/bugs/897421Tested-by: NKamal Mostafa <kamal@canonical.com>
Tested-by: NPeter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NMassimo Morana <massimo.morana@canonical.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d9c95bdd

ext2: Implement freezing · 1e8b212f

由 Jan Kara 提交于 6月 12, 2012

The only missing piece to make freezing work reliably with ext2 is to
stop iput() of unlinked inode from deleting the inode on frozen filesystem.
So add a necessary protection to ext2_evict_inode().

We also provide appropriate ->freeze_fs and ->unfreeze_fs functions.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1e8b212f

btrfs: Convert to new freezing mechanism · b2b5ef5c

由 Jan Kara 提交于 6月 12, 2012

We convert btrfs_file_aio_write() to use new freeze check.  We also add proper
freeze protection to btrfs_page_mkwrite(). We also add freeze protection to
the transaction mechanism to avoid starting transactions on frozen filesystem.
At minimum this is necessary to stop iput() of unlinked file to change frozen
filesystem during truncation.

Checks in cleaner_kthread() and transaction_kthread() can be safely removed
since btrfs_freeze() will lock the mutexes and thus block the threads (and they
shouldn't have anything to do anyway).

CC: linux-btrfs@vger.kernel.org
CC: Chris Mason <chris.mason@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b2b5ef5c

nilfs2: Convert to new freezing mechanism · 2c22b337

由 Jan Kara 提交于 6月 12, 2012

We change nilfs_page_mkwrite() to provide proper freeze protection for
writeable page faults (we must wait for frozen filesystem even if the
page is fully mapped).

We remove all vfs_check_frozen() checks since they are now handled by
the generic code.

CC: linux-nilfs@vger.kernel.org
CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2c22b337

ntfs: Convert to new freezing mechanism · fbf8fb76

由 Jan Kara 提交于 6月 12, 2012

Move check in ntfs_file_aio_write_nolock() to ntfs_file_aio_write() and
use new freeze protection.

CC: linux-ntfs-dev@lists.sourceforge.net
CC: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fbf8fb76

fuse: Convert to new freezing mechanism · 58ef6a75

由 Jan Kara 提交于 6月 12, 2012

Convert check in fuse_file_aio_write() to using new freeze protection.

CC: fuse-devel@lists.sourceforge.net
CC: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

58ef6a75

gfs2: Convert to new freezing mechanism · 39263d5e

由 Jan Kara 提交于 6月 12, 2012

We update gfs2_page_mkwrite() to use new freeze protection and the transaction
code to use freeze protection while the transaction is running. That is needed
to stop iput() of unlinked file from modifying the filesystem. The rest is
handled by the generic code.

CC: cluster-devel@redhat.com
CC: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

39263d5e

ocfs2: Convert to new freezing mechanism · fef6925c

由 Jan Kara 提交于 6月 12, 2012

Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new freeze
protection. We also protect several ioctl entry points which were missing the
protection. Finally, we add freeze protection to the journaling mechanism so
that iput() of unlinked inode cannot modify a frozen filesystem.

CC: Mark Fasheh <mfasheh@suse.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: ocfs2-devel@oss.oracle.com
Acked-by: NJoel Becker <jlbec@evilplan.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fef6925c

xfs: Convert to new freezing code · d9457dc0

由 Jan Kara 提交于 6月 12, 2012

Generic code now blocks all writers from standard write paths. So we add
blocking of all writers coming from ioctl (we get a protection of ioctl against
racing remount read-only as a bonus) and convert xfs_file_aio_write() to a
non-racy freeze protection. We also keep freeze protection on transaction
start to block internal filesystem writes such as removal of preallocated
blocks.

CC: Ben Myers <bpm@sgi.com>
CC: Alex Elder <elder@kernel.org>
CC: xfs@oss.sgi.com
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d9457dc0

ext4: Convert to new freezing mechanism · 8e8ad8a5

由 Jan Kara 提交于 6月 12, 2012

We remove most of frozen checks since upper layer takes care of blocking all
writes. We have to handle protection in ext4_page_mkwrite() in a special way
because we cannot use generic block_page_mkwrite(). Also we add a freeze
protection to ext4_evict_inode() so that iput() of unlinked inode cannot modify
a frozen filesystem (we cannot easily instrument ext4_journal_start() /
ext4_journal_stop() with freeze protection because we are missing the
superblock pointer in ext4_journal_stop() in nojournal mode).

CC: linux-ext4@vger.kernel.org
CC: "Theodore Ts'o" <tytso@mit.edu>
BugLink: https://bugs.launchpad.net/bugs/897421Tested-by: NKamal Mostafa <kamal@canonical.com>
Tested-by: NPeter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NMassimo Morana <massimo.morana@canonical.com>
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8e8ad8a5

fs: Protect write paths by sb_start_write - sb_end_write · 14da9200

由 Jan Kara 提交于 6月 12, 2012

There are several entry points which dirty pages in a filesystem.  mmap
(handled by block_page_mkwrite()), buffered write (handled by
__generic_file_aio_write()), splice write (generic_file_splice_write),
truncate, and fallocate (these can dirty last partial page - handled inside
each filesystem separately). Protect these places with sb_start_write() and
sb_end_write().

->page_mkwrite() calls are particularly complex since they are called with
mmap_sem held and thus we cannot use standard sb_start_write() due to lock
ordering constraints. We solve the problem by using a special freeze protection
sb_start_pagefault() which ranks below mmap_sem.

BugLink: https://bugs.launchpad.net/bugs/897421Tested-by: NKamal Mostafa <kamal@canonical.com>
Tested-by: NPeter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NMassimo Morana <massimo.morana@canonical.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

14da9200

fs: Skip atime update on frozen filesystem · 5d37e9e6

由 Jan Kara 提交于 6月 12, 2012

It is unexpected to block reading of frozen filesystem because of atime update.
Also handling blocking on frozen filesystem because of atime update would make
locking more complex than it already is. So just skip atime update when
filesystem is frozen like we skip it when filesystem is remounted read-only.

BugLink: https://bugs.launchpad.net/bugs/897421Tested-by: NKamal Mostafa <kamal@canonical.com>
Tested-by: NPeter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NMassimo Morana <massimo.morana@canonical.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5d37e9e6

fs: Add freezing handling to mnt_want_write() / mnt_drop_write() · eb04c282

由 Jan Kara 提交于 6月 12, 2012

Most of places where we want freeze protection coincides with the places where
we also have remount-ro protection. So make mnt_want_write() and
mnt_drop_write() (and their _file alternative) prevent freezing as well.
For the few cases that are really interested only in remount-ro protection
provide new function variants.

BugLink: https://bugs.launchpad.net/bugs/897421Tested-by: NKamal Mostafa <kamal@canonical.com>
Tested-by: NPeter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NMassimo Morana <massimo.morana@canonical.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eb04c282

fs: Improve filesystem freezing handling · 5accdf82

由 Jan Kara 提交于 6月 12, 2012

vfs_check_frozen() tests are racy since the filesystem can be frozen just after
the test is performed. Thus in write paths we can end up marking some pages or
inodes dirty even though the file system is already frozen. This creates
problems with flusher thread hanging on frozen filesystem.

Another problem is that exclusion between ->page_mkwrite() and filesystem
freezing has been handled by setting page dirty and then verifying s_frozen.
This guaranteed that either the freezing code sees the faulted page, writes it,
and writeprotects it again or we see s_frozen set and bail out of page fault.
This works to protect from page being marked writeable while filesystem
freezing is running but has an unpleasant artefact of leaving dirty (although
unmodified and writeprotected) pages on frozen filesystem resulting in similar
problems with flusher thread as the first problem.

This patch aims at providing exclusion between write paths and filesystem
freezing. We implement a writer-freeze read-write semaphore in the superblock.
Actually, there are three such semaphores because of lock ranking reasons - one
for page fault handlers (->page_mkwrite), one for all other writers, and one of
internal filesystem purposes (used e.g. to track running transactions).  Write
paths which should block freezing (e.g. directory operations, ->aio_write(),
->page_mkwrite) hold reader side of the semaphore. Code freezing the filesystem
takes the writer side.

Only that we don't really want to bounce cachelines of the semaphores between
CPUs for each write happening. So we implement the reader side of the semaphore
as a per-cpu counter and the writer side is implemented using s_writers.frozen
superblock field.

[AV: microoptimize sb_start_write(); we want it fast in normal case]

BugLink: https://bugs.launchpad.net/bugs/897421Tested-by: NKamal Mostafa <kamal@canonical.com>
Tested-by: NPeter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NMassimo Morana <massimo.morana@canonical.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5accdf82

ceph: define snap counts as u32 everywhere · aa711ee3

由 Alex Elder 提交于 7月 13, 2012

There are two structures in which a count of snapshots are
maintained:

    struct ceph_snap_context {
	...
        u32 num_snaps;
	...
    }
and
    struct ceph_snap_realm {
	...
        u32 num_prior_parent_snaps;   /*  had prior to parent_since */
	...
        u32 num_snaps;
	...
    }

These fields never take on negative values (e.g., to hold special
meaning), and so are really inherently unsigned.  Furthermore they
take their value from over-the-wire or on-disk formatted 32-bit
values.

So change their definition to have type u32, and change some spots
elsewhere in the code to account for this change.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

aa711ee3

ceph: fix potential double free · 21ec6ffa

由 Alan Cox 提交于 7月 20, 2012

We re-run the loop but we don't re-set the attrs pointer back to NULL.
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

21ec6ffa

ceph: close old con before reopening on mds reconnect · a53aab64

由 Sage Weil 提交于 7月 30, 2012

When we detect a mds session reset, close the old ceph_connection before
reopening it.  This ensures we clean up the old socket properly and keep
the ceph_connection state correct.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

a53aab64

c/r: fcntl: add F_GETOWNER_UIDS option · 1d151c33

由 Cyrill Gorcunov 提交于 7月 30, 2012

When we restore file descriptors we would like them to look exactly as
they were at dumping time.

With help of fcntl it's almost possible, the missing snippet is file
owners UIDs.

To be able to read their values the F_GETOWNER_UIDS is introduced.

This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise
returning -EINVAL.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d151c33

fs: cachefiles: add support for large files in filesystem caching · 98c350cd

由 Justin Lecher 提交于 7月 30, 2012

Support the caching of large files.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=31182Signed-off-by: NJustin Lecher <jlec@gentoo.org>
Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.com>
Tested-by: NSuresh Jayaraman <sjayaraman@suse.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

98c350cd

proc: do not allow negative offsets on /proc/<pid>/environ · bc452b4b

由 Djalal Harouni 提交于 7月 30, 2012

__mem_open() which is called by both /proc/<pid>/environ and
/proc/<pid>/mem ->open() handlers will allow the use of negative offsets.
/proc/<pid>/mem has negative offsets but not /proc/<pid>/environ.

Clean this by moving the 'force FMODE_UNSIGNED_OFFSET flag' to mem_open()
to allow negative offsets only on /proc/<pid>/mem.
Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Brad Spengler <spender@grsecurity.net>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bc452b4b