提交 · fb6791d100d1bba20b5cdbc4912e1f7086ec60f8 · openanolis / cloud-kernel

14 11月, 2012 1 次提交

GFS2: skip dlm_unlock calls in unmount · fb6791d1

由 David Teigland 提交于 11月 13, 2012

When unmounting, gfs2 does a full dlm_unlock operation on every
cached lock. This can create a very large amount of work and can
take a long time to complete. However, the vast majority of these
dlm unlock operations are unnecessary because after all the unlocks
are done, gfs2 leaves the dlm lockspace, which automatically clears
the locks of the leaving node, without unlocking each one individually.
So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
remove the locks implicitly. The one exception is when the lock's lvb is
being used. In this case, dlm_unlock is called because it may update the
lvb of the resource.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

fb6791d1

13 11月, 2012 4 次提交

GFS2: Fix one RG corner case · aa8920c9

由 Steven Whitehouse 提交于 11月 13, 2012

For filesystems with only a single resource group, we need to be careful
that the allocation loop will not land up with a NULL resource group. This
fixes a bug in a previous patch where the gfs2_rgrpd_get_next() function
was being used instead of gfs2_rgrpd_get_first()
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

aa8920c9

GFS2: Eliminate redundant buffer_head manipulation in gfs2_unlink_inode · 4327a9bf

由 Bob Peterson 提交于 11月 12, 2012

Since we now have a dirty_inode that takes care of manipulating the
inode buffer and writing from the inode to the buffer, we can
eliminate some unnecessary buffer manipulations in gfs2_unlink_inode
that are now redundant.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4327a9bf

GFS2: Use dirty_inode in gfs2_dir_add · 343cd8f0

由 Bob Peterson 提交于 11月 12, 2012

This patch changes the gfs2_dir_add function so that it uses
the dirty_inode function (via mark_inode_dirty) rather than manually
updating the dinode.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

343cd8f0

GFS2: Fix truncation of journaled data files · fa731fc4

由 Steven Whitehouse 提交于 11月 13, 2012

This patch fixes an issue relating to not having enough revokes
available when truncating journaled data files. In order to ensure
that we do no run out, the truncation is broken into separate pieces
if it is large enough.

Tested using fsx on a journaled data file.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

fa731fc4

07 11月, 2012 13 次提交

GFS2: Add Orlov allocator · 9dbe9610

由 Steven Whitehouse 提交于 10月 31, 2012

Just like ext3, this works on the root directory and any directory
with the +T flag set. Also, just like ext3, any subdirectory created
in one of the just mentioned cases will be allocated to a random
resource group (GFS2 equivalent of a block group).

If you are creating a set of directories, each of which will contain a
job running on a different node, then by setting +T on the parent
directory before creating the subdirectories, each will land up in a
different resource group, and thus resource group contention between
nodes will be kept to a minimum.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

9dbe9610

GFS2: Use proper allocation context for new inodes · c9aecf73

由 Steven Whitehouse 提交于 10月 31, 2012

Rather than using the parent directory's allocation context, this
patch allocated the new inode earlier in the process and then uses
it to contain all the information required. As a result, we can now
use the new inode's own allocation context to allocate it rather
than having to use the parent directory's context. This give us a
lot more flexibility in where the inode is placed on disk.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

c9aecf73

GFS2: Add test for resource group congestion status · bcd97c06

由 Steven Whitehouse 提交于 10月 31, 2012

This patch uses information gathered by the recent glock statistics
patch in order to derrive a boolean verdict on the congestion
status of a resource group. This is then used when making decisions
on which resource group to choose during block allocation.

The aim is to avoid resource groups which are heavily contended
by other nodes, while still ensuring locality of access wherever
possible.

Once a reservation has been made in a particular resource group
we continue to use that resource group until a new reservation is
required. This should help to ensure that we do not change resource
groups too often.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

bcd97c06

GFS2: Rename glops go_xmote_th to go_sync · 06dfc306

由 Bob Peterson 提交于 10月 24, 2012

[Editorial: This is a nit, but has been a minor irritation for a long time:]

This patch renames glops structure item for go_xmote_th to go_sync.
The functionality is unchanged; it's just for readability.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

06dfc306

GFS2: Speed up gfs2_rbm_from_block · a68a0a35

由 Bob Peterson 提交于 10月 19, 2012

This patch is a rewrite of function gfs2_rbm_from_block. Rather than
looping to find the right bitmap, the code now does a few simple
math calculations.

I compared the performance of both algorithms side by side and the new
algorithm is noticeably faster. Sample instrumentation output from a
"fast" machine:

5 million calls: millisec spent: Orig: 166 New: 113
5 million calls: millisec spent: Orig: 189 New: 114

In addition, I ran postmark (on a somewhat slowr CPU) before the after
the new algorithm was put in place and postmark showed a decent
improvement:

Before the new algorithm:
-------------------------
Time:
	645 seconds total
	584 seconds of transactions (171 per second)

Files:
	150087 created (232 per second)
		Creation alone: 100000 files (2083 per second)
		Mixed with transactions: 50087 files (85 per second)
	49995 read (85 per second)
	49991 appended (85 per second)
	150087 deleted (232 per second)
		Deletion alone: 100174 files (7705 per second)
		Mixed with transactions: 49913 files (85 per second)

Data:
	273.42 megabytes read (434.08 kilobytes per second)
	852.13 megabytes written (1.32 megabytes per second)

With the new algorithm:
-----------------------
Time:
	599 seconds total
	530 seconds of transactions (188 per second)

Files:
	150087 created (250 per second)
		Creation alone: 100000 files (1886 per second)
		Mixed with transactions: 50087 files (94 per second)
	49995 read (94 per second)
	49991 appended (94 per second)
	150087 deleted (250 per second)
		Deletion alone: 100174 files (6260 per second)
		Mixed with transactions: 49913 files (94 per second)

Data:
	273.42 megabytes read (467.42 kilobytes per second)
	852.13 megabytes written (1.42 megabytes per second)
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a68a0a35

GFS2: Review bug traps in glops.c · 8eae1ca0

由 Steven Whitehouse 提交于 10月 15, 2012

Two of the bug traps here could really be warnings. The others are
converted from BUG() to GLOCK_BUG_ON() since we'll most likely
need to know the glock state in order to debug any issues which
arise. As a result of this, __dump_glock has to be renamed and
is no longer static.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

8eae1ca0

GFS2: Test bufdata with buffer locked and gfs2_log_lock held · 96e5d1d3

由 Benjamin Marzinski 提交于 11月 07, 2012

In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
buffer without having the gfs2_log_lock held. It was then assuming it would
stay attached for the rest of the function. However, without either the log
lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
time. This patch moves the locking before the test. If there isn't a bd
already attached, gfs2 can safely allocate one and attach it before locking.
There is no way that the newly allocated bd could be on the ail list,
and thus no way for __gfs2_ail_flush() to detach it.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

96e5d1d3

GFS2: Don't call file_accessed() with a shared glock · 3d162688

由 Benjamin Marzinski 提交于 11月 06, 2012

file_accessed() was being called by gfs2_mmap() with a shared glock. If it
needed to update the atime, it was crashing because it dirtied the inode in
gfs2_dirty_inode() without holding an exclusive lock. gfs2_dirty_inode()
checked if the caller was already holding a glock, but it didn't make sure that
the glock was in the exclusive state. Now, instead of calling file_accessed()
while holding the shared lock in gfs2_mmap(), file_accessed() is called after
grabbing and releasing the glock to update the inode. If file_accessed() needs
to update the atime, it will grab an exclusive lock in gfs2_dirty_inode().

gfs2_dirty_inode() now also checks to make sure that if the calling process has
already locked the glock, it has an exclusive lock.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

3d162688

GFS2: Fix FITRIM argument handling · 076f0faa

由 Lukas Czerner 提交于 10月 16, 2012

Currently implementation in gfs2 uses FITRIM arguments as it were in
file system blocks units which is wrong. The FITRIM arguments
(fstrim_range.start, fstrim_range.len and fstrim_range.minlen) are
actually in bytes.

Moreover, check for start argument beyond the end of file system, len
argument being smaller than file system block and minlen argument being
bigger than biggest resource group were missing.

This commit converts the code to convert FITRIM argument to file system
blocks and also adds appropriate checks mentioned above.

All the problems were recognised by xfstests 251 and 260.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

076f0faa

GFS2: Require user to provide argument for FITRIM · 3a238ade

由 Lukas Czerner 提交于 10月 16, 2012

When the fstrim_range argument is not provided by user in FITRIM ioctl
we should just return EFAULT and not promoting bad behaviour by filling
the structure in kernel. Let the user deal with it.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

3a238ade

GFS2: Clean up some unused assignments · 73738a77

由 Andrew Price 提交于 10月 12, 2012

Cleans up two cases where variables were assigned values but then never
used again.
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

73738a77

GFS2: Fix possible null pointer deref in gfs2_rs_alloc · cd0ed19f

由 Andrew Price 提交于 10月 12, 2012

Despite the return value from kmem_cache_zalloc() being checked, the
error wasn't being returned until after a possible null pointer
dereference. This patch returns the error immediately, allowing the
removal of the error variable.
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

cd0ed19f

GFS2: Fix an unchecked error from gfs2_rs_alloc · aaaf68c5

由 Andrew Price 提交于 10月 12, 2012

Check the return value of gfs2_rs_alloc(ip) and avoid a possible null
pointer dereference.
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

aaaf68c5

03 11月, 2012 1 次提交

NFS4: nfs4_opendata_access should return errno · 998f40b5

由 Weston Andros Adamson 提交于 11月 02, 2012

Return errno - not an NFS4ERR_. This worked because NFS4ERR_ACCESS == EACCES.
Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

998f40b5

02 11月, 2012 1 次提交
- T
  NFSv4: Initialise the NFSv4.1 slot table highest_used_slotid correctly · f9b1ef5f
  由 Trond Myklebust 提交于 10月 29, 2012
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
  f9b1ef5f
01 11月, 2012 8 次提交

NFS: add nfs_sb_deactive_async to avoid deadlock · 324d003b

由 Weston Andros Adamson 提交于 10月 30, 2012

Use nfs_sb_deactive_async instead of nfs_sb_deactive when in a workqueue
context.  This avoids a deadlock where rpc_shutdown_client loops forever
in a workqueue kworker context, trying to kill all RPC tasks associated with
the client, while one or more of these tasks have already been assigned to the
same kworker (and will never run rpc_exit_task).

This approach is needed because RPC tasks that have already been assigned
to a kworker by queue_work cannot be canceled, as explained in the comment
for workqueue.c:insert_wq_barrier.
Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
[Trond: add module_get/put.]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

324d003b

nfs: Show original device name verbatim in /proc/*/mount{s,info} · 97a54868

由 Ben Hutchings 提交于 10月 21, 2012

Since commit c7f404b4 ('vfs: new superblock methods to override
/proc/*/mount{s,info}'), nfs_path() is used to generate the mounted
device name reported back to userland.

nfs_path() always generates a trailing slash when the given dentry is
the root of an NFS mount, but userland may expect the original device
name to be returned verbatim (as it used to be).  Make this
canonicalisation optional and change the callers accordingly.

[jrnieder@gmail.com: use flag instead of bool argument]
Reported-and-tested-by: NChris Hiestand <chiestand@salk.edu>
Reference: http://bugs.debian.org/669314Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Cc: <stable@vger.kernel.org> # v2.6.39+
Signed-off-by: NJonathan Nieder <jrnieder@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

97a54868

nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeouts · acce94e6

由 Scott Mayhew 提交于 10月 16, 2012

In very busy v3 environment, rpc.mountd can respond to the NULL
procedure but not the MNT procedure in a timely manner causing
the MNT procedure to time out. The problem is the mount system
call returns EIO which causes the mount to fail, instead of
ETIMEDOUT, which would cause the mount to be retried.

This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to
the rpc_call_sync() call in nfs_mount() which causes
ETIMEDOUT to be returned on timed out connections.
Signed-off-by: NSteve Dickson <steved@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

acce94e6

nfs: Check whether a layout pointer is NULL before free it · 7175fe90

由 Yanchuan Nian 提交于 10月 31, 2012

The new layout pointer in pnfs_find_alloc_layout() may be NULL because of
out of memory. we must do some check work, otherwise pnfs_free_layout_hdr()
will go wrong because it can not deal with a NULL pointer.
Signed-off-by: NYanchuan Nian <ycnian@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7175fe90

NFS: fix bug in legacy DNS resolver. · 8d96b106

由 NeilBrown 提交于 10月 31, 2012

The DNS resolver's use of the sunrpc cache involves a 'ttl' number
(relative) rather that a timeout (absolute).  This confused me when
I wrote
  commit c5b29f88
     "sunrpc: use seconds since boot in expiry cache"

and I managed to break it.  The effect is that any TTL is interpreted
as 0, and nothing useful gets into the cache.

This patch removes the use of get_expiry() - which really expects an
expiry time - and uses get_uint() instead, treating the int correctly
as a ttl.

This fixes a regression that has been present since 2.6.37, causing
certain NFS accesses in certain environments to incorrectly fail.
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

8d96b106

NFSv4: nfs4_locku_done must release the sequence id · 2b1bc308

由 Trond Myklebust 提交于 10月 29, 2012

If the state recovery machinery is triggered by the call to
nfs4_async_handle_error() then we can deadlock.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

2b1bc308

NFSv4.1: We must release the sequence id when we fail to get a session slot · 2240a9e2

由 Trond Myklebust 提交于 10月 29, 2012

If we do not release the sequence id in cases where we fail to get a
session slot, then we can deadlock if we hit a recovery scenario.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

2240a9e2

NFS: Wait for session recovery to finish before returning · 399f11c3

由 Bryan Schumaker 提交于 10月 30, 2012

Currently, we will schedule session recovery and then return to the
caller of nfs4_handle_exception.  This works for most cases, but causes
a hang on the following test case:

	Client				Server
	------				------
	Open file over NFS v4.1
	Write to file
					Expire client
	Try to lock file

The server will return NFS4ERR_BADSESSION, prompting the client to
schedule recovery.  However, the client will continue placing lock
attempts and the open recovery never seems to be scheduled.  The
simplest solution is to wait for session recovery to run before retrying
the lock.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

399f11c3

31 10月, 2012 1 次提交

Return the right error value when dup[23]() newfd argument is too large · 08f05c49

由 Al Viro 提交于 10月 31, 2012

Jack Lin reports that the error return from dup3() for the RLIMIT_NOFILE
case changed incorrectly after 3.6.

The culprit is commit f33ff992 ("take rlimit check to callers of
expand_files()") which when it moved the "return -EMFILE" out to the
caller, didn't notice that the dup3() had special code to turn the
EMFILE return into EBADF.

The replace_fd() helper that got added later then inherited the bug too.
Reported-by: NJack Lin <linliangjie@huawei.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
[ Noted more bugs, wrote proper changelog, fixed up typos - Linus ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08f05c49

29 10月, 2012 3 次提交

ceph: fix dentry reference leak in encode_fh() · 52eb5a90

由 David Zafman 提交于 10月 18, 2012

Call to d_find_alias() needs a corresponding dput()

This fixes http://tracker.newdream.net/issues/3271Signed-off-by: NDavid Zafman <david.zafman@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

52eb5a90

ext4: fix unjournaled inode bitmap modification · ffb5387e

由 Eric Sandeen 提交于 10月 28, 2012

commit 119c0d44 changed
ext4_new_inode() such that the inode bitmap was being modified
outside a transaction, which could lead to corruption, and was
discovered when journal_checksum found a bad checksum in the
journal during log replay.

Nix ran into this when using the journal_async_commit mount
option, which enables journal checksumming.  The ensuing
journal replay failures due to the bad checksums led to
filesystem corruption reported as the now infamous
"Apparent serious progressive ext4 data corruption bug"

[ Changed by tytso to only call ext4_journal_get_write_access() only
  when we're fairly certain that we're going to allocate the inode. ]

I've tested this by mounting with journal_checksum and
running fsstress then dropping power; I've also tested by
hacking DM to create snapshots w/o first quiescing, which
allows me to test journal replay repeatedly w/o actually
power-cycling the box.  Without the patch I hit a journal
checksum error every time.  With this fix it survives
many iterations.
Reported-by: NNix <nix@esperi.org.uk>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

ffb5387e

Lock splice_read and splice_write functions · 1a25b1c4

由 Mikulas Patocka 提交于 10月 15, 2012

Functions generic_file_splice_read and generic_file_splice_write access
the pagecache directly. For block devices these functions must be locked
so that block size is not changed while they are in progress.

This patch is an additional fix for commit b87570f5 ("Fix a crash
when block device is read and block size is changed at the same time")
that locked aio_read, aio_write and mmap against block size change.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1a25b1c4

27 10月, 2012 1 次提交

VFS: don't do protected {sym,hard}links by default · 561ec64a

由 Linus Torvalds 提交于 10月 26, 2012

In commit 800179c9 ("This adds symlink and hardlink restrictions to
the Linux VFS"), the new link protections were enabled by default, in
the hope that no actual application would care, despite it being
technically against legacy UNIX (and documented POSIX) behavior.

However, it does turn out to break some applications.  It's rare, and
it's unfortunate, but it's unacceptable to break existing systems, so
we'll have to default to legacy behavior.

In particular, it has broken the way AFD distributes files, see

  http://www.dwd.de/AFD/

along with some legacy scripts.

Distributions can end up setting this at initrd time or in system
scripts: if you have security problems due to link attacks during your
early boot sequence, you have bigger problems than some kernel sysctl
setting. Do:

	echo 1 > /proc/sys/fs/protected_symlinks
	echo 1 > /proc/sys/fs/protected_hardlinks

to re-enable the link protections.

Alternatively, we may at some point introduce a kernel config option
that sets these kinds of "more secure but not traditional" behavioural
options automatically.
Reported-by: NNick Bowler <nbowler@elliptictech.com>
Reported-by: NHolger Kiehl <Holger.Kiehl@dwd.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # v3.6
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

561ec64a

26 10月, 2012 7 次提交

fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check · 12176503

由 Kees Cook 提交于 10月 25, 2012

The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check
while converting ioctl arguments.  This could lead to leaking kernel
stack contents into userspace.

Patch extracted from existing fix in grsecurity.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12176503

freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD · b40a7959

由 Oleg Nesterov 提交于 10月 25, 2012

flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b40a7959

Btrfs: do not bug when we fail to commit the transaction · c37b2b62

由 Josef Bacik 提交于 10月 22, 2012

We BUG if we fail to commit the transaction when creating a snapshot, which
is just obnoxious.  Remove the BUG_ON().  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c37b2b62

Btrfs: fix memory leak when cloning root's node · 7bfdcf7f

由 Liu Bo 提交于 10月 25, 2012

After cloning root's node, we forgot to dec the src's ref
which can lead to a memory leak.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

7bfdcf7f

Btrfs: Use btrfs_update_inode_fallback when creating a snapshot · be6aef60

由 Josef Bacik 提交于 10月 22, 2012

On a really full file system I was getting ENOSPC back from
btrfs_update_inode when trying to update the parent inode when creating a
snapshot. Just use the fallback method so we can update the inode and not
have to worry about having a delayed ref. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

be6aef60

Btrfs: Send: preserve ownership (uid and gid) also for symlinks. · e2d044fe

由 Alex Lyakas 提交于 10月 17, 2012

This patch also requires a change in the user-space part of "receive".
We need to use "lchown" instead of "chown". We will do this in the
following patch.
Signed-off-by: NAlex Lyakas <alex.btrfs@zadarastorage.com>

 	if (S_ISREG(sctx->cur_inode_mode)) {

e2d044fe

Btrfs: fix deadlock caused by the nested chunk allocation · 671415b7

由 Miao Xie 提交于 10月 16, 2012

Steps to reproduce:
 # mkfs.btrfs -m raid1 <disk1> <disk2>
 # btrfstune -S 1 <disk1>
 # mount <disk1> <mnt>
 # btrfs device add <disk3> <disk4> <mnt>
 # mount -o remount,rw <mnt>
 # dd if=/dev/zero of=<mnt>/tmpfile bs=1M count=1
 Deadlock happened.

It is because of the nested chunk allocation. When we wrote the data
into the filesystem, we would allocate the data chunk because there was
no data chunk in the filesystem. At the end of the data chunk allocation,
we should insert the metadata of the data chunk into the extent tree, but
there was no raid1 chunk, so we tried to lock the chunk allocation mutex to
allocate the new chunk, but we had held the mutex, the deadlock happened.

By rights, we would allocate the raid1 chunk when we added the second device
because the profile of the seed filesystem is raid1 and we had two devices.
But we didn't do that in fact. It is because the last step of the first device
insertion didn't commit the transaction. So when we added the second device,
we didn't cow the tree, and just inserted the relative metadata into the leaves
which were generated by the first device insertion, and its profile was dup.

So, I fix this problem by commiting the transaction at the end of the first
device insertion.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

671415b7

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功