提交 · badc76dd0dc6d55a86c79e952f19d3af24708058 · openeuler / raspberrypi-kernel

24 1月, 2015 3 次提交

NFSv4: Convert nfs_alloc_seqid() to return an ERR_PTR() if allocation fails · badc76dd

由 Trond Myklebust 提交于 1月 23, 2015

When we relax the sequencing on the NFSv4.1 OPEN/CLOSE code, we will want
to use the value NULL to indicate that no sequencing is needed.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

badc76dd

NFSv4: More CLOSE/OPEN races · f95549cf

由 Trond Myklebust 提交于 1月 23, 2015

If an OPEN RPC call races with a CLOSE or OPEN_DOWNGRADE so that it
updates the nfs_state structure before the CLOSE/OPEN_DOWNGRADE has
a chance to do so, then we know that the state->flags need to be
recalculated from scratch.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f95549cf

NFSv4: Fix an atomicity problem in CLOSE · 566fcec6

由 Trond Myklebust 提交于 1月 23, 2015

If we are to remove the serialisation of OPEN/CLOSE, then we need to
ensure that the stateid sent as part of a CLOSE operation does not
change after we test the state in nfs4_close_prepare.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

566fcec6

22 1月, 2015 3 次提交

NFS: Fix use of nfs_attr_use_mounted_on_fileid() · 2ef47eb1

由 Anna Schumaker 提交于 12月 09, 2014

This function call was being optimized out during nfs_fhget(), leading
to situations where we have a valid fileid but still want to use the
mounted_on_fileid.  For example, imagine we have our server configured
like this:

server % df
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       9.1G  6.5G  1.9G  78% /
/dev/vdb1       487M  2.3M  456M   1% /exports
/dev/vdc1       487M  2.3M  456M   1% /exports/vol1
/dev/vdd1       487M  2.3M  456M   1% /exports/vol2

If our client mounts /exports and tries to do a "chown -R" across the
entire mountpoint, we will get a nasty message warning us about a circular
directory structure.  Running chown with strace tells me that each directory
has the same device and inode number:

newfstatat(AT_FDCWD, "/nfs/", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0
newfstatat(4, "vol1", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0
newfstatat(4, "vol2", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0

With this patch the mounted_on_fileid values are used for st_ino, so the
directory loop warning isn't reported.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2ef47eb1

NFSv4.1: Fix an Oops in nfs41_walk_client_list · 3175e1dc

由 Trond Myklebust 提交于 1月 21, 2015

If we start state recovery on a client that failed to initialise correctly,
then we are very likely to Oops.
Reported-by: N"Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
Link: http://lkml.kernel.org/r/130621862.279655.1421851650684.JavaMail.zimbra@desy.de
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3175e1dc

nfs: fix dio deadlock when O_DIRECT flag is flipped · ee8a1a8b

由 Peng Tao 提交于 1月 20, 2015

We only support swap file calling nfs_direct_IO. However, application
might be able to get to nfs_direct_IO if it toggles O_DIRECT flag
during IO and it can deadlock because we grab inode->i_mutex in
nfs_file_direct_write(). So return 0 for such case. Then the generic
layer will fall back to buffer IO.
Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ee8a1a8b

13 1月, 2015 1 次提交

locks: fix NULL-deref in generic_delete_lease · 52d304eb

由 NeilBrown 提交于 1月 13, 2015

commit 0efaa7e8
  locks: generic_delete_lease doesn't need a file_lock at all

moves the call to fl->fl_lmops->lm_change() to a place in the
code where fl might be a non-lease lock.
When that happens, fl_lmops is NULL and an Oops ensures.

So add an extra test to restore correct functioning.
Reported-by: NLinda Walsh <suse@tlinx.org>
Link: https://bugzilla.suse.com/show_bug.cgi?id=912569
Cc: stable@vger.kernel.org (v3.18)
Fixes: 0efaa7e8Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

52d304eb

10 1月, 2015 1 次提交

kernfs: Fix kernfs_name_compare · 72392ed0

由 Rasmus Villemoes 提交于 12月 05, 2014

Returning a difference from a comparison functions is usually wrong
(see acbbe6fb "kcmp: fix standard comparison bug" for the long
story). Here there is the additional twist that if the void pointers
ns and kn->ns happen to differ by a multiple of 2^32,
kernfs_name_compare returns 0, falsely reporting a match to the
caller.

Technically 'hash - kn->hash' is ok since the hashes are restricted to
31 bits, but it's better to avoid that subtlety.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

72392ed0

09 1月, 2015 5 次提交

sched, fanotify: Deal with nested sleeps · 536ebe9c

由 Peter Zijlstra 提交于 12月 16, 2014

As per e23738a7 ("sched, inotify: Deal with nested sleeps").

fanotify_read is a wait loop with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).

Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.
Reported-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/20141216152838.GZ3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

536ebe9c

vfs: renumber FMODE_NONOTIFY and add to uniqueness check · 75069f2b

由 David Drysdale 提交于 1月 08, 2015

Fix clashing values for O_PATH and FMODE_NONOTIFY on sparc.  The
clashing O_PATH value was added in commit 5229645b ("vfs: add
nonconflicting values for O_PATH") but this can't be changed as it is
user-visible.

FMODE_NONOTIFY is only used internally in the kernel, but it is in the
same numbering space as the other O_* flags, as indicated by the comment
at the top of include/uapi/asm-generic/fcntl.h (and its use in
fs/notify/fanotify/fanotify_user.c).  So renumber it to avoid the clash.

All of this has happened before (commit 12ed2e36: "fanotify:
FMODE_NONOTIFY and __O_SYNC in sparc conflict"), and all of this will
happen again -- so update the uniqueness check in fcntl_init() to
include __FMODE_NONOTIFY.
Signed-off-by: NDavid Drysdale <drysdale@google.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NJan Kara <jack@suse.cz>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

75069f2b

ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file · 53dc20b9

由 Xue jiufei 提交于 1月 08, 2015

In ocfs2_link(), the parent directory inode passed to function
ocfs2_lookup_ino_from_name() is wrong.  Parameter dir is the parent of
new_dentry not old_dentry.  We should get old_dir from old_dentry and
lookup old_dentry in old_dir in case another node remove the old dentry.

With this change, hard linking works again, when paths are relative with
at least one subdirectory.  This is how the problem was reproducable:

  # mkdir a
  # mkdir b
  # touch a/test
  # ln a/test b/test
  ln: failed to create hard link `b/test' => `a/test': No such file or  directory

However when creating links in the same dir, it worked well.

Now the link gets created.

Fixes: 0e048316 ("ocfs2: check existence of old dentry in ocfs2_link()")
Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
Reported-by: NSzabo Aron - UBIT <aron@ubit.hu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Tested-by: NAron Szabo <aron@ubit.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

53dc20b9

ocfs2: remove bogus check in dlm_process_recovery_data · eb4f73b4

由 Joseph Qi 提交于 1月 08, 2015

In dlm_process_recovery_data, only when dlm_new_lock failed the ret will
be set to -ENOMEM.  And in this case, newlock is definitely NULL.  So
test newlock is meaningless, remove it.
Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
Reviewed-by: NAlex Chen <alex.chen@huawei.com>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eb4f73b4

I
ceph: use %zu for len in ceph_fill_inline_data() · 0668ff52
由 Ilya Dryomov 提交于 12月 19, 2014
```
len is size_t, should be printed with %zu.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
```
0668ff52

08 1月, 2015 1 次提交

nfsd: fix fi_delegees leak when fi_had_conflict returns true · 94ae1db2

由 Jeff Layton 提交于 12月 13, 2014

Currently, nfs4_set_delegation takes a reference to an existing
delegation and then checks to see if there is a conflict. If there is
one, then it doesn't release that reference.

Change the code to take the reference after the check and only if there
is no conflict.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

94ae1db2

06 1月, 2015 8 次提交

fuse: add memory barrier to INIT · 9759bd51

由 Miklos Szeredi 提交于 1月 06, 2015

Theoretically we need to order setting of various fields in fc with
fc->initialized.

No known bug reports related to this yet.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

9759bd51

fuse: fix LOOKUP vs INIT compat handling · 21f62174

由 Miklos Szeredi 提交于 1月 06, 2015

Analysis from Marc:

 "Commit 7078187a ("fuse: introduce fuse_simple_request() helper")
  from the above pull request triggers some EIO errors for me in some tests
  that rely on fuse

  Looking at the code changes and a bit of debugging info I think there's a
  general problem here that fuse_get_req checks and possibly waits for
  fc->initialized, and this was always called first.  But this commit
  changes the ordering and in many places fc->minor is now possibly used
  before fuse_get_req, and we can't be sure that fc has been initialized.
  In my case fuse_lookup_init sets req->out.args[0].size to the wrong size
  because fc->minor at that point is still 0, leading to the EIO error."

Fix by moving the compat adjustments into fuse_simple_request() to after
fuse_get_req().

This is also more readable than the original, since now compatibility is
handled in a single function instead of cluttering each operation.
Reported-by: NMarc Dionne <marc.c.dionne@gmail.com>
Tested-by: NMarc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Fixes: 7078187a ("fuse: introduce fuse_simple_request() helper")

21f62174

NFSv4: Remove incorrect check in can_open_delegated() · 4e379d36

由 Trond Myklebust 提交于 12月 19, 2014

Remove an incorrect check for NFS_DELEGATION_NEED_RECLAIM in
can_open_delegated(). We are allowed to cache opens even in
a situation where we're doing reboot recovery.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4e379d36

NFS: Ignore transport protocol when detecting server trunking · 7a01edf0

由 Chuck Lever 提交于 1月 03, 2015

Detect server trunking across transport protocols. Otherwise, an
RDMA mount and a TCP mount of the same server will end up with
separate nfs_clients using the same clientid4.
Reported-by: NDai Ngo <dai.ngo@oracle.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7a01edf0

NFSv4/v4.1: Verify the client owner id during trunking detection · 55b9df93

由 Trond Myklebust 提交于 1月 03, 2015

While we normally expect the NFSv4 client to always send the same client
owner to all servers, there are a couple of situations where that is not
the case:
 1) In NFSv4.0, switching between use of '-omigration' and not will cause
    the kernel to switch between using the non-uniform and uniform client
    strings.
 2) In NFSv4.1, or NFSv4.0 when using uniform client strings, if the
    uniquifier string is suddenly changed.

This patch will catch those situations by checking the client owner id
in the trunking detection code, and will do the right thing if it notices
that the strings differ.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

55b9df93

NFSv4: Cache the NFSv4/v4.1 client owner_id in the struct nfs_client · ceb3a16c

由 Trond Myklebust 提交于 1月 03, 2015

Ensure that we cache the NFSv4/v4.1 client owner_id so that we can
verify it when we're doing trunking detection.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ceb3a16c

NFSv4.1: Fix client id trunking on Linux · 1fc0703a

由 Trond Myklebust 提交于 1月 02, 2015

Currently, our trunking code will check for session trunking, but will
fail to detect client id trunking. This is a problem, because it means
that the client will fail to recognise that the two connections represent
shared state, even if they do not permit a shared session.
By removing the check for the server minor id, and only checking the
major id, we will end up doing the right thing in both cases: we close
down the new nfs_client and fall back to using the existing one.

Fixes: 05f4c350 ("NFS: Discover NFSv4 server trunking when mounting")
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # 3.7.x
Tested-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1fc0703a

LOCKD: Fix a race when initialising nlmsvc_timeout · 06bed7d1

由 Trond Myklebust 提交于 1月 02, 2015

This commit fixes a race whereby nlmclnt_init() first starts the lockd
daemon, and then calls nlm_bind_host() with the expectation that
nlmsvc_timeout has already been initialised. Unfortunately, there is no
no synchronisation between lockd() and lockd_up() to guarantee that this
is the case.

Fix is to move the initialisation of nlmsvc_timeout into lockd_create_svc

Fixes: 9a1b6bf8 ("LOCKD: Don't call utsname()->nodename...")
Cc: Bruce Fields <bfields@fieldses.org>
Cc: stable@vger.kernel.org # 3.10.x
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

06bed7d1

03 1月, 2015 7 次提交

J
ext4: remove spurious KERN_INFO from ext4_warning call · 363307e6
由 Jakub Wilk 提交于 1月 02, 2015
```
Signed-off-by: NJakub Wilk <jwilk@jwilk.net>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
363307e6

Revert "ext4: fix suboptimal seek_{data,hole} extents traversial" · ad7fefb1

由 Theodore Ts'o 提交于 1月 02, 2015

This reverts commit 14516bb7.

This was causing regression test failures with generic/285 with an ext3
filesystem using CONFIG_EXT4_USE_FOR_EXT23.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ad7fefb1

Btrfs: don't delay inode ref updates during log replay · 6f896054

由 Chris Mason 提交于 12月 31, 2014

Commit 1d52c78a (Btrfs: try not to ENOSPC on log replay) added a
check to skip delayed inode updates during log replay because it
confuses the enospc code.  But the delayed processing will end up
ignoring delayed refs from log replay because the inode itself wasn't
put through the delayed code.

This can end up triggering a warning at commit time:

WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()

Which is repeated for each commit because we never process the delayed
inode ref update.

The fix used here is to change btrfs_delayed_delete_inode_ref to return
an error if we're currently in log replay.  The caller will do the ref
deletion immediately and everything will work properly.
Signed-off-by: NChris Mason <clm@fb.com>
cc: stable@vger.kernel.org # v3.18 and any stable series that picked 1d52c78a

6f896054

Btrfs: correctly get tree level in tree_backref_for_extent · a1317f45

由 Filipe Manana 提交于 12月 15, 2014

If we are using skinny metadata, the block's tree level is in the offset
of the key and not in a btrfs_tree_block_info structure following the
extent item (it doesn't exist). Therefore fix it.

Besides returning the correct level in the tree, this also prevents reading
past the leaf's end in the case where the extent item is the last item in
the leaf (eb) and it has only 1 inline reference - this is because
sizeof(struct btrfs_tree_block_info) is greater than
sizeof(struct btrfs_extent_inline_ref).

Got it while running a scrub which produced the following warning:

BTRFS: checksum error at logical 42123264 on dev /dev/sde, sector 15840: metadata node (level 24) in tree 5
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

a1317f45

Btrfs: call inode_dec_link_count() on mkdir error path · c7cfb8a5

由 Wang Shilong 提交于 12月 24, 2014

In btrfs_mkdir(), if it fails to create dir, we should
clean up existed items, setting inode's link properly
to make sure it could be cleaned up properly.
Signed-off-by: NWang Shilong <wangshilong1991@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

c7cfb8a5

Btrfs: abort transaction if we don't find the block group · df95e7f0

由 Josef Bacik 提交于 12月 12, 2014

We shouldn't BUG_ON() if there is corruption.  I hit this while testing my block
group patch and the abort worked properly.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

df95e7f0

Btrfs, scrub: uninitialized variable in scrub_extent_for_parity() · 6b6d24b3

由 Dan Carpenter 提交于 12月 12, 2014

The only way that "ret" is set is when we call scrub_pages_for_parity()
so the skip to "if (ret) " test doesn't make sense and causes a static
checker warning.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

6b6d24b3

27 12月, 2014 1 次提交

ext4: prevent online resize with backup superblock · 011fa994

由 Theodore Ts'o 提交于 12月 26, 2014

Prevent BUG or corrupted file systems after the following:

mkfs.ext4 /dev/vdc 100M
mount -t ext4 -o sb=40961 /dev/vdc /vdc
resize2fs /dev/vdc

We previously prevented online resizing using the old resize ioctl.
Move the code to ext4_resize_begin(), so the check applies for all of
the resize ioctl's.
Reported-by: NMaxim Malkov <malkov@ispras.ru>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

011fa994

23 12月, 2014 1 次提交

cifs: make new inode cache when file type is different · 9e6d722f

由 Nakajima Akira 提交于 12月 19, 2014

In spite of different file type,
 if file is same name and same inode number, old inode cache is used.
This causes that you can not cd directory, can not cat SymbolicLink.
So this patch is that if file type is different, return error.

Reproducible sample :
1. create file 'a' at cifs client.
2. repeat rm and mkdir 'a' 4 times at server, then direcotry 'a' having same inode number is created.
   (Repeat 4 times, then same inode number is recycled.)
   (When server is under RHEL 6.6, 1 time is O.K.  Always same inode number is recycled.)
3. ls -li at client, then you can not cd directory, can not remove directory.

SymbolicLink has same problem.

Bug link:
https://bugzilla.kernel.org/show_bug.cgi?id=90011Signed-off-by: NNakajima Akira <nakajima.akira@nttcom.co.jp>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NSteve French <steve.french@primarydata.com>

9e6d722f

22 12月, 2014 2 次提交

udf: Reduce repeated dereferences · 3ee3039c

由 Jan Kara 提交于 12月 18, 2014

Replace repeated dereferences like dir->i_sb by storing superblock
pointer in a variable and using that.
Signed-off-by: NJan Kara <jack@suse.cz>

3ee3039c

udf: Check component length before reading it · e237ec37

由 Jan Kara 提交于 12月 19, 2014

Check that length specified in a component of a symlink fits in the
input buffer we are reading. Also properly ignore component length for
component types that do not use it. Otherwise we read memory after end
of buffer for corrupted udf image.
Reported-by: NCarl Henrik Lunde <chlunde@ping.uio.no>
CC: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>

e237ec37

19 12月, 2014 7 次提交

udf: Check path length when reading symlink · 0e5cc9a4

由 Jan Kara 提交于 12月 18, 2014

Symlink reading code does not check whether the resulting path fits into
the page provided by the generic code. This isn't as easy as just
checking the symlink size because of various encoding conversions we
perform on path. So we have to check whether there is still enough space
in the buffer on the fly.

CC: stable@vger.kernel.org
Reported-by: NCarl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: NJan Kara <jack@suse.cz>

0e5cc9a4

udf: Verify symlink size before loading it · a1d47b26

由 Jan Kara 提交于 12月 19, 2014

UDF specification allows arbitrarily large symlinks. However we support
only symlinks at most one block large. Check the length of the symlink
so that we don't access memory beyond end of the symlink block.

CC: stable@vger.kernel.org
Reported-by: NCarl Henrik Lunde <chlunde@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

a1d47b26

udf: Verify i_size when loading inode · e159332b

由 Jan Kara 提交于 12月 19, 2014

Verify that inode size is sane when loading inode with data stored in
ICB. Otherwise we may get confused later when working with the inode and
inode size is too big.

CC: stable@vger.kernel.org
Reported-by: NCarl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: NJan Kara <jack@suse.cz>

e159332b

isofs: Fix unchecked printing of ER records · 4e202462

由 Jan Kara 提交于 12月 18, 2014

We didn't check length of rock ridge ER records before printing them.
Thus corrupted isofs image can cause us to access and print some memory
behind the buffer with obvious consequences.
Reported-and-tested-by: NCarl Henrik Lunde <chlunde@ping.uio.no>
CC: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>

4e202462

ocfs2: fix journal commit deadlock · 136f49b9

由 Junxiao Bi 提交于 12月 18, 2014

For buffer write, page lock will be got in write_begin and released in
write_end, in ocfs2_write_end_nolock(), before it unlock the page in
ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask
for the read lock of journal->j_trans_barrier.  Holding page lock and
ask for journal->j_trans_barrier breaks the locking order.

This will cause a deadlock with journal commit threads, ocfs2cmt will
get write lock of journal->j_trans_barrier first, then it wakes up
kjournald2 to do the commit work, at last it waits until done.  To
commit journal, kjournald2 needs flushing data first, it needs get the
cache page lock.

Since some ocfs2 cluster locks are holding by write process, this
deadlock may hung the whole cluster.

unlock pages before ocfs2_run_deallocs() can fix the locking order, also
put unlock before ocfs2_commit_trans() to make page lock is unlocked
before j_trans_barrier to preserve unlocking order.
Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

136f49b9

ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker · 1e589581

由 Joseph Qi 提交于 12月 18, 2014

Commit ac4fef4d ("ocfs2/dlm: do not purge lockres that is queued for
assert master") may have the following possible race case:

  dlm_dispatch_assert_master       dlm_wq
  ========================================================================
  queue_work(dlm->quedlm_worker,
      &dlm->dispatched_work);
                                 dispatch work,
                                 dlm_lockres_drop_inflight_worker
                                 *BUG_ON(res->inflight_assert_workers == 0)*
  dlm_lockres_grab_inflight_worker
  inflight_assert_workers++

So ensure inflight_assert_workers to be increased first.
Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
Signed-off-by: NXue jiufei <xuejiufei@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1e589581

ocfs2: reflink: fix slow unlink for refcounted file · f62f12b3

由 Junxiao Bi 提交于 12月 18, 2014

When running ocfs2 test suite multiple nodes reflink stress test, for a
4 nodes cluster, every unlink() for refcounted file needs about 700s.

The slow unlink is caused by the contention of refcount tree lock since
all nodes are unlink files using the same refcount tree.  When the
unlinking file have many extents(over 1600 in our test), most of the
extents has refcounted flag set.  In ocfs2_commit_truncate(), it will
execute the following call trace for every extents.  This means it needs
get and released refcount tree lock about 1600 times.  And when several
nodes are do this at the same time, the performance will be very low.

  ocfs2_remove_btree_range()
  --  ocfs2_lock_refcount_tree()
  ----  ocfs2_refcount_lock()
  ------  __ocfs2_cluster_lock()

ocfs2_refcount_lock() is costly, move it to ocfs2_commit_truncate() to
do lock/unlock once can improve a lot performance.
Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Cc: Wengang <wen.gang.wang@oracle.com>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f62f12b3