提交 · ab866549b3da3eef88e51696bcb24e79f1cc3745 · openeuler / raspberrypi-kernel

05 4月, 2014 12 次提交

ceph: drop extra open file reference in ceph_atomic_open() · ab866549

由 Yan, Zheng 提交于 4月 01, 2014

ceph_atomic_open() calls ceph_open() after receiving the MDS reply.
ceph_open() grabs an extra open file reference. (The open request
already holds an open file reference)
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

ab866549

ceph: preallocate buffer for readdir reply · 54008399

由 Yan, Zheng 提交于 3月 29, 2014

Preallocate buffer for readdir reply. Limit number of entries in
readdir reply according to the buffer size.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

54008399

Y
ceph: don't include ceph.{file,dir}.layout vxattr in listxattr() · cc48c3e8
由 Yan, Zheng 提交于 3月 24, 2014
```
This avoids 'cp -a' modifying layout of new files/directories.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
cc48c3e8

ceph: check buffer size in ceph_vxattrcb_layout() · 1e5c6649

由 Yan, Zheng 提交于 3月 24, 2014

If buffer size is zero, return the size of layout vxattr. If buffer
size is not zero, check if it is large enough for layout vxattr.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

1e5c6649

ceph: fix null pointer dereference in discard_cap_releases() · 00bd8edb

由 Yan, Zheng 提交于 3月 24, 2014

send_mds_reconnect() may call discard_cap_releases() after all
release messages have been dropped by cleanup_cap_releases()
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

00bd8edb

ceph: Remove get/set acl on symlinks · 5f75ce57

由 Fabian Frederick 提交于 3月 21, 2014

Remove unsupported symlink operations.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

5f75ce57

ceph: set mds_wanted when MDS reply changes a cap to auth cap · d9ffc4f7

由 Yan, Zheng 提交于 3月 18, 2014

When adjusting caps client wants, MDS does not record caps that are
not allowed. For non-auth MDS, it does not record WR caps. So when
a MDS reply changes a non-auth cap to auth cap, client needs to set
cap's mds_wanted according to the reply.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

d9ffc4f7

ceph: use fl->fl_file as owner identifier of flock and posix lock · eb13e832

由 Yan, Zheng 提交于 3月 09, 2014

flock and posix lock should use fl->fl_file instead of process ID
as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner
is usually equal to fl->fl_file, but it also can be a customized
value). The process ID of who holds the lock is just for F_GETLK
fcntl(2).

The fix is rename the 'pid' fields of struct ceph_mds_request_args
and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages.
We also set the most significant bit of the 'owner' field. MDS can
use that bit to distinguish between old and new clients.

The MDS counterpart of this patch modifies the flock code to not
take the 'pid_namespace' into consideration when checking conflict
locks.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

eb13e832

Y
ceph: forbid mandatory file lock · eb70c0ce
由 Yan, Zheng 提交于 3月 04, 2014
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
eb70c0ce

ceph: use fl->fl_type to decide flock operation · 0e8e95d6

由 Yan, Zheng 提交于 3月 04, 2014

VFS does not directly pass flock's operation code to filesystem's
flock callback. It translates the operation code to the form how
posix lock's parameters are presented.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

0e8e95d6

ceph: update i_max_size even if inode version does not change · 8c93cd61

由 Yan, Zheng 提交于 3月 08, 2014

handle following sequence of events:
 - client releases a inode with i_max_size > 0. The release message
   is queued. (is not sent to the auth MDS)
 - a 'lookup' request reply from non-auth MDS returns the same inode.
 - client opens the inode in write mode. The version of inode trace
   in 'open' request reply is equal to the cached inode's version.
 - client requests new max size. The MDS ignores the request because
   it does not affect client's write range
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8c93cd61

ceph: make sure write caps are registered with auth MDS · a2550604

由 Yan, Zheng 提交于 3月 08, 2014

Only auth MDS can issue write caps to clients, so don't consider
write caps registered with non-auth MDS as valid.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

a2550604

03 4月, 2014 15 次提交

Y
ceph: print inode number for LOOKUPINO request · c137a32a
由 Yan, Zheng 提交于 3月 01, 2014
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>
```
c137a32a

ceph: add get_name() NFS export callback · 19913b4e

由 Yan, Zheng 提交于 3月 06, 2014

Use the newly introduced LOOKUPNAME MDS request to connect child
inode to its parent directory.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

19913b4e

ceph: fix ceph_fh_to_parent() · 8996f4f2

由 Yan, Zheng 提交于 3月 01, 2014

ceph_fh_to_parent() returns dentry that corresponds to the 'ino' field
of struct ceph_nfs_confh. This is wrong, it should return dentry that
corresponds to the 'parent_ino' field.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8996f4f2

ceph: add get_parent() NFS export callback · 9017c2ec

由 Yan, Zheng 提交于 3月 01, 2014

The callback uses LOOKUPPARENT MDS request to find parent.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

9017c2ec

ceph: simplify ceph_fh_to_dentry() · 4f32b42d

由 Yan, Zheng 提交于 3月 01, 2014

MDS handles LOOKUPHASH and LOOKUPINO MDS requests in the same way.
So __cfh_to_dentry() is redundant.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

4f32b42d

ceph: fscache: Wait for completion of object initialization · f1fc4fee

由 Yunchuan Wen 提交于 12月 26, 2013

The object store limit needs to be updated after writing,
and this can be done provided the corresponding object has already
been initialized. Current object initialization is done asynchrously,
which introduce a race if a file is opened, then immediately followed
by a writing, the initialization may have not completed, the code will
reach the ASSERT in fscache_submit_exclusive_op() to cause kernel
bug.
Tested-by: NMilosz Tanski <milosz@adfin.com>
Signed-off-by: NYunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: NMin Chen <minchen@ubuntukylin.com>
Signed-off-by: NLi Wang <liwang@ubuntukylin.com>

f1fc4fee

ceph: fscache: Update object store limit after file writing · 32d3e148

由 Yunchuan Wen 提交于 12月 26, 2013

Synchronize object->store_limit[_l] with new inode->i_size after file writing.
Tested-by: NMilosz Tanski <milosz@adfin.com>
Signed-off-by: NYunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: NMin Chen <minchen@ubuntukylin.com>
Signed-off-by: NLi Wang <liwang@ubuntukylin.com>

32d3e148

ceph: fscache: add an interface to synchronize object store limit · 020c4bdd

由 Yunchuan Wen 提交于 12月 26, 2013

Add an interface to explicitly synchronize object->store_limit[_l]
with inode->i_size
Tested-by: NMilosz Tanski <milosz@adfin.com>
Signed-off-by: NYunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: NMin Chen <minchen@ubuntukylin.com>
Signed-off-by: NLi Wang <liwang@ubuntukylin.com>

020c4bdd

ceph: do not set r_old_dentry_dir on link() · 4b58c9b1

由 Sage Weil 提交于 2月 05, 2013

This is racy--we do not know whather d_parent has changed out from
underneath us because i_mutex is not held on the source inode's directory.

Also, taking this reference is useless.
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>

4b58c9b1

ceph: do not assume r_old_dentry[_dir] always set together · 844d87c3

由 Sage Weil 提交于 2月 05, 2013

Do not assume that r_old_dentry implies that r_old_dentry_dir is also
true.  Separate out the ref cleanup and make the debugs dump behave when
it is NULL.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>

844d87c3

ceph: do not chain inode updates to parent fsync · 752c8bdc

由 Sage Weil 提交于 2月 05, 2013

The fsync(dirfd) only covers namespace operations, not inode updates.
We do not need to cover setattr variants or O_TRUNC.
Reported-by: NAl Viro <viro@xeniv.linux.org.uk>
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>

752c8bdc

ceph: avoid useless ceph_get_dentry_parent_inode() in ceph_rename() · 180061a5

由 Sage Weil 提交于 2月 05, 2013

This is just old_dir; no reason to abuse the dcache pointers.

Reported-by: Al Viro <viro.zeniv.linux.org.uk>
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>

180061a5

ceph: let MDS adjust readdir 'frag' · 15289dc8

由 Yan, Zheng 提交于 3月 03, 2014

If readdir 'frag' is adjusted, readdir 'offset' should be reset.
Otherwise some dentries may be lost when readdir and fragmenting
directory happen at the some.

Another way to fix this issue is let MDS adjust readdir 'frag'.
The code that handles MDS reply reset the readdir 'offset' if
the readdir reply is different than the requested one.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

15289dc8

ceph: fix reset_readdir() · dcd3cc05

由 Yan, Zheng 提交于 2月 28, 2014

When changing readdir postion, fi->next_offset should be set to 0
if the new postion is not in the first dirfrag.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

dcd3cc05

ceph: fix ceph_dir_llseek() · f0494206

由 Yan, Zheng 提交于 2月 27, 2014

Comparing offset with inode->i_sb->s_maxbytes doesn't make sense for
directory. For a fragmented directory, offset (frag_t, off) can be
larger than inode->i_sb->s_maxbytes.

At the very beginning of ceph_dir_llseek(), local variable old_offset
is initialized to parameter offset. This doesn't make sense neither.
Old_offset should be ceph_make_fpos(fi->frag, fi->next_offset).
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

f0494206

31 3月, 2014 5 次提交

ext4: atomically set inode->i_flags in ext4_set_inode_flags() · 00a1a053

由 Theodore Ts'o 提交于 3月 30, 2014

Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.
Reported-by: NJohn Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00a1a053

switch mnt_hash to hlist · 38129a13

由 Al Viro 提交于 3月 20, 2014

fixes RCU bug - walking through hlist is safe in face of element moves,
since it's self-terminating.  Cyclic lists are not - if we end up jumping
to another hash chain, we'll loop infinitely without ever hitting the
original list head.

[fix for dumb braino folded]

Spotted by: Max Kellermann <mk@cm4all.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

38129a13

don't bother with propagate_mnt() unless the target is shared · 0b1b901b

由 Al Viro 提交于 3月 21, 2014

If the dest_mnt is not shared, propagate_mnt() does nothing -
there's no mounts to propagate to and thus no copies to create.
Might as well don't bother calling it in that case.

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0b1b901b

keep shadowed vfsmounts together · 1d6a32ac

由 Al Viro 提交于 3月 20, 2014

preparation to switching mnt_hash to hlist

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1d6a32ac

resizable namespace.c hashes · 0818bf27

由 Al Viro 提交于 2月 28, 2014

* switch allocation to alloc_large_system_hash()
* make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
* switch mountpoint_hashtable from list_head to hlist_head

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0818bf27

29 3月, 2014 1 次提交

ocfs2: check if cluster name exists before deref · d9060742

由 Sasha Levin 提交于 3月 28, 2014

Commit c74a3bdd ("ocfs2: add clustername to cluster connection") is
trying to strlcpy a string which was explicitly passed as NULL in the
very same patch, triggering a NULL ptr deref.

  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: strlcpy (lib/string.c:388 lib/string.c:151)
  CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G        W     3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274
  RIP:  strlcpy (lib/string.c:388 lib/string.c:151)
  Call Trace:
   ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350)
   ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396)
   user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679)
   dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503)
   vfs_mkdir (fs/namei.c:3467)
   SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472)
   tracesys (arch/x86/kernel/entry_64.S:749)

akpm: this patch probably disables the feature.  A temporary thing to
avoid triviel oopses.
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d9060742

28 3月, 2014 1 次提交

vfs: Allocate anon_inode_inode in anon_inode_init() · 75c5a52d

由 Jan Kara 提交于 3月 26, 2014

Currently we allocated anon_inode_inode in anon_inodefs_mount. This is
somewhat fragile as if that function ever gets called again, it will
overwrite anon_inode_inode pointer. So move the initialization of
anon_inode_inode to anon_inode_init().
Signed-off-by: NJan Kara <jack@suse.cz>
[ Further simplified on suggestion from Dave Jones ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

75c5a52d

26 3月, 2014 2 次提交

fs: remove now stale label in anon_inode_init() · fce7fc79

由 Linus Torvalds 提交于 3月 25, 2014

The previous commit removed the register_filesystem() call and the
associated error handling, but left the label for the error path that no
longer exists.  Remove that too.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fce7fc79

fs: Avoid userspace mounting anon_inodefs filesystem · d6f2589a

由 Jan Kara 提交于 3月 25, 2014

anon_inodefs filesystem is a kernel internal filesystem userspace
shouldn't mess with. Remove registration of it so userspace cannot
even try to mount it (which would fail anyway because the filesystem is
MS_NOUSER).

This fixes an oops triggered by trinity when it tried mounting
anon_inodefs which overwrote anon_inode_inode pointer while other CPU
has been in anon_inode_getfile() between ihold() and d_instantiate().
Thus effectively creating dentry pointing to an inode without holding a
reference to it.
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d6f2589a

23 3月, 2014 4 次提交

rcuwalk: recheck mount_lock after mountpoint crossing attempts · b37199e6

由 Al Viro 提交于 3月 20, 2014

We can get false negative from __lookup_mnt() if an unrelated vfsmount
gets moved.  In that case legitimize_mnt() is guaranteed to fail,
and we will fall back to non-RCU walk... unless we end up running
into a hard error on a filesystem object we wouldn't have reached
if not for that false negative.  IOW, delaying that check until
the end of pathname resolution is wrong - we should recheck right
after we attempt to cross the mountpoint.  We don't need to recheck
unless we see d_mountpoint() being true - in that case even if
we have just raced with mount/umount, we can simply go on as if
we'd come at the moment when the sucker wasn't a mountpoint; if we
run into a hard error as the result, it was a legitimate outcome.
__lookup_mnt() returning NULL is different in that respect, since
it might've happened due to operation on completely unrelated
mountpoint.

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b37199e6

make prepend_name() work correctly when called with negative *buflen · e825196d

由 Al Viro 提交于 3月 23, 2014

In all callchains leading to prepend_name(), the value left in *buflen
is eventually discarded unused if prepend_name() has returned a negative.
So we are free to do what prepend() does, and subtract from *buflen
*before* checking for underflow (which turns into checking the sign
of subtraction result, of course).

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e825196d

vfs: Don't let __fdget_pos() get FMODE_PATH files · 99aea681

由 Eric Biggers 提交于 3月 16, 2014

Commit bd2a31d5 ("get rid of fget_light()") introduced the
__fdget_pos() function, which returns the resulting file pointer and
fdput flags combined in an 'unsigned long'.  However, it also changed the
behavior to return files with FMODE_PATH set, which shouldn't happen
because read(), write(), lseek(), etc. aren't allowed on such files.
This commit restores the old behavior.

This regression actually had no effect on read() and write() since
FMODE_READ and FMODE_WRITE are not set on file descriptors opened with
O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH
to fail with ESPIPE rather than EBADF.
Signed-off-by: NEric Biggers <ebiggers3@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

99aea681

vfs: atomic f_pos access in llseek() · d7a15f8d

由 Eric Biggers 提交于 3月 16, 2014

Commit 9c225f26 ("vfs: atomic f_pos accesses as per POSIX") changed
several system calls to use fdget_pos() instead of fdget(), but missed
sys_llseek().  Fix it.
Signed-off-by: NEric Biggers <ebiggers3@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d7a15f8d