提交 · 99128addc964d4429d1bb9be5fa9e03ce85b1e68 · openanolis / cloud-kernel

04 3月, 2014 2 次提交

ext3: Update PF_MEMALLOC handling in ext3_write_inode() · 99128add

由 Jan Kara 提交于 2月 28, 2014

The special handling of PF_MEMALLOC callers in ext3_write_inode()
shouldn't be necessary as there shouldn't be any. Warn about it. Also
update comment before the function as it seems somewhat outdated.
Signed-off-by: NJan Kara <jack@suse.cz>

99128add

ext2/3: use prandom_u32() instead of get_random_bytes() · e878167a

由 ZhangZhen 提交于 2月 26, 2014

Many of the uses of get_random_bytes() do not actually need
cryptographically secure random numbers.  Replace those uses with a
call to prandom_u32(), which is faster and which doesn't consume
entropy from the /dev/random driver.

The commit dd1f723b has made that for
ext4, and i did the same for ext2/3.
Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: NJan Kara <jack@suse.cz>

e878167a

03 3月, 2014 9 次提交

ext3: remove an unneeded check in ext3_new_blocks() · 4ddb987a

由 Dan Carpenter 提交于 2月 25, 2014

We know "fatal" is zero here.  The code can be simplified a bit by
assigning directly.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>

4ddb987a

ext3: remove unneeded check in ext3_ordered_writepage() · f8cb556f

由 Dan Carpenter 提交于 2月 21, 2014

We already know "ret" is zero so there is no need to do:

		if (!ret)
			ret = err;

We can just assign ret directly instead.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>

f8cb556f

fs: Mark function as static in ext3/xattr_security.c · 7d6c2113

由 Rashika Kheria 提交于 2月 09, 2014

Mark function as static in ext3/xattr_security.c because it is not used
outside this file.

This eliminates the following warning in ext3/xattr_security.c:
fs/ext3/xattr_security.c:46:5: warning: no previous prototype for ‘ext3_initxattrs’ [-Wmissing-prototypes]
Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NJan Kara <jack@suse.cz>

7d6c2113

fs: Mark function as static in ext3/dir.c · 8ccb154c

由 Rashika Kheria 提交于 2月 09, 2014

Mark function as static in ext3/dir.c because it is not used outside
this file.

This also eliminates the following warning in ext3/dir.c:
fs/ext3/dir.c:278:8: warning: no previous prototype for ‘ext3_dir_llseek’ [-Wmissing-prototypes]
Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NJan Kara <jack@suse.cz>

8ccb154c

fs: Mark function as static in ext2/xattr_security.c · 17cd48e4

由 Rashika Kheria 提交于 2月 09, 2014

Mark function as static in ext2/xattr_security.c because it is not
used outside this file.

This also elimiantes the following warning in ext2/xattr_security.c:
fs/ext2/xattr_security.c:45:5: warning: no previous prototype for ‘ext2_initxattrs’ [-Wmissing-prototypes]
Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NJan Kara <jack@suse.cz>

17cd48e4

ext3: Add __init macro to init_inodecache · 1da8b822

由 Fabian Frederick 提交于 2月 01, 2014

init_inodecache is only called by __init init_ext3_fs.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJan Kara <jack@suse.cz>

1da8b822

ext2: Add __init macro to init_inodecache · 0903353a

由 Fabian Frederick 提交于 2月 01, 2014

init_inodecache is only called by __init init_ext2_fs.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJan Kara <jack@suse.cz>

0903353a

udf: Add __init macro to init_inodecache · 53ea18de

由 Fabian Frederick 提交于 2月 01, 2014

init_inodecache is only called by __init init_udf_fs.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJan Kara <jack@suse.cz>

53ea18de

fs: udf: parse_options: blocksize check · 8c6915ae

由 Fabian Frederick 提交于 1月 29, 2014

Both affs and isofs check for blocksize integrity during
parse_options.Do the same thing for udf.

Valid values : 512, 1024, 2048 or 4096 bytes.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJan Kara <jack@suse.cz>

8c6915ae

25 2月, 2014 4 次提交

sysfs: fix namespace refcnt leak · fed95bab

由 Li Zefan 提交于 2月 25, 2014

As mount() and kill_sb() is not a one-to-one match, we shoudn't get
ns refcnt unconditionally in sysfs_mount(), and instead we should
get the refcnt only when kernfs_mount() allocated a new superblock.

v2:
- Changed the name of the new argument, suggested by Tejun.
- Made the argument optional, suggested by Tejun.

v3:
- Make the new argument as second-to-last arg, suggested by Tejun.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NTejun Heo <tj@kernel.org>
 ---
 fs/kernfs/mount.c      | 8 +++++++-
 fs/sysfs/mount.c       | 5 +++--
 include/linux/kernfs.h | 9 +++++----
 3 files changed, 15 insertions(+), 7 deletions(-)
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

fed95bab

fsnotify: Allocate overflow events with proper type · ff57cd58

由 Jan Kara 提交于 2月 21, 2014

Commit 7053aee2 "fsnotify: do not share events between notification
groups" used overflow event statically allocated in a group with the
size of the generic notification event. This causes problems because
some code looks at type specific parts of event structure and gets
confused by a random data it sees there and causes crashes.

Fix the problem by allocating overflow event with type corresponding to
the group type so code cannot get confused.
Signed-off-by: NJan Kara <jack@suse.cz>

ff57cd58

fanotify: Handle overflow in case of permission events · 482ef06c

由 Jan Kara 提交于 2月 21, 2014

If the event queue overflows when we are handling permission event, we
will never get response from userspace. So we must avoid waiting for it.
Change fsnotify_add_notify_event() to return whether overflow has
happened so that we can detect it in fanotify_handle_event() and act
accordingly.
Signed-off-by: NJan Kara <jack@suse.cz>

482ef06c

fsnotify: Fix detection whether overflow event is queued · 2513190a

由 Jan Kara 提交于 2月 21, 2014

Currently we didn't initialize event's list head when we removed it from
the event list. Thus a detection whether overflow event is already
queued wasn't working. Fix it by always initializing the list head when
deleting event from a list.
Signed-off-by: NJan Kara <jack@suse.cz>

2513190a

22 2月, 2014 1 次提交

Revert "writeback: do not sync data dirtied after sync start" · 0dc83bd3

由 Jan Kara 提交于 2月 21, 2014

This reverts commit c4a391b5. Dave
Chinner <david@fromorbit.com> has reported the commit may cause some
inodes to be left out from sync(2). This is because we can call
redirty_tail() for some inode (which sets i_dirtied_when to current time)
after sync(2) has started or similarly requeue_inode() can set
i_dirtied_when to current time if writeback had to skip some pages. The
real problem is in the functions clobbering i_dirtied_when but fixing
that isn't trivial so revert is a safer choice for now.

CC: stable@vger.kernel.org # >= 3.13
Signed-off-by: NJan Kara <jack@suse.cz>

0dc83bd3

21 2月, 2014 2 次提交

quota: Fix race between dqput() and dquot_scan_active() · 1362f4ea

由 Jan Kara 提交于 2月 20, 2014

Currently last dqput() can race with dquot_scan_active() causing it to
call callback for an already deactivated dquot. The race is as follows:

CPU1					CPU2
  dqput()
    spin_lock(&dq_list_lock);
    if (atomic_read(&dquot->dq_count) > 1) {
     - not taken
    if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
      spin_unlock(&dq_list_lock);
      ->release_dquot(dquot);
        if (atomic_read(&dquot->dq_count) > 1)
         - not taken
					  dquot_scan_active()
					    spin_lock(&dq_list_lock);
					    if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
					     - not taken
					    atomic_inc(&dquot->dq_count);
					    spin_unlock(&dq_list_lock);
        - proceeds to release dquot
					    ret = fn(dquot, priv);
					     - called for inactive dquot

Fix the problem by making sure possible ->release_dquot() is finished by
the time we call the callback and new calls to it will notice reference
dquot_scan_active() has taken and bail out.

CC: stable@vger.kernel.org # >= 2.6.29
Signed-off-by: NJan Kara <jack@suse.cz>

1362f4ea

udf: Fix data corruption on file type conversion · 09ebb17a

由 Jan Kara 提交于 2月 18, 2014

UDF has two types of files - files with data stored in inode (ICB in
UDF terminology) and files with data stored in external data blocks. We
convert file from in-inode format to external format in
udf_file_aio_write() when we find out data won't fit into inode any
longer. However the following race between two O_APPEND writes can happen:

CPU1					CPU2
udf_file_aio_write()			udf_file_aio_write()
  down_write(&iinfo->i_data_sem);
  checks that i_size + count1 fits within inode
    => no need to convert
  up_write(&iinfo->i_data_sem);
					  down_write(&iinfo->i_data_sem);
					  checks that i_size + count2 fits
					    within inode => no need to convert
					  up_write(&iinfo->i_data_sem);
  generic_file_aio_write()
    - extends file by count1 bytes
					  generic_file_aio_write()
					    - extends file by count2 bytes

Clearly if count1 + count2 doesn't fit into the inode, we overwrite
kernel buffers beyond inode, possibly corrupting the filesystem as well.

Fix the problem by acquiring i_mutex before checking whether write fits
into the inode and using __generic_file_aio_write() afterwards which
puts check and write into one critical section.
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NJan Kara <jack@suse.cz>

09ebb17a

19 2月, 2014 4 次提交

NFS fix error return in nfs4_select_rw_stateid · 146d70ca

由 Andy Adamson 提交于 2月 18, 2014

Do not return an error when nfs4_copy_delegation_stateid succeeds.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1392737765-41942-1-git-send-email-andros@netapp.com
Fixes: ef1820f9 (NFSv4: Don't try to recover NFSv4 locks when...)
Cc: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

146d70ca

xfs: limit superblock corruption errors to actual corruption · 5ef11eb0

由 Eric Sandeen 提交于 2月 19, 2014

Today, if

xfs_sb_read_verify
  xfs_sb_verify
    xfs_mount_validate_sb

detects superblock corruption, it'll be extremely noisy, dumping
2 stacks, 2 hexdumps, etc.

This is because we call XFS_CORRUPTION_ERROR in xfs_mount_validate_sb
as well as in xfs_sb_read_verify.

Also, *any* errors in xfs_mount_validate_sb which are not corruption
per se; things like too-big-blocksize, bad version, bad magic, v1 dirs,
rw-incompat etc - things which do not return EFSCORRUPTED - will
still do the whole XFS_CORRUPTION_ERROR spew when xfs_sb_read_verify
sees any error at all.  And it suggests to the user that they
should run xfs_repair, even if the root cause of the mount failure
is a simple incompatibility.

I'll submit that the probably-not-corrupted errors don't warrant
this much noise, so this patch removes the warning for anything
other than EFSCORRUPTED returns, and replaces the lower-level
XFS_CORRUPTION_ERROR with an xfs_notice().
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

5ef11eb0

xfs: skip verification on initial "guess" superblock read · daba5427

由 Eric Sandeen 提交于 2月 19, 2014

When xfs_readsb() does the very first read of the superblock,
it makes a guess at the length of the buffer, based on the
sector size of the underlying storage.  This may or may
not match the filesystem sector size in sb_sectsize, so
we can't i.e. do a CRC check on it; it might be too short.

In fact, mounting a filesystem with sb_sectsize larger
than the device sector size will cause a mount failure
if CRCs are enabled, because we are checksumming a length
which exceeds the buffer passed to it.

So always read twice; the first time we read with NULL
buffer ops to skip verification; then set the proper
read length, hook up the proper verifier, and give it
another go.

Once we are sure that we've got the right buffer length,
we can also use bp->b_length in the xfs_sb_read_verify,
rather than the less-trusted on-disk sectorsize for
secondary superblocks.  Before this we ran the risk of
passing junk to the crc32c routines, which didn't always
handle extreme values.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

daba5427

xfs: xfs_sb_read_verify() doesn't flag bad crcs on primary sb · 7a01e707

由 Eric Sandeen 提交于 2月 19, 2014

My earlier commit 10e6e65d deserves a layer or two of brown paper
bags.  The logic in that commit means that a CRC failure on the
primary superblock will *never* result in an error return.

Hopefully this fixes it, so that we always return the error
if it's a primary superblock, otherwise only if the filesystem
has CRCs enabled.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

7a01e707

18 2月, 2014 13 次提交

inotify: Fix reporting of cookies for inotify events · 45a22f4c

由 Jan Kara 提交于 2月 17, 2014

My rework of handling of notification events (namely commit 7053aee2
"fsnotify: do not share events between notification groups") broke
sending of cookies with inotify events. We didn't propagate the value
passed to fsnotify() properly and passed 4 uninitialized bytes to
userspace instead (so it is also an information leak). Sadly I didn't
notice this during my testing because inotify cookies aren't used very
much and LTP inotify tests ignore them.

Fix the problem by passing the cookie value properly.

Fixes: 7053aee2Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>

45a22f4c

jbd2: fix use after free in jbd2_journal_start_reserved() · 92e3b405

由 Dan Carpenter 提交于 2月 17, 2014

If start_this_handle() fails then it leads to a use after free of
"handle".
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

92e3b405

FS-Cache: Handle removal of unadded object to the fscache_object_list rb tree · 7026f192

由 David Howells 提交于 2月 17, 2014

When FS-Cache allocates an object, the following sequence of events can
occur:

 -->fscache_alloc_object()
    -->cachefiles_alloc_object() [via cache->ops->alloc_object]
    <--[returns new object]
    -->fscache_attach_object()
    <--[failed]
    -->cachefiles_put_object() [via cache->ops->put_object]
       -->fscache_object_destroy()
          -->fscache_objlist_remove()
             -->rb_erase() to remove the object from fscache_object_list.

resulting in a crash in the rbtree code.

The problem is that the object is only added to fscache_object_list on
the success path of fscache_attach_object() where it calls
fscache_objlist_add().

So if fscache_attach_object() fails, the object won't have been added to
the objlist rbtree.  We do, however, unconditionally try to remove the
object from the tree.

Thanks to NeilBrown for finding this and suggesting this solution.
Reported-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: N(a customer of) NeilBrown <neilb@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7026f192

reiserfs: fix utterly brain-damaged indentation. · 416e2abd

由 Dave Jones 提交于 2月 17, 2014

This has been this way for years, and every time I stumble across it I
lose my lunch.  After coming across it for the nth time in the Coverity
results, I had to overcome the bystander effect and do something about
it.

This ignores the 79 column limit in favor of making it look like C
instead of gibberish.

The correct thing to do here would be to lose some of the indentation by
breaking this function up into several smaller ones.  I might do that at
some point if I have the stomach to look at this again.

(Also some of those overlong ternary operations would likely be more
readable as regular if's)
Signed-off-by: NDave Jones <davej@fedoraproject.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

416e2abd

ceph: fix __dcache_readdir() · 4d5f5df6

由 Yan, Zheng 提交于 2月 13, 2014

If directory is fragmented, readdir() read its dirfrags one by one.
After reading all dirfrags, the corresponding dentries are sorted in
(frag_t, off) order in the dcache. If dentries of a directory are all
cached, __dcache_readdir() can use the cached dentries to satisfy
readdir syscall. But when checking if a given dentry is after the
position of readdir, __dcache_readdir() compares numerical value of
frag_t directly. This is wrong, it should use ceph_frag_compare().
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

4d5f5df6

ceph: add acl, noacl options for cephfs mount · 45195e42

由 Sage Weil 提交于 2月 16, 2014

Make the 'acl' option dependent on having ACL support compiled in.  Make
the 'noacl' option work even without it so that one can always ask it to
be off and not error out on mount when it is not supported.
Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com>
Signed-off-by: NSage Weil <sage@inktank.com>

45195e42

ceph: make ceph_forget_all_cached_acls() static inline · c969d9bf

由 Guangliang Zhao 提交于 2月 16, 2014

Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NSage Weil <sage@inktank.com>

c969d9bf

Y
ceph: add missing init_acl() for mkdir() and atomic_open() · b20a95a0
由 Yan, Zheng 提交于 2月 11, 2014
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
b20a95a0

ceph: fix ceph_set_acl() · 7a92d647

由 Yan, Zheng 提交于 2月 11, 2014

If acl is equivalent to file mode permission bits, ceph_set_acl()
needs to remove any existing acl xattr. Use __ceph_setxattr() to
handle both setting and removing acl xattr cases, it doesn't return
-ENODATA when there is no acl xattr.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

7a92d647

Y
ceph: fix ceph_removexattr() · 524186ac
由 Yan, Zheng 提交于 2月 11, 2014
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
524186ac

ceph: remove xattr when null value is given to setxattr() · bcdfeb2e

由 Yan, Zheng 提交于 2月 11, 2014

For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE
to distinguish null value case from the zero-length value case.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

bcdfeb2e

ceph: properly handle XATTR_CREATE and XATTR_REPLACE · fbc0b970

由 Yan, Zheng 提交于 2月 11, 2014

return -EEXIST if XATTR_CREATE is set and xattr alread exists.
return -ENODATA if XATTR_REPLACE is set but xattr does not exist.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

fbc0b970

NFSv4: Use the correct net namespace in nfs4_update_server · 292f503c

由 Trond Myklebust 提交于 2月 16, 2014

We need to use the same net namespace that was used to resolve
the hostname and sockaddr arguments.

Fixes: 32e62b7c (NFS: Add nfs4_update_server)
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

292f503c

17 2月, 2014 1 次提交

ext4: don't leave i_crtime.tv_sec uninitialized · 19ea8060

由 Theodore Ts'o 提交于 2月 16, 2014

If the i_crtime field is not present in the inode, don't leave the
field uninitialized.

Fixes: ef7f3835 ("ext4: Add nanosecond timestamps")
Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Tested-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

19ea8060

16 2月, 2014 4 次提交

ext4: fix online resize with a non-standard blocks per group setting · 3d2660d0

由 Theodore Ts'o 提交于 2月 15, 2014

The set_flexbg_block_bitmap() function assumed that the number of
blocks in a blockgroup was sb->blocksize * 8, which is normally true,
but not always!  Use EXT4_BLOCKS_PER_GROUP(sb) instead, to fix block
bitmap corruption after:

mke2fs -t ext4 -g 3072 -i 4096 /dev/vdd 1G
mount -t ext4 /dev/vdd /vdd
resize2fs /dev/vdd 8G
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reported-by: NJon Bernard <jbernard@tuxion.com>
Cc: stable@vger.kernel.org

3d2660d0

ext4: fix online resize with very large inode tables · b93c9535

由 Theodore Ts'o 提交于 2月 15, 2014

If a file system has a large number of inodes per block group, all of
the metadata blocks in a flex_bg may be larger than what can fit in a
single block group.  Unfortunately, ext4_alloc_group_tables() in
resize.c was never tested to see if it would handle this case
correctly, and there were a large number of bugs which caused the
following sequence to result in a BUG_ON:

kernel bug at fs/ext4/resize.c:409!
   ...
call trace:
 [<ffffffff81256768>] ext4_flex_group_add+0x1448/0x1830
 [<ffffffff81257de2>] ext4_resize_fs+0x7b2/0xe80
 [<ffffffff8123ac50>] ext4_ioctl+0xbf0/0xf00
 [<ffffffff811c111d>] do_vfs_ioctl+0x2dd/0x4b0
 [<ffffffff811b9df2>] ? final_putname+0x22/0x50
 [<ffffffff811c1371>] sys_ioctl+0x81/0xa0
 [<ffffffff81676aa9>] system_call_fastpath+0x16/0x1b
code: c8 4c 89 df e8 41 96 f8 ff 44 89 e8 49 01 c4 44 29 6d d4 0
rip  [<ffffffff81254fa1>] set_flexbg_block_bitmap+0x171/0x180


This can be reproduced with the following command sequence:

   mke2fs -t ext4 -i 4096 /dev/vdd 1G
   mount -t ext4 /dev/vdd /vdd
   resize2fs /dev/vdd 8G

To fix this, we need to make sure the right thing happens when a block
group's inode table straddles two block groups, which means the
following bugs had to be fixed:

1) Not clearing the BLOCK_UNINIT flag in the second block group in
   ext4_alloc_group_tables --- the was proximate cause of the BUG_ON.

2) Incorrectly determining how many block groups contained contiguous
   free blocks in ext4_alloc_group_tables().

3) Incorrectly setting the start of the next block range to be marked
   in use after a discontinuity in setup_new_flex_group_blocks().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

b93c9535

Btrfs: use right clone root offset for compressed extents · 93de4ba8

由 Filipe David Borba Manana 提交于 2月 15, 2014

For non compressed extents, iterate_extent_inodes() gives us offsets
that take into account the data offset from the file extent items, while
for compressed extents it doesn't. Therefore we have to adjust them before
placing them in a send clone instruction. Not doing this adjustment leads to
the receiving end requesting for a wrong a file range to the clone ioctl,
which results in different file content from the one in the original send
root.

Issue reproducible with the following excerpt from the test I made for
xfstests:

  _scratch_mkfs
  _scratch_mount "-o compress-force=lzo"

  $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo

  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1

  $XFS_IO_PROG -c "pwrite -S 0x3e -b 80000 200000 80000" $SCRATCH_MNT/foo
  $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
  $XFS_IO_PROG -c "pwrite -S 0xdc -b 10000 250000 10000" $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "pwrite -S 0xff -b 10000 300000 10000" $SCRATCH_MNT/foo

  # will be used for incremental send to be able to issue clone operations
  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap

  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2

  $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
  $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
      -x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2
  $FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \
      -x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2

  $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
  $BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap
  $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \
      -c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap

  _scratch_unmount
  _scratch_mkfs
  _scratch_mount

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
  $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap
  $FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
  $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

93de4ba8

btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105 · f085381e

由 Anand Jain 提交于 1月 15, 2014

bdev is null when disk has disappeared and mounted with
the degrade option

stack trace
---------
btrfs_sysfs_add_one+0x105/0x1c0 [btrfs]
open_ctree+0x15f3/0x1fe0 [btrfs]
btrfs_mount+0x5db/0x790 [btrfs]
? alloc_pages_current+0xa4/0x160
mount_fs+0x34/0x1b0
vfs_kern_mount+0x62/0xf0
do_mount+0x22e/0xa80
? __get_free_pages+0x9/0x40
? copy_mount_options+0x31/0x170
SyS_mount+0x7e/0xc0
system_call_fastpath+0x16/0x1b
---------

reproducer:
-------
mkfs.btrfs -draid1 -mraid1 /dev/sdc /dev/sdd
(detach a disk)
devmgt detach /dev/sdc [1]
mount -o degrade /dev/sdd /btrfs
-------

[1] github.com/anajain/devmgt.git
Signed-off-by: NAnand Jain <Anand.Jain@oracle.com>
Tested-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

f085381e

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功