提交 · df3c1e9a05ff25aca9f54a6c08b77003e2e32bf1 · openeuler / Kernel

09 3月, 2014 1 次提交

jbd2: don't unplug after writing revoke records · df3c1e9a

由 Theodore Ts'o 提交于 3月 08, 2014

During commit process, keep the block device plugged after we are done
writing the revoke records, until we are finished writing the rest of
the commit records in the journal. This will allow most of the
journal blocks to be written in a single I/O operation, instead of
separating the the revoke blocks from the rest of the journal blocks.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

df3c1e9a

04 3月, 2014 1 次提交

ext4: Speedup WB_SYNC_ALL pass called from sync(2) · 10542c22

由 Jan Kara 提交于 3月 04, 2014

When doing filesystem wide sync, there's no need to force transaction
commit (or synchronously write inode buffer) separately for each inode
because ext4_sync_fs() takes care of forcing commit at the end (VFS
takes care of flushing buffer cache, respectively). Most of the time
this slowness doesn't manifest because previous WB_SYNC_NONE writeback
doesn't leave much to write but when there are processes aggressively
creating new files and several filesystems to sync, the sync slowness
can be noticeable. In the following test script sync(1) takes around 6
minutes when there are two ext4 filesystems mounted on a standard SATA
drive. After this patch sync takes a couple of seconds so we have about
two orders of magnitude improvement.

      function run_writers
      {
        for (( i = 0; i < 10; i++ )); do
          mkdir $1/dir$i
          for (( j = 0; j < 40000; j++ )); do
            dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
          done &
        done
      }

      for dir in "$@"; do
        run_writers $dir
      done

      sleep 40
      time sync
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

10542c22

24 2月, 2014 1 次提交

ext4: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate · 9eb79482

由 Namjae Jeon 提交于 2月 23, 2014

This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4.
 
The semantics of this flag are following:
1) It collapses the range lying between offset and length by removing any data
   blocks which are present in this range and than updates all the logical
   offsets of extents beyond "offset + len" to nullify the hole created by
   removing blocks. In short, it does not leave a hole.
2) It should be used exclusively. No other fallocate flag in combination.
3) Offset and length supplied to fallocate should be fs block size aligned
   in case of xfs and ext4.
4) Collaspe range does not work beyond i_size.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
Tested-by: NDongsu Park <dongsu.park@profitbricks.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9eb79482

22 2月, 2014 1 次提交

ext4: translate fallocate mode bits to strings · a633f5a3

由 Lukas Czerner 提交于 2月 22, 2014

Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a633f5a3

21 2月, 2014 5 次提交

ext4: merge uninitialized extents · a9b82415

由 Darrick J. Wong 提交于 2月 20, 2014

Allow for merging uninitialized extents.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a9b82415

ext4: avoid exposure of stale data in ext4_punch_hole() · e251f9bc

由 Maxim Patlasov 提交于 2月 20, 2014

While handling punch-hole fallocate, it's useless to truncate page cache
before removing the range from extent tree (or block map in indirect case)
because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
read) immediately after truncating page cache, but before updating extent
tree (or block map). In that case the user will see stale data even after
fallocate is completed.

Until the problem of data corruption resulting from pages backed by
already freed blocks is fully resolved, the simple thing we can do now
is to add another truncation of pagecache after punch hole is done.
Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

e251f9bc

ext4: silence warnings in extent status tree debugging code · ce140cdd

由 Eric Whitney 提交于 2月 20, 2014

Adjust the conversion specifications in a few optionally compiled debug
messages to match the return type of ext4_es_status().  Also, make a
couple of minor grammatical message edits while we're at it.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ce140cdd

ext4: remove unused ac_ex_scanned · dc9ddd98

由 Eric Sandeen 提交于 2月 20, 2014

When looking at a bug report with:

> kernel: EXT4-fs: 0 scanned, 0 found

I thought wow, 0 scanned, that's odd?  But it's not odd; it's printing
a variable that is initialized to 0 and never touched again.

It's never been used since the original merge, so I don't really even
know what the original intent was, either.

If anyone knows how to hook it up, speak now via patch, otherwise just
yank it so it's not making a confusing situation more confusing in
kernel logs.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

dc9ddd98

ext4: avoid possible overflow in ext4_map_blocks() · e861b5e9

由 Theodore Ts'o 提交于 2月 20, 2014

The ext4_map_blocks() function returns the number of blocks which
satisfying the caller's request. This number of blocks requested by
the caller is specified by an unsigned integer, but the return value
of ext4_map_blocks() is a signed integer (to accomodate error codes
per the kernel's standard error signalling convention).

Historically, overflows could never happen since mballoc() will refuse
to allocate more than 2048 blocks at a time (which is something we
should fix), and if the blocks were already allocated, the fact that
there would be some number of intervening metadata blocks pretty much
guaranteed that there could never be a contiguous region of data
blocks that was greater than 2**31 blocks.

However, this is now possible if there is a file system which is a bit
bigger than 8TB, and is created using the new mke2fs hugeblock
feature, which can create a perfectly contiguous file. In that case,
if a userspace program attempted to call fallocate() on this already
fully allocated file, it's possible that ext4_map_blocks() could
return a number large enough that it would overflow a signed integer,
resulting in a ext4 thinking that the ext4_map_blocks() call had
failed with some strange error code.

Since ext4_map_blocks() is always free to return a smaller number of
blocks than what was requested by the caller, fix this by capping the
number of blocks that ext4_map_blocks() will ever try to map to 2**31
- 1. In practice this should never get hit, except by someone
deliberately trying to provke the above-described bug.

Thanks to the PaX team for asking whethre this could possibly happen
in some off-line discussions about using some static code checking
technology they are developing to find bugs in kernel code.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e861b5e9

20 2月, 2014 4 次提交

ext4: make sure ex.fe_logical is initialized · ab0c00fc

由 Theodore Ts'o 提交于 2月 20, 2014

The lowest levels of mballoc set all of the fields of struct
ext4_free_extent except for fe_logical, since they are just trying to
find the requested free set of blocks, and the logical block hasn't
been set yet.  This makes some static code checkers sad.  Set it to
various different debug values, which would be useful when
debugging mballoc if these values were to ever show up due to the
parts of mballoc triyng to use ac->ac_b_ex.fe_logical before it is
properly upper layers of mballoc failing to properly set, usually by
ext4_mb_use_best_found().

Addresses-Coverity-Id: #139697
Addresses-Coverity-Id: #139698
Addresses-Coverity-Id: #139699
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ab0c00fc

ext4: don't calculate total xattr header size unless needed · 7b1b2c1b

由 Theodore Ts'o 提交于 2月 19, 2014

The function ext4_expand_extra_isize_ea() doesn't need the size of all
of the extended attribute headers.  So if we don't calculate it when
it is unneeded, it we can skip some undeeded memory references, and as
a bonus, we eliminate some kvetching by static code analysis tools.

Addresses-Coverity-Id: #741291
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7b1b2c1b

ext4: add ext4_es_store_pblock_status() · 9a6633b1

由 Theodore Ts'o 提交于 2月 19, 2014

Avoid false positives by static code analysis tools such as sparse and
coverity caused by the fact that we set the physical block, and then
the status in the extent_status structure.  It is also more efficient
to set both of these values at once.

Addresses-Coverity-Id: #989077
Addresses-Coverity-Id: #989078
Addresses-Coverity-Id: #1080722
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>

9a6633b1

ext4: fix error return from ext4_ext_handle_uninitialized_extents() · ce37c429

由 Eric Whitney 提交于 2月 19, 2014

Commit 37794732 breaks the return of error codes from
ext4_ext_handle_uninitialized_extents() in ext4_ext_map_blocks().  A
portion of the patch assigns that function's signed integer return
value to an unsigned int.  Consequently, negatively valued error codes
are lost and can be treated as a bogus allocated block count.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

ce37c429

18 2月, 2014 6 次提交

ext4: address a benign compiler warning · 024949ec

由 Patrick Palka 提交于 2月 17, 2014

When !defined(CONFIG_EXT4_DEBUG), mb_debug() should be defined as a
no_printk() statement instead of an empty statement in order to suppress
the following compiler warning:

fs/ext4/mballoc.c: In function ‘ext4_mb_cleanup_pa’:
fs/ext4/mballoc.c:2659:47: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
   mb_debug(1, "mballoc: %u PAs left\n", count);
Signed-off-by: NPatrick Palka <patrick@parcs.ath.cx>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

024949ec

jbd2: mark file-local functions as static · 7747e6d0

由 Rashika Kheria 提交于 2月 17, 2014

Mark functions as static in jbd2/journal.c because they are not used
outside this file.

This eliminates the following warning in jbd2/journal.c:
fs/jbd2/journal.c:125:5: warning: no previous prototype for ‘jbd2_verify_csum_type’ [-Wmissing-prototypes]
fs/jbd2/journal.c:146:5: warning: no previous prototype for ‘jbd2_superblock_csum_verify’ [-Wmissing-prototypes]
fs/jbd2/journal.c:154:6: warning: no previous prototype for ‘jbd2_superblock_csum_set’ [-Wmissing-prototypes]
Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>

7747e6d0

ext4: remove an unneeded check in mext_page_mkuptodate() · df3a98b0

由 Dan Carpenter 提交于 2月 17, 2014

"err" is zero here, there is no need to check again.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

df3a98b0

ext4: clean up error handling in swap_inode_boot_loader() · d8558a29

由 Theodore Ts'o 提交于 2月 17, 2014

Tighten up the code to make the code easier to read and maintain.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d8558a29

ext4: Add __init marking to init_inodecache · e67bc2b3

由 Fabian Frederick 提交于 2月 17, 2014

init_inodecache is only called by __init init_ext4_fs.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e67bc2b3

jbd2: fix use after free in jbd2_journal_start_reserved() · 92e3b405

由 Dan Carpenter 提交于 2月 17, 2014

If start_this_handle() fails then it leads to a use after free of
"handle".
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

92e3b405

17 2月, 2014 1 次提交

ext4: don't leave i_crtime.tv_sec uninitialized · 19ea8060

由 Theodore Ts'o 提交于 2月 16, 2014

If the i_crtime field is not present in the inode, don't leave the
field uninitialized.

Fixes: ef7f3835 ("ext4: Add nanosecond timestamps")
Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Tested-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

19ea8060

16 2月, 2014 2 次提交

ext4: fix online resize with a non-standard blocks per group setting · 3d2660d0

由 Theodore Ts'o 提交于 2月 15, 2014

The set_flexbg_block_bitmap() function assumed that the number of
blocks in a blockgroup was sb->blocksize * 8, which is normally true,
but not always!  Use EXT4_BLOCKS_PER_GROUP(sb) instead, to fix block
bitmap corruption after:

mke2fs -t ext4 -g 3072 -i 4096 /dev/vdd 1G
mount -t ext4 /dev/vdd /vdd
resize2fs /dev/vdd 8G
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reported-by: NJon Bernard <jbernard@tuxion.com>
Cc: stable@vger.kernel.org

3d2660d0

ext4: fix online resize with very large inode tables · b93c9535

由 Theodore Ts'o 提交于 2月 15, 2014

If a file system has a large number of inodes per block group, all of
the metadata blocks in a flex_bg may be larger than what can fit in a
single block group.  Unfortunately, ext4_alloc_group_tables() in
resize.c was never tested to see if it would handle this case
correctly, and there were a large number of bugs which caused the
following sequence to result in a BUG_ON:

kernel bug at fs/ext4/resize.c:409!
   ...
call trace:
 [<ffffffff81256768>] ext4_flex_group_add+0x1448/0x1830
 [<ffffffff81257de2>] ext4_resize_fs+0x7b2/0xe80
 [<ffffffff8123ac50>] ext4_ioctl+0xbf0/0xf00
 [<ffffffff811c111d>] do_vfs_ioctl+0x2dd/0x4b0
 [<ffffffff811b9df2>] ? final_putname+0x22/0x50
 [<ffffffff811c1371>] sys_ioctl+0x81/0xa0
 [<ffffffff81676aa9>] system_call_fastpath+0x16/0x1b
code: c8 4c 89 df e8 41 96 f8 ff 44 89 e8 49 01 c4 44 29 6d d4 0
rip  [<ffffffff81254fa1>] set_flexbg_block_bitmap+0x171/0x180


This can be reproduced with the following command sequence:

   mke2fs -t ext4 -i 4096 /dev/vdd 1G
   mount -t ext4 /dev/vdd /vdd
   resize2fs /dev/vdd 8G

To fix this, we need to make sure the right thing happens when a block
group's inode table straddles two block groups, which means the
following bugs had to be fixed:

1) Not clearing the BLOCK_UNINIT flag in the second block group in
   ext4_alloc_group_tables --- the was proximate cause of the BUG_ON.

2) Incorrectly determining how many block groups contained contiguous
   free blocks in ext4_alloc_group_tables().

3) Incorrectly setting the start of the next block range to be marked
   in use after a discontinuity in setup_new_flex_group_blocks().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

b93c9535

13 2月, 2014 2 次提交

ext4: don't try to modify s_flags if the the file system is read-only · 23301410

由 Theodore Ts'o 提交于 2月 12, 2014

If an ext4 file system is created by some tool other than mke2fs
(perhaps by someone who has a pathalogical fear of the GPL) that
doesn't set one or the other of the EXT2_FLAGS_{UN}SIGNED_HASH flags,
and that file system is then mounted read-only, don't try to modify
the s_flags field.  Otherwise, if dm_verity is in use, the superblock
will change, causing an dm_verity failure.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

23301410

ext4: fix error paths in swap_inode_boot_loader() · 30d29b11

由 Zheng Liu 提交于 2月 12, 2014

In swap_inode_boot_loader() we forgot to release ->i_mutex and resume
unlocked dio for inode and inode_bl if there is an error starting the
journal handle.  This commit fixes this issue.
Reported-by: NAhmed Tamrawi <ahmedtamrawi@gmail.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dr. Tilmann Bubeck <t.bubeck@reinform.de>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org  # v3.10+

30d29b11

12 2月, 2014 1 次提交

ext4: fix xfstest generic/299 block validity failures · 15cc1767

由 Eric Whitney 提交于 2月 12, 2014

Commit a115f749 (ext4: remove wait for unwritten extent conversion from
ext4_truncate) exposed a bug in ext4_ext_handle_uninitialized_extents().
It can be triggered by xfstest generic/299 when run on a test file
system created without a journal. This test continuously fallocates and
truncates files to which random dio/aio writes are simultaneously
performed by a separate process. The test completes successfully, but
if the test filesystem is mounted with the block_validity option, a
warning message stating that a logical block has been mapped to an
illegal physical block is posted in the kernel log.

The bug occurs when an extent is being converted to the written state
by ext4_end_io_dio() and ext4_ext_handle_uninitialized_extents()
discovers a mapping for an existing uninitialized extent. Although it
sets EXT4_MAP_MAPPED in map->m_flags, it fails to set map->m_pblk to
the discovered physical block number. Because map->m_pblk is not
otherwise initialized or set by this function or its callers, its
uninitialized value is returned to ext4_map_blocks(), where it is
stored as a bogus mapping in the extent status tree.

Since map->m_pblk can accidentally contain illegal values that are
larger than the physical size of the file system, calls to
check_block_validity() in ext4_map_blocks() that are enabled if the
block_validity mount option is used can fail, resulting in the logged
warning message.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org # 3.11+

15cc1767

10 2月, 2014 8 次提交

L

Linux 3.14-rc2 · b28a960c
由 Linus Torvalds 提交于 2月 09, 2014

b28a960c

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · cd63204c

由 Linus Torvalds 提交于 2月 09, 2014

Pull SELinux fixes from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  SELinux:  Fix kernel BUG on empty security contexts.
  selinux: add SOCK_DIAG_BY_FAMILY to the list of netlink message types

cd63204c

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f94aa7c7

由 Linus Torvalds 提交于 2月 09, 2014

Pull vfs fixes from Al Viro:
 "A couple of fixes, both -stable fodder.  The O_SYNC bug is fairly
  old..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix a kmap leak in virtio_console
  fix O_SYNC|O_APPEND syncing the wrong range on write()

f94aa7c7

J

Merge branch 'stable-3.14' of git://git.infradead.org/users/pcmoore/selinux into for-linus · f743166d
由 James Morris 提交于 2月 10, 2014

f743166d

fix a kmap leak in virtio_console · c9efe511

由 Al Viro 提交于 2月 02, 2014

While we are at it, don't do kmap() under kmap_atomic(), *especially*
for a page we'd allocated with GFP_KERNEL. It's spelled "page_address",
and had that been more than that, we'd have a real trouble - kmap_high()
can block, and doing that while holding kmap_atomic() is a Bad Idea(tm).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c9efe511

fix O_SYNC|O_APPEND syncing the wrong range on write() · d311d79d

由 Al Viro 提交于 2月 09, 2014

It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
	pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
	pos_before_write .. pos_before_write + written - 1
instead.  Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().

All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of ->aio_write().

The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d311d79d

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 9c1db779

由 Linus Torvalds 提交于 2月 09, 2014

Pull btrfs fixes from Chris Mason:
 "This is a small collection of fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix data corruption when reading/updating compressed extents
  Btrfs: don't loop forever if we can't run because of the tree mod log
  btrfs: reserve no transaction units in btrfs_ioctl_set_features
  btrfs: commit transaction after setting label and features
  Btrfs: fix assert screwup for the pending move stuff

9c1db779

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6f2a1c1e

由 Linus Torvalds 提交于 2月 09, 2014

Pull perf fixes from Ingo Molnar:
 "Tooling fixes, mostly related to the KASLR fallout, but also other
  fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf buildid-cache: Check relocation when checking for existing kcore
  perf tools: Adjust kallsyms for relocated kernel
  perf tests: No need to set up ref_reloc_sym
  perf symbols: Prevent the use of kcore if the kernel has moved
  perf record: Get ref_reloc_sym from kernel map
  perf machine: Set up ref_reloc_sym in machine__create_kernel_maps()
  perf machine: Add machine__get_kallsyms_filename()
  perf tools: Add kallsyms__get_function_start()
  perf symbols: Fix symbol annotation for relocated kernel
  perf tools: Fix include for non x86 architectures
  perf tools: Fix AAAAARGH64 memory barriers
  perf tools: Demangle kernel and kernel module symbols too
  perf/doc: Remove mention of non-existent set_perf_event_pending() from design.txt

6f2a1c1e

09 2月, 2014 7 次提交

Btrfs: fix data corruption when reading/updating compressed extents · a2aa75e1

由 Filipe David Borba Manana 提交于 2月 08, 2014

When using a mix of compressed file extents and prealloc extents, it
is possible to fill a page of a file with random, garbage data from
some unrelated previous use of the page, instead of a sequence of zeroes.

A simple sequence of steps to get into such case, taken from the test
case I made for xfstests, is:

   _scratch_mkfs
   _scratch_mount "-o compress-force=lzo"
   $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

This results in the following file items in the fs tree:

   item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
       inode generation 6 transid 6 size 542872 block group 0 mode 100600
   item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
       inode ref index 2 namelen 6 name: foobar
   item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
       extent data disk byte 0 nr 0 gen 6
       extent data offset 0 nr 24576 ram 266240
       extent compression 0
   item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
       prealloc data disk byte 12849152 nr 241664 gen 6
       prealloc data offset 0 nr 241664
   item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
       extent data disk byte 12845056 nr 4096 gen 6
       extent data offset 0 nr 20480 ram 20480
       extent compression 2
   item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
       prealloc data disk byte 13090816 nr 405504 gen 6
       prealloc data offset 0 nr 258048

The on disk extent at offset 266240 (which corresponds to 1 single disk block),
contains 5 compressed chunks of file data. Each of the first 4 compress 4096
bytes of file data, while the last one only compresses 3024 bytes of file data.
Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
1072 bytes) should always return zeroes (our next extent is a prealloc one).

The solution here is the compression code path to zero the remaining (untouched)
bytes of the last page it uncompressed data into, as the information about how
much space the file data consumes in the last page is not known in the upper layer
fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
the remainder of the page but only if it corresponds to the last page of the inode
and if the inode's size is not a multiple of the page size.

This would cause not only returning random data on reads, but also permanently
storing random data when updating parts of the region that should be zeroed.
For the example above, it means updating a single byte in the region [285648 ; 286720[
would store that byte correctly but also store random data on disk.

A test case for xfstests follows soon.
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

a2aa75e1

Btrfs: don't loop forever if we can't run because of the tree mod log · 27a377db

由 Josef Bacik 提交于 2月 07, 2014

A user reported a 100% cpu hang with my new delayed ref code. Turns out I
forgot to increase the count check when we can't run a delayed ref because of
the tree mod log. If we can't run any delayed refs during this there is no
point in continuing to look, and we need to break out. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

27a377db

btrfs: reserve no transaction units in btrfs_ioctl_set_features · 8051aa1a

由 David Sterba 提交于 2月 07, 2014

Added in patch "btrfs: add ioctls to query/change feature bits online"
modifications to superblock don't need to reserve metadata blocks when
starting a transaction.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

8051aa1a

btrfs: commit transaction after setting label and features · d0270aca

由 Jeff Mahoney 提交于 2月 07, 2014

The set_fslabel ioctl uses btrfs_end_transaction, which means it's
possible that the change will be lost if the system crashes, same for
the newly set features. Let's use btrfs_commit_transaction instead.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

d0270aca

Btrfs: fix assert screwup for the pending move stuff · 6cc98d90

由 Josef Bacik 提交于 2月 05, 2014

Wang noticed that he was failing btrfs/030 even though me and Filipe couldn't
reproduce. Turns out this is because Wang didn't have CONFIG_BTRFS_ASSERT set,
which meant that a key part of Filipe's original patch was not being built in.
This appears to be a mess up with merging Filipe's patch as it does not exist in
his original patch. Fix this by changing how we make sure del_waiting_dir_move
asserts that it did not error and take the function out of the ifdef check.
This makes btrfs/030 pass with the assert on or off. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NFilipe Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

6cc98d90

Merge tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 49447903

由 Linus Torvalds 提交于 2月 08, 2014

Pull pinctrl fixes from Linus Walleij:
 "First round of pin control fixes for v3.14:

   - Protect pinctrl_list_add() with the proper mutex.  This was
     identified by RedHat.  Caused nasty locking warnings was rootcased
     by Stanislaw Gruszka.

   - Avoid adding dangerous debugfs files when either half of the
     subsystem is unused: pinmux or pinconf.

   - Various fixes to various drivers: locking, hardware particulars, DT
     parsing, error codes"

* tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: tegra: return correct error type
  pinctrl: do not init debugfs entries for unimplemented functionalities
  pinctrl: protect pinctrl_list add
  pinctrl: sirf: correct the pin index of ac97_pins group
  pinctrl: imx27: fix offset calculation in imx_read_2bit
  pinctrl: vt8500: Change devicetree data parsing
  pinctrl: imx27: fix wrong offset to ICONFB
  pinctrl: at91: use locked variant of irq_set_handler

49447903

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c132adef

由 Linus Torvalds 提交于 2月 08, 2014

Pull irq fix from Thomas Gleixner:
 "Add a missing Kconfig dependency"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Generic irq chip requires IRQ_DOMAIN

c132adef

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功