1. 21 11月, 2014 8 次提交
    • F
      Btrfs: make inode.c:compress_file_range() return void · c44f649e
      Filipe Manana 提交于
      Its return value is useless, its single caller ignores it and can't do
      anything with it anyway, since it's a workqueue task and not the task
      calling filemap_fdatawrite_range (writepages) nor filemap_fdatawait_range().
      Failure is communicated to such functions via start and end of writeback
      with the respective pages tagged with an error and AS_EIO flag set in the
      inode's imapping.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c44f649e
    • S
      Btrfs: fix incorrect compression ratio detection · 4bcbb332
      Shilong Wang 提交于
      Steps to reproduce:
       # mkfs.btrfs -f /dev/sdb
       # mount -t btrfs /dev/sdb /mnt -o compress=lzo
       # dd if=/dev/zero of=/mnt/data bs=$((33*4096)) count=1
      
      after previous steps, inode will be detected as bad compression ratio,
      and NOCOMPRESS flag will be set for that inode.
      
      Reason is that compress have a max limit pages every time(128K), if a
      132k write in, it will be splitted into two write(128k+4k), this bug
      is a leftover for commit 68bb462d(Btrfs: don't compress for a small write)
      
      Fix this problem by checking every time before compression, if it is a
      small write(<=blocksize), we bail out and fall into nocompression directly.
      Signed-off-by: NWang Shilong <wangshilong1991@gmail.com>
      Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      4bcbb332
    • F
      Btrfs: don't ignore compressed bio write errors · 7bdcefc1
      Filipe Manana 提交于
      Our compressed bio write end callback was essentially ignoring the error
      parameter. When a write error happens, it must pass a value of 0 to the
      inode's write_page_end_io_hook callback, SetPageError on the respective
      pages and set AS_EIO in the inode's mapping flags, so that a call to
      filemap_fdatawait_range() / filemap_fdatawait() can find out that errors
      happened (we surely don't want silent failures on fsync for example).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7bdcefc1
    • F
      Btrfs: make inode.c:submit_compressed_extents() return void · dec8f175
      Filipe Manana 提交于
      Its return value is completely ignored by its single caller and it's
      useless anyway, since errors are indicated through SetPageError and
      the bit AS_EIO set in the flags of the inode's mapping. The caller
      can't do anything with the value, as it's invoked from a workqueue
      task and not by the task calling filemap_fdatawrite_range (which calls
      the writepages address space callback, which in turn calls the inode's
      fill_delalloc callback).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      dec8f175
    • F
      Btrfs: process all async extents on compressed write failure · 3d7a820f
      Filipe Manana 提交于
      If we had an error when processing one of the async extents from our list,
      we were not processing the remaining async extents, meaning we would leak
      those async_extent structs, never release the pages with the compressed
      data and never unlock and clear the dirty flag from the inode's pages (those
      that correspond to the uncompressed content).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3d7a820f
    • F
      Btrfs: don't leak pages and memory on compressed write error · 40ae837b
      Filipe Manana 提交于
      In inode.c:submit_compressed_extents(), if we fail before calling
      btrfs_submit_compressed_write(), or when that function fails, we
      were freeing the async_extent structure without releasing its pages
      and freeing the pages array.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      40ae837b
    • F
      Btrfs: fix hang on compressed write error · fce2a4e6
      Filipe Manana 提交于
      In inode.c:submit_compressed_extents(), before calling btrfs_submit_compressed_write()
      we start writeback for all pages, clear their dirty flag, unlock them, etc, but if
      btrfs_submit_compressed_write() fails (at the moment it can only fail with -ENOMEM),
      we never end the writeback on the pages, so any filemap_fdatawait_range() call will
      hang forever. We were also not calling the writepage end io hook, which means the
      corresponding ordered extent will never complete and all its waiters will block
      forever, such as a full fsync (via btrfs_wait_ordered_range()).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      fce2a4e6
    • F
      Btrfs: set page and mapping error on compressed write failure · 704de49d
      Filipe Manana 提交于
      If we fail in submit_compressed_extents() before calling btrfs_submit_compressed_write(),
      we start and end the writeback for the pages (clear their dirty flag, unlock them, etc)
      but we don't tag the pages, nor the inode's mapping, with an error. This makes it
      impossible for a caller of filemap_fdatawait_range() (fsync, or transaction commit
      for e.g.) know that there was an error.
      
      Note that the return value of submit_compressed_extents() is useless, as that function
      is executed by a workqueue task and not directly by the fill_delalloc callback. This
      means the writepage/s callbacks of the inode's address space operations don't get that
      return value.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      704de49d
  2. 14 11月, 2014 2 次提交
  3. 13 11月, 2014 11 次提交
  4. 07 11月, 2014 6 次提交
    • D
      xfs: track bulkstat progress by agino · 00275899
      Dave Chinner 提交于
      The bulkstat main loop progress is tracked by the "lastino"
      variable, which is a full 64 bit inode. However, the loop actually
      works on agno/agino pairs, and so there's a significant disconnect
      between the rest of the loop and the main cursor. Convert this to
      use the agino, and pass the agino into the chunk formatting function
      and convert it too.
      
      This gets rid of the inconsistency in the loop processing, and
      finally makes it simple for us to skip inodes at any point in the
      loop simply by incrementing the agino cursor.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      00275899
    • D
      xfs: bulkstat error handling is broken · febe3cbe
      Dave Chinner 提交于
      The error propagation is a horror - xfs_bulkstat() returns
      a rval variable which is only set if there are formatter errors. Any
      sort of btree walk error or corruption will cause the bulkstat walk
      to terminate but will not pass an error back to userspace. Worse
      is the fact that formatter errors will also be ignored if any inodes
      were correctly formatted into the user buffer.
      
      Hence bulkstat can fail badly yet still report success to userspace.
      This causes significant issues with xfsdump not dumping everything
      in the filesystem yet reporting success. It's not until a restore
      fails that there is any indication that the dump was bad and tha
      bulkstat failed. This patch now triggers xfsdump to fail with
      bulkstat errors rather than silently missing files in the dump.
      
      This now causes bulkstat to fail when the lastino cookie does not
      fall inside an existing inode chunk. The pre-3.17 code tolerated
      that error by allowing the code to move to the next inode chunk
      as the agino target is guaranteed to fall into the next btree
      record.
      
      With the fixes up to this point in the series, xfsdump now passes on
      the troublesome filesystem image that exposes all these bugs.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      febe3cbe
    • D
      xfs: bulkstat main loop logic is a mess · 6e57c542
      Dave Chinner 提交于
      There are a bunch of variables tha tare more wildy scoped than they
      need to be, obfuscated user buffer checks and tortured "next inode"
      tracking. This all needs cleaning up to expose the real issues that
      need fixing.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6e57c542
    • D
      xfs: bulkstat chunk-formatter has issues · 2b831ac6
      Dave Chinner 提交于
      The loop construct has issues:
      	- clustidx is completely unused, so remove it.
      	- the loop tries to be smart by terminating when the
      	  "freecount" tells it that all inodes are free. Just drop
      	  it as in most cases we have to scan all inodes in the
      	  chunk anyway.
      	- move the "user buffer left" condition check to the only
      	  point where we consume space int eh user buffer.
      	- move the initialisation of agino out of the loop, leaving
      	  just a simple loop control logic using the clusteridx.
      
      Also, double handling of the user buffer variables leads to problems
      tracking the current state - use the cursor variables directly
      rather than keeping local copies and then having to update the
      cursor before returning.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2b831ac6
    • D
      xfs: bulkstat chunk formatting cursor is broken · bf4a5af2
      Dave Chinner 提交于
      The xfs_bulkstat_agichunk formatting cursor takes buffer values from
      the main loop and passes them via the structure to the chunk
      formatter, and the writes the changed values back into the main loop
      local variables. Unfortunately, this complex dance is full of corner
      cases that aren't handled correctly.
      
      The biggest problem is that it is double handling the information in
      both the main loop and the chunk formatting function, leading to
      inconsistent updates and endless loops where progress is not made.
      
      To fix this, push the struct xfs_bulkstat_agichunk outwards to be
      the primary holder of user buffer information. this removes the
      double handling in the main loop.
      
      Also, pass the last inode processed by the chunk formatter as a
      separate parameter as it purely an output variable and is not
      related to the user buffer consumption cursor.
      
      Finally, the chunk formatting code is not shared by anyone, so make
      it local to xfs_itable.c.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      bf4a5af2
    • D
      xfs: bulkstat btree walk doesn't terminate · afa947cb
      Dave Chinner 提交于
      The bulkstat code has several different ways of detecting the end of
      an AG when doing a walk. They are not consistently detected, and the
      code that checks for the end of AG conditions is not consistently
      coded. Hence the are conditions where the walk code can get stuck in
      an endless loop making no progress and not triggering any
      termination conditions.
      
      Convert all the "tmp/i" status return codes from btree operations
      to a common name (stat) and apply end-of-ag detection to these
      operations consistently.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      afa947cb
  5. 06 11月, 2014 1 次提交
  6. 05 11月, 2014 4 次提交
  7. 04 11月, 2014 1 次提交
  8. 01 11月, 2014 1 次提交
  9. 31 10月, 2014 3 次提交
    • D
      Return short read or 0 at end of a raw device, not EIO · b2de525f
      David Jeffery 提交于
      Author: David Jeffery <djeffery@redhat.com>
      Changes to the basic direct I/O code have broken the raw driver when reading
      to the end of a raw device.  Instead of returning a short read for a read that
      extends partially beyond the device's end or 0 when at the end of the device,
      these reads now return EIO.
      
      The raw driver needs the same end of device handling as was added for normal
      block devices.  Using blkdev_read_iter, which has the needed size checks,
      prevents the EIO conditions at the end of the device.
      Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b2de525f
    • A
      isofs: don't bother with ->d_op for normal case · b0afd8e5
      Al Viro 提交于
      we only need it for joliet and case-insensitive mounts
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b0afd8e5
    • E
      fs: allow open(dir, O_TMPFILE|..., 0) with mode 0 · 69a91c23
      Eric Rannaud 提交于
      The man page for open(2) indicates that when O_CREAT is specified, the
      'mode' argument applies only to future accesses to the file:
      
      	Note that this mode applies only to future accesses of the newly
      	created file; the open() call that creates a read-only file
      	may well return a read/write file descriptor.
      
      The man page for open(2) implies that 'mode' is treated identically by
      O_CREAT and O_TMPFILE.
      
      O_TMPFILE, however, behaves differently:
      
      	int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
      	assert(fd == -1);
      	assert(errno == EACCES);
      
      	int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
      	assert(fd > 0);
      
      For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:
      
      	if (*opened & FILE_CREATED) {
      		/* Don't check for write permission, don't truncate */
      		open_flag &= ~O_TRUNC;
      		will_truncate = false;
      		acc_mode = MAY_OPEN;
      		path_to_nameidata(path, nd);
      		goto finish_open_created;
      	}
      
      But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
      may_open().
      
      This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
      inode is created, may_open() is called with acc_mode = MAY_OPEN, in
      do_tmpfile().
      
      A different, but related glibc bug revealed the discrepancy:
      https://sourceware.org/bugzilla/show_bug.cgi?id=17523
      
      The glibc lazily loads the 'mode' argument of open() and openat() using
      va_arg() only if O_CREAT is present in 'flags' (to support both the 2
      argument and the 3 argument forms of open; same idea for openat()).
      However, the glibc ignores the 'mode' argument if O_TMPFILE is in
      'flags'.
      
      On x86_64, for open(), it magically works anyway, as 'mode' is in
      RDX when entering open(), and is still in RDX on SYSCALL, which is where
      the kernel looks for the 3rd argument of a syscall.
      
      But openat() is not quite so lucky: 'mode' is in RCX when entering the
      glibc wrapper for openat(), while the kernel looks for the 4th argument
      of a syscall in R10. Indeed, the syscall calling convention differs from
      the regular calling convention in this respect on x86_64. So the kernel
      sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
      fails with EACCES.
      Signed-off-by: NEric Rannaud <e@nanocritical.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69a91c23
  10. 30 10月, 2014 3 次提交
    • J
      ext4: make ext4_ext_convert_to_initialized() return proper number of blocks · ae9e9c6a
      Jan Kara 提交于
      ext4_ext_convert_to_initialized() can return more blocks than are
      actually allocated from map->m_lblk in case where initial part of the
      on-disk extent is zeroed out. Luckily this doesn't have serious
      consequences because the caller currently uses the return value
      only to unmap metadata buffers. Anyway this is a data
      corruption/exposure problem waiting to happen so fix it.
      
      Coverity-id: 1226848
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ae9e9c6a
    • J
      ext4: bail early when clearing inode journal flag fails · 4f879ca6
      Jan Kara 提交于
      When clearing inode journal flag, we call jbd2_journal_flush() to force
      all the journalled data to their final locations. Currently we ignore
      when this fails and continue clearing inode journal flag. This isn't a
      big problem because when jbd2_journal_flush() fails, journal is likely
      aborted anyway. But it can still lead to somewhat confusing results so
      rather bail out early.
      
      Coverity-id: 989044
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4f879ca6
    • J
      ext4: bail out from make_indexed_dir() on first error · 6050d47a
      Jan Kara 提交于
      When ext4_handle_dirty_dx_node() or ext4_handle_dirty_dirent_node()
      fail, there's really something wrong with the fs and there's no point in
      continuing further. Just return error from make_indexed_dir() in that
      case. Also initialize frames array so that if we return early due to
      error, dx_release() doesn't try to dereference uninitialized memory
      (which could happen also due to error in do_split()).
      
      Coverity-id: 741300
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      6050d47a