1. 25 6月, 2014 4 次提交
    • D
      xfs: global error sign conversion · 2451337d
      Dave Chinner 提交于
      Convert all the errors the core XFs code to negative error signs
      like the rest of the kernel and remove all the sign conversion we
      do in the interface layers.
      
      Errors for conversion (and comparison) found via searches like:
      
      $ git grep " E" fs/xfs
      $ git grep "return E" fs/xfs
      $ git grep " E[A-Z].*;$" fs/xfs
      
      Negation points found via searches like:
      
      $ git grep "= -[a-z,A-Z]" fs/xfs
      $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
      $ git grep " -[a-z].*;" fs/xfs
      
      [ with some bits I missed from Brian Foster ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2451337d
    • D
      libxfs: move source files · 30f712c9
      Dave Chinner 提交于
      Move all the source files that are shared with userspace into
      libxfs/. This is done as one big chunk simpy to get it done
      quickly
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      30f712c9
    • D
      libxfs: move header files · 84be0ffc
      Dave Chinner 提交于
      Move all the header files that are shared with userspace into
      libxfs. This is done as one big chunk simpy to get it done quickly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      84be0ffc
    • D
      xfs: create libxfs infrastructure · 69116a13
      Dave Chinner 提交于
      To minimise the differences between kernel and userspace code,
      split the kernel code into the same structure as the userspace code.
      That is, the gneric core functionality of XFS is moved to a libxfs/
      directory and treat it as a layering barrier in the XFS code.
      
      This patch introduces the libxfs directory, the build infrastructure
      and an initial source and header file to build. The libxfs directory
      will contain the header files that are needed to build libxfs - most
      of userspace does not care about the location of these header files
      as they are accessed indirectly. Hence keeping them inside libxfs
      makes it easy to track the changes and script the sync process as
      the directory structure will be identical.
      
      To allow this changeover to occur in the kernel code, there are some
      temporary infrastructure in the makefiles to grab the header
      filesystem from both locations. Once all the files are moved,
      modifications will be made in the source code that will make the
      need for these include directives go away.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      69116a13
  2. 22 6月, 2014 2 次提交
  3. 12 6月, 2014 1 次提交
    • A
      ->splice_write() via ->write_iter() · 8d020765
      Al Viro 提交于
      iter_file_splice_write() - a ->splice_write() instance that gathers the
      pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
      it to ->write_iter().  A bunch of simple cases coverted to that...
      
      [AV: fixed the braino spotted by Cyrill]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8d020765
  4. 11 6月, 2014 1 次提交
    • A
      fs,userns: Change inode_capable to capable_wrt_inode_uidgid · 23adbe12
      Andy Lutomirski 提交于
      The kernel has no concept of capabilities with respect to inodes; inodes
      exist independently of namespaces.  For example, inode_capable(inode,
      CAP_LINUX_IMMUTABLE) would be nonsense.
      
      This patch changes inode_capable to check for uid and gid mappings and
      renames it to capable_wrt_inode_uidgid, which should make it more
      obvious what it does.
      
      Fixes CVE-2014-4014.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23adbe12
  5. 10 6月, 2014 1 次提交
  6. 06 6月, 2014 23 次提交
  7. 20 5月, 2014 8 次提交
    • R
      xfs: fix compile error when libxfs header used in C++ code · 376c2f3a
      Roger Willcocks 提交于
      xfs_ialloc.h:102: error: expected ',' or '...' before 'delete'
      
      Simple parameter rename, no changes to behaviour.
      Signed-off-by: NRoger Willcocks <roger@filmlight.ltd.uk>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      376c2f3a
    • J
      xfs: fix infinite loop at xfs_vm_writepage on 32bit system · 8695d27e
      Jie Liu 提交于
      Write to a file with an offset greater than 16TB on 32-bit system and
      then trigger page write-back via sync(1) will cause task hang.
      
      # block_size=4096
      # offset=$(((2**32 - 1) * $block_size))
      # xfs_io -f -c "pwrite $offset $block_size" /storage/test_file
      # sync
      
      INFO: task sync:2590 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      sync            D c1064a28     0  2590   2097 0x00000000
      .....
      Call Trace:
      [<c1064a28>] ? ttwu_do_wakeup+0x18/0x130
      [<c1066d0e>] ? try_to_wake_up+0x1ce/0x220
      [<c1066dbf>] ? wake_up_process+0x1f/0x40
      [<c104fc2e>] ? wake_up_worker+0x1e/0x30
      [<c15b6083>] schedule+0x23/0x60
      [<c15b3c2d>] schedule_timeout+0x18d/0x1f0
      [<c12a143e>] ? do_raw_spin_unlock+0x4e/0x90
      [<c10515f1>] ? __queue_delayed_work+0x91/0x150
      [<c12a12ef>] ? do_raw_spin_lock+0x3f/0x100
      [<c12a143e>] ? do_raw_spin_unlock+0x4e/0x90
      [<c15b5b5d>] wait_for_completion+0x7d/0xc0
      [<c1066d60>] ? try_to_wake_up+0x220/0x220
      [<c116a4d2>] sync_inodes_sb+0x92/0x180
      [<c116fb05>] sync_inodes_one_sb+0x15/0x20
      [<c114a8f8>] iterate_supers+0xb8/0xc0
      [<c116faf0>] ? fdatawrite_one_bdev+0x20/0x20
      [<c116fc21>] sys_sync+0x31/0x80
      [<c15be18d>] sysenter_do_call+0x12/0x28
      
      This issue can be triggered via xfstests/generic/308.
      
      The reason is that the end_index is unsigned long with maximum value
      '2^32-1=4294967295' on 32-bit platform, and the given offset cause it
      wrapped to 0, so that the following codes will repeat again and again
      until the task schedule time out:
      
      end_index = offset >> PAGE_CACHE_SHIFT;
      last_index = (offset - 1) >> PAGE_CACHE_SHIFT;
      if (page->index >= end_index) {
      	unsigned offset_into_page = offset & (PAGE_CACHE_SIZE - 1);
              /*
               * Just skip the page if it is fully outside i_size, e.g. due
               * to a truncate operation that is in progress.
               */
              if (page->index >= end_index + 1 || offset_into_page == 0) {
      	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      		unlock_page(page);
      		return 0;
      	}
      
      In order to check if a page is fully outsids i_size or not, we can fix
      the code logic as below:
      	if (page->index > end_index ||
      	    (page->index == end_index && offset_into_page == 0))
      
      Secondly, there still has another similar issue when calculating the
      end offset for mapping the filesystem blocks to the file blocks for
      delalloc.  With the same tests to above, run unmount(8) will cause
      kernel panic if CONFIG_XFS_DEBUG is enabled:
      
      XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || \
      	ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 964
      
      kernel BUG at fs/xfs/xfs_message.c:108!
      invalid opcode: 0000 [#1] SMP
      task: edddc100 ti: ec6ee000 task.ti: ec6ee000
      EIP: 0060:[<f83d87cb>] EFLAGS: 00010296 CPU: 1
      EIP is at assfail+0x2b/0x30 [xfs]
      ..............
      Call Trace:
      [<f83d9cd4>] xfs_fs_destroy_inode+0x74/0x120 [xfs]
      [<c115ddf1>] destroy_inode+0x31/0x50
      [<c115deff>] evict+0xef/0x170
      [<c115dfb2>] dispose_list+0x32/0x40
      [<c115ea3a>] evict_inodes+0xca/0xe0
      [<c1149706>] generic_shutdown_super+0x46/0xd0
      [<c11497b9>] kill_block_super+0x29/0x70
      [<c1149a14>] deactivate_locked_super+0x44/0x70
      [<c114a427>] deactivate_super+0x47/0x60
      [<c1161c3d>] mntput_no_expire+0xcd/0x120
      [<c1162ae8>] SyS_umount+0xa8/0x370
      [<c1162dce>] SyS_oldumount+0x1e/0x20
      [<c15be18d>] sysenter_do_call+0x12/0x28
      
      That because the end_offset is evaluated to 0 which is the same reason
      to above, hence the mapping and covertion for dealloc file blocks to
      file system blocks did not happened.
      
      This patch just fixed both issues.
      Reported-by: NMichael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      8695d27e
    • D
      xfs: remove redundant checks from xfs_da_read_buf · 7c166350
      Dave Chinner 提交于
      All of the verification checks of magic numbers are now done by
      verifiers, so ther eis no need to check them again once the buffer
      has been successfully read. If the magic number is bad, it won't
      even get to that code to verify it so it really serves no purpose at
      all anymore. Remove it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      7c166350
    • D
      xfs: log vector rounding leaks log space · 110dc24a
      Dave Chinner 提交于
      The addition of direct formatting of log items into the CIL
      linear buffer added alignment restrictions that the start of each
      vector needed to be 64 bit aligned. Hence padding was added in
      xlog_finish_iovec() to round up the vector length to ensure the next
      vector started with the correct alignment.
      
      This adds a small number of bytes to the size of
      the linear buffer that is otherwise unused. The issue is that we
      then use the linear buffer size to determine the log space used by
      the log item, and this includes the unused space. Hence when we
      account for space used by the log item, it's more than is actually
      written into the iclogs, and hence we slowly leak this space.
      
      This results on log hangs when reserving space, with threads getting
      stuck with these stack traces:
      
      Call Trace:
      [<ffffffff81d15989>] schedule+0x29/0x70
      [<ffffffff8150d3a2>] xlog_grant_head_wait+0xa2/0x1a0
      [<ffffffff8150d55d>] xlog_grant_head_check+0xbd/0x140
      [<ffffffff8150ee33>] xfs_log_reserve+0x103/0x220
      [<ffffffff814b7f05>] xfs_trans_reserve+0x2f5/0x310
      .....
      
      The 4 bytes is significant. Brain Foster did all the hard work in
      tracking down a reproducable leak to inode chunk allocation (it went
      away with the ikeep mount option). His rough numbers were that
      creating 50,000 inodes leaked 11 log blocks. This turns out to be
      roughly 800 inode chunks or 1600 inode cluster buffers. That
      works out at roughly 4 bytes per cluster buffer logged, and at that
      I started looking for a 4 byte leak in the buffer logging code.
      
      What I found was that a struct xfs_buf_log_format structure for an
      inode cluster buffer is 28 bytes in length. This gets rounded up to
      32 bytes, but the vector length remains 28 bytes. Hence the CIL
      ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
      for that vector rather than 28 bytes which are written into the log.
      
      The fix for this problem is to separately track the bytes used by
      the log vectors in the item and use that instead of the buffer
      length when accounting for the log space that will be used by the
      formatted log item.
      
      Again, thanks to Brian Foster for doing all the hard work and long
      hours to isolate this leak and make finding the bug relatively
      simple.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      110dc24a
    • N
      xfs: remove XFS_TRANS_RESERVE in collapse range · ce576f1c
      Namjae Jeon 提交于
      There is no need to dip into reserve pool. Reserve pool is used for much
      more important things. And xfs_trans_reserve will never return ENOSPC
      because punch hole is already done. If we get ENOSPC, collapse range
      will be simply failed.
      
      Cc: Brian Foster <bfoster@redhat.com>
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ce576f1c
    • D
      xfs: remove shared supberlock feature checking · ab3e57b5
      Dave Chinner 提交于
      We reject any filesystem that is mounted with this feature bit set,
      so we don't need to check for it anywhere else. Remove the function
      for checking if the feature bit is set and any code that uses it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ab3e57b5
    • D
      xfs: don't need dirv2 checks anymore · 5d074a4f
      Dave Chinner 提交于
      If the the V2 directory feature bit is not set in the superblock
      feature mask the filesystem will fail the good version check.
      Hence we don't need any other version checking on the dir2 feature
      bit in the code as the filesystem will not mount without it set.
      Remove the checking code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5d074a4f
    • D
      xfs: turn NLINK feature on by default · 263997a6
      Dave Chinner 提交于
      mkfs has turned on the XFS_SB_VERSION_NLINKBIT feature bit by
      default since November 2007. It's about time we simply made the
      kernel code turn it on by default and so always convert v1 inodes to
      v2 inodes when reading them in from disk or allocating them. This
      This removes needless version checks and modification when bumping
      link counts on inodes, and will take code out of a few common code
      paths.
      
         text    data     bss     dec     hex filename
       783251  100867     616  884734   d7ffe fs/xfs/xfs.o.orig
       782664  100867     616  884147   d7db3 fs/xfs/xfs.o.patched
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      263997a6