1. 01 8月, 2011 1 次提交
  2. 26 7月, 2011 4 次提交
  3. 23 7月, 2011 1 次提交
    • J
      ext3: Fix data corruption in inodes with journalled data · b22570d9
      Jan Kara 提交于
      When journalling data for an inode (either because it is a symlink or
      because the filesystem is mounted in data=journal mode), ext3_evict_inode()
      can discard unwritten data by calling truncate_inode_pages(). This is
      because we don't mark the buffer / page dirty when journalling data but only
      add the buffer to the running transaction and thus mm does not know there
      are still unwritten data.
      
      Fix the problem by carefully tracking transaction containing inode's data,
      committing this transaction, and writing uncheckpointed buffers when inode
      should be reaped.
      Signed-off-by: NJan Kara <jack@suse.cz>
      b22570d9
  4. 21 7月, 2011 5 次提交
  5. 20 7月, 2011 3 次提交
  6. 25 6月, 2011 6 次提交
    • L
      ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs() · 2c2ea945
      Lukas Czerner 提交于
      We should return -EINVAL when the FITRIM parameters are not sane, but
      currently we are exiting silently if start is beyond the end of the
      file system. This commit fixes this so we return -EINVAL as other file
      systems do.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      CC: Jan Kara <jack@suse.cz>
      Signed-off-by: NJan Kara <jack@suse.cz>
      2c2ea945
    • H
      ext3/ioctl.c: silence sparse warnings about different address spaces · 81fe8c62
      H Hartley Sweeten 提交于
      The 'from' argument for copy_from_user and the 'to' argument for
      copy_to_user should both be tagged as __user address space.
      Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NJan Kara <jack@suse.cz>
      81fe8c62
    • J
      ext3: Improve truncate error handling · ee3e77f1
      Jan Kara 提交于
      New truncate calling convention allows us to handle errors from
      ext3_block_truncate_page(). So reorganize the code so that
      ext3_block_truncate_page() is called before we change inode size.
      
      This also removes unnecessary block zeroing from error recovery after failed
      buffered writes (zeroing isn't needed because we could have never written
      non-zero data to disk). We have to be careful and keep zeroing in direct IO
      write error recovery because there we might have already overwritten end of the
      last file block.
      Signed-off-by: NJan Kara <jack@suse.cz>
      ee3e77f1
    • J
      ext3: Fix oops in ext3_try_to_allocate_with_rsv() · ad95c5e9
      Jan Kara 提交于
      Block allocation is called from two places: ext3_get_blocks_handle() and
      ext3_xattr_block_set(). These two callers are not necessarily synchronized
      because xattr code holds only xattr_sem and i_mutex, and
      ext3_get_blocks_handle() may hold only truncate_mutex when called from
      writepage() path. Block reservation code does not expect two concurrent
      allocations to happen to the same inode and thus assertions can be triggered
      or reservation structure corruption can occur.
      
      Fix the problem by taking truncate_mutex in xattr code to serialize
      allocations.
      
      CC: Sage Weil <sage@newdream.net>
      CC: stable@kernel.org
      Reported-by: NFyodor Ustinov <ufm@ufm.su>
      Signed-off-by: NJan Kara <jack@suse.cz>
      ad95c5e9
    • J
      ext3: Convert ext3 to new truncate calling convention · 40680f2f
      Jan Kara 提交于
      Mostly trivial conversion. We fix a bug that IS_IMMUTABLE and IS_APPEND files
      could not be truncated during failed writes as we change the code.  In fact the
      test is not needed at all because both IS_IMMUTABLE and IS_APPEND is tested in
      upper layers in do_sys_[f]truncate(), may_write(), etc.
      Signed-off-by: NJan Kara <jack@suse.cz>
      40680f2f
    • L
      ext3: Add fixed tracepoints · 785c4bcc
      Lukas Czerner 提交于
      This commit adds fixed tracepoints to the ext3 code. It is based on ext4
      tracepoints, however due to the differences of both file systems, there
      are some tracepoints missing (those for delaloc and for multi-block
      allocator) and there are some ext3 specific as well (for reservation
      windows).
      
      Here is a list:
      
      ext3_free_inode
      ext3_request_inode
      ext3_allocate_inode
      ext3_evict_inode
      ext3_drop_inode
      ext3_mark_inode_dirty
      ext3_write_begin
      ext3_ordered_write_end
      ext3_writeback_write_end
      ext3_journalled_write_end
      ext3_ordered_writepage
      ext3_writeback_writepage
      ext3_journalled_writepage
      ext3_readpage
      ext3_releasepage
      ext3_invalidatepage
      ext3_discard_blocks
      ext3_request_blocks
      ext3_allocate_blocks
      ext3_free_blocks
      ext3_sync_file_enter
      ext3_sync_file_exit
      ext3_sync_fs
      ext3_rsv_window_add
      ext3_discard_reservation
      ext3_alloc_new_reservation
      ext3_reserved
      ext3_forget
      ext3_read_block_bitmap
      ext3_direct_IO_enter
      ext3_direct_IO_exit
      ext3_unlink_enter
      ext3_unlink_exit
      ext3_truncate_enter
      ext3_truncate_exit
      ext3_get_blocks_enter
      ext3_get_blocks_exit
      ext3_load_inode
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NJan Kara <jack@suse.cz>
      785c4bcc
  7. 27 5月, 2011 2 次提交
    • C
      fs: pass exact type of data dirties to ->dirty_inode · aa385729
      Christoph Hellwig 提交于
      Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
      anything else, so that the filesystem can track internally if it
      needs to push out a transaction for fdatasync or not.
      
      This is just the prototype change with no user for it yet.  I plan
      to push large XFS changes for the next merge window, and getting
      this trivial infrastructure in this window would help a lot to avoid
      tree interdependencies.
      
      Also remove incorrect comments that ->dirty_inode can't block.  That
      has been changed a long time ago, and many implementations rely on it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aa385729
    • D
      ext3: add cleancache support · d71bc6db
      Dan Magenheimer 提交于
      This fifth patch of eight in this cleancache series "opts-in"
      cleancache for ext3.  Filesystems must explicitly enable
      cleancache by calling cleancache_init_fs anytime an instance
      of the filesystem is mounted. For ext3, all other cleancache
      hooks are in the VFS layer including the matching cleancache_flush_fs
      hook which must be called on unmount.
      
      Details and a FAQ can be found in Documentation/vm/cleancache.txt
      
      [v6-v8: no changes]
      [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NAndreas Dilger <adilger@sun.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik Van Riel <riel@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      d71bc6db
  8. 26 5月, 2011 3 次提交
  9. 17 5月, 2011 1 次提交
    • J
      ext3: Fix fs corruption when make_indexed_dir() fails · 86c4f6d8
      Jan Kara 提交于
      When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated
      block for index tree root, we did not properly mark all changed buffers dirty.
      This lead to only some of these buffers being written out and thus effectively
      corrupting the directory.
      
      Fix the issue by marking all changed data dirty even in the error failure case.
      
      CC: stable@kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      86c4f6d8
  10. 30 4月, 2011 1 次提交
    • J
      ext3: Fix lock inversion in ext3_symlink() · ae54870a
      Jan Kara 提交于
      ext3_symlink() cannot call __page_symlink() with transaction open.
      __page_symlink() calls ext3_write_begin() which gets page lock which ranks
      above transaction start (thus lock ordering is violated) and and also
      ext3_write_begin() waits for a transaction commit when we run out of space
      which never happens if we hold transaction open.
      
      Fix the problem by stopping a transaction before calling __page_symlink()
      (we have to be careful and put inode to orphan list so that it gets deleted
      in case of crash) and starting another one after __page_symlink() returns
      for addition of symlink into a directory.
      Signed-off-by: NJan Kara <jack@suse.cz>
      ae54870a
  11. 31 3月, 2011 1 次提交
  12. 24 3月, 2011 2 次提交
  13. 15 3月, 2011 2 次提交
  14. 10 3月, 2011 1 次提交
  15. 08 3月, 2011 1 次提交
  16. 04 3月, 2011 1 次提交
    • T
      ext3: Fix an overflow in ext3_trim_fs. · 425fa410
      Tao Ma 提交于
      In a bs=4096 volume, if we call FITRIM with the following parameter as
      fstrim_range(start = 102400, len = 134144000, minlen = 10240), with the
      following code:
      if (len >= EXT3_BLOCKS_PER_GROUP(sb))
              len -= (EXT3_BLOCKS_PER_GROUP(sb) - first_block);
      else
              last_block = first_block + len;
      
      So if len < EXT3_BLOCKS_PER_GROUP while first_block + len >
      EXT3_BLOCKS_PER_GROUP, last_block will be set to an overflow value
      which exceeds EXT3_BLOCKS_PER_GROUP.
      
      This patch fixes it and adjusts len and last_block accordingly.
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      425fa410
  17. 01 3月, 2011 1 次提交
    • A
      ext3: skip orphan cleanup on rocompat fs · ce654b37
      Amir Goldstein 提交于
      Orphan cleanup is currently executed even if the file system has some
      number of unknown ROCOMPAT features, which deletes inodes and frees
      blocks, which could be very bad for some RO_COMPAT features.
      
      This patch skips the orphan cleanup if it contains readonly compatible
      features not known by this ext3 implementation, which would prevent
      the fs from being mounted (or remounted) readwrite.
      Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
      Signed-off-by: NJan Kara <jack@suse.cz>
      ce654b37
  18. 24 2月, 2011 2 次提交
    • T
      ext3: speed up group trim with the right free block count. · bbac751d
      Tao Ma 提交于
      When we trim some free blocks in a group of ext3, we should
      calculate the free blocks properly and check whether there are
      enough freed blocks left for us to trim. Current solution will
      only calculate free spaces if they are large for a trim which
      is wrong.
      
      Let us see a small example:
      a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
      And minblocks is 1M. With current solution, we have to iterate
      the whole group since these 300k will never be subtracted from
      1.5M. But actually we should exit after we find the first 2
      free spaces since the left 3 chunks only sum up to 900K if we
      subtract the first 600K although they can't be trimed.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      bbac751d
    • T
      ext3: Adjust trim start with first_data_block. · 4b44dd30
      Tao Ma 提交于
      As we have make the consense in the e-mail[1], the trim start should
      be added with first_data_block. So this patch fulfill it and remove
      the check for start < first_data_block.
      
      [1] http://www.spinics.net/lists/linux-ext4/msg22737.html
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      4b44dd30
  19. 02 2月, 2011 1 次提交
    • E
      fs/vfs/security: pass last path component to LSM on inode creation · 2a7dba39
      Eric Paris 提交于
      SELinux would like to implement a new labeling behavior of newly created
      inodes.  We currently label new inodes based on the parent and the creating
      process.  This new behavior would also take into account the name of the
      new object when deciding the new label.  This is not the (supposed) full path,
      just the last component of the path.
      
      This is very useful because creating /etc/shadow is different than creating
      /etc/passwd but the kernel hooks are unable to differentiate these
      operations.  We currently require that userspace realize it is doing some
      difficult operation like that and than userspace jumps through SELinux hoops
      to get things set up correctly.  This patch does not implement new
      behavior, that is obviously contained in a seperate SELinux patch, but it
      does pass the needed name down to the correct LSM hook.  If no such name
      exists it is fine to pass NULL.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      2a7dba39
  20. 13 1月, 2011 1 次提交
    • J
      quota: Fix deadlock during path resolution · f00c9e44
      Jan Kara 提交于
      As Al Viro pointed out path resolution during Q_QUOTAON calls to quotactl
      is prone to deadlocks. We hold s_umount semaphore for reading during the
      path resolution and resolution itself may need to acquire the semaphore
      for writing when e. g. autofs mountpoint is passed.
      
      Solve the problem by performing the resolution before we get hold of the
      superblock (and thus s_umount semaphore). The whole thing is complicated
      by the fact that some filesystems (OCFS2) ignore the path argument. So to
      distinguish between filesystem which want the path and which do not we
      introduce new .quota_on_meta callback which does not get the path. OCFS2
      then uses this callback instead of old .quota_on.
      
      CC: Al Viro <viro@ZenIV.linux.org.uk>
      CC: Christoph Hellwig <hch@lst.de>
      CC: Ted Ts'o <tytso@mit.edu>
      CC: Joel Becker <joel.becker@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      f00c9e44