1. 23 8月, 2011 2 次提交
  2. 23 7月, 2011 1 次提交
    • J
      ext3: Fix data corruption in inodes with journalled data · b22570d9
      Jan Kara 提交于
      When journalling data for an inode (either because it is a symlink or
      because the filesystem is mounted in data=journal mode), ext3_evict_inode()
      can discard unwritten data by calling truncate_inode_pages(). This is
      because we don't mark the buffer / page dirty when journalling data but only
      add the buffer to the running transaction and thus mm does not know there
      are still unwritten data.
      
      Fix the problem by carefully tracking transaction containing inode's data,
      committing this transaction, and writing uncheckpointed buffers when inode
      should be reaped.
      Signed-off-by: NJan Kara <jack@suse.cz>
      b22570d9
  3. 21 7月, 2011 2 次提交
  4. 25 6月, 2011 3 次提交
    • J
      ext3: Improve truncate error handling · ee3e77f1
      Jan Kara 提交于
      New truncate calling convention allows us to handle errors from
      ext3_block_truncate_page(). So reorganize the code so that
      ext3_block_truncate_page() is called before we change inode size.
      
      This also removes unnecessary block zeroing from error recovery after failed
      buffered writes (zeroing isn't needed because we could have never written
      non-zero data to disk). We have to be careful and keep zeroing in direct IO
      write error recovery because there we might have already overwritten end of the
      last file block.
      Signed-off-by: NJan Kara <jack@suse.cz>
      ee3e77f1
    • J
      ext3: Convert ext3 to new truncate calling convention · 40680f2f
      Jan Kara 提交于
      Mostly trivial conversion. We fix a bug that IS_IMMUTABLE and IS_APPEND files
      could not be truncated during failed writes as we change the code.  In fact the
      test is not needed at all because both IS_IMMUTABLE and IS_APPEND is tested in
      upper layers in do_sys_[f]truncate(), may_write(), etc.
      Signed-off-by: NJan Kara <jack@suse.cz>
      40680f2f
    • L
      ext3: Add fixed tracepoints · 785c4bcc
      Lukas Czerner 提交于
      This commit adds fixed tracepoints to the ext3 code. It is based on ext4
      tracepoints, however due to the differences of both file systems, there
      are some tracepoints missing (those for delaloc and for multi-block
      allocator) and there are some ext3 specific as well (for reservation
      windows).
      
      Here is a list:
      
      ext3_free_inode
      ext3_request_inode
      ext3_allocate_inode
      ext3_evict_inode
      ext3_drop_inode
      ext3_mark_inode_dirty
      ext3_write_begin
      ext3_ordered_write_end
      ext3_writeback_write_end
      ext3_journalled_write_end
      ext3_ordered_writepage
      ext3_writeback_writepage
      ext3_journalled_writepage
      ext3_readpage
      ext3_releasepage
      ext3_invalidatepage
      ext3_discard_blocks
      ext3_request_blocks
      ext3_allocate_blocks
      ext3_free_blocks
      ext3_sync_file_enter
      ext3_sync_file_exit
      ext3_sync_fs
      ext3_rsv_window_add
      ext3_discard_reservation
      ext3_alloc_new_reservation
      ext3_reserved
      ext3_forget
      ext3_read_block_bitmap
      ext3_direct_IO_enter
      ext3_direct_IO_exit
      ext3_unlink_enter
      ext3_unlink_exit
      ext3_truncate_enter
      ext3_truncate_exit
      ext3_get_blocks_enter
      ext3_get_blocks_exit
      ext3_load_inode
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NJan Kara <jack@suse.cz>
      785c4bcc
  5. 27 5月, 2011 1 次提交
    • C
      fs: pass exact type of data dirties to ->dirty_inode · aa385729
      Christoph Hellwig 提交于
      Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
      anything else, so that the filesystem can track internally if it
      needs to push out a transaction for fdatasync or not.
      
      This is just the prototype change with no user for it yet.  I plan
      to push large XFS changes for the next merge window, and getting
      this trivial infrastructure in this window would help a lot to avoid
      tree interdependencies.
      
      Also remove incorrect comments that ->dirty_inode can't block.  That
      has been changed a long time ago, and many implementations rely on it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aa385729
  6. 31 3月, 2011 1 次提交
  7. 24 3月, 2011 1 次提交
  8. 10 3月, 2011 1 次提交
  9. 11 1月, 2011 1 次提交
  10. 28 10月, 2010 2 次提交
  11. 26 10月, 2010 1 次提交
  12. 10 8月, 2010 4 次提交
    • A
      convert ext3 to ->evict_inode() · ac14a95b
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ac14a95b
    • C
      remove inode_setattr · 1025774c
      Christoph Hellwig 提交于
      Replace inode_setattr with opencoded variants of it in all callers.  This
      moves the remaining call to vmtruncate into the filesystem methods where it
      can be replaced with the proper truncate sequence.
      
      In a few cases it was obvious that we would never end up calling vmtruncate
      so it was left out in the opencoded variant:
      
       spufs: explicitly checks for ATTR_SIZE earlier
       btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
       ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
      
      In addition to that ncpfs called inode_setattr with handcrafted iattrs,
      which allowed to trim down the opencoded variant.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1025774c
    • C
      introduce __block_write_begin · 6e1db88d
      Christoph Hellwig 提交于
      Split up the block_write_begin implementation - __block_write_begin is a new
      trivial wrapper for block_prepare_write that always takes an already
      allocated page and can be either called from block_write_begin or filesystem
      code that already has a page allocated.  Remove the handling of already
      allocated pages from block_write_begin after switching all callers that
      do it to __block_write_begin.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6e1db88d
    • C
      sort out blockdev_direct_IO variants · eafdc7d1
      Christoph Hellwig 提交于
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in prepearation of the new truncate calling sequence.  This was only done
      for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
      was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
      its _newtrunc variant while at it as just opencoding the two additional
      paramters is shorted than the name suffix.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eafdc7d1
  13. 06 8月, 2010 1 次提交
    • J
      ext3: Fix dirtying of journalled buffers in data=journal mode · 5f11e6a4
      Jan Kara 提交于
      In data=journal mode, we still use block_write_begin() to prepare page for
      writing. This function can occasionally mark buffer dirty which violates
      journalling assumptions - when a buffer is part of a transaction, it should be
      dirty and a buffer can be already part of a forget list of some transaction
      when block_write_begin() gets called. This violation of journalling assumptions
      then results in "JBD: Spotted dirty metadata buffer..." warnings.
      
      In fact, temporary dirtying the buffer while the page is still locked does not
      really cause problems to the journalling because we won't write the buffer
      until the page gets unlocked. So we just have to make sure to clear dirty bits
      before unlocking the page.
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5f11e6a4
  14. 21 7月, 2010 2 次提交
    • J
      ext3: Avoid filesystem corruption after a crash under heavy delete load · f25f6242
      Jan Kara 提交于
      It can happen that ext3_free_branches calls ext3_forget() for an indirect block
      in an earlier transaction than a transaction in which we clear pointer to this
      indirect block. Thus if we crash before a transaction clearing the block
      pointer is committed, we will see indirect block pointing to already freed
      blocks and complain during orphan list cleanup.
      
      The fix is simple: Make sure ext3_forget() is called in the transaction
      doing block pointer clearing.
      
      This is a backport of an ext4 fix by Amir G. <amir73il@users.sourceforge.net>
      Signed-off-by: NJan Kara <jack@suse.cz>
      f25f6242
    • C
      ext3: remove vestiges of nobh support · 4c4d3901
      Christoph Hellwig 提交于
      The nobh option was only supported for writeback mode, but given that all
      write paths (except mmapped writed) actually create buffer heads, it
      effectively was a no-op already.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      4c4d3901
  15. 22 5月, 2010 1 次提交
  16. 30 3月, 2010 1 次提交
    • L
      ext3: fix broken handling of EXT3_STATE_NEW · de329820
      Linus Torvalds 提交于
      In commit 9df93939 ("ext3: Use bitops to read/modify
      EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to
      use bitops for its state handling.  However, unline the same ext4
      change, it didn't actually change the name of the field when it changed
      the semantics of it.
      
      As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that
      initialized the field to EXT3_STATE_NEW.  And that does not work
      _at_all_ when we're now working with individually named bits rather than
      values that get masked.  So the code tried to mark the state to be new,
      but in actual fact set the field to EXT3_STATE_JDATA.  Which makes no
      sense at all, and screws up all the code that checks whether the inode
      was newly allocated.
      
      In particular, it made the xattr code unhappy, and caused various random
      behavior, like apparently
      
      	https://bugzilla.redhat.com/show_bug.cgi?id=577911
      
      So fix the initialization, and rename the field to match ext4 so that we
      don't have this happen again.
      
      Cc: James Morris <jmorris@namei.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Daniel J Walsh <dwalsh@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de329820
  17. 06 3月, 2010 1 次提交
  18. 05 3月, 2010 7 次提交
    • C
      dquot: cleanup dquot initialize routine · 871a2931
      Christoph Hellwig 提交于
      Get rid of the initialize dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_initialize helper to __dquot_initialize
      and vfs_dq_init to dquot_initialize to have a consistent namespace.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      871a2931
    • C
      dquot: move dquot initialization responsibility into the filesystem · 907f4554
      Christoph Hellwig 提交于
      Currently various places in the VFS call vfs_dq_init directly.  This means
      we tie the quota code into the VFS.  Get rid of that and make the
      filesystem responsible for the initialization.   For most metadata operations
      this is a straight forward move into the methods, but for truncate and
      open it's a bit more complicated.
      
      For truncate we currently only call vfs_dq_init for the sys_truncate case
      because open already takes care of it for ftruncate and open(O_TRUNC) - the
      new code causes an additional vfs_dq_init for those which is harmless.
      
      For open the initialization is moved from do_filp_open into the open method,
      which means it happens slightly earlier now, and only for regular files.
      The latter is fine because we don't need to initialize it for operations
      on special files, and we already do it as part of the namespace operations
      for directories.
      
      Add a dquot_file_open helper that filesystems that support generic quotas
      can use to fill in ->open.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      907f4554
    • C
      dquot: cleanup dquot transfer routine · b43fa828
      Christoph Hellwig 提交于
      Get rid of the transfer dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_transfer helper to __dquot_transfer
      and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
      and make the new dquot_transfer return a normal negative errno value
      which all callers expect.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b43fa828
    • C
      dquot: cleanup space allocation / freeing routines · 5dd4056d
      Christoph Hellwig 提交于
      Get rid of the alloc_space, free_space, reserve_space, claim_space and
      release_rsv dquot operations - they are always called from the filesystem
      and if a filesystem really needs their own (which none currently does)
      it can just call into it's own routine directly.
      
      Move shared logic into the common __dquot_alloc_space,
      dquot_claim_space_nodirty and __dquot_free_space low-level methods,
      and rationalize the wrappers around it to move as much as possible
      code into the common block for CONFIG_QUOTA vs not.  Also rename
      all these helpers to be named dquot_* instead of vfs_dq_*.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5dd4056d
    • D
      ext3: add writepage sanity checks · 49792c80
      Dmitry Monakhov 提交于
      - There is theoretical possibility to perform writepage on
         RO superblock. Add explicit check for what case.
      - Page must being locked before writepage.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      49792c80
    • J
      ext3: Truncate allocated blocks if direct IO write fails to update i_size · 7eb4969e
      Jan Kara 提交于
      We have to truncate blocks allocated to file during direct IO when we
      fail to update i_size properly.
      Signed-off-by: NJan Kara <jack@suse.cz>
      7eb4969e
    • J
      ext3: Use bitops to read/modify EXT3_I(inode)->i_state · 9df93939
      Jan Kara 提交于
      At several places we modify EXT3_I(inode)->i_state without holding i_mutex
      (ext3_release_file, ext3_bmap, ext3_journalled_writepage, ext3_do_update_inode,
      ...). These modifications are racy and we can lose updates to i_state. So
      convert handling of i_state to use bitops which are atomic.
      Signed-off-by: NJan Kara <jack@suse.cz>
      9df93939
  19. 23 12月, 2009 1 次提交
  20. 10 12月, 2009 1 次提交
    • J
      ext3: Fix data / filesystem corruption when write fails to copy data · 68eb3db0
      Jan Kara 提交于
      When ext3_write_begin fails after allocating some blocks or
      generic_perform_write fails to copy data to write, we truncate blocks already
      instantiated beyond i_size. Although these blocks were never inside i_size, we
      have to truncate pagecache of these blocks so that corresponding buffers get
      unmapped. Otherwise subsequent __block_prepare_write (called because we are
      retrying the write) will find the buffers mapped, not call ->get_block, and
      thus the page will be backed by already freed blocks leading to filesystem and
      data corruption.
      Reported-by: NJames Y Knight <foom@fuhm.net>
      Signed-off-by: NJan Kara <jack@suse.cz>
      68eb3db0
  21. 04 12月, 2009 1 次提交
  22. 11 11月, 2009 2 次提交
  23. 16 9月, 2009 2 次提交
    • C
      ext3: Add locking to ext3_do_update_inode · 4f003fd3
      Chris Mason 提交于
      I've been struggling with this off and on while I've been testing the
      data=guarded work.  The symptom is corrupted orphan lists and inodes
      with the wrong i_size stored on disk.  I was convinced the
      data=guarded code was just missing a call to ext3_mark_inode_dirty, but
      tracing showed the i_disksize I was sending to ext3_mark_inode_dirty
      wasn't actually making it to the drive.
      
      ext3_mark_inode_dirty can be called without locks held (atime updates
      and a few others), so the data=guarded code uses locks while updating
      the in-memory inode, and then calls ext3_mark_inode_dirty
      without any locks held.
      
      But, ext3_mark_inode_dirty has no internal locking to make sure that
      only one CPU is updating the buffer head at a time.  Generally this
      works out ok because everyone that changes the inode then calls
      ext3_mark_inode_dirty themselves.  Even though it races, eventually
      someone updates the buffer heads and things move on.
      
      But there is still a risk of the wrong values getting in, and the
      data=guarded code seems to hit the race very often.
      
      Since everyone that changes the inode also logs it, it should be
      possible to fix this with some memory barriers.  I'll leave that as an
      exercise to the reader and lock the buffer head instead.
      
      It it probably a good idea to have a different patch series for lockless
      bit flipping on the ext3 i_state field.  ext3_do_update_inode &= clears
      EXT3_STATE_NEW without any locks held.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      4f003fd3
    • J
      ext3: Fix possible deadlock between ext3_truncate() and ext3_get_blocks() · 00171d3c
      Jan Kara 提交于
      During truncate we are sometimes forced to start a new transaction as the
      amount of blocks to be journaled is both quite large and hard to predict. So
      far we restarted a transaction while holding truncate_mutex and that violates
      lock ordering because truncate_mutex ranks below transaction start (and it
      can lead to a real deadlock with ext3_get_blocks() allocating new blocks
      from ext3_writepage()).
      
      Luckily, the problem is easy to fix: We just drop the truncate_mutex before
      restarting the transaction and acquire it afterwards. We are safe to do this as
      by the time ext3_truncate() is called, all the page cache for the truncated
      part of the file is dropped and so writepage() cannot come and allocate new
      blocks in the part of the file we are truncating. The rest of writers is
      stopped by us holding i_mutex.
      Signed-off-by: NJan Kara <jack@suse.cz>
      00171d3c