1. 11 1月, 2011 1 次提交
  2. 28 10月, 2010 2 次提交
  3. 26 10月, 2010 1 次提交
  4. 10 8月, 2010 4 次提交
    • A
      convert ext3 to ->evict_inode() · ac14a95b
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ac14a95b
    • C
      remove inode_setattr · 1025774c
      Christoph Hellwig 提交于
      Replace inode_setattr with opencoded variants of it in all callers.  This
      moves the remaining call to vmtruncate into the filesystem methods where it
      can be replaced with the proper truncate sequence.
      
      In a few cases it was obvious that we would never end up calling vmtruncate
      so it was left out in the opencoded variant:
      
       spufs: explicitly checks for ATTR_SIZE earlier
       btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
       ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
      
      In addition to that ncpfs called inode_setattr with handcrafted iattrs,
      which allowed to trim down the opencoded variant.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1025774c
    • C
      introduce __block_write_begin · 6e1db88d
      Christoph Hellwig 提交于
      Split up the block_write_begin implementation - __block_write_begin is a new
      trivial wrapper for block_prepare_write that always takes an already
      allocated page and can be either called from block_write_begin or filesystem
      code that already has a page allocated.  Remove the handling of already
      allocated pages from block_write_begin after switching all callers that
      do it to __block_write_begin.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6e1db88d
    • C
      sort out blockdev_direct_IO variants · eafdc7d1
      Christoph Hellwig 提交于
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in prepearation of the new truncate calling sequence.  This was only done
      for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
      was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
      its _newtrunc variant while at it as just opencoding the two additional
      paramters is shorted than the name suffix.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eafdc7d1
  5. 06 8月, 2010 1 次提交
    • J
      ext3: Fix dirtying of journalled buffers in data=journal mode · 5f11e6a4
      Jan Kara 提交于
      In data=journal mode, we still use block_write_begin() to prepare page for
      writing. This function can occasionally mark buffer dirty which violates
      journalling assumptions - when a buffer is part of a transaction, it should be
      dirty and a buffer can be already part of a forget list of some transaction
      when block_write_begin() gets called. This violation of journalling assumptions
      then results in "JBD: Spotted dirty metadata buffer..." warnings.
      
      In fact, temporary dirtying the buffer while the page is still locked does not
      really cause problems to the journalling because we won't write the buffer
      until the page gets unlocked. So we just have to make sure to clear dirty bits
      before unlocking the page.
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5f11e6a4
  6. 21 7月, 2010 2 次提交
    • J
      ext3: Avoid filesystem corruption after a crash under heavy delete load · f25f6242
      Jan Kara 提交于
      It can happen that ext3_free_branches calls ext3_forget() for an indirect block
      in an earlier transaction than a transaction in which we clear pointer to this
      indirect block. Thus if we crash before a transaction clearing the block
      pointer is committed, we will see indirect block pointing to already freed
      blocks and complain during orphan list cleanup.
      
      The fix is simple: Make sure ext3_forget() is called in the transaction
      doing block pointer clearing.
      
      This is a backport of an ext4 fix by Amir G. <amir73il@users.sourceforge.net>
      Signed-off-by: NJan Kara <jack@suse.cz>
      f25f6242
    • C
      ext3: remove vestiges of nobh support · 4c4d3901
      Christoph Hellwig 提交于
      The nobh option was only supported for writeback mode, but given that all
      write paths (except mmapped writed) actually create buffer heads, it
      effectively was a no-op already.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      4c4d3901
  7. 22 5月, 2010 1 次提交
  8. 30 3月, 2010 1 次提交
    • L
      ext3: fix broken handling of EXT3_STATE_NEW · de329820
      Linus Torvalds 提交于
      In commit 9df93939 ("ext3: Use bitops to read/modify
      EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to
      use bitops for its state handling.  However, unline the same ext4
      change, it didn't actually change the name of the field when it changed
      the semantics of it.
      
      As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that
      initialized the field to EXT3_STATE_NEW.  And that does not work
      _at_all_ when we're now working with individually named bits rather than
      values that get masked.  So the code tried to mark the state to be new,
      but in actual fact set the field to EXT3_STATE_JDATA.  Which makes no
      sense at all, and screws up all the code that checks whether the inode
      was newly allocated.
      
      In particular, it made the xattr code unhappy, and caused various random
      behavior, like apparently
      
      	https://bugzilla.redhat.com/show_bug.cgi?id=577911
      
      So fix the initialization, and rename the field to match ext4 so that we
      don't have this happen again.
      
      Cc: James Morris <jmorris@namei.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Daniel J Walsh <dwalsh@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de329820
  9. 06 3月, 2010 1 次提交
  10. 05 3月, 2010 7 次提交
    • C
      dquot: cleanup dquot initialize routine · 871a2931
      Christoph Hellwig 提交于
      Get rid of the initialize dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_initialize helper to __dquot_initialize
      and vfs_dq_init to dquot_initialize to have a consistent namespace.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      871a2931
    • C
      dquot: move dquot initialization responsibility into the filesystem · 907f4554
      Christoph Hellwig 提交于
      Currently various places in the VFS call vfs_dq_init directly.  This means
      we tie the quota code into the VFS.  Get rid of that and make the
      filesystem responsible for the initialization.   For most metadata operations
      this is a straight forward move into the methods, but for truncate and
      open it's a bit more complicated.
      
      For truncate we currently only call vfs_dq_init for the sys_truncate case
      because open already takes care of it for ftruncate and open(O_TRUNC) - the
      new code causes an additional vfs_dq_init for those which is harmless.
      
      For open the initialization is moved from do_filp_open into the open method,
      which means it happens slightly earlier now, and only for regular files.
      The latter is fine because we don't need to initialize it for operations
      on special files, and we already do it as part of the namespace operations
      for directories.
      
      Add a dquot_file_open helper that filesystems that support generic quotas
      can use to fill in ->open.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      907f4554
    • C
      dquot: cleanup dquot transfer routine · b43fa828
      Christoph Hellwig 提交于
      Get rid of the transfer dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_transfer helper to __dquot_transfer
      and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
      and make the new dquot_transfer return a normal negative errno value
      which all callers expect.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b43fa828
    • C
      dquot: cleanup space allocation / freeing routines · 5dd4056d
      Christoph Hellwig 提交于
      Get rid of the alloc_space, free_space, reserve_space, claim_space and
      release_rsv dquot operations - they are always called from the filesystem
      and if a filesystem really needs their own (which none currently does)
      it can just call into it's own routine directly.
      
      Move shared logic into the common __dquot_alloc_space,
      dquot_claim_space_nodirty and __dquot_free_space low-level methods,
      and rationalize the wrappers around it to move as much as possible
      code into the common block for CONFIG_QUOTA vs not.  Also rename
      all these helpers to be named dquot_* instead of vfs_dq_*.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5dd4056d
    • D
      ext3: add writepage sanity checks · 49792c80
      Dmitry Monakhov 提交于
      - There is theoretical possibility to perform writepage on
         RO superblock. Add explicit check for what case.
      - Page must being locked before writepage.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      49792c80
    • J
      ext3: Truncate allocated blocks if direct IO write fails to update i_size · 7eb4969e
      Jan Kara 提交于
      We have to truncate blocks allocated to file during direct IO when we
      fail to update i_size properly.
      Signed-off-by: NJan Kara <jack@suse.cz>
      7eb4969e
    • J
      ext3: Use bitops to read/modify EXT3_I(inode)->i_state · 9df93939
      Jan Kara 提交于
      At several places we modify EXT3_I(inode)->i_state without holding i_mutex
      (ext3_release_file, ext3_bmap, ext3_journalled_writepage, ext3_do_update_inode,
      ...). These modifications are racy and we can lose updates to i_state. So
      convert handling of i_state to use bitops which are atomic.
      Signed-off-by: NJan Kara <jack@suse.cz>
      9df93939
  11. 23 12月, 2009 1 次提交
  12. 10 12月, 2009 1 次提交
    • J
      ext3: Fix data / filesystem corruption when write fails to copy data · 68eb3db0
      Jan Kara 提交于
      When ext3_write_begin fails after allocating some blocks or
      generic_perform_write fails to copy data to write, we truncate blocks already
      instantiated beyond i_size. Although these blocks were never inside i_size, we
      have to truncate pagecache of these blocks so that corresponding buffers get
      unmapped. Otherwise subsequent __block_prepare_write (called because we are
      retrying the write) will find the buffers mapped, not call ->get_block, and
      thus the page will be backed by already freed blocks leading to filesystem and
      data corruption.
      Reported-by: NJames Y Knight <foom@fuhm.net>
      Signed-off-by: NJan Kara <jack@suse.cz>
      68eb3db0
  13. 04 12月, 2009 1 次提交
  14. 11 11月, 2009 2 次提交
  15. 16 9月, 2009 3 次提交
    • C
      ext3: Add locking to ext3_do_update_inode · 4f003fd3
      Chris Mason 提交于
      I've been struggling with this off and on while I've been testing the
      data=guarded work.  The symptom is corrupted orphan lists and inodes
      with the wrong i_size stored on disk.  I was convinced the
      data=guarded code was just missing a call to ext3_mark_inode_dirty, but
      tracing showed the i_disksize I was sending to ext3_mark_inode_dirty
      wasn't actually making it to the drive.
      
      ext3_mark_inode_dirty can be called without locks held (atime updates
      and a few others), so the data=guarded code uses locks while updating
      the in-memory inode, and then calls ext3_mark_inode_dirty
      without any locks held.
      
      But, ext3_mark_inode_dirty has no internal locking to make sure that
      only one CPU is updating the buffer head at a time.  Generally this
      works out ok because everyone that changes the inode then calls
      ext3_mark_inode_dirty themselves.  Even though it races, eventually
      someone updates the buffer heads and things move on.
      
      But there is still a risk of the wrong values getting in, and the
      data=guarded code seems to hit the race very often.
      
      Since everyone that changes the inode also logs it, it should be
      possible to fix this with some memory barriers.  I'll leave that as an
      exercise to the reader and lock the buffer head instead.
      
      It it probably a good idea to have a different patch series for lockless
      bit flipping on the ext3 i_state field.  ext3_do_update_inode &= clears
      EXT3_STATE_NEW without any locks held.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      4f003fd3
    • J
      ext3: Fix possible deadlock between ext3_truncate() and ext3_get_blocks() · 00171d3c
      Jan Kara 提交于
      During truncate we are sometimes forced to start a new transaction as the
      amount of blocks to be journaled is both quite large and hard to predict. So
      far we restarted a transaction while holding truncate_mutex and that violates
      lock ordering because truncate_mutex ranks below transaction start (and it
      can lead to a real deadlock with ext3_get_blocks() allocating new blocks
      from ext3_writepage()).
      
      Luckily, the problem is easy to fix: We just drop the truncate_mutex before
      restarting the transaction and acquire it afterwards. We are safe to do this as
      by the time ext3_truncate() is called, all the page cache for the truncated
      part of the file is dropped and so writepage() cannot come and allocate new
      blocks in the part of the file we are truncating. The rest of writers is
      stopped by us holding i_mutex.
      Signed-off-by: NJan Kara <jack@suse.cz>
      00171d3c
    • A
      HWPOISON: Enable .remove_error_page for migration aware file systems · aa261f54
      Andi Kleen 提交于
      Enable removing of corrupted pages through truncation
      for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
      These should cover most server needs.
      
      I chose the set of migration aware file systems for this
      for now, assuming they have been especially audited.
      But in general it should be safe for all file systems
      on the data area that support read/write and truncate.
      
      Caveat: the hardware error handler does not take i_mutex
      for now before calling the truncate function. Is that ok?
      
      Cc: tytso@mit.edu
      Cc: hch@infradead.org
      Cc: mfasheh@suse.com
      Cc: aia21@cantab.net
      Cc: hugh.dickins@tiscali.co.uk
      Cc: swhiteho@redhat.com
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      aa261f54
  16. 16 7月, 2009 2 次提交
    • J
      ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() · 43237b54
      Jan Kara 提交于
      Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to
      be a relict from some old days and setting disksize in this function does not
      make much sence. Currently it was set only by ext3_getblk().  Since the
      parameter has some effect only if create == 1, it is easy to check that the
      three callers which end up calling ext3_getblk() with create == 1 (ext3_append,
      ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves.
      Signed-off-by: NJan Kara <jack@suse.cz>
      43237b54
    • J
      ext3: Fix truncation of symlinks after failed write · 9eaaa2d5
      Jan Kara 提交于
      Contents of long symlinks is written via standard write methods. So when the
      write fails, we add inode to orphan list. But symlinks don't have .truncate
      method defined so nobody properly removes them from the orphan list (both on
      disk and in memory).
      
      Fix this by calling ext3_truncate() directly instead of calling vmtruncate()
      (which is saner anyway since we don't need anything vmtruncate() does except
      from calling .truncate in these paths).  We also add inode to orphan list only
      if ext3_can_truncate() is true (currently, it can be false for symlinks when
      there are no blocks allocated) - otherwise orphan list processing will complain
      and ext3_truncate() will not remove inode from on-disk orphan list.
      Signed-off-by: NJan Kara <jack@suse.cz>
      9eaaa2d5
  17. 24 6月, 2009 1 次提交
  18. 19 6月, 2009 2 次提交
  19. 12 6月, 2009 1 次提交
  20. 09 4月, 2009 1 次提交
  21. 03 4月, 2009 2 次提交
    • T
      ext3: Add replace-on-truncate hueristics for data=writeback mode · f7ab34ea
      Theodore Ts'o 提交于
      In data=writeback mode, start an asynchronous flush when closing a
      file which had been previously truncated down to zero.  This lowers
      the probability of data loss in the case of applications that attempt
      to replace a file using truncate.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f7ab34ea
    • J
      ext3: avoid false EIO errors · 695f6ae0
      Jan Kara 提交于
      Sometimes block_write_begin() can map buffers in a page but later we
      fail to copy data into those buffers (because the source page has been
      paged out in the mean time).  We then end up with !uptodate mapped
      buffers.  To add a bit more to the confusion, block_write_end() does
      not commit any data (and thus does not any mark buffers as uptodate) if
      we didn't succeed with copying all the data.
      
      Commit f4fc66a8 (ext3: convert to new
      aops) missed these cases and thus we were inserting non-uptodate
      buffers to transaction's list which confuses JBD code and it reports IO
      errors, aborts a transaction and generally makes users afraid about
      their data ;-P.
      
      This patch fixes the problem by reorganizing ext3_..._write_end() code
      to first call block_write_end() to mark buffers with valid data
      uptodate and after that we file only uptodate buffers to transaction's
      lists.
      
      We also fix a problem where we could leave blocks allocated beyond i_size
      (i_disksize in fact) because of failed write. We now add inode to orphan
      list when write fails (to be safe in case we crash) and then truncate blocks
      beyond i_size in a separate transaction.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      695f6ae0
  22. 27 3月, 2009 1 次提交
  23. 26 3月, 2009 1 次提交