1. 27 4月, 2009 6 次提交
  2. 25 4月, 2009 6 次提交
  3. 22 4月, 2009 1 次提交
    • C
      Btrfs: fix btrfs fallocate oops and deadlock · 546888da
      Chris Mason 提交于
      Btrfs fallocate was incorrectly starting a transaction with a lock held
      on the extent_io tree for the file, which could deadlock.  Strictly
      speaking it was using join_transaction which would be safe, but it is better
      to move the transaction outside of the lock.
      
      When preallocated extents are overwritten, btrfs_mark_buffer_dirty was
      being called on an unlocked buffer.  This was triggering an assertion and
      oops because the lock is supposed to be held.
      
      The bug was calling btrfs_mark_buffer_dirty on a leaf after btrfs_del_item had
      been run.  btrfs_del_item takes care of dirtying things, so the solution is a
      to skip the btrfs_mark_buffer_dirty call in this case.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      546888da
  4. 21 4月, 2009 4 次提交
    • C
      Btrfs: use the right node in reada_for_balance · 8c594ea8
      Chris Mason 提交于
      reada_for_balance was using the wrong index into the path node array,
      so it wasn't reading the right blocks.  We never directly used the
      results of the read done by this function because the btree search is
      started over at the end.
      
      This fixes reada_for_balance to reada in the correct node and to
      avoid searching past the last slot in the node.  It also makes sure to
      hold the parent lock while we are finding the nodes to read.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8c594ea8
    • C
      Btrfs: fix oops on page->mapping->host during writepage · 11c8349b
      Chris Mason 提交于
      The extent_io writepage call updates the writepage index in the inode
      as it makes progress.  But, it was doing the update after unlocking the page,
      which isn't legal because page->mapping can't be trusted once the page
      is unlocked.
      
      This lead to an oops, especially common with compression turned on.  The
      fix here is to update the writeback index before unlocking the page.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      11c8349b
    • C
      Btrfs: add a priority queue to the async thread helpers · d313d7a3
      Chris Mason 提交于
      Btrfs is using WRITE_SYNC_PLUG to send down synchronous IOs with a
      higher priority.  But, the checksumming helper threads prevent it
      from being fully effective.
      
      There are two problems.  First, a big queue of pending checksumming
      will delay the synchronous IO behind other lower priority writes.  Second,
      the checksumming uses an ordered async work queue.  The ordering makes sure
      that IOs are sent to the block layer in the same order they are sent
      to the checksumming threads.  Usually this gives us less seeky IO.
      
      But, when we start mixing IO priorities, the lower priority IO can delay
      the higher priority IO.
      
      This patch solves both problems by adding a high priority list to the async
      helper threads, and a new btrfs_set_work_high_prio(), which is used
      to make put a new async work item onto the higher priority list.
      
      The ordering is still done on high priority IO, but all of the high
      priority bios are ordered separately from the low priority bios.  This
      ordering is purely an IO optimization, it is not involved in data
      or metadata integrity.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d313d7a3
    • C
      Btrfs: use WRITE_SYNC for synchronous writes · ffbd517d
      Chris Mason 提交于
      Part of reducing fsync/O_SYNC/O_DIRECT latencies is using WRITE_SYNC for
      writes we plan on waiting on in the near future.  This patch
      mirrors recent changes in other filesystems and the generic code to
      use WRITE_SYNC when WB_SYNC_ALL is passed and to use WRITE_SYNC for
      other latency critical writes.
      
      Btrfs uses async worker threads for checksumming before the write is done,
      and then again to actually submit the bios.  The bio submission code just
      runs a per-device list of bios that need to be sent down the pipe.
      
      This list is split into low priority and high priority lists so the
      WRITE_SYNC IO happens first.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ffbd517d
  5. 14 4月, 2009 3 次提交
    • J
      ext2: fix data corruption for racing writes · 316cb4ef
      Jan Kara 提交于
      If two writers allocating blocks to file race with each other (e.g.
      because writepages races with ordinary write or two writepages race with
      each other), ext2_getblock() can be called on the same inode in parallel.
      Before we are going to allocate new blocks, we have to recheck the block
      chain we have obtained so far without holding truncate_mutex.  Otherwise
      we could overwrite the indirect block pointer set by the other writer
      leading to data loss.
      
      The below test program by Ying is able to reproduce the data loss with ext2
      on in BRD in a few minutes if the machine is under memory pressure:
      
      long kMemSize  = 50 << 20;
      int kPageSize = 4096;
      
      int main(int argc, char **argv) {
      	int status;
      	int count = 0;
      	int i;
      	char *fname = "/mnt/test.mmap";
      	char *mem;
      	unlink(fname);
      	int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600);
      	status = ftruncate(fd, kMemSize);
      	mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
      	// Fill the memory with 1s.
      	memset(mem, 1, kMemSize);
      	sleep(2);
      	for (i = 0; i < kMemSize; i++) {
      		int byte_good = mem[i] != 0;
      		if (!byte_good && ((i % kPageSize) == 0)) {
      			//printf("%d ", i / kPageSize);
      			count++;
      		}
      	}
      	munmap(mem, kMemSize);
      	close(fd);
      	unlink(fname);
      
      	if (count > 0) {
      		printf("Running %d bad page\n", count);
      		return 1;
      	}
      	return 0;
      }
      
      Cc: Ying Han <yinghan@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      316cb4ef
    • J
      jbd: update locking coments · 32433879
      Jan Kara 提交于
      Update information about locking in JBD revoke code.
      
      Reported-by: Lin Tan <tammy000@gmail.com>.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32433879
    • D
      hfs: fix memory leak when unmounting · eb2e5f45
      Dave Anderson 提交于
      When an HFS filesystem is unmounted, it leaks a 2-page bitmap.  Also,
      under extreme memory pressure, it's possible that hfs_releasepage() may
      use a tree pointer that has not been initialized, and if so, the release
      request should just be rejected.
      
      [akpm@linux-foundation.org: free_pages(0) is legal, remove obvious comment]
      Signed-off-by: NDave Anderson <anderson@redhat.com>
      Tested-by: NEugene Teo <eugeneteo@kernel.sg>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb2e5f45
  6. 13 4月, 2009 8 次提交
  7. 10 4月, 2009 1 次提交
  8. 09 4月, 2009 6 次提交
  9. 08 4月, 2009 2 次提交
  10. 07 4月, 2009 3 次提交
    • M
      splice: fix deadlock in splicing to file · 7bfac9ec
      Miklos Szeredi 提交于
      There's a possible deadlock in generic_file_splice_write(),
      splice_from_pipe() and ocfs2_file_splice_write():
      
       - task A calls generic_file_splice_write()
       - this calls inode_double_lock(), which locks i_mutex on both
         pipe->inode and target inode
       - ordering depends on inode pointers, can happen that pipe->inode is
         locked first
       - __splice_from_pipe() needs more data, calls pipe_wait()
       - this releases lock on pipe->inode, goes to interruptible sleep
       - task B calls generic_file_splice_write(), similarly to the first
       - this locks pipe->inode, then tries to lock inode, but that is
         already held by task A
       - task A is interrupted, it tries to lock pipe->inode, but fails, as
         it is already held by task B
       - ABBA deadlock
      
      Fix this by explicitly ordering locks: the outer lock must be on
      target inode and the inner lock (which is later unlocked and relocked)
      must be on pipe->inode.  This is OK, pipe inodes and target inodes
      form two nonoverlapping sets, generic_file_splice_write() and friends
      are not called with a target which is a pipe.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bfac9ec
    • R
      nilfs2: support nanosecond timestamp · 61239230
      Ryusuke Konishi 提交于
      After a review of user's feedback for finding out other compatibility
      issues, I found nilfs improperly initializes timestamps in inode;
      CURRENT_TIME was used there instead of CURRENT_TIME_SEC even though nilfs
      didn't have nanosecond timestamps on disk.  A few users gave us the report
      that the tar program sometimes failed to expand symbolic links on nilfs,
      and it turned out to be the cause.
      
      Instead of applying the above displacement, I've decided to support
      nanosecond timestamps on this occation.  Fortunetaly, a needless 64-bit
      field was in the nilfs_inode struct, and I found it's available for this
      purpose without impact for the users.
      
      So, this will do the enhancement and resolve the tar problem.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61239230
    • R
      nilfs2: introduce secondary super block · e339ad31
      Ryusuke Konishi 提交于
      The former versions didn't have extra super blocks.  This improves the
      weak point by introducing another super block at unused region in tail of
      the partition.
      
      This doesn't break disk format compatibility; older versions just ingore
      the secondary super block, and new versions just recover it if it doesn't
      exist.  The partition created by an old mkfs may not have unused region,
      but in that case, the secondary super block will not be added.
      
      This doesn't make more redundant copies of the super block; it is a future
      work.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e339ad31