1. 04 3月, 2010 1 次提交
  2. 02 3月, 2010 1 次提交
  3. 03 3月, 2010 2 次提交
  4. 05 3月, 2010 1 次提交
    • J
      ext4: use ext4_get_block_write in buffer write · 744692dc
      Jiaying Zhang 提交于
      Allocate uninitialized extent before ext4 buffer write and
      convert the extent to initialized after io completes.
      The purpose is to make sure an extent can only be marked
      initialized after it has been written with new data so
      we can safely drop the i_mutex lock in ext4 DIO read without
      exposing stale data. This helps to improve multi-thread DIO
      read performance on high-speed disks.
      
      Skip the nobh and data=journal mount cases to make things simple for now.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      744692dc
  5. 03 3月, 2010 1 次提交
  6. 02 3月, 2010 7 次提交
  7. 25 2月, 2010 1 次提交
  8. 24 2月, 2010 1 次提交
  9. 17 2月, 2010 1 次提交
    • C
      ext4: Fix BUG_ON at fs/buffer.c:652 in no journal mode · 73b50c1c
      Curt Wohlgemuth 提交于
      Calls to ext4_handle_dirty_metadata should only pass in an inode
      pointer for inode-specific metadata, and not for shared metadata
      blocks such as inode table blocks, block group descriptors, the
      superblock, etc.
      
      The BUG_ON can get tripped when updating a special device (such as a
      block device) that is opened (so that i_mapping is set in
      fs/block_dev.c) and the file system is mounted in no journal mode.
      
      Addresses-Google-Bug: #2404870
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      73b50c1c
  10. 05 3月, 2010 1 次提交
  11. 16 2月, 2010 2 次提交
  12. 25 1月, 2010 2 次提交
  13. 01 1月, 2010 1 次提交
  14. 23 1月, 2010 1 次提交
    • T
      ext4: Add block validity check when truncating indirect block mapped inodes · 1f2acb60
      Theodore Ts'o 提交于
      Add checks to ext4_free_branches() to make sure a block number found
      in an indirect block are valid before trying to free it.  If a bad
      block number is found, stop freeing the indirect block immediately,
      since the file system is corrupt and we will need to run fsck anyway.
      This also avoids spamming the logs, and specifically avoids
      driver-level "attempt to access beyond end of device" errors obscure
      what is really going on.
      
      If you get *really*, *really*, *really* unlucky, without this patch, a
      supposed indirect block containing garbage might contain a reference
      to a primary block group descriptor, in which case
      ext4_free_branches() could end up zero'ing out a block group
      descriptor block, and if then one of the block bitmaps for a block
      group described by that bg descriptor block is not in memory, and is
      read in by ext4_read_block_bitmap().  This function calls
      ext4_valid_block_bitmap(), which assumes that bg_inode_table() was
      validated at mount time and hasn't been modified since.  Since this
      assumption is no longer valid, it's possible for the value
      (ext4_inode_table(sb, desc) - group_first_block) to go negative, which
      will cause ext4_find_next_zero_bit() to trigger a kernel GPF.
      
      Addresses-Google-Bug: #2220436
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1f2acb60
  15. 16 2月, 2010 1 次提交
    • E
      ext4: Fix optional-arg mount options · 15121c18
      Eric Sandeen 提交于
      We have 2 mount options, "barrier" and "auto_da_alloc" which may or
      may not take a 1/0 argument.  This causes the ext4 superblock mount
      code to subtract uninitialized pointers and pass the result to
      kmalloc, which results in very noisy failures.
      
      Per Ted's suggestion, initialize the args struct so that
      we know whether match_token() found an argument for the
      option, and skip match_int() if not.
      
      Also, return error (0) from parse_options if we thought
      we found an argument, but match_int() Fails.
      Reported-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      15121c18
  16. 05 2月, 2010 1 次提交
  17. 15 1月, 2010 1 次提交
  18. 25 1月, 2010 1 次提交
  19. 23 1月, 2010 1 次提交
  20. 01 1月, 2010 2 次提交
    • T
      ext4: Calculate metadata requirements more accurately · 9d0be502
      Theodore Ts'o 提交于
      In the past, ext4_calc_metadata_amount(), and its sub-functions
      ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount()
      badly over-estimated the number of metadata blocks that might be
      required for delayed allocation blocks.  This didn't matter as much
      when functions which managed the reserved metadata blocks were more
      aggressive about dropping reserved metadata blocks as delayed
      allocation blocks were written, but unfortunately they were too
      aggressive.  This was fixed in commit 0637c6f4, but as a result the
      over-estimation by ext4_calc_metadata_amount() would lead to reserving
      2-3 times the number of pending delayed allocation blocks as
      potentially required metadata blocks.  So if there are 1 megabytes of
      blocks which have been not yet been allocation, up to 3 megabytes of
      space would get reserved out of the user's quota and from the file
      system free space pool until all of the inode's data blocks have been
      allocated.
      
      This commit addresses this problem by much more accurately estimating
      the number of metadata blocks that will be required.  It will still
      somewhat over-estimate the number of blocks needed, since it must make
      a worst case estimate not knowing which physical blocks will be
      needed, but it is much more accurate than before.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9d0be502
    • T
      ext4: Fix accounting of reserved metadata blocks · ee5f4d9c
      Theodore Ts'o 提交于
      Commit 0637c6f4 had a typo which caused the reserved metadata blocks to
      not be released correctly.   Fix this.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ee5f4d9c
  21. 31 12月, 2009 1 次提交
    • T
      ext4: Patch up how we claim metadata blocks for quota purposes · 0637c6f4
      Theodore Ts'o 提交于
      As reported in Kernel Bugzilla #14936, commit d21cd8f1 triggered a BUG
      in the function ext4_da_update_reserve_space() found in
      fs/ext4/inode.c.  The root cause of this BUG() was caused by the fact
      that ext4_calc_metadata_amount() can severely over-estimate how many
      metadata blocks will be needed, especially when using direct
      block-mapped files.
      
      In addition, it can also badly *under* estimate how much space is
      needed, since ext4_calc_metadata_amount() assumes that the blocks are
      contiguous, and this is not always true.  If the application is
      writing blocks to a sparse file, the number of metadata blocks
      necessary can be severly underestimated by the functions
      ext4_da_reserve_space(), ext4_da_update_reserve_space() and
      ext4_da_release_space().  This was the cause of the dq_claim_space
      reports found on kerneloops.org.
      
      Unfortunately, doing this right means that we need to massively
      over-estimate the amount of free space needed.  So in some cases we
      may need to force the inode to be written to disk asynchronously in
      to avoid spurious quota failures.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=14936Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0637c6f4
  22. 30 12月, 2009 1 次提交
  23. 26 12月, 2009 1 次提交
  24. 23 12月, 2009 7 次提交
    • E
      ext4: flush delalloc blocks when space is low · c8afb446
      Eric Sandeen 提交于
      Creating many small files in rapid succession on a small
      filesystem can lead to spurious ENOSPC; on a 104MB filesystem:
      
      for i in `seq 1 22500`; do
          echo -n > $SCRATCH_MNT/$i
          echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i
      done
      
      leads to ENOSPC even though after a sync, 40% of the fs is free
      again.
      
      This is because we reserve worst-case metadata for delalloc writes,
      and when data is allocated that worst-case reservation is not
      usually needed.
      
      When freespace is low, kicking off an async writeback will start
      converting that worst-case space usage into something more realistic,
      almost always freeing up space to continue.
      
      This resolves the testcase for me, and survives all 4 generic
      ENOSPC tests in xfstests.
      
      We'll still need a hard synchronous sync to squeeze out the last bit,
      but this fixes things up to a large degree.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c8afb446
    • J
      ext4: Eliminate potential double free on error path · d3533d72
      Julia Lawall 提交于
      b_entry_name and buffer are initially NULL, are initialized within a loop
      to the result of calling kmalloc, and are freed at the bottom of this loop.
      The loop contains gotos to cleanup, which also frees b_entry_name and
      buffer.  Some of these gotos are before the reinitializations of
      b_entry_name and buffer.  To maintain the invariant that b_entry_name and
      buffer are NULL at the top of the loop, and thus acceptable arguments to
      kfree, these variables are now set to NULL after the kfrees.
      
      This seems to be the simplest solution.  A more complicated solution
      would be to introduce more labels in the error handling code at the end of
      the function.
      
      A simplified version of the semantic match that finds this problem is as
      follows: (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r@
      identifier E;
      expression E1;
      iterator I;
      statement S;
      @@
      
      *kfree(E);
      ... when != E = E1
          when != I(E,...) S
          when != &E
      *kfree(E);
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d3533d72
    • A
      ext4: fix unsigned long long printk warning in super.c · a6b43e38
      Andrew Morton 提交于
      sparc64 allmodconfig:
      
      fs/ext4/super.c: In function `lifetime_write_kbytes_show':
      fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4)
      fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4)
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a6b43e38
    • D
      ext4: fix sleep inside spinlock issue with quota and dealloc (#14739) · 39bc680a
      Dmitry Monakhov 提交于
      Unlock i_block_reservation_lock before vfs_dq_reserve_block().
      This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=14739
      
      CC: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      39bc680a
    • D
      ext4: Fix potential quota deadlock · d21cd8f1
      Dmitry Monakhov 提交于
      We have to delay vfs_dq_claim_space() until allocation context destruction.
      Currently we have following call-trace:
      ext4_mb_new_blocks()
        /* task is already holding ac->alloc_semp */
       ->ext4_mb_mark_diskspace_used
          ->vfs_dq_claim_space()  /*  acquire dqptr_sem here. Possible deadlock */
       ->ext4_mb_release_context() /* drop ac->alloc_semp here */
      
      Let's move quota claiming to ext4_da_update_reserve_space()
      
       =======================================================
       [ INFO: possible circular locking dependency detected ]
       2.6.32-rc7 #18
       -------------------------------------------------------
       write-truncate-/3465 is trying to acquire lock:
        (&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0
      
       but task is already holding lock:
        (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #3 (&meta_group_info[i]->alloc_sem){++++..}:
              [<c017d04b>] __lock_acquire+0xd7b/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0527191>] down_read+0x51/0x90
              [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
              [<c02d0c1c>] ext4_mb_free_blocks+0x46c/0x870
              [<c029c9d3>] ext4_free_blocks+0x73/0x130
              [<c02c8cfc>] ext4_ext_truncate+0x76c/0x8d0
              [<c02a8087>] ext4_truncate+0x187/0x5e0
              [<c01e0f7b>] vmtruncate+0x6b/0x70
              [<c022ec02>] inode_setattr+0x62/0x190
              [<c02a2d7a>] ext4_setattr+0x25a/0x370
              [<c022ee81>] notify_change+0x151/0x340
              [<c021349d>] do_truncate+0x6d/0xa0
              [<c0221034>] may_open+0x1d4/0x200
              [<c022412b>] do_filp_open+0x1eb/0x910
              [<c021244d>] do_sys_open+0x6d/0x140
              [<c021258e>] sys_open+0x2e/0x40
              [<c0103100>] sysenter_do_call+0x12/0x32
      
       -> #2 (&ei->i_data_sem){++++..}:
              [<c017d04b>] __lock_acquire+0xd7b/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0527191>] down_read+0x51/0x90
              [<c02a5787>] ext4_get_blocks+0x47/0x450
              [<c02a74c1>] ext4_getblk+0x61/0x1d0
              [<c02a7a7f>] ext4_bread+0x1f/0xa0
              [<c02bcddc>] ext4_quota_write+0x12c/0x310
              [<c0262d23>] qtree_write_dquot+0x93/0x120
              [<c0261708>] v2_write_dquot+0x28/0x30
              [<c025d3fb>] dquot_commit+0xab/0xf0
              [<c02be977>] ext4_write_dquot+0x77/0x90
              [<c02be9bf>] ext4_mark_dquot_dirty+0x2f/0x50
              [<c025e321>] dquot_alloc_inode+0x101/0x180
              [<c029fec2>] ext4_new_inode+0x602/0xf00
              [<c02ad789>] ext4_create+0x89/0x150
              [<c0221ff2>] vfs_create+0xa2/0xc0
              [<c02246e7>] do_filp_open+0x7a7/0x910
              [<c021244d>] do_sys_open+0x6d/0x140
              [<c021258e>] sys_open+0x2e/0x40
              [<c0103100>] sysenter_do_call+0x12/0x32
      
       -> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}:
              [<c017d04b>] __lock_acquire+0xd7b/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0526505>] mutex_lock_nested+0x65/0x2d0
              [<c0260c9d>] vfs_load_quota_inode+0x4bd/0x5a0
              [<c02610af>] vfs_quota_on_path+0x5f/0x70
              [<c02bc812>] ext4_quota_on+0x112/0x190
              [<c026345a>] sys_quotactl+0x44a/0x8a0
              [<c0103100>] sysenter_do_call+0x12/0x32
      
       -> #0 (&s->s_dquot.dqptr_sem){++++..}:
              [<c017d361>] __lock_acquire+0x1091/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0527191>] down_read+0x51/0x90
              [<c025e73b>] dquot_claim_space+0x3b/0x1b0
              [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
              [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
              [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
              [<c02a5966>] ext4_get_blocks+0x226/0x450
              [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
              [<c02a6ed6>] ext4_da_writepages+0x506/0x790
              [<c01de272>] do_writepages+0x22/0x50
              [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
              [<c01d7b9b>] filemap_flush+0x2b/0x30
              [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
              [<c029e595>] ext4_release_file+0x75/0xb0
              [<c0216b59>] __fput+0xf9/0x210
              [<c0216c97>] fput+0x27/0x30
              [<c02122dc>] filp_close+0x4c/0x80
              [<c014510e>] put_files_struct+0x6e/0xd0
              [<c01451b7>] exit_files+0x47/0x60
              [<c0146a24>] do_exit+0x144/0x710
              [<c0147028>] do_group_exit+0x38/0xa0
              [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
              [<c0102849>] do_notify_resume+0xb9/0x890
              [<c01032d2>] work_notifysig+0x13/0x21
      
       other info that might help us debug this:
      
       3 locks held by write-truncate-/3465:
        #0:  (jbd2_handle){+.+...}, at: [<c02e1f8f>] start_this_handle+0x38f/0x5c0
        #1:  (&ei->i_data_sem){++++..}, at: [<c02a57f6>] ext4_get_blocks+0xb6/0x450
        #2:  (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
      
       stack backtrace:
       Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18
       Call Trace:
        [<c0524cb3>] ? printk+0x1d/0x22
        [<c017ac9a>] print_circular_bug+0xca/0xd0
        [<c017d361>] __lock_acquire+0x1091/0x1260
        [<c016bca2>] ? sched_clock_local+0xd2/0x170
        [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
        [<c017d5ea>] lock_acquire+0xba/0xd0
        [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
        [<c0527191>] down_read+0x51/0x90
        [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
        [<c025e73b>] dquot_claim_space+0x3b/0x1b0
        [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
        [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
        [<c02c601d>] ? ext4_ext_find_extent+0x25d/0x280
        [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
        [<c016bca2>] ? sched_clock_local+0xd2/0x170
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c016beef>] ? cpu_clock+0x4f/0x60
        [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
        [<c052712c>] ? down_write+0x8c/0xa0
        [<c02a5966>] ext4_get_blocks+0x226/0x450
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c016beef>] ? cpu_clock+0x4f/0x60
        [<c017908b>] ? trace_hardirqs_off+0xb/0x10
        [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
        [<c01d69cc>] ? find_get_pages_tag+0x16c/0x180
        [<c01d6860>] ? find_get_pages_tag+0x0/0x180
        [<c02a73bd>] ? __mpage_da_writepage+0x16d/0x1a0
        [<c01dfc4e>] ? pagevec_lookup_tag+0x2e/0x40
        [<c01ddf1b>] ? write_cache_pages+0xdb/0x3d0
        [<c02a7250>] ? __mpage_da_writepage+0x0/0x1a0
        [<c02a6ed6>] ext4_da_writepages+0x506/0x790
        [<c016beef>] ? cpu_clock+0x4f/0x60
        [<c016bca2>] ? sched_clock_local+0xd2/0x170
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c02a69d0>] ? ext4_da_writepages+0x0/0x790
        [<c01de272>] do_writepages+0x22/0x50
        [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
        [<c01d7b9b>] filemap_flush+0x2b/0x30
        [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
        [<c029e595>] ext4_release_file+0x75/0xb0
        [<c0216b59>] __fput+0xf9/0x210
        [<c0216c97>] fput+0x27/0x30
        [<c02122dc>] filp_close+0x4c/0x80
        [<c014510e>] put_files_struct+0x6e/0xd0
        [<c01451b7>] exit_files+0x47/0x60
        [<c0146a24>] do_exit+0x144/0x710
        [<c017b163>] ? lock_release_holdtime+0x33/0x210
        [<c0528137>] ? _spin_unlock_irq+0x27/0x30
        [<c0147028>] do_group_exit+0x38/0xa0
        [<c017babb>] ? trace_hardirqs_on+0xb/0x10
        [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
        [<c0102849>] do_notify_resume+0xb9/0x890
        [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
        [<c017b163>] ? lock_release_holdtime+0x33/0x210
        [<c0165b50>] ? autoremove_wake_function+0x0/0x50
        [<c017ba54>] ? trace_hardirqs_on_caller+0x134/0x190
        [<c017babb>] ? trace_hardirqs_on+0xb/0x10
        [<c0300ba4>] ? security_file_permission+0x14/0x20
        [<c0215761>] ? vfs_write+0x131/0x190
        [<c0214f50>] ? do_sync_write+0x0/0x120
        [<c0103115>] ? sysenter_do_call+0x27/0x32
        [<c01032d2>] work_notifysig+0x13/0x21
      
      CC: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      d21cd8f1
    • D
      ext4: Convert to generic reserved quota's space management. · a9e7f447
      Dmitry Monakhov 提交于
      This patch also fixes write vs chown race condition.
      Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      a9e7f447
    • T
      ext4, jbd2: Add barriers for file systems with exernal journals · cc3e1bea
      Theodore Ts'o 提交于
      This is a bit complicated because we are trying to optimize when we
      send barriers to the fs data disk.  We could just throw in an extra
      barrier to the data disk whenever we send a barrier to the journal
      disk, but that's not always strictly necessary.
      
      We only need to send a barrier during a commit when there are data
      blocks which are must be written out due to an inode written in
      ordered mode, or if fsync() depends on the commit to force data blocks
      to disk.  Finally, before we drop transactions from the beginning of
      the journal during a checkpoint operation, we need to guarantee that
      any blocks that were flushed out to the data disk are firmly on the
      rust platter before we drop the transaction from the journal.
      
      Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cc3e1bea