1. 06 1月, 2009 2 次提交
    • A
      ext4: Fix race between read_block_bitmap() and mark_diskspace_used() · e8134b27
      Aneesh Kumar K.V 提交于
      We need to make sure we update the block bitmap and clear
      EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since
      ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide
      whether to initialize the block bitmap each time it is called
      (introduced by commit c806e68f), and this can race with block
      allocations in ext4_mb_mark_diskspace_used().
      
      ext4_read_block_bitmap does:
      
      spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
      if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	ext4_init_block_bitmap(sb, bh, block_group, desc);
      
      Now on the block allocation side we do
      
      mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
      			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
      ....
      spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
      if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
      
      ie on allocation we update the bitmap then we take the sb_bgl_lock
      and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
      parallel ext4_read_block_bitmap can zero out the bitmap in between
      the above mb_set_bits and spin_lock(sb_bg_lock..)
      
      The race results in below user visible errors
      EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
      EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block ..
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      e8134b27
    • A
      ext4: fix BUG when calling ext4_error with locked block group · 5d1b1b3f
      Aneesh Kumar K.V 提交于
      The mballoc code likes to call ext4_error while it is holding locked
      block groups.  This can causes a scheduling in atomic context BUG.  We
      can't just unlock the block group and relock it after/if ext4_error
      returns since that might result in race conditions in the case where
      the filesystem is set to continue after finding errors.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5d1b1b3f
  2. 24 11月, 2008 1 次提交
    • A
      ext4: Fix lockdep recursive locking warning · b7be019e
      Aneesh Kumar K.V 提交于
      In ext4_mb_init_group(), if the filesystem block size is less than
      PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block
      groups in a loop.  We need to allow for this by using
      down_write_nested() and passing in the loop index as a lock subclass
      number.  This works because no other code path needs to take multiple
      alloc_sem's.  Note that lockdep will fail for filesystem blocksize
      smaller than to PAGE_SIZE/16k.  (e.g., a 1k filesystem blocksize with
      a 32k page size, or a 2k filesystem blocksize with a 64k blocksize,
      etc.)
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b7be019e
  3. 06 1月, 2009 1 次提交
  4. 07 11月, 2008 1 次提交
  5. 26 11月, 2008 1 次提交
  6. 06 1月, 2009 2 次提交
  7. 23 11月, 2008 1 次提交
  8. 05 11月, 2008 2 次提交
  9. 06 1月, 2009 1 次提交
  10. 05 11月, 2008 1 次提交
  11. 04 1月, 2009 1 次提交
  12. 17 12月, 2008 1 次提交
  13. 26 11月, 2008 1 次提交
    • J
      jbd2: improve jbd2 fsync batching · e07f7183
      Josef Bacik 提交于
      This patch removes the static sleep time in favor of a more self
      optimizing approach where we measure the average amount of time it
      takes to commit a transaction to disk and the ammount of time a
      transaction has been running.  If somebody does a sync write or an
      fsync() traditionally we would sleep for 1 jiffies, which depending on
      the value of HZ could be a significant amount of time compared to how
      long it takes to commit a transaction to the underlying storage.  With
      this patch instead of sleeping for a jiffie, we check to see if the
      amount of time this transaction has been running is less than the
      average commit time, and if it is we sleep for the delta using
      schedule_hrtimeout to give us a higher precision sleep time.  This
      greatly benefits high end storage where you could end up sleeping for
      longer than it takes to commit the transaction and therefore sitting
      idle instead of allowing the transaction to be committed by keeping
      the sleep time to a minimum so you are sure to always be doing
      something.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e07f7183
  14. 06 1月, 2009 3 次提交
    • A
      ext4: Don't overwrite allocation_context ac_status · 032115fc
      Aneesh Kumar K.V 提交于
      We can call ext4_mb_check_limits even after successfully allocating
      the requested blocks.  In that case, make sure we don't overwrite
      ac_status if it already has the status AC_STATUS_FOUND.  This fixes
      the lockdep warning:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      2.6.28-rc6-autokern1 #1
      ---------------------------------------------
      fsstress/11948 is trying to acquire lock:
       (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278
      .....
      
      stack backtrace:
      .....
       [<c04db974>] ext4_mb_regular_allocator+0xbb5/0xd44
      .....
      
      but task is already holding lock:
       (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      032115fc
    • T
      ext4: remove extraneous newlines from calls to ext4_error() and ext4_warning() · fde4d95a
      Theodore Ts'o 提交于
      This removes annoying blank syslog entries emitted by ext4_error() or
      ext4_warning(), since these functions add their own newline.
      Signed-off-by: NNick Warne <nick@ukfsn.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fde4d95a
    • T
      jbd2: Add barrier not supported test to journal_wait_on_commit_record · fd98496f
      Theodore Ts'o 提交于
      Xen doesn't report that barriers are not supported until buffer I/O is
      reported as completed, instead of when the buffer I/O is submitted.
      Add a check and a fallback codepath to journal_wait_on_commit_record()
      to detect this case, so that attempts to mount ext4 filesystems on
      LVM/devicemapper devices on Xen guests don't blow up with an "Aborting
      journal on device XXX"; "Remounting filesystem read-only" error.
      
      Thanks to Andreas Sundstrom for reporting this issue.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      fd98496f
  15. 07 1月, 2009 1 次提交
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  16. 17 12月, 2008 1 次提交
  17. 27 11月, 2008 1 次提交
  18. 26 11月, 2008 3 次提交
  19. 06 1月, 2009 2 次提交
  20. 05 11月, 2008 1 次提交
    • T
      ext4: tone down ext4_da_writepages warnings · 2a21e37e
      Theodore Ts'o 提交于
      If the filesystem has errors, ext4_da_writepages() will return a *lot*
      of errors, including lots and lots of stack dumps.  While it's true
      that we are dropping user data on the floor, which is unfortunate, the
      stack dumps aren't helpful, and they tend to obscure the true original
      root cause of the problem.  So in the case where the filesystem has
      aborted, return an EROFS right away.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2a21e37e
  21. 13 12月, 2008 1 次提交
    • T
      ext4: remove do_blk_alloc() · 97df5d15
      Theodore Ts'o 提交于
      The convenience function do_blk_alloc() is a static function with only
      one caller, so fold it into ext4_new_meta_blocks() to simplify the
      code and to make it easier to understand.
      
      To save more stack space, if count is a null pointer in
      ext4_new_meta_blocks() assume that caller wanted a single block (and
      if there is an error, no blocks were allocated).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      97df5d15
  22. 08 12月, 2008 1 次提交
    • T
      ext4: remove ext4_new_meta_block() · cfe82c85
      Theodore Ts'o 提交于
      There were only two one callers of the function ext4_new_meta_block(),
      which just a very simpler wrapper function around
      ext4_new_meta_blocks().  Change those two functions to call
      ext4_new_meta_blocks() directly, to save code and stack space usage.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cfe82c85
  23. 02 1月, 2009 1 次提交
  24. 07 1月, 2009 1 次提交
  25. 07 12月, 2008 1 次提交
  26. 29 10月, 2008 2 次提交
  27. 30 10月, 2008 1 次提交
  28. 05 1月, 2009 4 次提交
    • L
      Merge branch 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current · fe0bdec6
      Linus Torvalds 提交于
      * 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
        audit: validate comparison operations, store them in sane form
        clean up audit_rule_{add,del} a bit
        make sure that filterkey of task,always rules is reported
        audit rules ordering, part 2
        fixing audit rule ordering mess, part 1
        audit_update_lsm_rules() misses the audit_inode_hash[] ones
        sanitize audit_log_capset()
        sanitize audit_fd_pair()
        sanitize audit_mq_open()
        sanitize AUDIT_MQ_SENDRECV
        sanitize audit_mq_notify()
        sanitize audit_mq_getsetattr()
        sanitize audit_ipc_set_perm()
        sanitize audit_ipc_obj()
        sanitize audit_socketcall
        don't reallocate buffer in every audit_sockaddr()
      fe0bdec6
    • A
      rtc: add alarm/update irq interfaces · 099e6576
      Alessandro Zummo 提交于
      Add standard interfaces for alarm/update irqs enabling.  Drivers are no
      more required to implement equivalent ioctl code as rtc-dev will provide
      it.
      
      UIE emulation should now be handled correctly and will work even for those
      RTC drivers who cannot be configured to do both UIE and AIE.
      Signed-off-by: NAlessandro Zummo <a.zummo@towertech.it>
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      099e6576
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
    • B
      viafb: fix crashes due to 4k stack overflow · e687d691
      Bruno Prémont 提交于
      The function viafb_cursor() uses 2 stack-variables of CURSOR_SIZE bits;
      CURSOR_SIZE is defined as (8 * 1024).  Using up twice 1k on stack is too
      much for 4k-stack (though it works with 8k-stacks).  Make those two
      variables kzalloc'ed to preserve stack space.
      
      Also merge the whole lot of local struct's in viafb_ioctl into a union so
      the stack usage gets minimized here as well.  (struct's are only accessed
      in their indicidual IOCTL case) This second part is only compile-tested as
      I know of no userspace app using the IOCTLs.
      Signed-off-by: NBruno Prémont <bonbons@linux-vserver.org>
      Cc: <JosephChan@via.com.tw>
      Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e687d691