1. 04 1月, 2012 5 次提交
  2. 14 12月, 2011 5 次提交
    • Y
      ext4: handle EOF correctly in ext4_bio_write_page() · 5a0dc736
      Yongqiang Yang 提交于
      We need to zero out part of a page which beyond EOF before setting uptodate,
      otherwise, mapread or write will see non-zero data beyond EOF.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      5a0dc736
    • Y
      ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized · 5b5ffa49
      Yongqiang Yang 提交于
      If a file is fallocated on a hole, map->m_lblk + map->m_len may be greater
      than ee_block + ee_len.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      5b5ffa49
    • Y
      ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers() · 093e6e36
      Yongqiang Yang 提交于
      If a page has been read into memory and never been written, it has no
      buffers, but we should handle the page in truncate or punch hole.
      
      VFS code of writing operations has handled holes correctly, so this
      patch removes the code handling holes in writing operations.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      093e6e36
    • Y
      ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize · 13a79a47
      Yongqiang Yang 提交于
      If there is an unwritten but clean buffer in a page and there is a
      dirty buffer after the buffer, then mpage_submit_io does not write the
      dirty buffer out.  As a result, da_writepages loops forever.
      
      This patch fixes the problem by checking dirty flag.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      13a79a47
    • A
      ext4: avoid hangs in ext4_da_should_update_i_disksize() · ea51d132
      Andrea Arcangeli 提交于
      If the pte mapping in generic_perform_write() is unmapped between
      iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
      "copied" parameter to ->end_write can be zero. ext4 couldn't cope with
      it with delayed allocations enabled. This skips the i_disksize
      enlargement logic if copied is zero and no new data was appeneded to
      the inode.
      
       gdb> bt
       #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
       08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
       #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
       xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
       #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
       ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
       #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
       os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
       #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
       xffff88001e26be40) at mm/filemap.c:2600
       #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
       zed out>, pos=<value optimized out>) at mm/filemap.c:2632
       #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
       t fs/ext4/file.c:136
       #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
       ppos=0xffff88001e26bf48) at fs/read_write.c:406
       #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
       000, pos=0xffff88001e26bf48) at fs/read_write.c:435
       #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
       4000) at fs/read_write.c:487
       #10 <signal handler called>
       #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
       #12 0x0000000000000000 in ?? ()
       gdb> print offset
       $22 = 0xffffffffffffffff
       gdb> print idx
       $23 = 0xffffffff
       gdb> print inode->i_blkbits
       $24 = 0xc
       gdb> up
       #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
       xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
       2512                    if (ext4_da_should_update_i_disksize(page, end)) {
       gdb> print start
       $25 = 0x0
       gdb> print end
       $26 = 0xffffffffffffffff
       gdb> print pos
       $27 = 0x108000
       gdb> print new_i_size
       $28 = 0x108000
       gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
       $29 = 0xd9000
       gdb> down
       2467            for (i = 0; i < idx; i++)
       gdb> print i
       $30 = 0xd44acbee
      
      This is 100% reproducible with some autonuma development code tuned in
      a very aggressive manner (not normal way even for knumad) which does
      "exotic" changes to the ptes. It wouldn't normally trigger but I don't
      see why it can't happen normally if the page is added to swap cache in
      between the two faults leading to "copied" being zero (which then
      hangs in ext4). So it should be fixed. Especially possible with lumpy
      reclaim (albeit disabled if compaction is enabled) as that would
      ignore the young bits in the ptes.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      ea51d132
  3. 13 12月, 2011 2 次提交
    • T
      ext4: display the correct mount option in /proc/mounts for [no]init_itable · fc6cb1cd
      Theodore Ts'o 提交于
      /proc/mounts was showing the mount option [no]init_inode_table when
      the correct mount option that will be accepted by parse_options() is
      [no]init_itable.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      fc6cb1cd
    • P
      ext4: Fix crash due to getting bogus eh_depth value on big-endian systems · b4611abf
      Paul Mackerras 提交于
      Commit 1939dd84 ("ext4: cleanup ext4_ext_grow_indepth code") added a
      reference to ext4_extent_header.eh_depth, but forget to pass the value
      read through le16_to_cpu.  The result is a crash on big-endian
      machines, such as this crash on a POWER7 server:
      
      attempt to access beyond end of device
      sda8: rw=0, want=776392648163376, limit=168558560
      Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6bcb
      Faulting instruction address: 0xc0000000001f5f38
      cpu 0x14: Vector: 300 (Data Access) at [c000001bd1aaecf0]
          pc: c0000000001f5f38: .__brelse+0x18/0x60
          lr: c0000000002e07a4: .ext4_ext_drop_refs+0x44/0x80
          sp: c000001bd1aaef70
         msr: 9000000000009032
         dar: 6b6b6b6b6b6b6bcb
       dsisr: 40000000
        current = 0xc000001bd15b8010
        paca    = 0xc00000000ffe4600
          pid   = 19911, comm = flush-8:0
      enter ? for help
      [c000001bd1aaeff0] c0000000002e07a4 .ext4_ext_drop_refs+0x44/0x80
      [c000001bd1aaf090] c0000000002e0c58 .ext4_ext_find_extent+0x408/0x4c0
      [c000001bd1aaf180] c0000000002e145c .ext4_ext_insert_extent+0x2bc/0x14c0
      [c000001bd1aaf2c0] c0000000002e3fb8 .ext4_ext_map_blocks+0x628/0x1710
      [c000001bd1aaf420] c0000000002b2974 .ext4_map_blocks+0x224/0x310
      [c000001bd1aaf4d0] c0000000002b7f2c .mpage_da_map_and_submit+0xbc/0x490
      [c000001bd1aaf5a0] c0000000002b8688 .write_cache_pages_da+0x2c8/0x430
      [c000001bd1aaf720] c0000000002b8b28 .ext4_da_writepages+0x338/0x670
      [c000001bd1aaf8d0] c000000000157280 .do_writepages+0x40/0x90
      [c000001bd1aaf940] c0000000001ea830 .writeback_single_inode+0xe0/0x530
      [c000001bd1aafa00] c0000000001eb680 .writeback_sb_inodes+0x210/0x300
      [c000001bd1aafb20] c0000000001ebc84 .__writeback_inodes_wb+0xd4/0x140
      [c000001bd1aafbe0] c0000000001ebfec .wb_writeback+0x2fc/0x3e0
      [c000001bd1aafce0] c0000000001ed770 .wb_do_writeback+0x2f0/0x300
      [c000001bd1aafdf0] c0000000001ed848 .bdi_writeback_thread+0xc8/0x340
      [c000001bd1aafed0] c0000000000c5494 .kthread+0xb4/0xc0
      [c000001bd1aaff90] c000000000021f48 .kernel_thread+0x54/0x70
      
      This is due to getting ext_depth(inode) == 0x101 and therefore running
      off the end of the path array in ext4_ext_drop_refs into following
      unallocated structures.
      
      This fixes it by adding the necessary le16_to_cpu.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b4611abf
  4. 12 12月, 2011 1 次提交
  5. 25 11月, 2011 1 次提交
    • T
      ext4: fix racy use-after-free in ext4_end_io_dio() · 4c81f045
      Tejun Heo 提交于
      ext4_end_io_dio() queues io_end->work and then clears iocb->private;
      however, io_end->work calls aio_complete() which frees the iocb
      object.  If that slab object gets reallocated, then ext4_end_io_dio()
      can end up clearing someone else's iocb->private, this use-after-free
      can cause a leak of a struct ext4_io_end_t structure.
      
      Detected and tested with slab poisoning.
      
      [ Note: Can also reproduce using 12 fio's against 12 file systems with the
        following configuration file:
      
        [global]
        direct=1
        ioengine=libaio
        iodepth=1
        bs=4k
        ba=4k
        size=128m
      
        [create]
        filename=${TESTDIR}
        rw=write
      
        -- tytso ]
      
      Google-Bug-Id: 5354697
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NKent Overstreet <koverstreet@google.com>
      Tested-by: NKent Overstreet <koverstreet@google.com>
      Cc: stable@kernel.org
      4c81f045
  6. 22 11月, 2011 1 次提交
  7. 08 11月, 2011 1 次提交
  8. 07 11月, 2011 2 次提交
  9. 02 11月, 2011 4 次提交
  10. 01 11月, 2011 9 次提交
  11. 31 10月, 2011 4 次提交
    • T
      ext4: optimize locking for end_io extent conversion · b82e384c
      Theodore Ts'o 提交于
      Now that we are doing the locking correctly, we need to grab the
      i_completed_io_lock() twice per end_io.  We can clean this up by
      removing the structure from the i_complted_io_list, and use this as
      the locking mechanism to prevent ext4_flush_completed_IO() racing
      against ext4_end_io_work(), instead of clearing the
      EXT4_IO_END_UNWRITTEN in io->flag.
      
      In addition, if the ext4_convert_unwritten_extents() returns an error,
      we no longer keep the end_io structure on the linked list.  This
      doesn't help, because it tends to lock up the file system and wedges
      the system.  That's one way to call attention to the problem, but it
      doesn't help the overall robustness of the system.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b82e384c
    • T
      ext4: remove unnecessary call to waitqueue_active() · 4e298021
      Theodore Ts'o 提交于
      The usage of waitqueue_active() is not necessary, and introduces (I
      believe) a hard-to-hit race.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4e298021
    • T
      ext4: Use correct locking for ext4_end_io_nolock() · d73d5046
      Tao Ma 提交于
      We must hold i_completed_io_lock when manipulating anything on the
      i_completed_io_list linked list.  This includes io->lock, which we
      were checking in ext4_end_io_nolock().
      
      So move this check to ext4_end_io_work().  This also has the bonus of
      avoiding extra work if it is already done without needing to take the
      mutex.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d73d5046
    • C
      writeback: Add a 'reason' to wb_writeback_work · 0e175a18
      Curt Wohlgemuth 提交于
      This creates a new 'reason' field in a wb_writeback_work
      structure, which unambiguously identifies who initiates
      writeback activity.  A 'wb_reason' enumeration has been
      added to writeback.h, to enumerate the possible reasons.
      
      The 'writeback_work_class' and tracepoint event class and
      'writeback_queue_io' tracepoints are updated to include the
      symbolic 'reason' in all trace events.
      
      And the 'writeback_inodes_sbXXX' family of routines has had
      a wb_stats parameter added to them, so callers can specify
      why writeback is being started.
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      0e175a18
  12. 29 10月, 2011 5 次提交