1. 18 12月, 2011 3 次提交
  2. 17 12月, 2011 1 次提交
  3. 16 12月, 2011 8 次提交
    • C
      Btrfs: unplug every once and a while · d85c8a6f
      Chris Mason 提交于
      The btrfs io submission threads can build up massive plug lists.  This
      keeps things more reasonable so we don't hand over huge dumps of IO at
      once.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d85c8a6f
    • C
      Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code · e755d9ab
      Chris Mason 提交于
      btrfs_update_inode is sometimes called with a null reservation.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e755d9ab
    • J
      Btrfs: only set cache_generation if we setup the block group · e65cbb94
      Josef Bacik 提交于
      A user reported a problem booting into a new kernel with the old format inodes.
      He was panicing in cow_file_range while writing out the inode cache.  This is
      because if the block group is not cached we'll just skip writing out the cache,
      however if it gets dirtied again in the same transaction and it finished caching
      we'd go ahead and write it out, but since we set cache_generation to the transid
      we think we've already truncated it and will just carry on, running into
      cow_file_range and blowing up.  We need to make sure we only set
      cache_generation if we've done the truncate.  The user tested this patch and
      verified that the panic no longer occured.  Thanks,
      Reported-and-Tested-by: NKlaus Bitto <klaus.bitto@gmail.com>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      e65cbb94
    • J
      Btrfs: don't panic if orphan item already exists · ee4d89f0
      Josef Bacik 提交于
      I've been hitting this BUG_ON() in btrfs_orphan_add when running xfstest 269 in
      a loop.  This is because we will add an orphan item, do the truncate, the
      truncate will fail for whatever reason (*cough*ENOSPC*cough*) and then we're
      left with an orphan item still in the fs.  Then we come back later to do another
      truncate and it blows up because we already have an orphan item.  This is ok so
      just fix the BUG_ON() to only BUG() if ret is not EEXIST.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      ee4d89f0
    • J
      Btrfs: fix leaked space in truncate · 7041ee97
      Josef Bacik 提交于
      We were occasionaly leaking space when running xfstest 269.  This is because if
      we failed to start the transaction in the truncate loop we'd just goto out, but
      we need to break so that the inode is removed from the orphan list and the space
      is properly freed.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      7041ee97
    • J
      Btrfs: fix how we do delalloc reservations and how we free reservations on error · 660d3f6c
      Josef Bacik 提交于
      Running xfstests 269 with some tracing my scripts kept spitting out errors about
      releasing bytes that we didn't actually have reserved.  This took me down a huge
      rabbit hole and it turns out the way we deal with reserved_extents is wrong,
      we need to only be setting it if the reservation succeeds, otherwise the free()
      method will come in and unreserve space that isn't actually reserved yet, which
      can lead to other warnings and such.  The math was all working out right in the
      end, but it caused all sorts of other issues in addition to making my scripts
      yell and scream and generally make it impossible for me to track down the
      original issue I was looking for.  The other problem is with our error handling
      in the reservation code.  There are two cases that we need to deal with
      
      1) We raced with free.  In this case free won't free anything because csum_bytes
      is modified before we dro the lock in our reservation path, so free rightly
      doesn't release any space because the reservation code may be depending on that
      reservation.  However if we fail, we need the reservation side to do the free at
      that point since that space is no longer in use.  So as it stands the code was
      doing this fine and it worked out, except in case #2
      
      2) We don't race with free.  Nobody comes in and changes anything, and our
      reservation fails.  In this case we didn't reserve anything anyway and we just
      need to clean up csum_bytes but not free anything.  So we keep track of
      csum_bytes before we drop the lock and if it hasn't changed we know we can just
      decrement csum_bytes and carry on.
      
      Because of the case where we can race with free()'s since we have to drop our
      spin_lock to do the reservation, I'm going to serialize all reservations with
      the i_mutex.  We already get this for free in the heavy use paths, truncate and
      file write all hold the i_mutex, just needed to add it to page_mkwrite and
      various ioctl/balance things.  With this patch my space leak scripts no longer
      scream bloody murder.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      660d3f6c
    • J
      Btrfs: deal with enospc from dirtying inodes properly · 22c44fe6
      Josef Bacik 提交于
      Now that we're properly keeping track of delayed inode space we've been getting
      a lot of warnings out of btrfs_dirty_inode() when running xfstest 83.  This is
      because a bunch of people call mark_inode_dirty, which is void so we can't
      return ENOSPC.  This needs to be fixed in a few areas
      
      1) file_update_time - this updates the mtime and such when writing to a file,
      which will call mark_inode_dirty.  So copy file_update_time into btrfs so we can
      call btrfs_dirty_inode directly and return an error if we get one appropriately.
      
      2) fix symlinks to use btrfs_setattr for ->setattr.  For some reason we weren't
      setting ->setattr for symlinks, even though we should have been.  This catches
      one of the cases where we were getting errors in mark_inode_dirty.
      
      3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
      instead of mark_inode_dirty.  This lets us return errors properly for truncate
      and chown/anything related to setattr.
      
      4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
      print an error if we have one.  The only remaining user we can't control for
      this is touch_atime(), but we don't really want to keep people from walking
      down the tree if we don't have space to save the atime update, so just complain
      but don't worry about it.
      
      With this patch xfstests 83 complains a handful of times instead of hundreds of
      times.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      22c44fe6
    • J
      Btrfs: fix num_workers_starting bug and other bugs in async thread · 0dc3b84a
      Josef Bacik 提交于
      Al pointed out we have some random problems with the way we account for
      num_workers_starting in the async thread stuff.  First of all we need to make
      sure to decrement num_workers_starting if we fail to start the worker, so make
      __btrfs_start_workers do this.  Also fix __btrfs_start_workers so that it
      doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
      failed to create a worker.  Also check_pending_worker_creates needs to call
      __btrfs_start_work in it's work function since it already increments
      num_workers_starting.
      
      People only start one worker at a time, so get rid of the num_workers argument
      everywhere, and make btrfs_queue_worker a void since it will always succeed.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0dc3b84a
  4. 15 12月, 2011 7 次提交
    • C
      BTRFS: Establish i_ops before calling d_instantiate · ad19db71
      Casey Schaufler 提交于
      The Smack LSM hook for security_d_instantiate checks
      the inode's i_op->getxattr value to determine if the
      containing filesystem supports extended attributes.
      The BTRFS filesystem sets the inode's i_op value only
      after it has instantiated the inode. This results in
      Smack incorrectly giving new BTRFS inodes attributes
      from the filesystem defaults on the assumption that
      values can't be stored on the filesystem. This patch
      moves the assignment of inode operation vectors ahead
      of the calls to d_instantiate, letting Smack know that
      the filesystem supports extended attributes. There
      should be no impact on the performance or behavior of
      BTRFS.
      Signed-off-by: NCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ad19db71
    • C
      Btrfs: add a cond_resched() into the worker loop · 8f3b65a3
      Chris Mason 提交于
      If we have a constant stream of end_io completions or crc work,
      we can hit softlockup messages from the async helper threads.  This
      adds a cond_resched() into the loop to avoid them.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8f3b65a3
    • L
      Btrfs: fix ctime update of on-disk inode · 306424cc
      Li Zefan 提交于
      To reproduce the bug:
      
          # touch /mnt/tmp
          # stat /mnt/tmp | grep Change
          Change: 2011-12-09 09:32:23.412105981 +0800
          # chattr +i /mnt/tmp
          # stat /mnt/tmp | grep Change
          Change: 2011-12-09 09:32:43.198105295 +0800
          # umount /mnt
          # mount /dev/loop1 /mnt
          # stat /mnt/tmp | grep Change
          Change: 2011-12-09 09:32:23.412105981 +0800
      
      We should update ctime of in-memory inode before calling
      btrfs_update_inode().
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      306424cc
    • A
      btrfs: keep orphans for subvolume deletion · f8e9e0b0
      Arne Jansen 提交于
      Since we have the free space caches, btrfs_orphan_cleanup also runs for
      the tree_root. Unfortunately this also cleans up the orphans used to mark
      subvol deletions in progress.
      
      Currently if a subvol deletion gets interrupted twice by umount/mount, the
      deletion will not be continued and the space permanently lost, though it
      would be possible to write a tool to recover those lost subvol deletions.
      This patch checks if the orphan belongs to a subvol (dead root) and skips
      the deletion.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f8e9e0b0
    • M
      Btrfs: fix inaccurate available space on raid0 profile · 39fb26c3
      Miao Xie 提交于
      When we use raid0 as the data profile, df command may show us a very
      inaccurate value of the available space, which may be much less than the
      real one. It may make the users puzzled. Fix it by changing the calculation
      of the available space, and making it be more similar to a fake chunk
      allocation.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      39fb26c3
    • M
      Btrfs: fix wrong disk space information of the files · 3642320e
      Miao Xie 提交于
      Btrfsck report errors after the 83th case of xfstests was run, The error
      number is 400, it means the used disk space of the file is wrong.
      
      The reason of this bug is that:
      The file truncation may fail when the space of the file system is not enough,
      and leave some file extents, whose offset are beyond the end of the files.
      When we want to expand those files, we will drop those file extents, and
      put in dummy file extents, and then we should update the i-node. But btrfs
      forgets to do it.
      
      This patch adds the forgotten i-node update.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3642320e
    • M
      Btrfs: fix wrong i_size when truncating a file to a larger size · f4a2f4c5
      Miao Xie 提交于
      Btrfsck report error 100 after the 83th case of xfstests was run, it means
      the i_size of the file is wrong.
      
      The reason of this bug is that:
      Btrfs increased i_size of the file at the beginning, but it failed to expand
      the file, and failed to update the i_size to the old size because there is no
      enough space in the file system, so we found a wrong i_size.
      
      This patch fixes this bug by updating the i_size just when we pass the file
      expanding and get enough space to update i-node.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f4a2f4c5
  5. 14 12月, 2011 11 次提交
    • D
      fs/ncpfs: fix error paths and goto statements in ncp_fill_super() · 759c361e
      Djalal Harouni 提交于
      The label 'out_bdi' should be followed by bdi_destroy() instead of
      fput() which should be after the 'out_fput' label.
      
      If bdi_setup_and_register() fails then jump to the 'out_fput' label
      instead of the 'out_bdi' one.
      
      If fget(data.info_fd) fails then jump to the previously fixed 'out_bdi'
      label to call bdi_destroy() otherwise the bdi object will not be
      destroyed.
      
      Compile tested only.
      Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      759c361e
    • Y
      ext4: handle EOF correctly in ext4_bio_write_page() · 5a0dc736
      Yongqiang Yang 提交于
      We need to zero out part of a page which beyond EOF before setting uptodate,
      otherwise, mapread or write will see non-zero data beyond EOF.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      5a0dc736
    • Y
      ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized · 5b5ffa49
      Yongqiang Yang 提交于
      If a file is fallocated on a hole, map->m_lblk + map->m_len may be greater
      than ee_block + ee_len.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      5b5ffa49
    • Y
      ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers() · 093e6e36
      Yongqiang Yang 提交于
      If a page has been read into memory and never been written, it has no
      buffers, but we should handle the page in truncate or punch hole.
      
      VFS code of writing operations has handled holes correctly, so this
      patch removes the code handling holes in writing operations.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      093e6e36
    • Y
      ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize · 13a79a47
      Yongqiang Yang 提交于
      If there is an unwritten but clean buffer in a page and there is a
      dirty buffer after the buffer, then mpage_submit_io does not write the
      dirty buffer out.  As a result, da_writepages loops forever.
      
      This patch fixes the problem by checking dirty flag.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      13a79a47
    • A
      ext4: avoid hangs in ext4_da_should_update_i_disksize() · ea51d132
      Andrea Arcangeli 提交于
      If the pte mapping in generic_perform_write() is unmapped between
      iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
      "copied" parameter to ->end_write can be zero. ext4 couldn't cope with
      it with delayed allocations enabled. This skips the i_disksize
      enlargement logic if copied is zero and no new data was appeneded to
      the inode.
      
       gdb> bt
       #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
       08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
       #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
       xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
       #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
       ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
       #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
       os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
       #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
       xffff88001e26be40) at mm/filemap.c:2600
       #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
       zed out>, pos=<value optimized out>) at mm/filemap.c:2632
       #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
       t fs/ext4/file.c:136
       #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
       ppos=0xffff88001e26bf48) at fs/read_write.c:406
       #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
       000, pos=0xffff88001e26bf48) at fs/read_write.c:435
       #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
       4000) at fs/read_write.c:487
       #10 <signal handler called>
       #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
       #12 0x0000000000000000 in ?? ()
       gdb> print offset
       $22 = 0xffffffffffffffff
       gdb> print idx
       $23 = 0xffffffff
       gdb> print inode->i_blkbits
       $24 = 0xc
       gdb> up
       #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
       xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
       2512                    if (ext4_da_should_update_i_disksize(page, end)) {
       gdb> print start
       $25 = 0x0
       gdb> print end
       $26 = 0xffffffffffffffff
       gdb> print pos
       $27 = 0x108000
       gdb> print new_i_size
       $28 = 0x108000
       gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
       $29 = 0xd9000
       gdb> down
       2467            for (i = 0; i < idx; i++)
       gdb> print i
       $30 = 0xd44acbee
      
      This is 100% reproducible with some autonuma development code tuned in
      a very aggressive manner (not normal way even for knumad) which does
      "exotic" changes to the ptes. It wouldn't normally trigger but I don't
      see why it can't happen normally if the page is added to swap cache in
      between the two faults leading to "copied" being zero (which then
      hangs in ext4). So it should be fixed. Especially possible with lumpy
      reclaim (albeit disabled if compaction is enabled) as that would
      ignore the young bits in the ptes.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      ea51d132
    • Y
      ceph: add missing spin_unlock at ceph_mdsc_build_path() · 9d5a09e6
      Yehuda Sadeh 提交于
      one of the paths was missing spin_unlock
      Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
      9d5a09e6
    • A
      configfs: register_filesystem() called too early · 7c6455e3
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7c6455e3
    • A
      fuse: register_filesystem() called too early · 988f0325
      Al Viro 提交于
      same story as with ubifs
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      988f0325
    • A
      ubifs: too early register_filesystem() · 5cc361e3
      Al Viro 提交于
      doing that before you are ready to handle mount() is a Bad Idea(tm)...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5cc361e3
    • S
      ceph: fix SEEK_CUR, SEEK_SET regression · 6a82c47a
      Sage Weil 提交于
      Commit 06222e49 got the if wrong so that
      it always evaluates as true.  This is semantically harmless, but makes
      SEEK_CUR and SEEK_SET needlessly query the server.
      
      Rewrite the if to explicitly enumerate the cases we DO need a valid i_size
      to make this code less fragile.
      Reported-by: NRoel Kluin <roel.kluin@gmail.com>
      Signed-off-by: NSage Weil <sage@newdream.net>
      6a82c47a
  6. 13 12月, 2011 5 次提交
    • M
      fuse: llseek fix race · 73104b6e
      Miklos Szeredi 提交于
      Fix race between lseek(fd, 0, SEEK_CUR) and read/write.  This was fixed in
      generic code by commit 5b6f1eb9 (vfs: lseek(fd, 0, SEEK_CUR) race condition).
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      73104b6e
    • R
      fuse: fix llseek bug · b48c6af2
      Roel Kluin 提交于
      The test in fuse_file_llseek() "not SEEK_CUR or not SEEK_SET" always evaluates
      to true.
      
      This was introduced in 3.1 by commit 06222e49 (fs: handle SEEK_HOLE/SEEK_DATA
      properly in all fs's that define their own llseek) and changed the behavior of
      SEEK_CUR and SEEK_SET to always retrieve the file attributes.  This is a
      performance regression.
      
      Fix the test so that it makes sense.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: stable@vger.kernel.org
      CC: Josef Bacik <josef@redhat.com>
      CC: Al Viro <viro@zeniv.linux.org.uk>
      b48c6af2
    • M
      fuse: fix fuse_retrieve · 48706d0a
      Miklos Szeredi 提交于
      Fix two bugs in fuse_retrieve():
      
       - retrieving more than one page would yield repeated instances of the
         first page
      
       - if more than FUSE_MAX_PAGES_PER_REQ pages were requested than the
         request page array would overflow
      
      fuse_retrieve() was added in 2.6.36 and these bugs had been there since the
      beginning.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: stable@vger.kernel.org
      48706d0a
    • T
      ext4: display the correct mount option in /proc/mounts for [no]init_itable · fc6cb1cd
      Theodore Ts'o 提交于
      /proc/mounts was showing the mount option [no]init_inode_table when
      the correct mount option that will be accepted by parse_options() is
      [no]init_itable.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      fc6cb1cd
    • P
      ext4: Fix crash due to getting bogus eh_depth value on big-endian systems · b4611abf
      Paul Mackerras 提交于
      Commit 1939dd84 ("ext4: cleanup ext4_ext_grow_indepth code") added a
      reference to ext4_extent_header.eh_depth, but forget to pass the value
      read through le16_to_cpu.  The result is a crash on big-endian
      machines, such as this crash on a POWER7 server:
      
      attempt to access beyond end of device
      sda8: rw=0, want=776392648163376, limit=168558560
      Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6bcb
      Faulting instruction address: 0xc0000000001f5f38
      cpu 0x14: Vector: 300 (Data Access) at [c000001bd1aaecf0]
          pc: c0000000001f5f38: .__brelse+0x18/0x60
          lr: c0000000002e07a4: .ext4_ext_drop_refs+0x44/0x80
          sp: c000001bd1aaef70
         msr: 9000000000009032
         dar: 6b6b6b6b6b6b6bcb
       dsisr: 40000000
        current = 0xc000001bd15b8010
        paca    = 0xc00000000ffe4600
          pid   = 19911, comm = flush-8:0
      enter ? for help
      [c000001bd1aaeff0] c0000000002e07a4 .ext4_ext_drop_refs+0x44/0x80
      [c000001bd1aaf090] c0000000002e0c58 .ext4_ext_find_extent+0x408/0x4c0
      [c000001bd1aaf180] c0000000002e145c .ext4_ext_insert_extent+0x2bc/0x14c0
      [c000001bd1aaf2c0] c0000000002e3fb8 .ext4_ext_map_blocks+0x628/0x1710
      [c000001bd1aaf420] c0000000002b2974 .ext4_map_blocks+0x224/0x310
      [c000001bd1aaf4d0] c0000000002b7f2c .mpage_da_map_and_submit+0xbc/0x490
      [c000001bd1aaf5a0] c0000000002b8688 .write_cache_pages_da+0x2c8/0x430
      [c000001bd1aaf720] c0000000002b8b28 .ext4_da_writepages+0x338/0x670
      [c000001bd1aaf8d0] c000000000157280 .do_writepages+0x40/0x90
      [c000001bd1aaf940] c0000000001ea830 .writeback_single_inode+0xe0/0x530
      [c000001bd1aafa00] c0000000001eb680 .writeback_sb_inodes+0x210/0x300
      [c000001bd1aafb20] c0000000001ebc84 .__writeback_inodes_wb+0xd4/0x140
      [c000001bd1aafbe0] c0000000001ebfec .wb_writeback+0x2fc/0x3e0
      [c000001bd1aafce0] c0000000001ed770 .wb_do_writeback+0x2f0/0x300
      [c000001bd1aafdf0] c0000000001ed848 .bdi_writeback_thread+0xc8/0x340
      [c000001bd1aafed0] c0000000000c5494 .kthread+0xb4/0xc0
      [c000001bd1aaff90] c000000000021f48 .kernel_thread+0x54/0x70
      
      This is due to getting ext_depth(inode) == 0x101 and therefore running
      off the end of the path array in ext4_ext_drop_refs into following
      unallocated structures.
      
      This fixes it by adding the necessary le16_to_cpu.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b4611abf
  7. 12 12月, 2011 1 次提交
  8. 10 12月, 2011 1 次提交
    • C
      Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror · 5dbc8fca
      Chris Mason 提交于
      btrfs_end_bio checks the number of errors on a bio against the max
      number of errors allowed before sending any EIOs up to the higher
      levels.
      
      If we got enough copies of the bio done for a given raid level, it is
      supposed to clear the bio error flag and return success.
      
      We have pointers to the original bio sent down by the higher layers and
      pointers to any cloned bios we made for raid purposes.  If the original
      bio happens to be the one that got an io error, but not the last one to
      finish, it might not have the BIO_UPTODATE bit set.
      
      Then, when the last bio does finish, we'll call bio_end_io on the
      original bio.  It won't have the uptodate bit set and we'll end up
      sending EIO to the higher layers.
      
      We already had a check for this, it just was conditional on getting the
      IO error on the very last bio.  Make the check unconditional so we eat
      the EIOs properly.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5dbc8fca
  9. 09 12月, 2011 3 次提交
    • M
      procfs: do not overflow get_{idle,iowait}_time for nohz · 2a95ea6c
      Michal Hocko 提交于
      Since commit a25cac51 ("proc: Consider NO_HZ when printing idle and
      iowait times") we are reporting idle/io_wait time also while a CPU is
      tickless.  We rely on get_{idle,iowait}_time functions to retrieve
      proper data.
      
      These functions, however, use usecs_to_cputime to translate micro
      seconds time to cputime64_t.  This is just an alias to usecs_to_jiffies
      which reduces the data type from u64 to unsigned int and also checks
      whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
      and returns MAX_JIFFY_OFFSET in that case.
      
      When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
      it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
      >3000s! until we overflow unsigned int.  Just for reference
      CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
      CONFIG_HZ_1000 ~2s.
      
      This results in a bug when people saw [h]top going mad reporting 100%
      CPU usage even though there was basically no CPU load.  The reason was
      simply that /proc/stat stopped reporting idle/io_wait changes (and
      reported MAX_JIFFY_OFFSET) and so the only change happening was for user
      system time.
      
      Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
      to 32b type and it is much more appropriate for cumulative time values
      (unlike usecs_to_jiffies which intended for timeout calculations).
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Tested-by: NArtem S. Tashkinov <t.artem@mailcity.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a95ea6c
    • C
      fs/proc/meminfo.c: fix compilation error · b53fc7c2
      Claudio Scordino 提交于
      Fix the error message "directives may not be used inside a macro argument"
      which appears when the kernel is compiled for the cris architecture.
      Signed-off-by: NClaudio Scordino <claudio@evidence.eu.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b53fc7c2
    • A
      procfs: fix a vfsmount longterm reference leak · 905ad269
      Al Viro 提交于
      kern_mount() doesn't pair with plain mntput()...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      905ad269