1. 04 3月, 2014 2 次提交
  2. 03 3月, 2014 9 次提交
  3. 25 2月, 2014 4 次提交
    • L
      sysfs: fix namespace refcnt leak · fed95bab
      Li Zefan 提交于
      As mount() and kill_sb() is not a one-to-one match, we shoudn't get
      ns refcnt unconditionally in sysfs_mount(), and instead we should
      get the refcnt only when kernfs_mount() allocated a new superblock.
      
      v2:
      - Changed the name of the new argument, suggested by Tejun.
      - Made the argument optional, suggested by Tejun.
      
      v3:
      - Make the new argument as second-to-last arg, suggested by Tejun.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NTejun Heo <tj@kernel.org>
       ---
       fs/kernfs/mount.c      | 8 +++++++-
       fs/sysfs/mount.c       | 5 +++--
       include/linux/kernfs.h | 9 +++++----
       3 files changed, 15 insertions(+), 7 deletions(-)
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fed95bab
    • J
      fsnotify: Allocate overflow events with proper type · ff57cd58
      Jan Kara 提交于
      Commit 7053aee2 "fsnotify: do not share events between notification
      groups" used overflow event statically allocated in a group with the
      size of the generic notification event. This causes problems because
      some code looks at type specific parts of event structure and gets
      confused by a random data it sees there and causes crashes.
      
      Fix the problem by allocating overflow event with type corresponding to
      the group type so code cannot get confused.
      Signed-off-by: NJan Kara <jack@suse.cz>
      ff57cd58
    • J
      fanotify: Handle overflow in case of permission events · 482ef06c
      Jan Kara 提交于
      If the event queue overflows when we are handling permission event, we
      will never get response from userspace. So we must avoid waiting for it.
      Change fsnotify_add_notify_event() to return whether overflow has
      happened so that we can detect it in fanotify_handle_event() and act
      accordingly.
      Signed-off-by: NJan Kara <jack@suse.cz>
      482ef06c
    • J
      fsnotify: Fix detection whether overflow event is queued · 2513190a
      Jan Kara 提交于
      Currently we didn't initialize event's list head when we removed it from
      the event list. Thus a detection whether overflow event is already
      queued wasn't working. Fix it by always initializing the list head when
      deleting event from a list.
      Signed-off-by: NJan Kara <jack@suse.cz>
      2513190a
  4. 22 2月, 2014 1 次提交
    • J
      Revert "writeback: do not sync data dirtied after sync start" · 0dc83bd3
      Jan Kara 提交于
      This reverts commit c4a391b5. Dave
      Chinner <david@fromorbit.com> has reported the commit may cause some
      inodes to be left out from sync(2). This is because we can call
      redirty_tail() for some inode (which sets i_dirtied_when to current time)
      after sync(2) has started or similarly requeue_inode() can set
      i_dirtied_when to current time if writeback had to skip some pages. The
      real problem is in the functions clobbering i_dirtied_when but fixing
      that isn't trivial so revert is a safer choice for now.
      
      CC: stable@vger.kernel.org # >= 3.13
      Signed-off-by: NJan Kara <jack@suse.cz>
      0dc83bd3
  5. 21 2月, 2014 2 次提交
    • J
      quota: Fix race between dqput() and dquot_scan_active() · 1362f4ea
      Jan Kara 提交于
      Currently last dqput() can race with dquot_scan_active() causing it to
      call callback for an already deactivated dquot. The race is as follows:
      
      CPU1					CPU2
        dqput()
          spin_lock(&dq_list_lock);
          if (atomic_read(&dquot->dq_count) > 1) {
           - not taken
          if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
            spin_unlock(&dq_list_lock);
            ->release_dquot(dquot);
              if (atomic_read(&dquot->dq_count) > 1)
               - not taken
      					  dquot_scan_active()
      					    spin_lock(&dq_list_lock);
      					    if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
      					     - not taken
      					    atomic_inc(&dquot->dq_count);
      					    spin_unlock(&dq_list_lock);
              - proceeds to release dquot
      					    ret = fn(dquot, priv);
      					     - called for inactive dquot
      
      Fix the problem by making sure possible ->release_dquot() is finished by
      the time we call the callback and new calls to it will notice reference
      dquot_scan_active() has taken and bail out.
      
      CC: stable@vger.kernel.org # >= 2.6.29
      Signed-off-by: NJan Kara <jack@suse.cz>
      1362f4ea
    • J
      udf: Fix data corruption on file type conversion · 09ebb17a
      Jan Kara 提交于
      UDF has two types of files - files with data stored in inode (ICB in
      UDF terminology) and files with data stored in external data blocks. We
      convert file from in-inode format to external format in
      udf_file_aio_write() when we find out data won't fit into inode any
      longer. However the following race between two O_APPEND writes can happen:
      
      CPU1					CPU2
      udf_file_aio_write()			udf_file_aio_write()
        down_write(&iinfo->i_data_sem);
        checks that i_size + count1 fits within inode
          => no need to convert
        up_write(&iinfo->i_data_sem);
      					  down_write(&iinfo->i_data_sem);
      					  checks that i_size + count2 fits
      					    within inode => no need to convert
      					  up_write(&iinfo->i_data_sem);
        generic_file_aio_write()
          - extends file by count1 bytes
      					  generic_file_aio_write()
      					    - extends file by count2 bytes
      
      Clearly if count1 + count2 doesn't fit into the inode, we overwrite
      kernel buffers beyond inode, possibly corrupting the filesystem as well.
      
      Fix the problem by acquiring i_mutex before checking whether write fits
      into the inode and using __generic_file_aio_write() afterwards which
      puts check and write into one critical section.
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJan Kara <jack@suse.cz>
      09ebb17a
  6. 19 2月, 2014 4 次提交
  7. 18 2月, 2014 13 次提交
  8. 17 2月, 2014 1 次提交
  9. 16 2月, 2014 4 次提交
    • T
      ext4: fix online resize with a non-standard blocks per group setting · 3d2660d0
      Theodore Ts'o 提交于
      The set_flexbg_block_bitmap() function assumed that the number of
      blocks in a blockgroup was sb->blocksize * 8, which is normally true,
      but not always!  Use EXT4_BLOCKS_PER_GROUP(sb) instead, to fix block
      bitmap corruption after:
      
      mke2fs -t ext4 -g 3072 -i 4096 /dev/vdd 1G
      mount -t ext4 /dev/vdd /vdd
      resize2fs /dev/vdd 8G
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NJon Bernard <jbernard@tuxion.com>
      Cc: stable@vger.kernel.org
      3d2660d0
    • T
      ext4: fix online resize with very large inode tables · b93c9535
      Theodore Ts'o 提交于
      If a file system has a large number of inodes per block group, all of
      the metadata blocks in a flex_bg may be larger than what can fit in a
      single block group.  Unfortunately, ext4_alloc_group_tables() in
      resize.c was never tested to see if it would handle this case
      correctly, and there were a large number of bugs which caused the
      following sequence to result in a BUG_ON:
      
      kernel bug at fs/ext4/resize.c:409!
         ...
      call trace:
       [<ffffffff81256768>] ext4_flex_group_add+0x1448/0x1830
       [<ffffffff81257de2>] ext4_resize_fs+0x7b2/0xe80
       [<ffffffff8123ac50>] ext4_ioctl+0xbf0/0xf00
       [<ffffffff811c111d>] do_vfs_ioctl+0x2dd/0x4b0
       [<ffffffff811b9df2>] ? final_putname+0x22/0x50
       [<ffffffff811c1371>] sys_ioctl+0x81/0xa0
       [<ffffffff81676aa9>] system_call_fastpath+0x16/0x1b
      code: c8 4c 89 df e8 41 96 f8 ff 44 89 e8 49 01 c4 44 29 6d d4 0
      rip  [<ffffffff81254fa1>] set_flexbg_block_bitmap+0x171/0x180
      
      
      This can be reproduced with the following command sequence:
      
         mke2fs -t ext4 -i 4096 /dev/vdd 1G
         mount -t ext4 /dev/vdd /vdd
         resize2fs /dev/vdd 8G
      
      To fix this, we need to make sure the right thing happens when a block
      group's inode table straddles two block groups, which means the
      following bugs had to be fixed:
      
      1) Not clearing the BLOCK_UNINIT flag in the second block group in
         ext4_alloc_group_tables --- the was proximate cause of the BUG_ON.
      
      2) Incorrectly determining how many block groups contained contiguous
         free blocks in ext4_alloc_group_tables().
      
      3) Incorrectly setting the start of the next block range to be marked
         in use after a discontinuity in setup_new_flex_group_blocks().
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      b93c9535
    • F
      Btrfs: use right clone root offset for compressed extents · 93de4ba8
      Filipe David Borba Manana 提交于
      For non compressed extents, iterate_extent_inodes() gives us offsets
      that take into account the data offset from the file extent items, while
      for compressed extents it doesn't. Therefore we have to adjust them before
      placing them in a send clone instruction. Not doing this adjustment leads to
      the receiving end requesting for a wrong a file range to the clone ioctl,
      which results in different file content from the one in the original send
      root.
      
      Issue reproducible with the following excerpt from the test I made for
      xfstests:
      
        _scratch_mkfs
        _scratch_mount "-o compress-force=lzo"
      
        $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
        $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo
      
        $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
      
        $XFS_IO_PROG -c "pwrite -S 0x3e -b 80000 200000 80000" $SCRATCH_MNT/foo
        $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
        $XFS_IO_PROG -c "pwrite -S 0xdc -b 10000 250000 10000" $SCRATCH_MNT/foo
        $XFS_IO_PROG -c "pwrite -S 0xff -b 10000 300000 10000" $SCRATCH_MNT/foo
      
        # will be used for incremental send to be able to issue clone operations
        $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap
      
        $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
      
        $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
        $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
            -x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2
        $FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \
            -x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2
      
        $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
        $BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap
        $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \
            -c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap
      
        _scratch_unmount
        _scratch_mkfs
        _scratch_mount
      
        $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
        $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full
      
        $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap
        $FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full
      
        $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
        $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      93de4ba8
    • A
      btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105 · f085381e
      Anand Jain 提交于
      bdev is null when disk has disappeared and mounted with
      the degrade option
      
      stack trace
      ---------
      btrfs_sysfs_add_one+0x105/0x1c0 [btrfs]
      open_ctree+0x15f3/0x1fe0 [btrfs]
      btrfs_mount+0x5db/0x790 [btrfs]
      ? alloc_pages_current+0xa4/0x160
      mount_fs+0x34/0x1b0
      vfs_kern_mount+0x62/0xf0
      do_mount+0x22e/0xa80
      ? __get_free_pages+0x9/0x40
      ? copy_mount_options+0x31/0x170
      SyS_mount+0x7e/0xc0
      system_call_fastpath+0x16/0x1b
      ---------
      
      reproducer:
      -------
      mkfs.btrfs -draid1 -mraid1 /dev/sdc /dev/sdd
      (detach a disk)
      devmgt detach /dev/sdc [1]
      mount -o degrade /dev/sdd /btrfs
      -------
      
      [1] github.com/anajain/devmgt.git
      Signed-off-by: NAnand Jain <Anand.Jain@oracle.com>
      Tested-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f085381e