1. 17 2月, 2014 1 次提交
  2. 16 2月, 2014 2 次提交
    • T
      ext4: fix online resize with a non-standard blocks per group setting · 3d2660d0
      Theodore Ts'o 提交于
      The set_flexbg_block_bitmap() function assumed that the number of
      blocks in a blockgroup was sb->blocksize * 8, which is normally true,
      but not always!  Use EXT4_BLOCKS_PER_GROUP(sb) instead, to fix block
      bitmap corruption after:
      
      mke2fs -t ext4 -g 3072 -i 4096 /dev/vdd 1G
      mount -t ext4 /dev/vdd /vdd
      resize2fs /dev/vdd 8G
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NJon Bernard <jbernard@tuxion.com>
      Cc: stable@vger.kernel.org
      3d2660d0
    • T
      ext4: fix online resize with very large inode tables · b93c9535
      Theodore Ts'o 提交于
      If a file system has a large number of inodes per block group, all of
      the metadata blocks in a flex_bg may be larger than what can fit in a
      single block group.  Unfortunately, ext4_alloc_group_tables() in
      resize.c was never tested to see if it would handle this case
      correctly, and there were a large number of bugs which caused the
      following sequence to result in a BUG_ON:
      
      kernel bug at fs/ext4/resize.c:409!
         ...
      call trace:
       [<ffffffff81256768>] ext4_flex_group_add+0x1448/0x1830
       [<ffffffff81257de2>] ext4_resize_fs+0x7b2/0xe80
       [<ffffffff8123ac50>] ext4_ioctl+0xbf0/0xf00
       [<ffffffff811c111d>] do_vfs_ioctl+0x2dd/0x4b0
       [<ffffffff811b9df2>] ? final_putname+0x22/0x50
       [<ffffffff811c1371>] sys_ioctl+0x81/0xa0
       [<ffffffff81676aa9>] system_call_fastpath+0x16/0x1b
      code: c8 4c 89 df e8 41 96 f8 ff 44 89 e8 49 01 c4 44 29 6d d4 0
      rip  [<ffffffff81254fa1>] set_flexbg_block_bitmap+0x171/0x180
      
      
      This can be reproduced with the following command sequence:
      
         mke2fs -t ext4 -i 4096 /dev/vdd 1G
         mount -t ext4 /dev/vdd /vdd
         resize2fs /dev/vdd 8G
      
      To fix this, we need to make sure the right thing happens when a block
      group's inode table straddles two block groups, which means the
      following bugs had to be fixed:
      
      1) Not clearing the BLOCK_UNINIT flag in the second block group in
         ext4_alloc_group_tables --- the was proximate cause of the BUG_ON.
      
      2) Incorrectly determining how many block groups contained contiguous
         free blocks in ext4_alloc_group_tables().
      
      3) Incorrectly setting the start of the next block range to be marked
         in use after a discontinuity in setup_new_flex_group_blocks().
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      b93c9535
  3. 13 2月, 2014 2 次提交
  4. 12 2月, 2014 1 次提交
    • E
      ext4: fix xfstest generic/299 block validity failures · 15cc1767
      Eric Whitney 提交于
      Commit a115f749 (ext4: remove wait for unwritten extent conversion from
      ext4_truncate) exposed a bug in ext4_ext_handle_uninitialized_extents().
      It can be triggered by xfstest generic/299 when run on a test file
      system created without a journal.  This test continuously fallocates and
      truncates files to which random dio/aio writes are simultaneously
      performed by a separate process.  The test completes successfully, but
      if the test filesystem is mounted with the block_validity option, a
      warning message stating that a logical block has been mapped to an
      illegal physical block is posted in the kernel log.
      
      The bug occurs when an extent is being converted to the written state
      by ext4_end_io_dio() and ext4_ext_handle_uninitialized_extents()
      discovers a mapping for an existing uninitialized extent. Although it
      sets EXT4_MAP_MAPPED in map->m_flags, it fails to set map->m_pblk to
      the discovered physical block number.  Because map->m_pblk is not
      otherwise initialized or set by this function or its callers, its
      uninitialized value is returned to ext4_map_blocks(), where it is
      stored as a bogus mapping in the extent status tree.
      
      Since map->m_pblk can accidentally contain illegal values that are
      larger than the physical size of the file system,  calls to
      check_block_validity() in ext4_map_blocks() that are enabled if the
      block_validity mount option is used can fail, resulting in the logged
      warning message.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org  # 3.11+
      15cc1767
  5. 10 2月, 2014 1 次提交
    • A
      fix O_SYNC|O_APPEND syncing the wrong range on write() · d311d79d
      Al Viro 提交于
      It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
      when sync_page_range() had been introduced; generic_file_write{,v}() correctly
      synced
      	pos_after_write - written .. pos_after_write - 1
      but generic_file_aio_write() synced
      	pos_before_write .. pos_before_write + written - 1
      instead.  Which is not the same thing with O_APPEND, obviously.
      A couple of years later correct variant had been killed off when
      everything switched to use of generic_file_aio_write().
      
      All users of generic_file_aio_write() are affected, and the same bug
      has been copied into other instances of ->aio_write().
      
      The fix is trivial; the only subtle point is that generic_write_sync()
      ought to be inlined to avoid calculations useless for the majority of
      calls.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d311d79d
  6. 09 2月, 2014 6 次提交
    • F
      Btrfs: fix data corruption when reading/updating compressed extents · a2aa75e1
      Filipe David Borba Manana 提交于
      When using a mix of compressed file extents and prealloc extents, it
      is possible to fill a page of a file with random, garbage data from
      some unrelated previous use of the page, instead of a sequence of zeroes.
      
      A simple sequence of steps to get into such case, taken from the test
      case I made for xfstests, is:
      
         _scratch_mkfs
         _scratch_mount "-o compress-force=lzo"
         $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
      
      This results in the following file items in the fs tree:
      
         item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
             inode generation 6 transid 6 size 542872 block group 0 mode 100600
         item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
             inode ref index 2 namelen 6 name: foobar
         item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
             extent data disk byte 0 nr 0 gen 6
             extent data offset 0 nr 24576 ram 266240
             extent compression 0
         item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
             prealloc data disk byte 12849152 nr 241664 gen 6
             prealloc data offset 0 nr 241664
         item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
             extent data disk byte 12845056 nr 4096 gen 6
             extent data offset 0 nr 20480 ram 20480
             extent compression 2
         item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
             prealloc data disk byte 13090816 nr 405504 gen 6
             prealloc data offset 0 nr 258048
      
      The on disk extent at offset 266240 (which corresponds to 1 single disk block),
      contains 5 compressed chunks of file data. Each of the first 4 compress 4096
      bytes of file data, while the last one only compresses 3024 bytes of file data.
      Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
      1072 bytes) should always return zeroes (our next extent is a prealloc one).
      
      The solution here is the compression code path to zero the remaining (untouched)
      bytes of the last page it uncompressed data into, as the information about how
      much space the file data consumes in the last page is not known in the upper layer
      fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
      the remainder of the page but only if it corresponds to the last page of the inode
      and if the inode's size is not a multiple of the page size.
      
      This would cause not only returning random data on reads, but also permanently
      storing random data when updating parts of the region that should be zeroed.
      For the example above, it means updating a single byte in the region [285648 ; 286720[
      would store that byte correctly but also store random data on disk.
      
      A test case for xfstests follows soon.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a2aa75e1
    • J
      Btrfs: don't loop forever if we can't run because of the tree mod log · 27a377db
      Josef Bacik 提交于
      A user reported a 100% cpu hang with my new delayed ref code.  Turns out I
      forgot to increase the count check when we can't run a delayed ref because of
      the tree mod log.  If we can't run any delayed refs during this there is no
      point in continuing to look, and we need to break out.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      27a377db
    • D
      btrfs: reserve no transaction units in btrfs_ioctl_set_features · 8051aa1a
      David Sterba 提交于
      Added in patch "btrfs: add ioctls to query/change feature bits online"
      modifications to superblock don't need to reserve metadata blocks when
      starting a transaction.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      8051aa1a
    • J
      btrfs: commit transaction after setting label and features · d0270aca
      Jeff Mahoney 提交于
      The set_fslabel ioctl uses btrfs_end_transaction, which means it's
      possible that the change will be lost if the system crashes, same for
      the newly set features. Let's use btrfs_commit_transaction instead.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      d0270aca
    • J
      Btrfs: fix assert screwup for the pending move stuff · 6cc98d90
      Josef Bacik 提交于
      Wang noticed that he was failing btrfs/030 even though me and Filipe couldn't
      reproduce.  Turns out this is because Wang didn't have CONFIG_BTRFS_ASSERT set,
      which meant that a key part of Filipe's original patch was not being built in.
      This appears to be a mess up with merging Filipe's patch as it does not exist in
      his original patch.  Fix this by changing how we make sure del_waiting_dir_move
      asserts that it did not error and take the function out of the ifdef check.
      This makes btrfs/030 pass with the assert on or off.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NFilipe Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      6cc98d90
    • D
      jfs: fix generic posix ACL regression · c18f7b51
      Dave Kleikamp 提交于
      I missed a couple errors in reviewing the patches converting jfs
      to use the generic posix ACL function. Setting ACL's currently
      fails with -EOPNOTSUPP.
      Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Reported-by: NMichael L. Semon <mlsemon35@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c18f7b51
  7. 07 2月, 2014 2 次提交
  8. 06 2月, 2014 2 次提交
    • L
      execve: use 'struct filename *' for executable name passing · c4ad8f98
      Linus Torvalds 提交于
      This changes 'do_execve()' to get the executable name as a 'struct
      filename', and to free it when it is done.  This is what the normal
      users want, and it simplifies and streamlines their error handling.
      
      The controlled lifetime of the executable name also fixes a
      use-after-free problem with the trace_sched_process_exec tracepoint: the
      lifetime of the passed-in string for kernel users was not at all
      obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
      the pathname allocation lifetime with the execve() having finished,
      which in turn meant that the trace point that happened after
      mm_release() of the old process VM ended up using already free'd memory.
      
      To solve the kernel string lifetime issue, this simply introduces
      "getname_kernel()" that works like the normal user-space getname()
      function, except with the source coming from kernel memory.
      
      As Oleg points out, this also means that we could drop the tcomm[] array
      from 'struct linux_binprm', since the pathname lifetime now covers
      setup_new_exec().  That would be a separate cleanup.
      Reported-by: NIgor Zhbanov <i.zhbanov@samsung.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4ad8f98
    • T
      kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag · da9846ae
      Tejun Heo 提交于
      kernfs_deactivate() forgot to check whether KERNFS_LOCKDEP is set
      before performing lockdep annotations and ends up feeding
      uninitialized lockdep_map to lockdep triggering warning like the
      following on USB stick hotunplug.
      
       usb 1-2: USB disconnect, device number 2
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 1 PID: 62 Comm: khubd Not tainted 3.13.0-work+ #82
       Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
        ffff880065ca7f60 ffff88013a4ffa08 ffffffff81cfb6bd 0000000000000002
        ffff88013a4ffac8 ffffffff810f8530 ffff88013a4fc710 0000000000000002
        ffff880100000000 ffffffff82a3db50 0000000000000001 ffff88013a4fc710
       Call Trace:
        [<ffffffff81cfb6bd>] dump_stack+0x4e/0x7a
        [<ffffffff810f8530>] __lock_acquire+0x1910/0x1e70
        [<ffffffff810f931a>] lock_acquire+0x9a/0x1d0
        [<ffffffff8127c75e>] kernfs_deactivate+0xee/0x130
        [<ffffffff8127d4c8>] kernfs_addrm_finish+0x38/0x60
        [<ffffffff8127d701>] kernfs_remove_by_name_ns+0x51/0xa0
        [<ffffffff8127b4f1>] remove_files.isra.1+0x41/0x80
        [<ffffffff8127b7e7>] sysfs_remove_group+0x47/0xa0
        [<ffffffff8127b873>] sysfs_remove_groups+0x33/0x50
        [<ffffffff8177d66d>] device_remove_attrs+0x4d/0x80
        [<ffffffff8177e25e>] device_del+0x12e/0x1d0
        [<ffffffff819722c2>] usb_disconnect+0x122/0x1a0
        [<ffffffff819749b5>] hub_thread+0x3c5/0x1290
        [<ffffffff810c6a6d>] kthread+0xed/0x110
        [<ffffffff81d0a56c>] ret_from_fork+0x7c/0xb0
      
      Fix it by making kernfs_deactivate() perform lockdep annotations only
      if KERNFS_LOCKDEP is set.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NFabio Estevam <festevam@gmail.com>
      Reported-by: NAlan Stern <stern@rowland.harvard.edu>
      Reported-by: NJiri Kosina <jkosina@suse.cz>
      Reported-by: NDave Jones <davej@redhat.com>
      Tested-by: NFabio Estevam <fabio.estevam@freescale.com>
      Tested-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da9846ae
  9. 04 2月, 2014 6 次提交
  10. 03 2月, 2014 2 次提交
    • M
      hpfs: optimize quad buffer loading · 1c0b8a7a
      Mikulas Patocka 提交于
      HPFS needs to load 4 consecutive 512-byte sectors when accessing the
      directory nodes or bitmaps.  We can't switch to 2048-byte block size
      because files are allocated in the units of 512-byte sectors.
      
      Previously, the driver would allocate a 2048-byte area using kmalloc,
      copy the data from four buffers to this area and eventually copy them
      back if they were modified.
      
      In the current implementation of the buffer cache, buffers are allocated
      in the pagecache.  That means that 4 consecutive 512-byte buffers are
      stored in consecutive areas in the kernel address space.  So, we don't
      need to allocate extra memory and copy the content of the buffers there.
      
      This patch optimizes the code to avoid copying the buffers.  It checks
      if the four buffers are stored in contiguous memory - if they are not,
      it falls back to allocating a 2048-byte area and copying data there.
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c0b8a7a
    • M
      hpfs: remember free space · 2cbe5c76
      Mikulas Patocka 提交于
      Previously, hpfs scanned all bitmaps each time the user asked for free
      space using statfs.  This patch changes it so that hpfs scans the
      bitmaps only once, remembes the free space and on next invocation of
      statfs it returns the value instantly.
      
      New versions of wine are hammering on the statfs syscall very heavily,
      making some games unplayable when they're stored on hpfs, with load
      times in minutes.
      
      This should be backported to the stable kernels because it fixes
      user-visible problem (excessive level load times in wine).
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2cbe5c76
  11. 02 2月, 2014 3 次提交
  12. 01 2月, 2014 6 次提交
  13. 31 1月, 2014 5 次提交
  14. 30 1月, 2014 1 次提交