1. 26 11月, 2014 4 次提交
    • J
      ext4: move handling of list of shrinkable inodes into extent status code · b0dea4c1
      Jan Kara 提交于
      Currently callers adding extents to extent status tree were responsible
      for adding the inode to the list of inodes with freeable extents. This
      is error prone and puts list handling in unnecessarily many places.
      
      Just add inode to the list automatically when the first non-delay extent
      is added to the tree and remove inode from the list when the last
      non-delay extent is removed.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b0dea4c1
    • Z
      ext4: change LRU to round-robin in extent status tree shrinker · edaa53ca
      Zheng Liu 提交于
      In this commit we discard the lru algorithm for inodes with extent
      status tree because it takes significant effort to maintain a lru list
      in extent status tree shrinker and the shrinker can take a long time to
      scan this lru list in order to reclaim some objects.
      
      We replace the lru ordering with a simple round-robin.  After that we
      never need to keep a lru list.  That means that the list needn't be
      sorted if the shrinker can not reclaim any objects in the first round.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      edaa53ca
    • Z
      ext4: cache extent hole in extent status tree for ext4_da_map_blocks() · 2f8e0a7c
      Zheng Liu 提交于
      Currently extent status tree doesn't cache extent hole when a write
      looks up in extent tree to make sure whether a block has been allocated
      or not.  In this case, we don't put extent hole in extent cache because
      later this extent might be removed and a new delayed extent might be
      added back.  But it will cause a defect when we do a lot of writes.  If
      we don't put extent hole in extent cache, the following writes also need
      to access extent tree to look at whether or not a block has been
      allocated.  It brings a cache miss.  This commit fixes this defect.
      Also if the inode doesn't have any extent, this extent hole will be
      cached as well.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2f8e0a7c
    • J
      ext4: fix block reservation for bigalloc filesystems · cbd7584e
      Jan Kara 提交于
      For bigalloc filesystems we have to check whether newly requested inode
      block isn't already part of a cluster for which we already have delayed
      allocation reservation. This check happens in ext4_ext_map_blocks() and
      that function sets EXT4_MAP_FROM_CLUSTER if that's the case. However if
      ext4_da_map_blocks() finds in extent cache information about the block,
      we don't call into ext4_ext_map_blocks() and thus we always end up
      getting new reservation even if the space for cluster is already
      reserved. This results in overreservation and premature ENOSPC reports.
      
      Fix the problem by checking for existing cluster reservation already in
      ext4_da_map_blocks(). That simplifies the logic and actually allows us
      to get rid of the EXT4_MAP_FROM_CLUSTER flag completely.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      cbd7584e
  2. 23 11月, 2014 4 次提交
    • E
      ext4: fix end of region partial cluster handling · 0756b908
      Eric Whitney 提交于
      ext4_ext_remove_space() can incorrectly free a partial_cluster if
      EAGAIN is encountered while truncating or punching.  Extent removal
      should be retried in this case.
      
      It also fails to free a partial cluster when the punched region begins
      at the start of a file on that unaligned cluster and where the entire
      file has not been punched.  Remove the requirement that all blocks in
      the file must have been freed in order to free the partial cluster.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0756b908
    • E
      ext4: miscellaneous partial cluster cleanups · 345ee947
      Eric Whitney 提交于
      Add some casts and rearrange a few statements for improved readability.
      Some code can also be simplified and made more readable if we set
      partial_cluster to 0 rather than to a negative value when we can tell
      we've hit the left edge of the punched region.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      345ee947
    • E
      ext4: fix end of leaf partial cluster handling · 5bf43760
      Eric Whitney 提交于
      The fix in commit ad6599ab ("ext4: fix premature freeing of
      partial clusters split across leaf blocks"), intended to avoid
      dereferencing an invalid extent pointer when determining whether a
      partial cluster should be freed, wasn't quite good enough.  Assure that
      at least one extent remains at the start of the leaf once the hole has
      been punched.  Otherwise, the pointer to the extent to the right of the
      hole will be invalid and a partial cluster will be incorrectly freed.
      
      Set partial_cluster to 0 when we can tell we've hit the left edge of
      the punched region within the leaf.  This prevents incorrect freeing
      of a partial cluster when ext4_ext_rm_leaf is called one last time
      during extent tree traversal after the punched region has been removed.
      
      Adjust comments to reflect code changes and a correction.  Remove a bit
      of dead code.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      5bf43760
    • E
      ext4: fix partial cluster initialization · f4226d9e
      Eric Whitney 提交于
      The partial_cluster variable is not always initialized correctly when
      hole punching on bigalloc file systems.  Although commit c0634493
      ("ext4: fix partial cluster handling for bigalloc file systems")
      addressed the case where the right edge of the punched region and the
      next extent to its right were within the same leaf, it didn't handle
      the case where the next extent to its right is in the next leaf.  This
      causes xfstest generic/300 to fail.
      
      Fix this by replacing the code in c0634493922 with a more general
      solution that can continue the search for the first cluster to the
      right of the punched region into the next leaf if present.  If found,
      partial_cluster is initialized to this cluster's negative value.
      There's no need to determine if that cluster is actually shared;  we
      simply record it so its blocks won't be freed in the event it does
      happen to be shared.
      
      Also, minimize the burden on non-bigalloc file systems with some minor
      code simplification.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f4226d9e
  3. 21 11月, 2014 1 次提交
  4. 06 11月, 2014 1 次提交
    • D
      ext4: move_extent improve bh vanishing success factor · 88c6b61f
      Dmitry Monakhov 提交于
      Xiaoguang Wang has reported sporadic EBUSY failures of ext4/302
      Unfortunetly there is nothing we can do if some other task holds BH's
      refenrence.  So we must return EBUSY in this case.  But we can try
      kicking the journal to see if the other task releases the bh reference
      after the commit is complete.  Also decrease false positives by
      properly checking for ENOSPC and retrying the allocation after kicking
      the journal --- which is done by ext4_should_retry_alloc().
      
      [ Modified by tytso to properly check for ENOSPC. ]
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      88c6b61f
  5. 30 10月, 2014 9 次提交
    • J
      ext4: make ext4_ext_convert_to_initialized() return proper number of blocks · ae9e9c6a
      Jan Kara 提交于
      ext4_ext_convert_to_initialized() can return more blocks than are
      actually allocated from map->m_lblk in case where initial part of the
      on-disk extent is zeroed out. Luckily this doesn't have serious
      consequences because the caller currently uses the return value
      only to unmap metadata buffers. Anyway this is a data
      corruption/exposure problem waiting to happen so fix it.
      
      Coverity-id: 1226848
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ae9e9c6a
    • J
      ext4: bail early when clearing inode journal flag fails · 4f879ca6
      Jan Kara 提交于
      When clearing inode journal flag, we call jbd2_journal_flush() to force
      all the journalled data to their final locations. Currently we ignore
      when this fails and continue clearing inode journal flag. This isn't a
      big problem because when jbd2_journal_flush() fails, journal is likely
      aborted anyway. But it can still lead to somewhat confusing results so
      rather bail out early.
      
      Coverity-id: 989044
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4f879ca6
    • J
      ext4: bail out from make_indexed_dir() on first error · 6050d47a
      Jan Kara 提交于
      When ext4_handle_dirty_dx_node() or ext4_handle_dirty_dirent_node()
      fail, there's really something wrong with the fs and there's no point in
      continuing further. Just return error from make_indexed_dir() in that
      case. Also initialize frames array so that if we return early due to
      error, dx_release() doesn't try to dereference uninitialized memory
      (which could happen also due to error in do_split()).
      
      Coverity-id: 741300
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      6050d47a
    • D
      ext4: prevent bugon on race between write/fcntl · a41537e6
      Dmitry Monakhov 提交于
      O_DIRECT flags can be toggeled via fcntl(F_SETFL). But this value checked
      twice inside ext4_file_write_iter() and __generic_file_write() which
      result in BUG_ON inside ext4_direct_IO.
      
      Let's initialize iocb->private unconditionally.
      
      TESTCASE: xfstest:generic/036  https://patchwork.ozlabs.org/patch/402445/
      
      #TYPICAL STACK TRACE:
      kernel BUG at fs/ext4/inode.c:2960!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
      CPU: 6 PID: 5505 Comm: aio-dio-fcntl-r Not tainted 3.17.0-rc2-00176-gff5c017 #161
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
      task: ffff88080e95a7c0 ti: ffff88080f908000 task.ti: ffff88080f908000
      RIP: 0010:[<ffffffff811fabf2>]  [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0
      RSP: 0018:ffff88080f90bb58  EFLAGS: 00010246
      RAX: 0000000000000400 RBX: ffff88080fdb2a28 RCX: 00000000a802c818
      RDX: 0000040000080000 RSI: ffff88080d8aeb80 RDI: 0000000000000001
      RBP: ffff88080f90bbc8 R08: 0000000000000000 R09: 0000000000001581
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080d8aeb80
      R13: ffff88080f90bbf8 R14: ffff88080fdb28c8 R15: ffff88080fdb2a28
      FS:  00007f23b2055700(0000) GS:ffff880818400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f23b2045000 CR3: 000000080cedf000 CR4: 00000000000407e0
      Stack:
       ffff88080f90bb98 0000000000000000 7ffffffffffffffe ffff88080fdb2c30
       0000000000000200 0000000000000200 0000000000000001 0000000000000200
       ffff88080f90bbc8 ffff88080fdb2c30 ffff88080f90be08 0000000000000200
      Call Trace:
       [<ffffffff8112ca9d>] generic_file_direct_write+0xed/0x180
       [<ffffffff8112f2b2>] __generic_file_write_iter+0x222/0x370
       [<ffffffff811f495b>] ext4_file_write_iter+0x34b/0x400
       [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410
       [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff810abd94>] ? __lock_acquire+0x274/0x700
       [<ffffffff811f4610>] ? ext4_unwritten_wait+0xb0/0xb0
       [<ffffffff811bd756>] aio_run_iocb+0x286/0x410
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
       [<ffffffff811bc05b>] ? lookup_ioctx+0x4b/0xf0
       [<ffffffff811bde3b>] do_io_submit+0x55b/0x740
       [<ffffffff811bdcaa>] ? do_io_submit+0x3ca/0x740
       [<ffffffff811be030>] SyS_io_submit+0x10/0x20
       [<ffffffff815ce192>] system_call_fastpath+0x16/0x1b
      Code: 01 48 8b 80 f0 01 00 00 48 8b 18 49 8b 45 10 0f 85 f1 01 00 00 48 03 45 c8 48 3b 43 48 0f 8f e3 01 00 00 49 83 7c
      24 18 00 75 04 <0f> 0b eb fe f0 ff 83 ec 01 00 00 49 8b 44 24 18 8b 00 85 c0 89
      RIP  [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0
       RSP <ffff88080f90bb58>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Cc: stable@vger.kernel.org
      a41537e6
    • D
      ext4: remove extent status procfs files if journal load fails · 50460fe8
      Darrick J. Wong 提交于
      If we can't load the journal, remove the procfs files for the extent
      status information file to avoid leaking resources.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      50460fe8
    • D
      ext4: disallow changing journal_csum option during remount · 6b992ff2
      Darrick J. Wong 提交于
      ext4 does not permit changing the metadata or journal checksum feature
      flag while mounted.  Until we decide to support that, don't allow a
      remount to change the journal_csum flag (right now we silently fail to
      change anything).
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      6b992ff2
    • D
      ext4: enable journal checksum when metadata checksum feature enabled · 98c1a759
      Darrick J. Wong 提交于
      If metadata checksumming is turned on for the FS, we need to tell the
      journal to use checksumming too.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      98c1a759
    • J
      ext4: fix oops when loading block bitmap failed · 599a9b77
      Jan Kara 提交于
      When we fail to load block bitmap in __ext4_new_inode() we will
      dereference NULL pointer in ext4_journal_get_write_access(). So check
      for error from ext4_read_block_bitmap().
      
      Coverity-id: 989065
      Cc: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      599a9b77
    • J
      ext4: fix overflow when updating superblock backups after resize · 9378c676
      Jan Kara 提交于
      When there are no meta block groups update_backups() will compute the
      backup block in 32-bit arithmetics thus possibly overflowing the block
      number and corrupting the filesystem. OTOH filesystems without meta
      block groups larger than 16 TB should be rare. Fix the problem by doing
      the counting in 64-bit arithmetics.
      
      Coverity-id: 741252
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NLukas Czerner <lczerner@redhat.com>
      9378c676
  6. 24 10月, 2014 1 次提交
  7. 14 10月, 2014 1 次提交
    • D
      ext4: check s_chksum_driver when looking for bg csum presence · 813d32f9
      Darrick J. Wong 提交于
      Convert the ext4_has_group_desc_csum predicate to look for a checksum
      driver instead of the metadata_csum flag and change the bg checksum
      calculation function to look for GDT_CSUM before taking the crc16
      path.
      
      Without this patch, if we mount with ^uninit_bg,^metadata_csum and
      later metadata_csum gets turned on by accident, the block group
      checksum functions will incorrectly assume that checksumming is
      enabled (metadata_csum) but that crc16 should be used
      (!s_chksum_driver).  This is totally wrong, so fix the predicate
      and the checksum formula selection.
      
      (Granted, if the metadata_csum feature bit gets enabled on a live FS
      then something underhanded is going on, but we could at least avoid
      writing garbage into the on-disk fields.)
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Cc: stable@vger.kernel.org
      813d32f9
  8. 13 10月, 2014 2 次提交
    • D
      ext4: move error report out of atomic context in ext4_init_block_bitmap() · aef4885a
      Dmitry Monakhov 提交于
      Error report likely result in IO so it is bad idea to do it from
      atomic context.
      
      This patch should fix following issue:
      
      BUG: sleeping function called from invalid context at include/linux/buffer_head.h:349
      in_atomic(): 1, irqs_disabled(): 0, pid: 137, name: kworker/u128:1
      5 locks held by kworker/u128:1/137:
       #0:  ("writeback"){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
       #1:  ((&(&wb->dwork)->work)){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
       #2:  (jbd2_handle){......}, at: [<ffffffff81242622>] start_this_handle+0x712/0x7b0
       #3:  (&ei->i_data_sem){......}, at: [<ffffffff811fa387>] ext4_map_blocks+0x297/0x430
       #4:  (&(&bgl->locks[i].lock)->rlock){......}, at: [<ffffffff811f3180>] ext4_read_block_bitmap_nowait+0x5d0/0x630
      CPU: 3 PID: 137 Comm: kworker/u128:1 Not tainted 3.17.0-rc2-00184-g82752e4 #165
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
      Workqueue: writeback bdi_writeback_workfn (flush-1:0)
       0000000000000411 ffff880813777288 ffffffff815c7fdc ffff880813777288
       ffff880813a8bba0 ffff8808137772a8 ffffffff8108fb30 ffff880803e01e38
       ffff880803e01e38 ffff8808137772c8 ffffffff811a8d53 ffff88080ecc6000
      Call Trace:
       [<ffffffff815c7fdc>] dump_stack+0x51/0x6d
       [<ffffffff8108fb30>] __might_sleep+0xf0/0x100
       [<ffffffff811a8d53>] __sync_dirty_buffer+0x43/0xe0
       [<ffffffff811a8e03>] sync_dirty_buffer+0x13/0x20
       [<ffffffff8120f581>] ext4_commit_super+0x1d1/0x230
       [<ffffffff8120fa03>] save_error_info+0x23/0x30
       [<ffffffff8120fd06>] __ext4_error+0xb6/0xd0
       [<ffffffff8120f260>] ? ext4_group_desc_csum+0x140/0x190
       [<ffffffff811f2d8c>] ext4_read_block_bitmap_nowait+0x1dc/0x630
       [<ffffffff8122e23a>] ext4_mb_init_cache+0x21a/0x8f0
       [<ffffffff8113ae95>] ? lru_cache_add+0x55/0x60
       [<ffffffff8112e16c>] ? add_to_page_cache_lru+0x6c/0x80
       [<ffffffff8122eaa0>] ext4_mb_init_group+0x190/0x280
       [<ffffffff8122ec51>] ext4_mb_good_group+0xc1/0x190
       [<ffffffff8123309a>] ext4_mb_regular_allocator+0x17a/0x410
       [<ffffffff8122c821>] ? ext4_mb_use_preallocated+0x31/0x380
       [<ffffffff81233535>] ? ext4_mb_new_blocks+0x205/0x8e0
       [<ffffffff8116ed5c>] ? kmem_cache_alloc+0xfc/0x180
       [<ffffffff812335b0>] ext4_mb_new_blocks+0x280/0x8e0
       [<ffffffff8116f2c4>] ? __kmalloc+0x144/0x1c0
       [<ffffffff81221797>] ? ext4_find_extent+0x97/0x320
       [<ffffffff812257f4>] ext4_ext_map_blocks+0xbc4/0x1050
       [<ffffffff811fa387>] ? ext4_map_blocks+0x297/0x430
       [<ffffffff811fa3ab>] ext4_map_blocks+0x2bb/0x430
       [<ffffffff81200e43>] ? ext4_init_io_end+0x23/0x50
       [<ffffffff811feb44>] ext4_writepages+0x564/0xaf0
       [<ffffffff815cde3b>] ? _raw_spin_unlock+0x2b/0x40
       [<ffffffff810ac7bd>] ? lock_release_non_nested+0x2fd/0x3c0
       [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
       [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
       [<ffffffff811377e3>] do_writepages+0x23/0x40
       [<ffffffff8119c8ce>] __writeback_single_inode+0x9e/0x280
       [<ffffffff811a026b>] writeback_sb_inodes+0x2db/0x490
       [<ffffffff811a0664>] wb_writeback+0x174/0x2d0
       [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
       [<ffffffff811a0863>] wb_do_writeback+0xa3/0x200
       [<ffffffff811a0a40>] bdi_writeback_workfn+0x80/0x230
       [<ffffffff81085618>] ? process_one_work+0x228/0x4d0
       [<ffffffff810856cd>] process_one_work+0x2dd/0x4d0
       [<ffffffff81085618>] ? process_one_work+0x228/0x4d0
       [<ffffffff81085c1d>] worker_thread+0x35d/0x460
       [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
       [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
       [<ffffffff8108a885>] kthread+0xf5/0x100
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff8108a790>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff815ce2ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff8108a790>] ? __init_kthread_work
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      aef4885a
    • D
      ext4: Replace open coded mdata csum feature to helper function · 9aa5d32b
      Dmitry Monakhov 提交于
      Besides the fact that this replacement improves code readability
      it also protects from errors caused direct EXT4_S(sb)->s_es manipulation
      which may result attempt to use uninitialized  csum machinery.
      
      #Testcase_BEGIN
      IMG=/dev/ram0
      MNT=/mnt
      mkfs.ext4 $IMG
      mount $IMG $MNT
      #Enable feature directly on disk, on mounted fs
      tune2fs -O metadata_csum  $IMG
      # Provoke metadata update, likey result in OOPS
      touch $MNT/test
      umount $MNT
      #Testcase_END
      
      # Replacement script
      @@
      expression E;
      @@
      - EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
      + ext4_has_metadata_csum(E)
      
      https://bugzilla.kernel.org/show_bug.cgi?id=82201Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      9aa5d32b
  9. 12 10月, 2014 2 次提交
    • X
      ext4: delete useless comments about ext4_move_extents · 65dd8327
      Xiaoguang Wang 提交于
      In patch 'ext4: refactor ext4_move_extents code base',  Dmitry Monakhov has
      refactored ext4_move_extents' implementation, but forgot to update the
      corresponding comments, this patch will try to delete some useless comments.
      Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      65dd8327
    • E
      ext4: fix reservation overflow in ext4_da_write_begin · 0ff8947f
      Eric Sandeen 提交于
      Delalloc write journal reservations only reserve 1 credit,
      to update the inode if necessary.  However, it may happen
      once in a filesystem's lifetime that a file will cross
      the 2G threshold, and require the LARGE_FILE feature to
      be set in the superblock as well, if it was not set already.
      
      This overruns the transaction reservation, and can be
      demonstrated simply on any ext4 filesystem without the LARGE_FILE
      feature already set:
      
      dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
      	conv=notrunc of=testfile
      sync
      dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
      	conv=notrunc of=testfile
      
      leads to:
      
      EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
      EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
      EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
      EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
      EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
      
      Adjust the number of credits based on whether the flag is
      already set, and whether the current write may extend past the
      LARGE_FILE limit.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Cc: stable@vger.kernel.org
      0ff8947f
  10. 06 10月, 2014 2 次提交
    • T
      ext4: add ext4_iget_normal() which is to be used for dir tree lookups · f4bb2981
      Theodore Ts'o 提交于
      If there is a corrupted file system which has directory entries that
      point at reserved, metadata inodes, prohibit them from being used by
      treating them the same way we treat Boot Loader inodes --- that is,
      mark them to be bad inodes.  This prohibits them from being opened,
      deleted, or modified via chmod, chown, utimes, etc.
      
      In particular, this prevents a corrupted file system which has a
      directory entry which points at the journal inode from being deleted
      and its blocks released, after which point Much Hilarity Ensues.
      Reported-by: NSami Liedes <sami.liedes@iki.fi>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      f4bb2981
    • T
      ext4: don't orphan or truncate the boot loader inode · e2bfb088
      Theodore Ts'o 提交于
      The boot loader inode (inode #5) should never be visible in the
      directory hierarchy, but it's possible if the file system is corrupted
      that there will be a directory entry that points at inode #5.  In
      order to avoid accidentally trashing it, when such a directory inode
      is opened, the inode will be marked as a bad inode, so that it's not
      possible to modify (or read) the inode from userspace.
      
      Unfortunately, when we unlink this (invalid/illegal) directory entry,
      we will put the bad inode on the ophan list, and then when try to
      unlink the directory, we don't actually remove the bad inode from the
      orphan list before freeing in-memory inode structure.  This means the
      in-memory orphan list is corrupted, leading to a kernel oops.
      
      In addition, avoid truncating a bad inode in ext4_destroy_inode(),
      since truncating the boot loader inode is not a smart thing to do.
      Reported-by: NSami Liedes <sami.liedes@iki.fi>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      e2bfb088
  11. 04 10月, 2014 1 次提交
    • D
      ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT · 3e67cfad
      Dmitry Monakhov 提交于
      Otherwise this provokes complain like follows:
      WARNING: CPU: 12 PID: 5795 at fs/ext4/ext4_jbd2.c:48 ext4_journal_check_start+0x4e/0xa0()
      Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
      CPU: 12 PID: 5795 Comm: python Not tainted 3.17.0-rc2-00175-gae5344f #158
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
       0000000000000030 ffff8808116cfd28 ffffffff815c7dfc 0000000000000030
       0000000000000000 ffff8808116cfd68 ffffffff8106ce8c ffff8808116cfdc8
       ffff880813b16000 ffff880806ad6ae8 ffffffff81202008 0000000000000000
      Call Trace:
       [<ffffffff815c7dfc>] dump_stack+0x51/0x6d
       [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff81202008>] ? ext4_ioctl+0x9e8/0xeb0
       [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20
       [<ffffffff8122867e>] ext4_journal_check_start+0x4e/0xa0
       [<ffffffff81228c10>] __ext4_journal_start_sb+0x90/0x110
       [<ffffffff81202008>] ext4_ioctl+0x9e8/0xeb0
       [<ffffffff8107b0bd>] ? ptrace_stop+0x24d/0x2f0
       [<ffffffff81088530>] ? alloc_pid+0x480/0x480
       [<ffffffff8107b1f2>] ? ptrace_do_notify+0x92/0xb0
       [<ffffffff81186545>] do_vfs_ioctl+0x4e5/0x550
       [<ffffffff815cdbcb>] ? _raw_spin_unlock_irq+0x2b/0x40
       [<ffffffff81186603>] SyS_ioctl+0x53/0x80
       [<ffffffff815ce2ce>] tracesys+0xd0/0xd5
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      3e67cfad
  12. 02 10月, 2014 5 次提交
  13. 19 9月, 2014 3 次提交
  14. 18 9月, 2014 1 次提交
  15. 17 9月, 2014 2 次提交
  16. 11 9月, 2014 1 次提交