1. 07 4月, 2014 1 次提交
    • A
      ext4: initialize multi-block allocator before checking block descriptors · 00764937
      Azat Khuzhin 提交于
      With EXT4FS_DEBUG ext4_count_free_clusters() will call
      ext4_read_block_bitmap() without s_group_info initialized, so we need to
      initialize multi-block allocator before.
      
      And dependencies that must be solved, to allow this:
      - multi-block allocator needs in group descriptors
      - need to install s_op before initializing multi-block allocator,
        because in ext4_mb_init_backend() new inode is created.
      - initialize number of group desc blocks (s_gdb_count) otherwise
        number of clusters returned by ext4_free_clusters_after_init() is not correct.
        (see ext4_bg_num_gdb_nometa())
      
      Here is the stack backtrace:
      
      (gdb) bt
       #0  ext4_get_group_info (group=0, sb=0xffff880079a10000) at ext4.h:2430
       #1  ext4_validate_block_bitmap (sb=sb@entry=0xffff880079a10000,
           desc=desc@entry=0xffff880056510000, block_group=block_group@entry=0,
           bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:358
       #2  0xffffffff81232202 in ext4_wait_block_bitmap (sb=sb@entry=0xffff880079a10000,
           block_group=block_group@entry=0,
           bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:476
       #3  0xffffffff81232eaf in ext4_read_block_bitmap (sb=sb@entry=0xffff880079a10000,
           block_group=block_group@entry=0) at balloc.c:489
       #4  0xffffffff81232fc0 in ext4_count_free_clusters (sb=sb@entry=0xffff880079a10000) at balloc.c:665
       #5  0xffffffff81259ffa in ext4_check_descriptors (first_not_zeroed=<synthetic pointer>,
           sb=0xffff880079a10000) at super.c:2143
       #6  ext4_fill_super (sb=sb@entry=0xffff880079a10000, data=<optimized out>,
           data@entry=0x0 <irq_stack_union>, silent=silent@entry=0) at super.c:3851
           ...
      Signed-off-by: NAzat Khuzhin <a3at.mail@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      00764937
  2. 25 3月, 2014 1 次提交
  3. 19 3月, 2014 1 次提交
    • T
      ext4: each filesystem creates and uses its own mb_cache · 9c191f70
      T Makphaibulchoke 提交于
      This patch adds new interfaces to create and destory cache,
      ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove
      the cache creation and destory calls from ex4_init_xattr() and
      ext4_exitxattr() in fs/ext4/xattr.c.
      
      fs/ext4/super.c has been changed so that when a filesystem is mounted
      a cache is allocated and attched to its ext4_sb_info structure.
      
      fs/mbcache.c has been changed so that only one slab allocator is
      allocated and used by all mbcache structures.
      Signed-off-by: NT. Makphaibulchoke <tmac@hp.com>
      9c191f70
  4. 14 3月, 2014 1 次提交
  5. 13 3月, 2014 1 次提交
    • T
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o 提交于
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Anders Larsen <al@alarsen.net>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: xfs@oss.sgi.com
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: fuse-devel@lists.sourceforge.net
      Cc: cluster-devel@redhat.com
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-nilfs@vger.kernel.org
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: ocfs2-devel@oss.oracle.com
      Cc: reiserfs-devel@vger.kernel.org
      02b9984d
  6. 18 2月, 2014 1 次提交
  7. 13 2月, 2014 1 次提交
    • T
      ext4: don't try to modify s_flags if the the file system is read-only · 23301410
      Theodore Ts'o 提交于
      If an ext4 file system is created by some tool other than mke2fs
      (perhaps by someone who has a pathalogical fear of the GPL) that
      doesn't set one or the other of the EXT2_FLAGS_{UN}SIGNED_HASH flags,
      and that file system is then mounted read-only, don't try to modify
      the s_flags field.  Otherwise, if dm_verity is in use, the superblock
      will change, causing an dm_verity failure.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      23301410
  8. 09 12月, 2013 2 次提交
  9. 08 11月, 2013 1 次提交
  10. 18 10月, 2013 1 次提交
    • T
      ext4: add ratelimiting to ext4 messages · efbed4dc
      Theodore Ts'o 提交于
      In the case of a storage device that suddenly disappears, or in the
      case of significant file system corruption, this can result in a huge
      flood of messages being sent to the console.  This can overflow the
      file system containing /var/log/messages, or if a serial console is
      configured, this can slow down the system so much that a hardware
      watchdog can end up triggering forcing a system reboot.
      
      Google-Bug-Id: 7258357
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      efbed4dc
  11. 04 9月, 2013 1 次提交
    • C
      direct-io: Implement generic deferred AIO completions · 7b7a8665
      Christoph Hellwig 提交于
      Add support to the core direct-io code to defer AIO completions to user
      context using a workqueue.  This replaces opencoded and less efficient
      code in XFS and ext4 (we save a memory allocation for each direct IO)
      and will be needed to properly support O_(D)SYNC for AIO.
      
      The communication between the filesystem and the direct I/O code requires
      a new buffer head flag, which is a bit ugly but not avoidable until the
      direct I/O code stops abusing the buffer_head structure for communicating
      with the filesystems.
      
      Currently this creates a per-superblock unbound workqueue for these
      completions, which is taken from an earlier patch by Jan Kara.  I'm
      not really convinced about this use and would prefer a "normal" global
      workqueue with a high concurrency limit, but this needs further discussion.
      
      JK: Fixed ext4 part, dynamic allocation of the workqueue.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7b7a8665
  12. 29 8月, 2013 1 次提交
    • E
      ext4: allow specifying external journal by pathname mount option · ad4eec61
      Eric Sandeen 提交于
      It's always been a hassle that if an external journal's
      device number changes, the filesystem won't mount.
      And since boot-time enumeration can change, device number
      changes aren't unusual.
      
      The current mechanism to update the journal location is by
      passing in a mount option w/ a new devnum, but that's a hassle;
      it's a manual approach, fixing things after the fact.
      
      Adding a mount option, "-o journal_path=/dev/$DEVICE" would
      help, since then we can do i.e.
      
      # mount -o journal_path=/dev/disk/by-label/$JOURNAL_LABEL ...
      
      and it'll mount even if the devnum has changed, as shown here:
      
      # losetup /dev/loop0 journalfile
      # mke2fs -L mylabel-journal -O journal_dev /dev/loop0 
      # mkfs.ext4 -L mylabel -J device=/dev/loop0 /dev/sdb1
      
      Change the journal device number:
      
      # losetup -d /dev/loop0
      # losetup /dev/loop1 journalfile 
      
      And today it will fail:
      
      # mount /dev/sdb1 /mnt/test
      mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
             missing codepage or helper program, or other error
             In some cases useful info is found in syslog - try
             dmesg | tail  or so
      
      # dmesg | tail -n 1
      [17343.240702] EXT4-fs (sdb1): error: couldn't read superblock of external journal
      
      But with this new mount option, we can specify the new path:
      
      # mount -o journal_path=/dev/loop1 /dev/sdb1 /mnt/test
      #
      
      (which does update the encoded device number, incidentally):
      
      # umount /dev/sdb1
      # dumpe2fs -h /dev/sdb1 | grep "Journal device"
      dumpe2fs 1.41.12 (17-May-2010)
      Journal device:	          0x0701
      
      But best of all we can just always mount by journal-path, and
      it'll always work:
      
      # mount -o journal_path=/dev/disk/by-label/mylabel-journal /dev/sdb1 /mnt/test
      #
      
      So the journal_path option can be specified in fstab, and as long as
      the disk is available somewhere, and findable by label (or by UUID),
      we can mount.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      ad4eec61
  13. 20 8月, 2013 1 次提交
  14. 09 8月, 2013 2 次提交
  15. 27 7月, 2013 1 次提交
  16. 12 7月, 2013 1 次提交
    • T
      ext4: don't show usrquota/grpquota twice in /proc/mounts · ad065dd0
      Theodore Ts'o 提交于
      We now print mount options in a generic fashion in
      ext4_show_options(), so we shouldn't be explicitly printing the
      {usr,grp}quota options in ext4_show_quota_options().
      
      Without this patch, /proc/mounts can look like this:
      
       /dev/vdb /vdb ext4 rw,relatime,quota,usrquota,data=ordered,usrquota 0 0
                                            ^^^^^^^^              ^^^^^^^^
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      ad065dd0
  17. 06 7月, 2013 1 次提交
    • T
      ext4: fix ext4_get_group_number() · 960fd856
      Theodore Ts'o 提交于
      The function ext4_get_group_number() was introduced as an optimization
      in commit bd86298e.  Unfortunately, this commit incorrectly
      calculate the group number for file systems with a 1k block size (when
      s_first_data_block is 1 instead of zero).  This could cause the
      following kernel BUG:
      
      [  568.877799] ------------[ cut here ]------------
      [  568.877833] kernel BUG at fs/ext4/mballoc.c:3728!
      [  568.877840] Oops: Exception in kernel mode, sig: 5 [#1]
      [  568.877845] SMP NR_CPUS=32 NUMA pSeries
      [  568.877852] Modules linked in: binfmt_misc
      [  568.877861] CPU: 1 PID: 3516 Comm: fs_mark Not tainted 3.10.0-03216-g7c6809ff-dirty #1
      [  568.877867] task: c0000001fb0b8000 ti: c0000001fa954000 task.ti: c0000001fa954000
      [  568.877873] NIP: c0000000002f42a4 LR: c0000000002f4274 CTR: c000000000317ef8
      [  568.877879] REGS: c0000001fa956ed0 TRAP: 0700   Not tainted  (3.10.0-03216-g7c6809ff-dirty)
      [  568.877884] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 24000428  XER: 00000000
      [  568.877902] SOFTE: 1
      [  568.877905] CFAR: c0000000002b5464
      [  568.877908]
      GPR00: 0000000000000001 c0000001fa957150 c000000000c6a408 c0000001fb588000
      GPR04: 0000000000003fff c0000001fa9571c0 c0000001fa9571c4 000138098c50625f
      GPR08: 1301200000000000 0000000000000002 0000000000000001 0000000000000000
      GPR12: 0000000024000422 c00000000f33a300 0000000000008000 c0000001fa9577f0
      GPR16: c0000001fb7d0100 c000000000c29190 c0000000007f46e8 c000000000a14672
      GPR20: 0000000000000001 0000000000000008 ffffffffffffffff 0000000000000000
      GPR24: 0000000000000100 c0000001fa957278 c0000001fdb2bc78 c0000001fa957288
      GPR28: 0000000000100100 c0000001fa957288 c0000001fb588000 c0000001fdb2bd10
      [  568.877993] NIP [c0000000002f42a4] .ext4_mb_release_group_pa+0xec/0x1c0
      [  568.877999] LR [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0
      [  568.878004] Call Trace:
      [  568.878008] [c0000001fa957150] [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0 (unreliable)
      [  568.878017] [c0000001fa957200] [c0000000002fb070] .ext4_mb_discard_lg_preallocations+0x394/0x444
      [  568.878025] [c0000001fa957340] [c0000000002fb45c] .ext4_mb_release_context+0x33c/0x734
      [  568.878032] [c0000001fa957440] [c0000000002fbcf8] .ext4_mb_new_blocks+0x4a4/0x5f4
      [  568.878039] [c0000001fa957510] [c0000000002ef56c] .ext4_ext_map_blocks+0xc28/0x1178
      [  568.878047] [c0000001fa957640] [c0000000002c1a94] .ext4_map_blocks+0x2c8/0x490
      [  568.878054] [c0000001fa957730] [c0000000002c536c] .ext4_writepages+0x738/0xc60
      [  568.878062] [c0000001fa957950] [c000000000168a78] .do_writepages+0x5c/0x80
      [  568.878069] [c0000001fa9579d0] [c00000000015d1c4] .__filemap_fdatawrite_range+0x88/0xb0
      [  568.878078] [c0000001fa957aa0] [c00000000015d23c] .filemap_write_and_wait_range+0x50/0xfc
      [  568.878085] [c0000001fa957b30] [c0000000002b8edc] .ext4_sync_file+0x220/0x3c4
      [  568.878092] [c0000001fa957be0] [c0000000001f849c] .vfs_fsync_range+0x64/0x80
      [  568.878098] [c0000001fa957c70] [c0000000001f84f0] .vfs_fsync+0x38/0x4c
      [  568.878105] [c0000001fa957d00] [c0000000001f87f4] .do_fsync+0x54/0x90
      [  568.878111] [c0000001fa957db0] [c0000000001f8894] .SyS_fsync+0x28/0x3c
      [  568.878120] [c0000001fa957e30] [c000000000009c88] syscall_exit+0x0/0x7c
      [  568.878125] Instruction dump:
      [  568.878130] 60000000 813d0034 81610070 38000000 7f8b4800 419e001c 813f007c 7d2bfe70
      [  568.878144] 7d604a78 7c005850 54000ffe 7c0007b4 <0b000000> e8a10076 e87f0090 7fa4eb78
      [  568.878160] ---[ end trace 594d911d9654770b ]---
      
      In addition fix the STD_GROUP optimization so that it works for
      bigalloc file systems as well.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NLi Zhong <lizhongfs@gmail.com>
      Reviewed-by: NLukas Czerner <lczerner@redhat.com>
      Cc: stable@vger.kernel.org  # 3.10
      960fd856
  18. 01 7月, 2013 2 次提交
    • J
      ext4: reduce object size when !CONFIG_PRINTK · e7c96e8e
      Joe Perches 提交于
      Reduce the object size ~10% could be useful for embedded systems.
      
      Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and
      arguments, passing " " to functions when !CONFIG_PRINTK and still
      verifying format and arguments with no_printk.
      
      $ size fs/ext4/built-in.o*
         text	   data	    bss	    dec	    hex	filename
       239375	    610	    888	 240873	  3ace9	fs/ext4/built-in.o.new
       264167	    738	    888	 265793	  40e41	fs/ext4/built-in.o.old
      
          $ grep -E "CONFIG_EXT4|CONFIG_PRINTK" .config
          # CONFIG_PRINTK is not set
          CONFIG_EXT4_FS=y
          CONFIG_EXT4_USE_FOR_EXT23=y
          CONFIG_EXT4_FS_POSIX_ACL=y
          # CONFIG_EXT4_FS_SECURITY is not set
          # CONFIG_EXT4_DEBUG is not set
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e7c96e8e
    • Z
      ext4: improve extent cache shrink mechanism to avoid to burn CPU time · d3922a77
      Zheng Liu 提交于
      Now we maintain an proper in-order LRU list in ext4 to reclaim entries
      from extent status tree when we are under heavy memory pressure.  For
      keeping this order, a spin lock is used to protect this list.  But this
      lock burns a lot of CPU time.  We can use the following steps to trigger
      it.
      
        % cd /dev/shm
        % dd if=/dev/zero of=ext4-img bs=1M count=2k
        % mkfs.ext4 ext4-img
        % mount -t ext4 -o loop ext4-img /mnt
        % cd /mnt
        % for ((i=0;i<160;i++)); do truncate -s 64g $i; done
        % for ((i=0;i<160;i++)); do cp $i /dev/null &; done
        % perf record -a -g
        % perf report
      
      This commit tries to fix this problem.  Now a new member called
      i_touch_when is added into ext4_inode_info to record the last access
      time for an inode.  Meanwhile we never need to keep a proper in-order
      LRU list.  So this can avoid to burns some CPU time.  When we try to
      reclaim some entries from extent status tree, we use list_sort() to get
      a proper in-order list.  Then we traverse this list to discard some
      entries.  In ext4_sb_info, we use s_es_last_sorted to record the last
      time of sorting this list.  When we traverse the list, we skip the inode
      that is newer than this time, and move this inode to the tail of LRU
      list.  When the head of the list is newer than s_es_last_sorted, we will
      sort the LRU list again.
      
      In this commit, we break the loop if s_extent_cache_cnt == 0 because
      that means that all extents in extent status tree have been reclaimed.
      
      Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is
      changed to save a local variable in these functions.
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d3922a77
  19. 17 6月, 2013 1 次提交
  20. 13 6月, 2013 2 次提交
  21. 05 6月, 2013 2 次提交
  22. 28 5月, 2013 2 次提交
  23. 07 5月, 2013 1 次提交
  24. 21 4月, 2013 1 次提交
  25. 12 4月, 2013 1 次提交
  26. 10 4月, 2013 3 次提交
    • T
      ext4: fix miscellaneous big endian warnings · d6a77105
      Theodore Ts'o 提交于
      None of these result in any bug, but they makes sparse complain.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d6a77105
    • L
      ext4: introduce reserved space · 27dd4385
      Lukas Czerner 提交于
      Currently in ENOSPC condition when writing into unwritten space, or
      punching a hole, we might need to split the extent and grow extent tree.
      However since we can not allocate any new metadata blocks we'll have to
      zero out unwritten part of extent or punched out part of extent, or in
      the worst case return ENOSPC even though use actually does not allocate
      any space.
      
      Also in delalloc path we do reserve metadata and data blocks for the
      time we're going to write out, however metadata block reservation is
      very tricky especially since we expect that logical connectivity implies
      physical connectivity, however that might not be the case and hence we
      might end up allocating more metadata blocks than previously reserved.
      So in future, metadata reservation checks should be removed since we can
      not assure that we do not under reserve.
      
      And this is where reserved space comes into the picture. When mounting
      the file system we slice off a little bit of the file system space (2%
      or 4096 clusters, whichever is smaller) which can be then used for the
      cases mentioned above to prevent costly zeroout, or unexpected ENOSPC.
      
      The number of reserved clusters can be set via sysfs, however it can
      never be bigger than number of free clusters in the file system.
      
      Note that this patch fixes the failure of xfstest 274 as expected.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      27dd4385
    • A
      procfs: new helper - PDE_DATA(inode) · d9dda78b
      Al Viro 提交于
      The only part of proc_dir_entry the code outside of fs/proc
      really cares about is PDE(inode)->data.  Provide a helper
      for that; static inline for now, eventually will be moved
      to fs/proc, along with the knowledge of struct proc_dir_entry
      layout.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d9dda78b
  27. 09 4月, 2013 1 次提交
  28. 04 4月, 2013 3 次提交
    • L
      ext4: make ext4_block_in_group() much more efficient · 68911009
      Lukas Czerner 提交于
      Currently in when getting the block group number for a particular
      block in ext4_block_in_group() we're using
      ext4_get_group_no_and_offset() which uses do_div() to get the block
      group and the remainer which is offset within the group.
      
      We don't need all of that in ext4_block_in_group() as we only need to
      figure out the group number.
      
      This commit changes ext4_block_in_group() to calculate group number
      directly. This shows as a big improvement with regards to cpu
      utilization. Measuring fallocate -l 15T on fresh file system with perf
      showed that 23% of cpu time was spend in the
      ext4_get_group_no_and_offset(). With this change it completely
      disappears from the list only bumping the occurrence of
      ext4_init_block_bitmap() which is the biggest user of
      ext4_block_in_group() by 4%. As the result of this change on my system
      the fallocate call was approx. 10% faster.
      
      However since there is '-g' option in mkfs which allow us setting
      different groups size (mostly for developers) I've introduced new per
      file system flag whether we have a standard block group size or
      not. The flag is used to determine whether we can use the bit shift
      optimization or not.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      68911009
    • D
      ext4: unregister es_shrinker if mount failed · a75ae78f
      Dmitry Monakhov 提交于
      Otherwise destroyed ext_sb_info will be part of global shinker list
      and result in the following OOPS:
      
      JBD2: corrupted journal superblock
      JBD2: recovery failed
      EXT4-fs (dm-2): error loading journal
      general protection fault: 0000 [#1] SMP
      Modules linked in: fuse acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel microcode sg button sd_mod crc_t10dif ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_\
      mod
      CPU 1
      Pid: 2758, comm: mount Not tainted 3.8.0-rc3+ #136                  /DH55TC
      RIP: 0010:[<ffffffff811bfb2d>]  [<ffffffff811bfb2d>] unregister_shrinker+0xad/0xe0
      RSP: 0000:ffff88011d5cbcd8  EFLAGS: 00010207
      RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b53 RCX: 0000000000000006
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000246
      RBP: ffff88011d5cbce8 R08: 0000000000000002 R09: 0000000000000001
      R10: 0000000000000001 R11: 0000000000000000 R12: ffff88011cd3f848
      R13: ffff88011cd3f830 R14: ffff88011cd3f000 R15: 0000000000000000
      FS:  00007f7b721dd7e0(0000) GS:ffff880121a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007fffa6f75038 CR3: 000000011bc1c000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process mount (pid: 2758, threadinfo ffff88011d5ca000, task ffff880116aacb80)
      Stack:
      ffff88011cd3f000 ffffffff8209b6c0 ffff88011d5cbd18 ffffffff812482f1
      00000000000003f3 00000000ffffffea ffff880115f4c200 0000000000000000
      ffff88011d5cbda8 ffffffff81249381 ffff8801219d8bf8 ffffffff00000000
      Call Trace:
      [<ffffffff812482f1>] deactivate_locked_super+0x91/0xb0
      [<ffffffff81249381>] mount_bdev+0x331/0x340
      [<ffffffff81376730>] ? ext4_alloc_flex_bg_array+0x180/0x180
      [<ffffffff81362035>] ext4_mount+0x15/0x20
      [<ffffffff8124869a>] mount_fs+0x9a/0x2e0
      [<ffffffff81277e25>] vfs_kern_mount+0xc5/0x170
      [<ffffffff81279c02>] do_new_mount+0x172/0x2e0
      [<ffffffff8127aa56>] do_mount+0x376/0x380
      [<ffffffff8127ab98>] sys_mount+0x138/0x150
      [<ffffffff818ffed9>] system_call_fastpath+0x16/0x1b
      Code: 8b 05 88 04 eb 00 48 3d 90 ff 06 82 48 8d 58 e8 75 19 4c 89 e7 e8 e4 d7 2c 00 48 c7 c7 00 ff 06 82 e8 58 5f ef ff 5b 41 5c c9 c3 <48> 8b 4b 18 48 8b 73 20 48 89 da 31 c0 48 c7 c7 c5 a0 e4 81 e\
      8
      RIP  [<ffffffff811bfb2d>] unregister_shrinker+0xad/0xe0
      RSP <ffff88011d5cbcd8>
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      a75ae78f
    • D
      ext4: fix journal callback list traversal · 5d3ee208
      Dmitry Monakhov 提交于
      It is incorrect to use list_for_each_entry_safe() for journal callback
      traversial because ->next may be removed by other task:
      ->ext4_mb_free_metadata()
        ->ext4_mb_free_metadata()
          ->ext4_journal_callback_del()
      
      This results in the following issue:
      
      WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
      Hardware name:
      list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
      Call Trace:
       [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
       [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
       [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
       [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
       [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
       [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
       [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
       [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
       [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
       [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
       [<ffffffff810ac6be>] kthread+0x10e/0x120
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
      
      This patch fix the issue as follows:
      - ext4_journal_commit_callback() make list truly traversial safe
        simply by always starting from list_head
      - fix race between two ext4_journal_callback_del() and
        ext4_journal_callback_try_del()
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.com
      5d3ee208
  29. 13 3月, 2013 1 次提交
    • E
      fs: Readd the fs module aliases. · fa7614dd
      Eric W. Biederman 提交于
      I had assumed that the only use of module aliases for filesystems
      prior to "fs: Limit sys_mount to only request filesystem modules."
      was in request_module.  It turns out I was wrong.  At least mkinitcpio
      in Arch linux uses these aliases.
      
      So readd the preexising aliases, to keep from breaking userspace.
      
      Userspace eventually will have to follow and use the same aliases the
      kernel does.  So at some point we may be delete these aliases without
      problems.  However that day is not today.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      fa7614dd
  30. 12 3月, 2013 1 次提交