1. 29 8月, 2013 4 次提交
  2. 17 8月, 2013 15 次提交
    • J
      ext4: fix lost truncate due to race with writeback · 90e775b7
      Jan Kara 提交于
      The following race can lead to a loss of i_disksize update from truncate
      thus resulting in a wrong inode size if the inode size isn't updated
      again before inode is reclaimed:
      
      ext4_setattr()				mpage_map_and_submit_extent()
        EXT4_I(inode)->i_disksize = attr->ia_size;
        ...					  ...
      					  disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT
      					  /* False because i_size isn't
      					   * updated yet */
      					  if (disksize > i_size_read(inode))
      					  /* True, because i_disksize is
      					   * already truncated */
      					  if (disksize > EXT4_I(inode)->i_disksize)
      					    /* Overwrite i_disksize
      					     * update from truncate */
      					    ext4_update_i_disksize()
        i_size_write(inode, attr->ia_size);
      
      For other places updating i_disksize such race cannot happen because
      i_mutex prevents these races. Writeback is the only place where we do
      not hold i_mutex and we cannot grab it there because of lock ordering.
      
      We fix the race by doing both i_disksize and i_size update in truncate
      atomically under i_data_sem and in mpage_map_and_submit_extent() we move
      the check against i_size under i_data_sem as well.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      90e775b7
    • J
      ext4: simplify truncation code in ext4_setattr() · 5208386c
      Jan Kara 提交于
      Merge conditions in ext4_setattr() handling inode size changes, also
      move ext4_begin_ordered_truncate() call somewhat earlier because it
      simplifies error recovery in case of failure. Also add error handling in
      case i_disksize update fails.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      5208386c
    • J
      ext4: fix ext4_writepages() in presence of truncate · 5f1132b2
      Jan Kara 提交于
      Inode size can arbitrarily change while writeback is in progress. When
      ext4_writepages() has prepared a long extent for mapping and truncate
      then reduces i_size, mpage_map_and_submit_buffers() will always map just
      one buffer in a page instead of all of them due to lblk < blocks check.
      So we end up not using all blocks we've allocated (thus leaking them)
      and also delalloc accounting goes wrong manifesting as a warning like:
      
      ext4_da_release_space:1333: ext4_da_release_space: ino 12, to_free 1
      with only 0 reserved data blocks
      
      Note that the problem can happen only when blocksize < pagesize because
      otherwise we have only a single buffer in the page.
      
      Fix the problem by removing the size check from the mapping loop. We
      have an extent allocated so we have to use it all before checking for
      i_size. We also rename add_page_bufs_to_extent() to
      mpage_process_page_bufs() and make that function submit the page for IO
      if all buffers (upto EOF) in it are mapped.
      Reported-by: NDave Jones <davej@redhat.com>
      Reported-by: NZheng Liu <gnehzuil.liu@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      5f1132b2
    • J
      ext4: move test whether extent to map can be extended to one place · 09930042
      Jan Kara 提交于
      Currently the logic whether the current buffer can be added to an extent
      of buffers to map is split between mpage_add_bh_to_extent() and
      add_page_bufs_to_extent(). Move the whole logic to
      mpage_add_bh_to_extent() which makes things a bit more straightforward
      and make following i_size fixes easier.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      09930042
    • J
      ext4: fix warning in ext4_da_update_reserve_space() · 7d734532
      Jan Kara 提交于
      reaim workfile.dbase test easily triggers warning in
      ext4_da_update_reserve_space():
      
      EXT4-fs warning (device ram0): ext4_da_update_reserve_space:365:
      ino 12, allocated 1 with only 0 reserved metadata blocks (releasing 1
      blocks with reserved 9 data blocks)
      
      The problem is that (one of) tests creates file and then randomly writes
      to it with O_SYNC. That results in writing back pages of the file in
      random order so we create extents for written blocks say 0, 2, 4, 6, 8
      - this last allocation also allocates new block for extents. Then we
      writeout block 1 so we have extents 0-2, 4, 6, 8 and we release
      indirect extent block because extents fit in the inode again. Then we
      writeout block 10 and we need to allocate indirect extent block again
      which triggers the warning because we don't have the reservation
      anymore.
      
      Fix the problem by giving back freed metadata blocks resulting from
      extent merging into inode's reservation pool.
      Signed-off-by: NJan Kara <jack@suse.cz>
      7d734532
    • J
      quota: provide interface for readding allocated space into reserved space · 1c8924eb
      Jan Kara 提交于
      ext4 needs to convert allocated (metadata) blocks back into blocks
      reserved for delayed allocation. Add functions into quota code for
      supporting such operation.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1c8924eb
    • T
      ext4: avoid reusing recently deleted inodes in no journal mode · 19883bd9
      Theodore Ts'o 提交于
      In no journal mode, if an inode has recently been deleted, we
      shouldn't reuse it right away.  Otherwise it's possible, after an
      unclean shutdown, to hit a situation where a recently deleted inode
      gets reused for some other purpose before the inode table block has
      been written to disk.  However, if the directory entry has been
      updated, then the directory entry will be pointing at the old inode
      contents.
      
      E2fsck will make sure the file system is consistent after the
      unclean shutdown.  However, if the recently deleted inode is a
      character mode device, or an inode with the immutable bit set, even
      after the file system has been fixed up by e2fsck, it can be
      possible for a *.pyc file to be pointing at a character mode
      device, and when python tries to open the *.pyc file, Hilarity
      Ensues.  We could change all of userspace to be very suspicious
      about stat'ing files before opening them, and clearing the
      immutable flag if necessary --- or we can just avoid reusing an
      inode number if it has been recently deleted.
      
      Google-Bug-Id: 10017573
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      19883bd9
    • T
      ext4: allocate delayed allocation blocks before rename · 0e202704
      Theodore Ts'o 提交于
      When ext4_rename() overwrites an already existing file, call
      ext4_alloc_da_blocks() before starting the journal handle which
      actually does the rename, instead of doing this afterwards.  This
      improves the likelihood that the contents will survive a crash if an
      application replaces a file using the sequence:
      
      1)  write replacement contents to foo.new
      2)  <omit fsync of foo.new>
      3)  rename foo.new to foo
      
      It is still not a guarantee, since ext4_alloc_da_blocks() is *not*
      doing a file integrity sync; this means if foo.new is a very large
      file, it may not be completely flushed out to disk.
      
      However, for files smaller than a megabyte or so, any dirty pages
      should be flushed out before we do the rename operation, and so at the
      next journal commit, the CACHE FLUSH command will make sure al of
      these pages are safely on the disk platter.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0e202704
    • T
      ext4: start handle at least possible moment when renaming files · 5b61de75
      Theodore Ts'o 提交于
      In ext4_rename(), don't start the journal handle until the the
      directory entries have been successfully looked up.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5b61de75
    • T
      ext4: add support for extent pre-caching · 7869a4a6
      Theodore Ts'o 提交于
      Add a new fiemap flag which forces the all of the extents in an inode
      to be cached in the extent_status tree.  This is critically important
      when using AIO to a preallocated file, since if we need to read in
      blocks from the extent tree, the io_submit(2) system call becomes
      synchronous, and the AIO is no longer "A", which is bad.
      
      In addition, for most files which have an external leaf tree block,
      the cost of caching the information in the extent status tree will be
      less than caching the entire 4k block in the buffer cache.  So it is
      generally a win to keep the extent information cached.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7869a4a6
    • T
      ext4: cache all of an extent tree's leaf block upon reading · 107a7bd3
      Theodore Ts'o 提交于
      When we read in an extent tree leaf block from disk, arrange to have
      all of its entries cached.  In nearly all cases the in-memory
      representation will be more compact than the on-disk representation in
      the buffer cache, and it allows us to get the information without
      having to traverse the extent tree for successive extents.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      107a7bd3
    • T
      ext4: use unsigned int for es_status values · 3be78c73
      Theodore Ts'o 提交于
      Don't use an unsigned long long for the es_status flags; this requires
      that we pass 64-bit values around which is painful on 32-bit systems.
      Instead pass the extent status flags around using the low 4 bits of an
      unsigned int, and shift them into place when we are reading or writing
      es_pblk.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      3be78c73
    • T
      ext4: print the block number of invalid extent tree blocks · c349179b
      Theodore Ts'o 提交于
      When we find an invalid extent tree block, report the block number of
      the bad block for debugging purposes.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      c349179b
    • T
      ext4: refactor code to read the extent tree block · 7d7ea89e
      Theodore Ts'o 提交于
      Refactor out the code needed to read the extent tree block into a
      single read_extent_tree_block() function.  In addition to simplifying
      the code, it also makes sure that we call the ext4_ext_load_extent
      tracepoint whenever we need to read an extent tree block from disk.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      7d7ea89e
    • J
      jbd2: Fix oops in jbd2_journal_file_inode() · a361293f
      Jan Kara 提交于
      Commit 0713ed0c added
      jbd2_journal_file_inode() call into ext4_block_zero_page_range().
      However that function gets called from truncate path and thus inode
      needn't have jinode attached - that happens in ext4_file_open() but
      the file needn't be ever open since mount. Calling
      jbd2_journal_file_inode() without jinode attached results in the oops.
      
      We fix the problem by attaching jinode to inode also in ext4_truncate()
      and ext4_punch_hole() when we are going to zero out partial blocks.
      Reported-by: Nmajianpeng <majianpeng@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a361293f
  3. 12 8月, 2013 2 次提交
    • J
      jbd2: Fix use after free after error in jbd2_journal_dirty_metadata() · 91aa11fa
      Jan Kara 提交于
      When jbd2_journal_dirty_metadata() returns error,
      __ext4_handle_dirty_metadata() stops the handle. However callers of this
      function do not count with that fact and still happily used now freed
      handle. This use after free can result in various issues but very likely
      we oops soon.
      
      The motivation of adding __ext4_journal_stop() into
      __ext4_handle_dirty_metadata() in commit 9ea7a0df seems to be only to
      improve error reporting. So replace __ext4_journal_stop() with
      ext4_journal_abort_handle() which was there before that commit and add
      WARN_ON_ONCE() to dump stack to provide useful information.
      Reported-by: NSage Weil <sage@inktank.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org	# 3.2+
      91aa11fa
    • T
      ext4: flush the extent status cache during EXT4_IOC_SWAP_BOOT · cde2d7a7
      Theodore Ts'o 提交于
      Previously we weren't swapping only some of the extent_status LRU
      fields during the processing of the EXT4_IOC_SWAP_BOOT ioctl.  The
      much safer thing to do is to just completely flush the extent status
      tree when doing the swap.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Zheng Liu <gnehzuil.liu@gmail.com>
      Cc: stable@vger.kernel.org
      cde2d7a7
  4. 09 8月, 2013 2 次提交
  5. 30 7月, 2013 2 次提交
  6. 27 7月, 2013 2 次提交
  7. 21 7月, 2013 2 次提交
    • Z
      ext3: fix a BUG when opening a file with O_TMPFILE flag · dda5690d
      Zheng Liu 提交于
      When we try to open a file with O_TMPFILE flag, we will trigger a bug.
      The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
      this check always fails because we set ->i_nlink = 1 in
      inode_init_always().  We can use the following program to trigger it:
      
      int main(int argc, char *argv[])
      {
      	int fd;
      
      	fd = open(argv[1], O_TMPFILE, 0666);
      	if (fd < 0) {
      		perror("open ");
      		return -1;
      	}
      	close(fd);
      	return 0;
      }
      
      The oops message looks like this:
      
      kernel: kernel BUG at fs/ext3/namei.c:1992!
      kernel: invalid opcode: 0000 [#1] SMP
      kernel: Modules linked in: ext4 jbd2 crc16 cpufreq_ondemand ipv6 dm_mirror dm_region_hash dm_log dm_mod parport_pc parport serio_raw sg dcdbas pcspkr i2c_i801 ehci_pci ehci_hcd button acpi_cpufreq mperf e1000e ptp pps_core ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core ext3 jbd sd_mod ahci libahci libata scsi_mod uhci_hcd
      kernel: CPU: 0 PID: 2882 Comm: tst_tmpfile Not tainted 3.11.0-rc1+ #4
      kernel: Hardware name: Dell Inc. OptiPlex 780 /0V4W66, BIOS A05 08/11/2010
      kernel: task: ffff880112d30050 ti: ffff8801124d4000 task.ti: ffff8801124d4000
      kernel: RIP: 0010:[<ffffffffa00db5ae>] [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
      kernel: RSP: 0018:ffff8801124d5cc8  EFLAGS: 00010202
      kernel: RAX: 0000000000000000 RBX: ffff880111510128 RCX: ffff8801114683a0
      kernel: RDX: 0000000000000000 RSI: ffff880111510128 RDI: ffff88010fcf65a8
      kernel: RBP: ffff8801124d5d18 R08: 0080000000000000 R09: ffffffffa00d3b7f
      kernel: R10: ffff8801114683a0 R11: ffff8801032a2558 R12: 0000000000000000
      kernel: R13: ffff88010fcf6800 R14: ffff8801032a2558 R15: ffff8801115100d8
      kernel: FS:  00007f5d172b5700(0000) GS:ffff880117c00000(0000) knlGS:0000000000000000
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      kernel: CR2: 00007f5d16df15d0 CR3: 0000000110b1d000 CR4: 00000000000407f0
      kernel: Stack:
      kernel: 000000000000000c ffff8801048a7dc8 ffff8801114685a8 ffffffffa00b80d7
      kernel: ffff8801124d5e38 ffff8801032a2558 ffff88010ce24d68 0000000000000000
      kernel: ffff88011146b300 ffff8801124d5d44 ffff8801124d5d78 ffffffffa00db7e1
      kernel: Call Trace:
      kernel: [<ffffffffa00b80d7>] ? journal_start+0x8c/0xbd [jbd]
      kernel: [<ffffffffa00db7e1>] ext3_tmpfile+0xb2/0x13b [ext3]
      kernel: [<ffffffff821076f8>] path_openat+0x11f/0x5e7
      kernel: [<ffffffff821c86b4>] ? list_del+0x11/0x30
      kernel: [<ffffffff82065fa2>] ?  __dequeue_entity+0x33/0x38
      kernel: [<ffffffff82107cd5>] do_filp_open+0x3f/0x8d
      kernel: [<ffffffff82112532>] ? __alloc_fd+0x50/0x102
      kernel: [<ffffffff820f9296>] do_sys_open+0x13b/0x1cd
      kernel: [<ffffffff820f935c>] SyS_open+0x1e/0x20
      kernel: [<ffffffff82398c02>] system_call_fastpath+0x16/0x1b
      kernel: Code: 39 c7 0f 85 67 01 00 00 0f b7 03 25 00 f0 00 00 3d 00 40 00 00 74 18 3d 00 80 00 00 74 11 3d 00 a0 00 00 74 0a 83 7b 48 00 74 04 <0f> 0b eb fe 49 8b 85 50 03 00 00 4c 89 f6 48 c7 c7 c0 99 0e a0
      kernel: RIP  [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
      kernel: RSP <ffff8801124d5cc8>
      
      Here we couldn't call clear_nlink() directly because in d_tmpfile() we
      will call inode_dec_link_count() to decrease ->i_nlink.  So this commit
      tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      dda5690d
    • Z
      ext4: fix a BUG when opening a file with O_TMPFILE flag · e94bd349
      Zheng Liu 提交于
      When we try to open a file with O_TMPFILE flag, we will trigger a bug.
      The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
      this check always fails because we set ->i_nlink = 1 in
      inode_init_always().  We can use the following program to trigger it:
      
      int main(int argc, char *argv[])
      {
      	int fd;
      
      	fd = open(argv[1], O_TMPFILE, 0666);
      	if (fd < 0) {
      		perror("open ");
      		return -1;
      	}
      	close(fd);
      	return 0;
      }
      
      The oops message looks like this:
      
      kernel BUG at fs/ext4/namei.c:2572!
      invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in: dlci bridge stp hidp cmtp kernelcapi l2tp_ppp l2tp_netlink l2tp_core sctp libcrc32c rfcomm tun fuse nfnetli
      nk can_raw ipt_ULOG can_bcm x25 scsi_transport_iscsi ipx p8023 p8022 appletalk phonet psnap vmw_vsock_vmci_transport af_key vmw_vmci rose vsock atm can netrom ax25 af_rxrpc ir
      da pppoe pppox ppp_generic slhc bluetooth nfc rfkill rds caif_socket caif crc_ccitt af_802154 llc2 llc snd_hda_codec_realtek snd_hda_intel snd_hda_codec serio_raw snd_pcm pcsp
      kr edac_core snd_page_alloc snd_timer snd soundcore r8169 mii sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm
      CPU: 1 PID: 1812571 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #12
      Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
      task: ffff88007dfe69a0 ti: ffff88010f7b6000 task.ti: ffff88010f7b6000
      RIP: 0010:[<ffffffff8125ce69>]  [<ffffffff8125ce69>] ext4_orphan_add+0x299/0x2b0
      RSP: 0018:ffff88010f7b7cf8  EFLAGS: 00010202
      RAX: 0000000000000000 RBX: ffff8800966d3020 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffff88007dfe70b8 RDI: 0000000000000001
      RBP: ffff88010f7b7d40 R08: ffff880126a3c4e0 R09: ffff88010f7b7ca0
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801271fd668
      R13: ffff8800966d2f78 R14: ffff88011d7089f0 R15: ffff88007dfe69a0
      FS:  00007f70441a3740(0000) GS:ffff88012a800000(0000) knlGS:00000000f77c96c0
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000002834000 CR3: 0000000107964000 CR4: 00000000000007e0
      DR0: 0000000000780000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Stack:
       0000000000002000 00000020810b6dde 0000000000000000 ffff88011d46db00
       ffff8800966d3020 ffff88011d7089f0 ffff88009c7f4c10 ffff88010f7b7f2c
       ffff88007dfe69a0 ffff88010f7b7da8 ffffffff8125cfac ffff880100000004
      Call Trace:
       [<ffffffff8125cfac>] ext4_tmpfile+0x12c/0x180
       [<ffffffff811cba78>] path_openat+0x238/0x700
       [<ffffffff8100afc4>] ? native_sched_clock+0x24/0x80
       [<ffffffff811cc647>] do_filp_open+0x47/0xa0
       [<ffffffff811db73f>] ? __alloc_fd+0xaf/0x200
       [<ffffffff811ba2e4>] do_sys_open+0x124/0x210
       [<ffffffff81010725>] ? syscall_trace_enter+0x25/0x290
       [<ffffffff811ba3ee>] SyS_open+0x1e/0x20
       [<ffffffff816ca8d4>] tracesys+0xdd/0xe2
       [<ffffffff81001001>] ? start_thread_common.constprop.6+0x1/0xa0
      Code: 04 00 00 00 89 04 24 31 c0 e8 c4 77 04 00 e9 43 fe ff ff 66 25 00 d0 66 3d 00 80 0f 84 0e fe ff ff 83 7b 48 00 0f 84 04 fe ff ff <0f> 0b 49 8b 8c 24 50 07 00 00 e9 88 fe ff ff 0f 1f 84 00 00 00
      
      Here we couldn't call clear_nlink() directly because in d_tmpfile() we
      will call inode_dec_link_count() to decrease ->i_nlink.  So this commit
      tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Tested-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Tested-by: NDave Jones <davej@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      e94bd349
  8. 20 7月, 2013 6 次提交
    • A
      livelock avoidance in sget() · acfec9a5
      Al Viro 提交于
      Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about
      to fail.  The superblock is on ->fs_supers, ->s_umount is held exclusive,
      ->s_active is 1.  Along comes two more processes, trying to mount the same
      thing; sget() in each is picking that superblock, bumping ->s_count and
      trying to grab ->s_umount.  ->s_active is 3 now.  Original mount(2)
      finally gets to deactivate_locked_super() on failure; ->s_active is 2,
      superblock is still ->fs_supers because shutdown will *not* happen until
      ->s_active hits 0.  ->s_umount is dropped and now we have two processes
      chasing each other:
      s_active = 2, A acquired ->s_umount, B blocked
      A sees that the damn thing is stillborn, does deactivate_locked_super()
      s_active = 1, A drops ->s_umount, B gets it
      A restarts the search and finds the same superblock.  And bumps it ->s_active.
      s_active = 2, B holds ->s_umount, A blocked on trying to get it
      ... and we are in the earlier situation with A and B switched places.
      
      The root cause, of course, is that ->s_active should not grow until we'd
      got MS_BORN.  Then failing ->mount() will have deactivate_locked_super()
      shut the damn thing down.  Fortunately, it's easy to do - the key point
      is that grab_super() is called only for superblocks currently on ->fs_supers,
      so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and
      bump ->s_active; we must never increment ->s_count for superblocks past
      ->kill_sb(), but grab_super() is never called for those.
      
      The bug is pretty old; we would've caught it by now, if not for accidental
      exclusion between sget() for block filesystems; the things like cgroup or
      e.g. mtd-based filesystems don't have anything of that sort, so they get
      bitten.  The right way to deal with that is obviously to fix sget()...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      acfec9a5
    • A
      allow O_TMPFILE to work with O_WRONLY · ba57ea64
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ba57ea64
    • S
      Btrfs: fix wrong write offset when replacing a device · 115930cb
      Stefan Behrens 提交于
      Miao Xie reported the following issue:
      
      The filesystem was corrupted after we did a device replace.
      
      Steps to reproduce:
       # mkfs.btrfs -f -m single -d raid10 <device0>..<device3>
       # mount <device0> <mnt>
       # btrfs replace start -rfB 1 <device4> <mnt>
       # umount <mnt>
       # btrfsck <device4>
      
      The reason for the issue is that we changed the write offset by mistake,
      introduced by commit 625f1c8d.
      
      We read the data from the source device at first, and then write the
      data into the corresponding place of the new device. In order to
      implement the "-r" option, the source location is remapped using
      btrfs_map_block(). The read takes place on the mapped location, and
      the write needs to take place on the unmapped location. Currently
      the write is using the mapped location, and this commit changes it
      back by undoing the change to the write address that the aforementioned
      commit added by mistake.
      Reported-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: <stable@vger.kernel.org> # 3.10+
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      115930cb
    • J
      Btrfs: re-add root to dead root list if we stop dropping it · d29a9f62
      Josef Bacik 提交于
      If we stop dropping a root for whatever reason we need to add it back to the
      dead root list so that we will re-start the dropping next transaction commit.
      The other case this happens is if we recover a drop because we will add a root
      without adding it to the fs radix tree, so we can leak it's root and commit root
      extent buffer, adding this to the dead root list makes this cleanup happen.
      Thanks,
      
      Cc: stable@vger.kernel.org
      Reported-by: NAlex Lyakas <alex.btrfs@zadarastorage.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      d29a9f62
    • J
      Btrfs: fix lock leak when resuming snapshot deletion · fec386ac
      Josef Bacik 提交于
      We aren't setting path->locks[level] when we resume a snapshot deletion which
      means we won't unlock the buffer when we free the path.  This causes deadlocks
      if we happen to re-allocate the block before we've evicted the extent buffer
      from cache.  Thanks,
      
      Cc: stable@vger.kernel.org
      Reported-by: NAlex Lyakas <alex.btrfs@zadarastorage.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fec386ac
    • J
      Btrfs: update drop progress before stopping snapshot dropping · 3c8f2422
      Josef Bacik 提交于
      Alex pointed out a problem and fix that exists in the drop one snapshot at a
      time patch.  If we decide we need to exit for whatever reason (umount for
      example) we will just exit the snapshot dropping without updating the drop
      progress.  So the next time we go to resume we will BUG_ON() because we can't
      find the extent we left off at because we never updated it.  This patch fixes
      the problem.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAlex Lyakas <alex.btrfs@zadarastorage.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      3c8f2422
  9. 18 7月, 2013 2 次提交
  10. 17 7月, 2013 2 次提交
  11. 16 7月, 2013 1 次提交
    • T
      ext4: call ext4_es_lru_add() after handling cache miss · 63b99968
      Theodore Ts'o 提交于
      If there are no items in the extent status tree, ext4_es_lru_add() is
      a no-op.  So it is not sufficient to call ext4_es_lru_add() before we
      try to lookup an entry in the extent status tree.  We also need to
      call it at the end of ext4_ext_map_blocks(), after items have been
      added to the extent status tree.
      
      This could lead to inodes with that have extent status trees but which
      are not in the LRU list, which means they won't get considered for
      eviction by the es_shrinker.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Zheng Liu <wenqing.lz@taobao.com>
      Cc: stable@vger.kernel.org
      63b99968