1. 18 12月, 2013 1 次提交
    • J
      ext4: fix deadlock when writing in ENOSPC conditions · 34cf865d
      Jan Kara 提交于
      Akira-san has been reporting rare deadlocks of his machine when running
      xfstests test 269 on ext4 filesystem. The problem turned out to be in
      ext4_da_reserve_metadata() and ext4_da_reserve_space() which called
      ext4_should_retry_alloc() while holding i_data_sem. Since
      ext4_should_retry_alloc() can force a transaction commit, this is a
      lock ordering violation and leads to deadlocks.
      
      Fix the problem by just removing the retry loops. These functions should
      just report ENOSPC to the caller (e.g. ext4_da_write_begin()) and that
      function must take care of retrying after dropping all necessary locks.
      Reported-and-tested-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      34cf865d
  2. 09 12月, 2013 5 次提交
  3. 04 12月, 2013 2 次提交
    • E
      ext4: check for overlapping extents in ext4_valid_extent_entries() · 5946d089
      Eryu Guan 提交于
      A corrupted ext4 may have out of order leaf extents, i.e.
      
      extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT
      extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT
                   ^^^^ overlap with previous extent
      
      Reading such extent could hit BUG_ON() in ext4_es_cache_extent().
      
      	BUG_ON(end < lblk);
      
      The problem is that __read_extent_tree_block() tries to cache holes as
      well but assumes 'lblk' is greater than 'prev' and passes underflowed
      length to ext4_es_cache_extent(). Fix it by checking for overlapping
      extents in ext4_valid_extent_entries().
      
      I hit this when fuzz testing ext4, and am able to reproduce it by
      modifying the on-disk extent by hand.
      
      Also add the check for (ee_block + len - 1) in ext4_valid_extent() to
      make sure the value is not overflow.
      
      Ran xfstests on patched ext4 and no regression.
      
      Cc: Lukáš Czerner <lczerner@redhat.com>
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      5946d089
    • J
      ext4: fix use-after-free in ext4_mb_new_blocks · 4e8d2139
      Junho Ryu 提交于
      ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count.
      While ext4_mb_use_preallocated checks pa->pa_deleted first and then
      increments pa->count later, ext4_mb_put_pa decrements pa->pa_count
      before holding pa->pa_lock and then sets pa->pa_deleted.
      
      * Free sequence
      ext4_mb_put_pa (1):		atomic_dec_and_test pa->pa_count
      ext4_mb_put_pa (2):		lock pa->pa_lock
      ext4_mb_put_pa (3):			check pa->pa_deleted
      ext4_mb_put_pa (4):			set pa->pa_deleted=1
      ext4_mb_put_pa (5):		unlock pa->pa_lock
      ext4_mb_put_pa (6):		remove pa from a list
      ext4_mb_pa_callback:		free pa
      
      * Use sequence
      ext4_mb_use_preallocated (1):	iterate over preallocation
      ext4_mb_use_preallocated (2):	lock pa->pa_lock
      ext4_mb_use_preallocated (3):		check pa->pa_deleted
      ext4_mb_use_preallocated (4):		increase pa->pa_count
      ext4_mb_use_preallocated (5):	unlock pa->pa_lock
      ext4_mb_release_context:	access pa
      
      * Use-after-free sequence
      [initial status]		<pa->pa_deleted = 0, pa_count = 1>
      ext4_mb_use_preallocated (1):	iterate over preallocation
      ext4_mb_use_preallocated (2):	lock pa->pa_lock
      ext4_mb_use_preallocated (3):		check pa->pa_deleted
      ext4_mb_put_pa (1):		atomic_dec_and_test pa->pa_count
      [pa_count decremented]		<pa->pa_deleted = 0, pa_count = 0>
      ext4_mb_use_preallocated (4):		increase pa->pa_count
      [pa_count incremented]		<pa->pa_deleted = 0, pa_count = 1>
      ext4_mb_use_preallocated (5):	unlock pa->pa_lock
      ext4_mb_put_pa (2):		lock pa->pa_lock
      ext4_mb_put_pa (3):			check pa->pa_deleted
      ext4_mb_put_pa (4):			set pa->pa_deleted=1
      [race condition!]		<pa->pa_deleted = 1, pa_count = 1>
      ext4_mb_put_pa (5):		unlock pa->pa_lock
      ext4_mb_put_pa (6):		remove pa from a list
      ext4_mb_pa_callback:		free pa
      ext4_mb_release_context:	access pa
      
      AddressSanitizer has detected use-after-free in ext4_mb_new_blocks
      Bug report: http://goo.gl/rG1On3Signed-off-by: NJunho Ryu <jayr@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      4e8d2139
  4. 02 12月, 2013 1 次提交
    • T
      ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails · ae1495b1
      Theodore Ts'o 提交于
      While it's true that errors can only happen if there is a bug in
      jbd2_journal_dirty_metadata(), if a bug does happen, we need to halt
      the kernel or remount the file system read-only in order to avoid
      further data loss.  The ext4_journal_abort_handle() function doesn't
      do any of this, and while it's likely that this call (since it doesn't
      adjust refcounts) will likely result in the file system eventually
      deadlocking since the current transaction will never be able to close,
      it's much cleaner to call let ext4's error handling system deal with
      this situation.
      
      There's a separate bug here which is that if certain jbd2 errors
      errors occur and file system is mounted errors=continue, the file
      system will probably eventually end grind to a halt as described
      above.  But things have been this way in a long time, and usually when
      we have these sorts of errors it's pretty much a disaster --- and
      that's why the jbd2 layer aggressively retries memory allocations,
      which is the most likely cause of these jbd2 errors.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      ae1495b1
  5. 29 11月, 2013 1 次提交
  6. 28 11月, 2013 2 次提交
  7. 25 11月, 2013 1 次提交
    • S
      [CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload · f19e84df
      Steve French 提交于
      Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead
      of BTRFS_IOC_CLONE to avoid confusion about whether
      copy-on-write is required or optional for this operation.
      
      SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since
      they both speed up copy by offloading the copy rather than
      passing many read and write requests back and forth and both have
      identical syntax (passing file handles), but for SMB2/SMB3
      CopyChunk the server is not required to use copy-on-write
      to make a copy of the file (although some do), and Christoph
      has commented that since CopyChunk does not require
      copy-on-write we should not reuse BTRFS_IOC_CLONE.
      
      This patch renames the ioctl to use a cifs specific IOCTL
      CIFS_IOCTL_COPYCHUNK.  This ioctl is particularly important
      for SMB2/SMB3 since large file copy over the network otherwise
      can be very slow, and with this is often more than 100 times
      faster putting less load on server and client.
      
      Note that if a copy syscall is ever introduced, depending on
      its requirements/format it could end up using one of the other
      three methods that CIFS/SMB2/SMB3 can do for copy offload,
      but this method is particularly useful for file copy
      and broadly supported (not just by Samba server).
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NDavid Disseldorp <ddiss@samba.org>
      f19e84df
  8. 24 11月, 2013 8 次提交
    • L
      ceph: allocate non-zero page to fscache in readpage() · ff638b7d
      Li Wang 提交于
      ceph_osdc_readpages() returns number of bytes read, currently,
      the code only allocate full-zero page into fscache, this patch
      fixes this.
      Signed-off-by: NLi Wang <liwang@ubuntukylin.com>
      Reviewed-by: NMilosz Tanski <milosz@adfin.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ff638b7d
    • Y
      ceph: wake up 'safe' waiters when unregistering request · fc55d2c9
      Yan, Zheng 提交于
      We also need to wake up 'safe' waiters if error occurs or request
      aborted. Otherwise sync(2)/fsync(2) may hang forever.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      fc55d2c9
    • Y
      ceph: cleanup aborted requests when re-sending requests. · eb1b8af3
      Yan, Zheng 提交于
      Aborted requests usually get cleared when the reply is received.
      If MDS crashes, no reply will be received. So we need to cleanup
      aborted requests when re-sending requests.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NGreg Farnum <greg@inktank.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      eb1b8af3
    • Y
      ceph: handle race between cap reconnect and cap release · 99a9c273
      Yan, Zheng 提交于
      When a cap get released while composing the cap reconnect message.
      We should skip queuing the release message if the cap hasn't been
      added to the cap reconnect message.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      99a9c273
    • Y
      ceph: set caps count after composing cap reconnect message · 44c99757
      Yan, Zheng 提交于
      It's possible that some caps get released while composing the cap
      reconnect message.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      44c99757
    • Y
      ceph: queue cap release in __ceph_remove_cap() · a096b09a
      Yan, Zheng 提交于
      call __queue_cap_release() in __ceph_remove_cap(), this avoids
      acquiring s_cap_lock twice.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      a096b09a
    • T
      sysfs: use a separate locking class for open files depending on mmap · 027a485d
      Tejun Heo 提交于
      The following two commits implemented mmap support in the regular file
      path and merged bin file support into the regular path.
      
       73d97146 ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c")
       3124eb16 ("sysfs: merge regular and bin file handling")
      
      After the merge, the following commands trigger a spurious lockdep
      warning.  "test-mmap-read" simply mmaps the file and dumps the
      content.
      
        $ cat /sys/block/sda/trace/act_mask
        $ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.12.0-work+ #378 Not tainted
        -------------------------------------------------------
        test-mmap-read/567 is trying to acquire lock:
         (&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
      
        but task is already holding lock:
         (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #3 (&mm->mmap_sem){++++++}:
        ...
        -> #2 (sr_mutex){+.+.+.}:
        ...
        -> #1 (&bdev->bd_mutex){+.+.+.}:
        ...
        -> #0 (&of->mutex){+.+.+.}:
        ...
      
        other info that might help us debug this:
      
        Chain exists of:
         &of->mutex --> sr_mutex --> &mm->mmap_sem
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&mm->mmap_sem);
      				 lock(sr_mutex);
      				 lock(&mm->mmap_sem);
          lock(&of->mutex);
      
         *** DEADLOCK ***
      
        1 lock held by test-mmap-read/567:
         #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        stack backtrace:
        CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80
         ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208
         0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0
        Call Trace:
         [<ffffffff81611ad2>] dump_stack+0x4e/0x7a
         [<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf
         [<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60
         [<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0
         [<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0
         [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
         [<ffffffff8115d363>] mmap_region+0x3b3/0x5b0
         [<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0
         [<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0
         [<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250
         [<ffffffff81008282>] SyS_mmap+0x22/0x30
         [<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b
      
      This happens because one file nests sr_mutex, which nests mm->mmap_sem
      under it, under of->mutex while mmap implementation naturally nests
      of->mutex under mm->mmap_sem.  The warning is false positive as
      of->mutex is per open-file and the two paths belong to two different
      files.  This warning didn't trigger before regular and bin file
      supports were merged because only bin file supported mmap and the
      other side of locking happened only on regular files which used
      equivalent but separate locking.
      
      It'd be best if we give separate locking classes per file but we can't
      easily do that.  Let's differentiate on ->mmap() for now.  Later we'll
      add explicit file operations struct and can add per-ops lockdep key
      there.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      027a485d
    • M
      sysfs: handle duplicate removal attempts in sysfs_remove_group() · 54d71145
      Mika Westerberg 提交于
      Commit bcdde7e2 (sysfs: make __sysfs_remove_dir() recursive) changed
      the behavior so that directory removals will be done recursively. This
      means that the sysfs group might already be removed if its parent directory
      has been removed.
      
      The current code outputs warnings similar to following log snippet when it
      detects that there is no group for the given kobject:
      
       WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0()
       sysfs group ffffffff81c6f1e0 not found for kobject 'host7'
       Modules linked in:
       CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13
       Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
       Workqueue: kacpi_hotplug acpi_hotplug_work_fn
        0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8
        ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0
        ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48
       Call Trace:
        [<ffffffff817daab1>] dump_stack+0x45/0x56
        [<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0
        [<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50
        [<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70
        [<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0
        [<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50
        [<ffffffff8142a0d0>] device_del+0x40/0x1b0
        [<ffffffff8142a24d>] device_unregister+0xd/0x20
        [<ffffffff8144131a>] scsi_remove_host+0xba/0x110
        [<ffffffff8145f526>] ata_host_detach+0xc6/0x100
        [<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20
        [<ffffffff812e8f48>] pci_device_remove+0x28/0x60
        [<ffffffff8142d854>] __device_release_driver+0x64/0xd0
        [<ffffffff8142d8de>] device_release_driver+0x1e/0x30
        [<ffffffff8142d257>] bus_remove_device+0xf7/0x140
        [<ffffffff8142a1b1>] device_del+0x121/0x1b0
        [<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20
        [<ffffffff812fc743>] trim_stale_devices+0x73/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0
        [<ffffffff812fd90d>] hotplug_event+0xcd/0x160
        [<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60
        [<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22
        [<ffffffff8105cf3a>] process_one_work+0x17a/0x430
        [<ffffffff8105db29>] worker_thread+0x119/0x390
        [<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0
        [<ffffffff81063a5d>] kthread+0xcd/0xf0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
        [<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
      
      On this particular machine I see ~16 of these message during Thunderbolt
      hot-unplug.
      
      Fix this in similar way that was done for sysfs_remove_one() by checking
      if the parent directory has already been removed and bailing out early.
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54d71145
  9. 22 11月, 2013 2 次提交
    • J
      configfs: fix race between dentry put and lookup · 76ae281f
      Junxiao Bi 提交于
      A race window in configfs, it starts from one dentry is UNHASHED and end
      before configfs_d_iput is called.  In this window, if a lookup happen,
      since the original dentry was UNHASHED, so a new dentry will be
      allocated, and then in configfs_attach_attr(), sd->s_dentry will be
      updated to the new dentry.  Then in configfs_d_iput(),
      BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.
      
      sys_open:                     sys_close:
       ...                           fput
                                      dput
                                       dentry_kill
                                        __d_drop <--- dentry unhashed here,
                                                 but sd->dentry still point
                                                 to this dentry.
      
       lookup_real
        configfs_lookup
         configfs_attach_attr---> update sd->s_dentry
                                  to new allocated dentry here.
      
                                         d_kill
                                           configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry)
                                                           triggered here.
      
      To fix it, change configfs_d_iput to not update sd->s_dentry if
      sd->s_count > 2, that means there are another dentry is using the sd
      beside the one that is going to be put.  Use configfs_dirent_lock in
      configfs_attach_attr to sync with configfs_d_iput.
      
      With the following steps, you can reproduce the bug.
      
      1. enable ocfs2, this will mount configfs at /sys/kernel/config and
         fill configure in it.
      
      2. run the following script.
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76ae281f
    • S
      GFS2: Fix ref count bug relating to atomic_open · ea0341e0
      Steven Whitehouse 提交于
      In the case that atomic_open calls finish_no_open() with
      the dentry that was supplied to gfs2_atomic_open() an
      extra reference count is required. This patch fixes that
      issue preventing a bug trap triggering at umount time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ea0341e0
  10. 21 11月, 2013 17 次提交