1. 07 1月, 2014 1 次提交
  2. 19 12月, 2013 1 次提交
  3. 13 12月, 2013 2 次提交
    • J
      procfs: also fix proc_reg_get_unmapped_area() for !MMU case · ae5758a1
      Jan Beulich 提交于
      Commit fad1a86e ("procfs: call default get_unmapped_area on
      MMU-present architectures"), as its title says, took care of only the
      MMU case, leaving the !MMU side still in the regressed state (returning
      -EIO in all cases where pde->proc_fops->get_unmapped_area is NULL).
      
      From the fad1a86e changelog:
      
       "Commit c4fe2448 ("sparc: fix PCI device proc file mmap(2)") added
        proc_reg_get_unmapped_area in proc_reg_file_ops and
        proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
        get_unmapped_area method is not defined for the target procfs file, which
        causes regression of mmap on /proc/vmcore.
      
        To address this issue, like get_unmapped_area(), call default
        current->mm->get_unmapped_area on MMU-present architectures if
        pde->proc_fops->get_unmapped_area, i.e.  the one in actual file operation
        in the procfs file, is not defined"
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>	[3.12.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae5758a1
    • W
      dcache: allow word-at-a-time name hashing with big-endian CPUs · a5c21dce
      Will Deacon 提交于
      When explicitly hashing the end of a string with the word-at-a-time
      interface, we have to be careful which end of the word we pick up.
      
      On big-endian CPUs, the upper-bits will contain the data we're after, so
      ensure we generate our masks accordingly (and avoid hashing whatever
      random junk may have been sitting after the string).
      
      This patch adds a new dcache helper, bytemask_from_count, which creates
      a mask appropriate for the CPU endianness.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5c21dce
  4. 12 12月, 2013 7 次提交
    • D
      Btrfs: fix access_ok() check in btrfs_ioctl_send() · 700ff4f0
      Dan Carpenter 提交于
      The closing parenthesis is in the wrong place.  We want to check
      "sizeof(*arg->clone_sources) * arg->clone_sources_count" instead of
      "sizeof(*arg->clone_sources * arg->clone_sources_count)".
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      cc: stable@vger.kernel.org
      700ff4f0
    • W
      Btrfs: make sure we cleanup all reloc roots if error happens · 467bb1d2
      Wang Shilong 提交于
      I hit an oops when merging reloc roots fails, the reason is that
      new reloc roots may be added and we should make sure we cleanup
      all reloc roots.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      467bb1d2
    • W
      Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation · 66463748
      Wang Shilong 提交于
      Quota tree and UUID Tree is only cowed, they can not be snapshoted.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      66463748
    • W
      Btrfs: fix an oops when doing balance relocation · c974c464
      Wang Shilong 提交于
      I hit an oops when inserting reloc root into @reloc_root_tree(it can be
      easily triggered when forcing cow for relocation root)
      
      [  866.494539]  [<ffffffffa0499579>] btrfs_init_reloc_root+0x79/0xb0 [btrfs]
      [  866.495321]  [<ffffffffa044c240>] record_root_in_trans+0xb0/0x110 [btrfs]
      [  866.496109]  [<ffffffffa044d758>] btrfs_record_root_in_trans+0x48/0x80 [btrfs]
      [  866.496908]  [<ffffffffa0494da8>] select_reloc_root+0xa8/0x210 [btrfs]
      [  866.497703]  [<ffffffffa0495c8a>] do_relocation+0x16a/0x540 [btrfs]
      
      This is because reloc root inserted into @reloc_root_tree is not within one
      transaction,reloc root may be cowed and root block bytenr will be reused then
      oops happens.We should update reloc root in @reloc_root_tree when cow reloc
      root node, fix it.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c974c464
    • F
      Btrfs: don't miss skinny extent items on delayed ref head contention · 639eefc8
      Filipe David Borba Manana 提交于
      Currently extent-tree.c:btrfs_lookup_extent_info() can miss the lookup
      of skinny extent items. This can happen when the execution flow is the
      following:
      
      * We do an extent tree lookup and fail to find a skinny extent item;
      
      * As a result, we attempt to see if a non-skinny extent item exists,
        either by looking at previous item in the leaf or by doing another
        full extent tree search;
      
      * We have a transaction and then we check for a matching delayed ref
        head in the transaction's delayed refs rbtree;
      
      * We find such delayed ref head and then we try to lock it with a
        call to mutex_trylock();
      
      * The lock was contended so we jump to the label "again", which repeats
        the extent tree search but for a non-skinny extent item, because we set
        previously metadata variable to 0 and the search key to look for a
        non-skinny extent-item;
      
      * After the jump (and after releasing the transaction's delayed refs
        lock), a skinny extent item might have been added to the extent tree
        but we will miss it because metadata is set to 0 and the search key
        is set for a non-skinny extent-item.
      
      The fix here is to not reset metadata to 0 and to jump to the initial search
      key setup if the delayed ref head is contended, instead of jumping directly
      to the extent tree search label ("again").
      
      This issue was found while investigating the issue reported at Bugzilla 64961.
      
      David Sterba suspected this function was missing extent items, and that
      this could be caused by the last change to this function, which was made
      in the following patch:
      
          [PATCH] Btrfs: optimize btrfs_lookup_extent_info()
          (commit 74be9510)
      
      But in fact this issue already existed before, because after failing to find
      a skinny extent item, the code set the search key for a non-skinny extent
      item, and on contention of a matching delayed ref head it would not search
      the extent tree for a skinny extent item anymore.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      639eefc8
    • D
      btrfs: call mnt_drop_write after interrupted subvol deletion · e43f998e
      David Sterba 提交于
      If btrfs_ioctl_snap_destroy blocks on the mutex and the process is
      killed, mnt_write count is unbalanced and leads to unmountable
      filesystem.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      e43f998e
    • M
      Btrfs: don't clear the default compression type · a7e252af
      Miao Xie 提交于
      We met a oops caused by the wrong compression type:
      [  556.512356] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [  556.512370] IP: [<ffffffff811dbaa0>] __list_del_entry+0x1/0x98
      [SNIP]
      [  556.512490]  [<ffffffff811dbb44>] ? list_del+0xd/0x2b
      [  556.512539]  [<ffffffffa05dd5ce>] find_workspace+0x97/0x175 [btrfs]
      [  556.512546]  [<ffffffff813c14b5>] ? _raw_spin_lock+0xe/0x10
      [  556.512576]  [<ffffffffa05de276>] btrfs_compress_pages+0x2d/0xa2 [btrfs]
      [  556.512601]  [<ffffffffa05af060>] compress_file_range.constprop.54+0x1f2/0x4e8 [btrfs]
      [  556.512627]  [<ffffffffa05af388>] async_cow_start+0x32/0x4d [btrfs]
      [  556.512655]  [<ffffffffa05cc7a1>] worker_loop+0x144/0x4c3 [btrfs]
      [  556.512661]  [<ffffffff81059404>] ? finish_task_switch+0x80/0xb8
      [  556.512689]  [<ffffffffa05cc65d>] ? btrfs_queue_worker+0x244/0x244 [btrfs]
      [  556.512695]  [<ffffffff8104fa4e>] kthread+0x8d/0x95
      [  556.512699]  [<ffffffff81050000>] ? bit_waitqueue+0x34/0x7d
      [  556.512704]  [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65
      [  556.512709]  [<ffffffff813c7eec>] ret_from_fork+0x7c/0xb0
      [  556.512713]  [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65
      
      Steps to reproduce:
       # mkfs.btrfs -f <dev>
       # mount -o nodatacow <dev> <mnt>
       # touch <mnt>/<file>
       # chattr =c <mnt>/<file>
       # dd if=/dev/zero of=<mnt>/<file> bs=1M count=10
      
      It is because we cleared the default compression type when setting the
      nodatacow. In fact, we needn't do it because we have used COMPRESS flag to
      indicate if we need compressed the file data or not, needn't use the
      variant -- compress_type -- in btrfs_info to do the same thing, and just
      use it to hold the default compression type. Or we would get a wrong compress
      type for a file whose own compress flag is set but the compress flag of its
      filesystem is not set.
      Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a7e252af
  5. 11 12月, 2013 3 次提交
    • J
      nfsd: when reusing an existing repcache entry, unhash it first · 781c2a5a
      Jeff Layton 提交于
      The DRC code will attempt to reuse an existing, expired cache entry in
      preference to allocating a new one. It'll then search the cache, and if
      it gets a hit it'll then free the cache entry that it was going to
      reuse.
      
      The cache code doesn't unhash the entry that it's going to reuse
      however, so it's possible for it end up designating an entry for reuse
      and then subsequently freeing the same entry after it finds it.  This
      leads it to a later use-after-free situation and usually some list
      corruption warnings or an oops.
      
      Fix this by simply unhashing the entry that we intend to reuse. That
      will mean that it's not findable via a search and should prevent this
      situation from occurring.
      
      Cc: stable@vger.kernel.org # v3.10+
      Reported-by: NChristoph Hellwig <hch@infradead.org>
      Reported-by: Ng. artim <gartim@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      781c2a5a
    • D
      xfs: growfs overruns AGFL buffer on V4 filesystems · f94c4457
      Dave Chinner 提交于
      This loop in xfs_growfs_data_private() is incorrect for V4
      superblocks filesystems:
      
      		for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
      			agfl->agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
      
      For V4 filesystems, we don't have a agfl header structure, and so
      XFS_AGFL_SIZE() returns an entire sector's worth of entries, which
      we then index from an offset into the sector. Hence: buffer overrun.
      
      This problem was introduced in 3.10 by commit 77c95bba ("xfs: add
      CRC checks to the AGFL") which changed the AGFL structure but failed
      to update the growfs code to handle the different structures.
      
      Fix it by using the correct offset into the buffer for both V4 and
      V5 filesystems.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit b7d961b3)
      f94c4457
    • J
      xfs: don't perform discard if the given range length is less than block size · 2f42d612
      Jie Liu 提交于
      For discard operation, we should return EINVAL if the given range length
      is less than a block size, otherwise it will go through the file system
      to discard data blocks as the end range might be evaluated to -1, e.g,
      # fstrim -v -o 0 -l 100 /xfs7
      /xfs7: 9811378176 bytes were trimmed
      
      This issue can be triggered via xfstests/generic/288.
      
      Also, it seems to get the request queue pointer via bdev_get_queue()
      instead of the hard code pointer dereference is not a bad thing.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit f9fd0135)
      2f42d612
  6. 10 12月, 2013 1 次提交
  7. 06 12月, 2013 1 次提交
  8. 05 12月, 2013 2 次提交
    • H
      nfs: fix do_div() warning by instead using sector_div() · 3873d064
      Helge Deller 提交于
      When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like
      shown below.  Fix this warning by instead using sector_div() which is provided
      by the kernel.h header file.
      
      fs/nfs/blocklayout/extents.c: In function ‘normalize’:
      include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
      fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’
      nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default]
      fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default]
      include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’
       extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      3873d064
    • T
      NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery · f22e5edd
      Trond Myklebust 提交于
      Andy Adamson reports:
      
      The state manager is recovering expired state and recovery OPENs are being
      processed. If kswapd is pruning inodes at the same time, a deadlock can occur
      when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
      resultant layoutreturn gets an error that the state mangager is to handle,
      causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.
      
      At the same time an open is waiting for the inode deletion to complete in
      __wait_on_freeing_inode.
      
      If the open is either the open called by the state manager, or an open from
      the same open owner that is holding the NFSv4 sequence id which causes the
      OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
      then the state is deadlocked with kswapd.
      
      The fix is simply to have layoutreturn ignore all errors except NFS4ERR_DELAY.
      We already know that layouts are dropped on all server reboots, and that
      it has to be coded to deal with the "forgetful client model" that doesn't
      send layoutreturns.
      Reported-by: NAndy Adamson <andros@netapp.com>
      Link: http://lkml.kernel.org/r/1385402270-14284-1-git-send-email-andros@netapp.comSigned-off-by: NTrond Myklebust <Trond.Myklebust@primarydata.com>
      f22e5edd
  9. 03 12月, 2013 2 次提交
    • A
      epoll: drop EPOLLWAKEUP if PM_SLEEP is disabled · 95f19f65
      Amit Pundir 提交于
      Drop EPOLLWAKEUP from epoll events mask if CONFIG_PM_SLEEP is disabled.
      Signed-off-by: NAmit Pundir <amit.pundir@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      95f19f65
    • L
      vfs: fix subtle use-after-free of pipe_inode_info · b0d8d229
      Linus Torvalds 提交于
      The pipe code was trying (and failing) to be very careful about freeing
      the pipe info only after the last access, with a pattern like:
      
              spin_lock(&inode->i_lock);
              if (!--pipe->files) {
                      inode->i_pipe = NULL;
                      kill = 1;
              }
              spin_unlock(&inode->i_lock);
              __pipe_unlock(pipe);
              if (kill)
                      free_pipe_info(pipe);
      
      where the final freeing is done last.
      
      HOWEVER.  The above is actually broken, because while the freeing is
      done at the end, if we have two racing processes releasing the pipe
      inode info, the one that *doesn't* free it will decrement the ->files
      count, and unlock the inode i_lock, but then still use the
      "pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".
      
      This is *very* hard to trigger in practice, since the race window is
      very small, and adding debug options seems to just hide it by slowing
      things down.
      
      Simon originally reported this way back in July as an Oops in
      kmem_cache_allocate due to a single bit corruption (due to the final
      "spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
      allocation that had re-used the free'd pipe-info), it's taken this long
      to figure out.
      
      Since the 'pipe->files' accesses aren't even protected by the pipe lock
      (we very much use the inode lock for that), the simple solution is to
      just drop the pipe lock early.  And since there were two users of this
      pattern, create a helper function for it.
      
      Introduced commit ba5bb147 ("pipe: take allocation and freeing of
      pipe_inode_info out of ->i_mutex").
      Reported-by: NSimon Kirby <sim@hostway.ca>
      Reported-by: NIan Applegate <ia@cloudflare.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org   # v3.10+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0d8d229
  10. 29 11月, 2013 1 次提交
  11. 28 11月, 2013 2 次提交
  12. 25 11月, 2013 2 次提交
    • S
      [CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload · f19e84df
      Steve French 提交于
      Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead
      of BTRFS_IOC_CLONE to avoid confusion about whether
      copy-on-write is required or optional for this operation.
      
      SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since
      they both speed up copy by offloading the copy rather than
      passing many read and write requests back and forth and both have
      identical syntax (passing file handles), but for SMB2/SMB3
      CopyChunk the server is not required to use copy-on-write
      to make a copy of the file (although some do), and Christoph
      has commented that since CopyChunk does not require
      copy-on-write we should not reuse BTRFS_IOC_CLONE.
      
      This patch renames the ioctl to use a cifs specific IOCTL
      CIFS_IOCTL_COPYCHUNK.  This ioctl is particularly important
      for SMB2/SMB3 since large file copy over the network otherwise
      can be very slow, and with this is often more than 100 times
      faster putting less load on server and client.
      
      Note that if a copy syscall is ever introduced, depending on
      its requirements/format it could end up using one of the other
      three methods that CIFS/SMB2/SMB3 can do for copy offload,
      but this method is particularly useful for file copy
      and broadly supported (not just by Samba server).
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NDavid Disseldorp <ddiss@samba.org>
      f19e84df
    • K
      block: submit_bio_wait() conversions · c170bbb4
      Kent Overstreet 提交于
      It was being open coded in a few places.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c170bbb4
  13. 24 11月, 2013 9 次提交
    • P
      Squashfs: fix failure to unlock pages on decompress error · 6d565409
      Phillip Lougher 提交于
      Direct decompression into the page cache.  If we fall back
      to using an intermediate buffer (because we cannot grab all the
      page cache pages) and we get a decompress fail, we forgot to
      release the pages.
      Reported-by: NRoman Peniaev <r.peniaev@gmail.com>
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      6d565409
    • L
      ceph: allocate non-zero page to fscache in readpage() · ff638b7d
      Li Wang 提交于
      ceph_osdc_readpages() returns number of bytes read, currently,
      the code only allocate full-zero page into fscache, this patch
      fixes this.
      Signed-off-by: NLi Wang <liwang@ubuntukylin.com>
      Reviewed-by: NMilosz Tanski <milosz@adfin.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ff638b7d
    • Y
      ceph: wake up 'safe' waiters when unregistering request · fc55d2c9
      Yan, Zheng 提交于
      We also need to wake up 'safe' waiters if error occurs or request
      aborted. Otherwise sync(2)/fsync(2) may hang forever.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      fc55d2c9
    • Y
      ceph: cleanup aborted requests when re-sending requests. · eb1b8af3
      Yan, Zheng 提交于
      Aborted requests usually get cleared when the reply is received.
      If MDS crashes, no reply will be received. So we need to cleanup
      aborted requests when re-sending requests.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NGreg Farnum <greg@inktank.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      eb1b8af3
    • Y
      ceph: handle race between cap reconnect and cap release · 99a9c273
      Yan, Zheng 提交于
      When a cap get released while composing the cap reconnect message.
      We should skip queuing the release message if the cap hasn't been
      added to the cap reconnect message.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      99a9c273
    • Y
      ceph: set caps count after composing cap reconnect message · 44c99757
      Yan, Zheng 提交于
      It's possible that some caps get released while composing the cap
      reconnect message.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      44c99757
    • Y
      ceph: queue cap release in __ceph_remove_cap() · a096b09a
      Yan, Zheng 提交于
      call __queue_cap_release() in __ceph_remove_cap(), this avoids
      acquiring s_cap_lock twice.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      a096b09a
    • T
      sysfs: use a separate locking class for open files depending on mmap · 027a485d
      Tejun Heo 提交于
      The following two commits implemented mmap support in the regular file
      path and merged bin file support into the regular path.
      
       73d97146 ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c")
       3124eb16 ("sysfs: merge regular and bin file handling")
      
      After the merge, the following commands trigger a spurious lockdep
      warning.  "test-mmap-read" simply mmaps the file and dumps the
      content.
      
        $ cat /sys/block/sda/trace/act_mask
        $ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.12.0-work+ #378 Not tainted
        -------------------------------------------------------
        test-mmap-read/567 is trying to acquire lock:
         (&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
      
        but task is already holding lock:
         (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #3 (&mm->mmap_sem){++++++}:
        ...
        -> #2 (sr_mutex){+.+.+.}:
        ...
        -> #1 (&bdev->bd_mutex){+.+.+.}:
        ...
        -> #0 (&of->mutex){+.+.+.}:
        ...
      
        other info that might help us debug this:
      
        Chain exists of:
         &of->mutex --> sr_mutex --> &mm->mmap_sem
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&mm->mmap_sem);
      				 lock(sr_mutex);
      				 lock(&mm->mmap_sem);
          lock(&of->mutex);
      
         *** DEADLOCK ***
      
        1 lock held by test-mmap-read/567:
         #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        stack backtrace:
        CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80
         ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208
         0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0
        Call Trace:
         [<ffffffff81611ad2>] dump_stack+0x4e/0x7a
         [<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf
         [<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60
         [<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0
         [<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0
         [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
         [<ffffffff8115d363>] mmap_region+0x3b3/0x5b0
         [<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0
         [<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0
         [<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250
         [<ffffffff81008282>] SyS_mmap+0x22/0x30
         [<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b
      
      This happens because one file nests sr_mutex, which nests mm->mmap_sem
      under it, under of->mutex while mmap implementation naturally nests
      of->mutex under mm->mmap_sem.  The warning is false positive as
      of->mutex is per open-file and the two paths belong to two different
      files.  This warning didn't trigger before regular and bin file
      supports were merged because only bin file supported mmap and the
      other side of locking happened only on regular files which used
      equivalent but separate locking.
      
      It'd be best if we give separate locking classes per file but we can't
      easily do that.  Let's differentiate on ->mmap() for now.  Later we'll
      add explicit file operations struct and can add per-ops lockdep key
      there.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      027a485d
    • M
      sysfs: handle duplicate removal attempts in sysfs_remove_group() · 54d71145
      Mika Westerberg 提交于
      Commit bcdde7e2 (sysfs: make __sysfs_remove_dir() recursive) changed
      the behavior so that directory removals will be done recursively. This
      means that the sysfs group might already be removed if its parent directory
      has been removed.
      
      The current code outputs warnings similar to following log snippet when it
      detects that there is no group for the given kobject:
      
       WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0()
       sysfs group ffffffff81c6f1e0 not found for kobject 'host7'
       Modules linked in:
       CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13
       Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
       Workqueue: kacpi_hotplug acpi_hotplug_work_fn
        0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8
        ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0
        ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48
       Call Trace:
        [<ffffffff817daab1>] dump_stack+0x45/0x56
        [<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0
        [<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50
        [<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70
        [<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0
        [<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50
        [<ffffffff8142a0d0>] device_del+0x40/0x1b0
        [<ffffffff8142a24d>] device_unregister+0xd/0x20
        [<ffffffff8144131a>] scsi_remove_host+0xba/0x110
        [<ffffffff8145f526>] ata_host_detach+0xc6/0x100
        [<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20
        [<ffffffff812e8f48>] pci_device_remove+0x28/0x60
        [<ffffffff8142d854>] __device_release_driver+0x64/0xd0
        [<ffffffff8142d8de>] device_release_driver+0x1e/0x30
        [<ffffffff8142d257>] bus_remove_device+0xf7/0x140
        [<ffffffff8142a1b1>] device_del+0x121/0x1b0
        [<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20
        [<ffffffff812fc743>] trim_stale_devices+0x73/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0
        [<ffffffff812fd90d>] hotplug_event+0xcd/0x160
        [<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60
        [<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22
        [<ffffffff8105cf3a>] process_one_work+0x17a/0x430
        [<ffffffff8105db29>] worker_thread+0x119/0x390
        [<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0
        [<ffffffff81063a5d>] kthread+0xcd/0xf0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
        [<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
      
      On this particular machine I see ~16 of these message during Thunderbolt
      hot-unplug.
      
      Fix this in similar way that was done for sysfs_remove_one() by checking
      if the parent directory has already been removed and bailing out early.
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54d71145
  14. 22 11月, 2013 2 次提交
    • J
      configfs: fix race between dentry put and lookup · 76ae281f
      Junxiao Bi 提交于
      A race window in configfs, it starts from one dentry is UNHASHED and end
      before configfs_d_iput is called.  In this window, if a lookup happen,
      since the original dentry was UNHASHED, so a new dentry will be
      allocated, and then in configfs_attach_attr(), sd->s_dentry will be
      updated to the new dentry.  Then in configfs_d_iput(),
      BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.
      
      sys_open:                     sys_close:
       ...                           fput
                                      dput
                                       dentry_kill
                                        __d_drop <--- dentry unhashed here,
                                                 but sd->dentry still point
                                                 to this dentry.
      
       lookup_real
        configfs_lookup
         configfs_attach_attr---> update sd->s_dentry
                                  to new allocated dentry here.
      
                                         d_kill
                                           configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry)
                                                           triggered here.
      
      To fix it, change configfs_d_iput to not update sd->s_dentry if
      sd->s_count > 2, that means there are another dentry is using the sd
      beside the one that is going to be put.  Use configfs_dirent_lock in
      configfs_attach_attr to sync with configfs_d_iput.
      
      With the following steps, you can reproduce the bug.
      
      1. enable ocfs2, this will mount configfs at /sys/kernel/config and
         fill configure in it.
      
      2. run the following script.
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76ae281f
    • S
      GFS2: Fix ref count bug relating to atomic_open · ea0341e0
      Steven Whitehouse 提交于
      In the case that atomic_open calls finish_no_open() with
      the dentry that was supplied to gfs2_atomic_open() an
      extra reference count is required. This patch fixes that
      issue preventing a bug trap triggering at umount time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ea0341e0
  15. 21 11月, 2013 4 次提交