1. 18 8月, 2015 1 次提交
    • D
      writeback: plug writeback at a high level · d353d758
      Dave Chinner 提交于
      Doing writeback on lots of little files causes terrible IOPS storms
      because of the per-mapping writeback plugging we do. This
      essentially causes imeediate dispatch of IO for each mapping,
      regardless of the context in which writeback is occurring.
      
      IOWs, running a concurrent write-lots-of-small 4k files using fsmark
      on XFS results in a huge number of IOPS being issued for data
      writes.  Metadata writes are sorted and plugged at a high level by
      XFS, so aggregate nicely into large IOs. However, data writeback IOs
      are dispatched in individual 4k IOs, even when the blocks of two
      consecutively written files are adjacent.
      
      Test VM: 8p, 8GB RAM, 4xSSD in RAID0, 100TB sparse XFS filesystem,
      metadata CRCs enabled.
      
      Kernel: 3.10-rc5 + xfsdev + my 3.11 xfs queue (~70 patches)
      
      Test:
      
      $ ./fs_mark  -D  10000  -S0  -n  10000  -s  4096  -L  120  -d
      /mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d
      /mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d
      /mnt/scratch/6  -d  /mnt/scratch/7
      
      Result:
      
      		wall	sys	create rate	Physical write IO
      		time	CPU	(avg files/s)	 IOPS	Bandwidth
      		-----	-----	------------	------	---------
      unpatched	6m56s	15m47s	24,000+/-500	26,000	130MB/s
      patched		5m06s	13m28s	32,800+/-600	 1,500	180MB/s
      improvement	-26.44%	-14.68%	  +36.67%	-94.23%	+38.46%
      
      If I use zero length files, this workload at about 500 IOPS, so
      plugging drops the data IOs from roughly 25,500/s to 1000/s.
      3 lines of code, 35% better throughput for 15% less CPU.
      
      The benefits of plugging at this layer are likely to be higher for
      spinning media as the IO patterns for this workload are going make a
      much bigger difference on high IO latency devices.....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Tested-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      d353d758
  2. 07 8月, 2015 7 次提交
    • S
      ipc: use private shmem or hugetlbfs inodes for shm segments. · e1832f29
      Stephen Smalley 提交于
      The shm implementation internally uses shmem or hugetlbfs inodes for shm
      segments.  As these inodes are never directly exposed to userspace and
      only accessed through the shm operations which are already hooked by
      security modules, mark the inodes with the S_PRIVATE flag so that inode
      security initialization and permission checking is skipped.
      
      This was motivated by the following lockdep warning:
      
        ======================================================
         [ INFO: possible circular locking dependency detected ]
         4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G        W
        -------------------------------------------------------
         httpd/1597 is trying to acquire lock:
         (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130
         but task is already holding lock:
         (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180
         which lock already depends on the new lock.
         the existing dependency chain (in reverse order) is:
         -> #3 (&mm->mmap_sem){++++++}:
              lock_acquire+0xc7/0x270
              __might_fault+0x7a/0xa0
              filldir+0x9e/0x130
              xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
              xfs_readdir+0x1b4/0x330 [xfs]
              xfs_file_readdir+0x2b/0x30 [xfs]
              iterate_dir+0x97/0x130
              SyS_getdents+0x91/0x120
              entry_SYSCALL_64_fastpath+0x12/0x76
         -> #2 (&xfs_dir_ilock_class){++++.+}:
              lock_acquire+0xc7/0x270
              down_read_nested+0x57/0xa0
              xfs_ilock+0x167/0x350 [xfs]
              xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
              xfs_attr_get+0xbd/0x190 [xfs]
              xfs_xattr_get+0x3d/0x70 [xfs]
              generic_getxattr+0x4f/0x70
              inode_doinit_with_dentry+0x162/0x670
              sb_finish_set_opts+0xd9/0x230
              selinux_set_mnt_opts+0x35c/0x660
              superblock_doinit+0x77/0xf0
              delayed_superblock_init+0x10/0x20
              iterate_supers+0xb3/0x110
              selinux_complete_init+0x2f/0x40
              security_load_policy+0x103/0x600
              sel_write_load+0xc1/0x750
              __vfs_write+0x37/0x100
              vfs_write+0xa9/0x1a0
              SyS_write+0x58/0xd0
              entry_SYSCALL_64_fastpath+0x12/0x76
        ...
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Reported-by: NMorten Stevens <mstevens@fedoraproject.org>
      Acked-by: NHugh Dickins <hughd@google.com>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1832f29
    • J
      ocfs2: fix shift left overflow · 32e5a2a2
      Joseph Qi 提交于
      When using a large volume, for example 9T volume with 2T already used,
      frequent creation of small files with O_DIRECT when the IO is not
      cluster aligned may clear sectors in the wrong place.  This will cause
      filesystem corruption.
      
      This is because p_cpos is a u32.  When calculating the corresponding
      sector it should be converted to u64 first, otherwise it may overflow.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: <stable@vger.kernel.org>	[4.0+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32e5a2a2
    • J
      fsnotify: fix oops in fsnotify_clear_marks_by_group_flags() · 8f2f3eb5
      Jan Kara 提交于
      fsnotify_clear_marks_by_group_flags() can race with
      fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked()
      drops mark_mutex, a mark from the list iterated by
      fsnotify_clear_marks_by_group_flags() can be freed and thus the next
      entry pointer we have cached may become stale and we dereference free
      memory.
      
      Fix the problem by first moving marks to free to a special private list
      and then always free the first entry in the special list.  This method
      is safe even when entries from the list can disappear once we drop the
      lock.
      Signed-off-by: NJan Kara <jack@suse.com>
      Reported-by: NAshish Sangwan <a.sangwan@samsung.com>
      Reviewed-by: NAshish Sangwan <a.sangwan@samsung.com>
      Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f2f3eb5
    • A
      signalfd: fix information leak in signalfd_copyinfo · 3ead7c52
      Amanieu d'Antras 提交于
      This function may copy the si_addr_lsb field to user mode when it hasn't
      been initialized, which can leak kernel stack data to user mode.
      
      Just checking the value of si_code is insufficient because the same
      si_code value is shared between multiple signals.  This is solved by
      checking the value of si_signo in addition to si_code.
      Signed-off-by: NAmanieu d'Antras <amanieu@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ead7c52
    • J
      ocfs2: fix BUG in ocfs2_downconvert_thread_do_work() · 209f7512
      Joseph Qi 提交于
      The "BUG_ON(list_empty(&osb->blocked_lock_list))" in
      ocfs2_downconvert_thread_do_work can be triggered in the following case:
      
      ocfs2dc has firstly saved osb->blocked_lock_count to local varibale
      processed, and then processes the dentry lockres.  During the dentry
      put, it calls iput and then deletes rw, inode and open lockres from
      blocked list in ocfs2_mark_lockres_freeing.  And this causes the
      variable `processed' to not reflect the number of blocked lockres to be
      processed, which triggers the BUG.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      209f7512
    • M
      fs, file table: reinit files_stat.max_files after deferred memory initialisation · 4248b0da
      Mel Gorman 提交于
      Dave Hansen reported the following;
      
      	My laptop has been behaving strangely with 4.2-rc2.  Once I log
      	in to my X session, I start getting all kinds of strange errors
      	from applications and see this in my dmesg:
      
              	VFS: file-max limit 8192 reached
      
      The problem is that the file-max is calculated before memory is fully
      initialised and miscalculates how much memory the kernel is using.  This
      patch recalculates file-max after deferred memory initialisation.  Note
      that using memory hotplug infrastructure would not have avoided this
      problem as the value is not recalculated after memory hot-add.
      
      4.1:             files_stat.max_files = 6582781
      4.2-rc2:         files_stat.max_files = 8192
      4.2-rc2 patched: files_stat.max_files = 6562467
      
      Small differences with the patch applied and 4.1 but not enough to matter.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Nicolai Stange <nicstange@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Alex Ng <alexng@microsoft.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4248b0da
    • Q
      btrfs: qgroup: Fix a regression in qgroup reserved space. · c05f9429
      Qu Wenruo 提交于
      During the change to new btrfs extent-oriented qgroup implement, due to
      it doesn't use the old __qgroup_excl_accounting() for exclusive extent,
      it didn't free the reserved bytes.
      
      The bug will cause limit function go crazy as the reserved space is
      never freed, increasing limit will have no effect and still cause
      EQOUT.
      
      The fix is easy, just free reserved bytes for newly created exclusive
      extent as what it does before.
      Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NYang Dongsheng <yangds.fnst@cn.fujitsu.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c05f9429
  3. 05 8月, 2015 1 次提交
  4. 02 8月, 2015 1 次提交
    • A
      link_path_walk(): be careful when failing with ENOTDIR · 97242f99
      Al Viro 提交于
      In RCU mode we might end up with dentry evicted just we check
      that it's a directory.  In such case we should return ECHILD
      rather than ENOTDIR, so that pathwalk would be retries in non-RCU
      mode.
      
      Breakage had been introduced in commit b18825a7 - prior to that
      we were looking at nd->inode, which had been fetched before
      verifying that ->d_seq was still valid.  That form of check
      would only be satisfied if at some point the pathname prefix
      would indeed have resolved to a non-directory.  The fix consists
      of checking ->d_seq after we'd run into a non-directory dentry,
      and failing with ECHILD in case of mismatch.
      
      Note that all branches since 3.12 have that problem...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      97242f99
  5. 01 8月, 2015 1 次提交
    • J
      nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid · 8fcd461d
      Jeff Layton 提交于
      Currently, preprocess_stateid_op calls nfs4_check_olstateid which
      verifies that the open stateid corresponds to the current filehandle in the
      call by calling nfs4_check_fh.
      
      If the stateid is a NFS4_DELEG_STID however, then no such check is done.
      This could cause incorrect enforcement of permissions, because the
      nfsd_permission() call in nfs4_check_file uses current the current
      filehandle, but any subsequent IO operation will use the file descriptor
      in the stateid.
      
      Move the call to nfs4_check_fh into nfs4_check_file instead so that it
      can be done for all stateid types.
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Cc: stable@vger.kernel.org
      [bfields: moved fh check to avoid NULL deref in special stateid case]
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      8fcd461d
  6. 31 7月, 2015 2 次提交
  7. 29 7月, 2015 3 次提交
    • D
      xfs: remote attributes need to be considered data · df150ed1
      Dave Chinner 提交于
      We don't log remote attribute contents, and instead write them
      synchronously before we commit the block allocation and attribute
      tree update transaction. As a result we are writing to the allocated
      space before the allcoation has been made permanent.
      
      As a result, we cannot consider this allocation to be a metadata
      allocation. Metadata allocation can take blocks from the free list
      and so reuse them before the transaction that freed the block is
      committed to disk. This behaviour is perfectly fine for journalled
      metadata changes as log recovery will ensure the free operation is
      replayed before the overwrite, but for remote attribute writes this
      is not the case.
      
      Hence we have to consider the remote attribute blocks to contain
      data and allocate accordingly. We do this by dropping the
      XFS_BMAPI_METADATA flag from the block allocation. This means the
      allocation will not use blocks that are on the busy list without
      first ensuring that the freeing transaction has been committed to
      disk and the blocks removed from the busy list. This ensures we will
      never overwrite a freed block without first ensuring that it is
      really free.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      df150ed1
    • D
      xfs: remote attribute headers contain an invalid LSN · e3c32ee9
      Dave Chinner 提交于
      In recent testing, a system that crashed failed log recovery on
      restart with a bad symlink buffer magic number:
      
      XFS (vda): Starting recovery (logdev: internal)
      XFS (vda): Bad symlink block magic!
      XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 2060
      
      On examination of the log via xfs_logprint, none of the symlink
      buffers in the log had a bad magic number, nor were any other types
      of buffer log format headers mis-identified as symlink buffers.
      Tracing was used to find the buffer the kernel was tripping over,
      and xfs_db identified it's contents as:
      
      000: 5841524d 00000000 00000346 64d82b48 8983e692 d71e4680 a5f49e2c b317576e
      020: 00000000 00602038 00000000 006034ce d0020000 00000000 4d4d4d4d 4d4d4d4d
      040: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
      060: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
      .....
      
      This is a remote attribute buffer, which are notable in that they
      are not logged but are instead written synchronously by the remote
      attribute code so that they exist on disk before the attribute
      transactions are committed to the journal.
      
      The above remote attribute block has an invalid LSN in it - cycle
      0xd002000, block 0 - which means when log recovery comes along to
      determine if the transaction that writes to the underlying block
      should be replayed, it sees a block that has a future LSN and so
      does not replay the buffer data in the transaction. Instead, it
      validates the buffer magic number and attaches the buffer verifier
      to it.  It is this buffer magic number check that is failing in the
      above assert, indicating that we skipped replay due to the LSN of
      the underlying buffer.
      
      The problem here is that the remote attribute buffers cannot have a
      valid LSN placed into them, because the transaction that contains 
      the attribute tree pointer changes and the block allocation that the
      attribute data is being written to hasn't yet been committed. Hence
      the LSN field in the attribute block is completely unwritten,
      thereby leaving the underlying contents of the block in the LSN
      field. It could have any value, and hence a future overwrite of the
      block by log recovery may or may not work correctly.
      
      Fix this by always writing an invalid LSN to the remote attribute
      block, as any buffer in log recovery that needs to write over the
      remote attribute should occur. We are protected from having old data
      written over the attribute by the fact that freeing the block before
      the remote attribute is written will result in the buffer being
      marked stale in the log and so all changes prior to the buffer stale
      transaction will be cancelled by log recovery.
      
      Hence it is safe to ignore the LSN in the case or synchronously
      written, unlogged metadata such as remote attribute blocks, and to
      ensure we do that correctly, we need to write an invalid LSN to all
      remote attribute blocks to trigger immediate recovery of metadata
      that is written over the top.
      
      As a further protection for filesystems that may already have remote
      attribute blocks with bad LSNs on disk, change the log recovery code
      to always trigger immediate recovery of metadata over remote
      attribute blocks.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e3c32ee9
    • D
      xfs: call dax_fault on read page faults for DAX · b2442c5a
      Dave Chinner 提交于
      When modifying the patch series to handle the XFS MMAP_LOCK nesting
      of page faults, I botched the conversion of the read page fault
      path, and so it is only every calling through the page cache. Re-add
      the necessary __dax_fault() call for such files.
      
      Because the get_blocks callback on read faults may not set up the
      mapping buffer correctly to allow unwritten extent completion to be
      run, we need to allow callers of __dax_fault() to pass a null
      complete_unwritten() callback. The DAX code always zeros the
      unwritten page when it is read faulted so there are no stale data
      exposure issues with not doing the conversion. The only downside
      will be the potential for increased CPU overhead on repeated read
      faults of the same page. If this proves to be a problem, then the
      filesystem needs to fix it's get_block callback and provide a
      convert_unwritten() callback to the read fault path.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMatthew Wilcox <willy@linux.intel.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b2442c5a
  8. 28 7月, 2015 2 次提交
    • K
      nfs: Fix an oops caused by using other thread's stack space in ASYNC mode · a49c2691
      Kinglong Mee 提交于
      An oops caused by using other thread's stack space in sunrpc ASYNC sending thread.
      
      [ 9839.007187] ------------[ cut here ]------------
      [ 9839.007923] kernel BUG at fs/nfs/nfs4xdr.c:910!
      [ 9839.008069] invalid opcode: 0000 [#1] SMP
      [ 9839.008069] Modules linked in: blocklayoutdriver rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm joydev iosf_mbi crct10dif_pclmul snd_timer crc32_pclmul crc32c_intel ghash_clmulni_intel snd soundcore ppdev pvpanic parport_pc i2c_piix4 serio_raw virtio_balloon parport acpi_cpufreq nfsd nfs_acl lockd grace auth_rpcgss sunrpc qxl drm_kms_helper virtio_net virtio_console virtio_blk ttm drm virtio_pci virtio_ring virtio ata_generic pata_acpi
      [ 9839.008069] CPU: 0 PID: 308 Comm: kworker/0:1H Not tainted 4.0.0-0.rc4.git1.3.fc23.x86_64 #1
      [ 9839.008069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 9839.008069] Workqueue: rpciod rpc_async_schedule [sunrpc]
      [ 9839.008069] task: ffff8800d8b4d8e0 ti: ffff880036678000 task.ti: ffff880036678000
      [ 9839.008069] RIP: 0010:[<ffffffffa0339cc9>]  [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4]
      [ 9839.008069] RSP: 0018:ffff88003667ba58  EFLAGS: 00010246
      [ 9839.008069] RAX: 0000000000000000 RBX: 000000001fc15e18 RCX: ffff8800c0193800
      [ 9839.008069] RDX: ffff8800e4ae3f24 RSI: 000000001fc15e2c RDI: ffff88003667bcd0
      [ 9839.008069] RBP: ffff88003667ba58 R08: ffff8800d9173008 R09: 0000000000000003
      [ 9839.008069] R10: ffff88003667bcd0 R11: 000000000000000c R12: 0000000000010000
      [ 9839.008069] R13: ffff8800d9173350 R14: 0000000000000000 R15: ffff8800c0067b98
      [ 9839.008069] FS:  0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
      [ 9839.008069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 9839.008069] CR2: 00007f988c9c8bb0 CR3: 00000000d99b6000 CR4: 00000000000407f0
      [ 9839.008069] Stack:
      [ 9839.008069]  ffff88003667bbc8 ffffffffa03412c5 00000000c6c55680 ffff880000000003
      [ 9839.008069]  0000000000000088 00000010c6c55680 0001000000000002 ffffffff816e87e9
      [ 9839.008069]  0000000000000000 00000000477290e2 ffff88003667bab8 ffffffff81327ba3
      [ 9839.008069] Call Trace:
      [ 9839.008069]  [<ffffffffa03412c5>] encode_attrs+0x435/0x530 [nfsv4]
      [ 9839.008069]  [<ffffffff816e87e9>] ? inet_sendmsg+0x69/0xb0
      [ 9839.008069]  [<ffffffff81327ba3>] ? selinux_socket_sendmsg+0x23/0x30
      [ 9839.008069]  [<ffffffff8164c1df>] ? do_sock_sendmsg+0x9f/0xc0
      [ 9839.008069]  [<ffffffff8164c278>] ? kernel_sendmsg+0x58/0x70
      [ 9839.008069]  [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc]
      [ 9839.008069]  [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc]
      [ 9839.008069]  [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4]
      [ 9839.008069]  [<ffffffffa03419a5>] encode_open+0x2d5/0x340 [nfsv4]
      [ 9839.008069]  [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4]
      [ 9839.008069]  [<ffffffffa011ab89>] ? xdr_encode_opaque+0x19/0x20 [sunrpc]
      [ 9839.008069]  [<ffffffffa0339cfb>] ? encode_string+0x2b/0x40 [nfsv4]
      [ 9839.008069]  [<ffffffffa0341bf3>] nfs4_xdr_enc_open+0xb3/0x140 [nfsv4]
      [ 9839.008069]  [<ffffffffa0110a4c>] rpcauth_wrap_req+0xac/0xf0 [sunrpc]
      [ 9839.008069]  [<ffffffffa01017db>] call_transmit+0x18b/0x2d0 [sunrpc]
      [ 9839.008069]  [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc]
      [ 9839.008069]  [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc]
      [ 9839.008069]  [<ffffffffa010caa0>] __rpc_execute+0x90/0x460 [sunrpc]
      [ 9839.008069]  [<ffffffffa010ce85>] rpc_async_schedule+0x15/0x20 [sunrpc]
      [ 9839.008069]  [<ffffffff810b452b>] process_one_work+0x1bb/0x410
      [ 9839.008069]  [<ffffffff810b47d3>] worker_thread+0x53/0x470
      [ 9839.008069]  [<ffffffff810b4780>] ? process_one_work+0x410/0x410
      [ 9839.008069]  [<ffffffff810b4780>] ? process_one_work+0x410/0x410
      [ 9839.008069]  [<ffffffff810ba7b8>] kthread+0xd8/0xf0
      [ 9839.008069]  [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180
      [ 9839.008069]  [<ffffffff81786418>] ret_from_fork+0x58/0x90
      [ 9839.008069]  [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180
      [ 9839.008069] Code: 00 00 48 c7 c7 21 fa 37 a0 e8 94 1c d6 e0 c6 05 d2 17 05 00 01 8b 03 eb d7 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 89 f3
      [ 9839.008069] RIP  [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4]
      [ 9839.008069]  RSP <ffff88003667ba58>
      [ 9839.071114] ---[ end trace cc14c03adb522e94 ]---
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      a49c2691
    • J
      nfs: plug memory leak when ->prepare_layoutcommit fails · 3471648a
      Jeff Layton 提交于
      "data" is currently leaked when the prepare_layoutcommit operation
      returns an error. Put the cred before taking the spinlock in that
      case, take the lock and then goto out_unlock which will drop the
      lock and then free "data".
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      3471648a
  9. 27 7月, 2015 3 次提交
  10. 25 7月, 2015 2 次提交
    • J
      f2fs: call set_page_dirty to attach i_wb for cgroup · 6282adbf
      Jaegeuk Kim 提交于
      The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback
      is called, __inc_wb_stat() updates i_wb's stat.
      
      So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to
      any writebacking pages.
      
      This patch should resolve the following kernel panic reported by Andreas Reis.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=101801
      
      --- Comment #2 from Andreas Reis <andreas.reis@gmail.com> ---
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
      IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
      PGD 2951ff067 PUD 2df43f067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 7 PID: 10356 Comm: gcc Tainted: G        W       4.2.0-1-cu #1
      Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS
      T01 02/03/2015
      task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000
      RIP: 0010:[<ffffffff8149deea>]  [<ffffffff8149deea>]
      __percpu_counter_add+0x1a/0x90
      RSP: 0018:ffff880295143ac8  EFLAGS: 00010082
      RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001
      RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088
      RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30
      R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088
      R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0
      FS:  00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Stack:
       0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750
       ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296
       0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40
      Call Trace:
       [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0
       [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0
       [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640
       [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150
       [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0
       [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90
       [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0
       [<ffffffff81239329>] vfs_unlink+0x109/0x1b0
       [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0
       [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20
       [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71
      Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49
      89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca
      65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a
      RIP  [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
       RSP <ffff880295143ac8>
      CR2: 00000000000000a8
      ---[ end trace 5132449a58ed93a3 ]---
      note: gcc[10356] exited with preempt_count 2
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6282adbf
    • J
      f2fs: handle error cases in move_encrypted_block · 548aedac
      Jaegeuk Kim 提交于
      This patch fixes some missing error handlers.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      548aedac
  11. 24 7月, 2015 2 次提交
  12. 23 7月, 2015 10 次提交
    • E
      mnt: Clarify and correct the disconnect logic in umount_tree · f2d0a123
      Eric W. Biederman 提交于
      rmdir mntpoint will result in an infinite loop when there is
      a mount locked on the mountpoint in another mount namespace.
      
      This is because the logic to test to see if a mount should
      be disconnected in umount_tree is buggy.
      
      Move the logic to decide if a mount should remain connected to
      it's mountpoint into it's own function disconnect_mount so that
      clarity of expression instead of terseness of expression becomes
      a virtue.
      
      When the conditions where it is invalid to leave a mount connected
      are first ruled out, the logic for deciding if a mount should
      be disconnected becomes much clearer and simpler.
      
      Fixes: e0c9c0af mnt: Update detach_mounts to leave mounts connected
      Fixes: ce07d891 mnt: Honor MNT_LOCKED when detaching mounts
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f2d0a123
    • F
      Btrfs: fix quick exhaustion of the system array in the superblock · 00d80e34
      Filipe Manana 提交于
      Omar reported that after commit 4fbcdf66 ("Btrfs: fix -ENOSPC when
      finishing block group creation"), introduced in 4.2-rc1, the following
      test was failing due to exhaustion of the system array in the superblock:
      
        #!/bin/bash
      
        truncate -s 100T big.img
        mkfs.btrfs big.img
        mount -o loop big.img /mnt/loop
      
        num=5
        sz=10T
        for ((i = 0; i < $num; i++)); do
            echo fallocate $i $sz
            fallocate -l $sz /mnt/loop/testfile$i
        done
        btrfs filesystem sync /mnt/loop
      
        for ((i = 0; i < $num; i++)); do
              echo rm $i
              rm /mnt/loop/testfile$i
              btrfs filesystem sync /mnt/loop
        done
        umount /mnt/loop
      
      This made btrfs_add_system_chunk() fail with -EFBIG due to excessive
      allocation of system block groups. This happened because the test creates
      a large number of data block groups per transaction and when committing
      the transaction we start the writeout of the block group caches for all
      the new new (dirty) block groups, which results in pre-allocating space
      for each block group's free space cache using the same transaction handle.
      That in turn often leads to creation of more block groups, and all get
      attached to the new_bgs list of the same transaction handle to the point
      of getting a list with over 1500 elements, and creation of new block groups
      leads to the need of reserving space in the chunk block reserve and often
      creating a new system block group too.
      
      So that made us quickly exhaust the chunk block reserve/system space info,
      because as of the commit mentioned before, we do reserve space for each
      new block group in the chunk block reserve, unlike before where we would
      not and would at most allocate one new system block group and therefore
      would only ensure that there was enough space in the system space info to
      allocate 1 new block group even if we ended up allocating thousands of
      new block groups using the same transaction handle. That worked most of
      the time because the computed required space at check_system_chunk() is
      very pessimistic (assumes a chunk tree height of BTRFS_MAX_LEVEL/8 and
      that all nodes/leafs in a path will be COWed and split) and since the
      updates to the chunk tree all happen at btrfs_create_pending_block_groups
      it is unlikely that a path needs to be COWed more than once (unless
      writepages() for the btree inode is called by mm in between) and that
      compensated for the need of creating any new nodes/leads in the chunk
      tree.
      
      So fix this by ensuring we don't accumulate a too large list of new block
      groups in a transaction's handles new_bgs list, inserting/updating the
      chunk tree for all accumulated new block groups and releasing the unused
      space from the chunk block reserve whenever the list becomes sufficiently
      large. This is a generic solution even though the problem currently can
      only happen when starting the writeout of the free space caches for all
      dirty block groups (btrfs_start_dirty_block_groups()).
      Reported-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Tested-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      00d80e34
    • A
      btrfs: its btrfs_err() instead of btrfs_error() · 3e303ea6
      Anand Jain 提交于
      sorry I indented to use btrfs_err() and I have no idea
      how btrfs_error() got there.
      infact I was thinking about these kind of oversights
      since these two func are too closely named.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3e303ea6
    • Z
      btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail · 95ab1f64
      Zhao Lei 提交于
      When read_tree_block() failed, we can see following dmesg:
       [  134.371389] BUG: unable to handle kernel NULL pointer dereference at 0000000000000063
       [  134.372236] IP: [<ffffffff813a4a51>] free_extent_buffer+0x21/0x90
       [  134.372236] PGD 0
       [  134.372236] Oops: 0000 [#1] SMP
       [  134.372236] Modules linked in:
       [  134.372236] CPU: 0 PID: 2289 Comm: mount Not tainted 4.2.0-rc1_HEAD_c65b99f0_+ #115
       [  134.372236] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
       [  134.372236] task: ffff88003b6e1a00 ti: ffff880011e60000 task.ti: ffff880011e60000
       [  134.372236] RIP: 0010:[<ffffffff813a4a51>]  [<ffffffff813a4a51>] free_extent_buffer+0x21/0x90
       ...
       [  134.372236] Call Trace:
       [  134.372236]  [<ffffffff81379aa1>] free_root_extent_buffers+0x91/0xb0
       [  134.372236]  [<ffffffff81379c3d>] free_root_pointers+0x17d/0x190
       [  134.372236]  [<ffffffff813801b0>] open_ctree+0x1ca0/0x25b0
       [  134.372236]  [<ffffffff8144d017>] ? disk_name+0x97/0xb0
       [  134.372236]  [<ffffffff813558aa>] btrfs_mount+0x8fa/0xab0
       ...
      
      Reason:
       read_tree_block() changed to return error number on fail,
       and this value(not NULL) is set to tree_root->node, then subsequent
       code will run to:
        free_root_pointers()
        ->free_root_extent_buffers()
        ->free_extent_buffer()
        ->atomic_read((extent_buffer *)(-E_XXX)->refs);
       and trigger above error.
      
      Fix:
       Set tree_root->node to NULL on fail to make error_handle code
       happy.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      95ab1f64
    • Z
      btrfs: Fix lockdep warning of btrfs_run_delayed_iputs() · 8a733013
      Zhao Lei 提交于
      Liu Bo <bo.li.liu@oracle.com> reported a lockdep warning of
      delayed_iput_sem in xfstests generic/241:
        [ 2061.345955] =============================================
        [ 2061.346027] [ INFO: possible recursive locking detected ]
        [ 2061.346027] 4.1.0+ #268 Tainted: G        W
        [ 2061.346027] ---------------------------------------------
        [ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock:
        [ 2061.346027]  (&fs_info->delayed_iput_sem){++++..}, at:
        [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
        [ 2061.346027] but task is already holding lock:
        [ 2061.346027]  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
        [ 2061.346027] other info that might help us debug this:
        [ 2061.346027]  Possible unsafe locking scenario:
      
        [ 2061.346027]        CPU0
        [ 2061.346027]        ----
        [ 2061.346027]   lock(&fs_info->delayed_iput_sem);
        [ 2061.346027]   lock(&fs_info->delayed_iput_sem);
        [ 2061.346027]
         *** DEADLOCK ***
      It is rarely happened, about 1/400 in my test env.
      
      The reason is recursion of btrfs_run_delayed_iputs():
        cleaner_kthread
        -> btrfs_run_delayed_iputs() *1
        -> get delayed_iput_sem lock *2
        -> iput()
        -> ...
        -> btrfs_commit_transaction()
        -> btrfs_run_delayed_iputs() *1
        -> get delayed_iput_sem lock (dead lock) *2
        *1: recursion of btrfs_run_delayed_iputs()
        *2: warning of lockdep about delayed_iput_sem
      
      When fs is in high stress, new iputs may added into fs_info->delayed_iputs
      list when btrfs_run_delayed_iputs() is running, which cause
      second btrfs_run_delayed_iputs() run into down_read(&fs_info->delayed_iput_sem)
      again, and cause above lockdep warning.
      
      Actually, it will not cause real problem because both locks are read lock,
      but to avoid lockdep warning, we can do a fix.
      
      Fix:
        Don't do btrfs_run_delayed_iputs() in btrfs_commit_transaction() for
        cleaner_kthread thread to break above recursion path.
        cleaner_kthread is calling btrfs_run_delayed_iputs() explicitly in code,
        and don't need to call btrfs_run_delayed_iputs() again in
        btrfs_commit_transaction(), it also give us a bonus to avoid stack overflow.
      
      Test:
        No above lockdep warning after patch in 1200 generic/241 tests.
      Reported-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      8a733013
    • T
      NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability · cd812599
      Trond Myklebust 提交于
      Setting the change attribute has been mandatory for all NFS versions, since
      commit 3a1556e8 ("NFSv2/v3: Simulate the change attribute"). We should
      therefore not have anything be conditional on it being set/unset.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      cd812599
    • T
      NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised · 5c675d64
      Trond Myklebust 提交于
      We can't allow caching of data until the change attribute has been
      initialised correctly.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      5c675d64
    • T
      NFS: Don't revalidate the mapping if both size and change attr are up to date · 85a23cee
      Trond Myklebust 提交于
      If we've ensured that the size and the change attribute are both correct,
      then there is no point in marking those attributes as needing revalidation
      again. Only do so if we know the size is incorrect and was not updated.
      
      Fixes: f2467b6f ("NFS: Clear NFS_INO_REVAL_PAGECACHE when...")
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      85a23cee
    • T
      NFSv4/pnfs: Ensure we don't miss a file extension · 2b83d3de
      Trond Myklebust 提交于
      pNFS writes don't return attributes, however that doesn't mean that we
      should ignore the fact that they may be extending the file. This patch
      ensures that if a write is seen to extend the file, then we always set
      an attribute barrier, and update the cached file size.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      2b83d3de
    • T
      NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked · 3c38cbe2
      Trond Myklebust 提交于
      Otherwise, nfs4_select_rw_stateid() will always return the zero stateid
      instead of the correct open stateid.
      
      Fixes: f95549cf ("NFSv4: More CLOSE/OPEN races")
      Cc: stable@vger.kernel.org # 4.0+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      3c38cbe2
  13. 22 7月, 2015 1 次提交
    • L
      Revert "fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()" · d725e66c
      Linus Torvalds 提交于
      This reverts commit a2673b6e.
      
      Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms:
      
       "Thanks for report! You are right that my patch introduces a race
        between fsnotify kthread and fsnotify_destroy_group() which can result
        in leaking inotify event on group destruction.
      
        I haven't yet decided whether the right fix is not to queue events for
        dying notification group (as that is pointless anyway) or whether we
        should just fix the original problem differently...  Whenever I look
        at fsnotify code mark handling I get lost in the maze of locks, lists,
        and subtle differences between how different notification systems
        handle notification marks :( I'll think about it over night"
      
      and after thinking about it, Jan says:
      
       "OK, I have looked into the code some more and I found another
        relatively simple way of fixing the original oops.  It will be IMHO
        better than trying to fixup this issue which has more potential for
        breakage.  I'll ask Linus to revert the fsnotify fix he already merged
        and send a new fix"
      Reported-by: NKinglong Mee <kinglongmee@gmail.com>
      Requested-by: NJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d725e66c
  14. 21 7月, 2015 2 次提交
    • K
      nfsd: Fix a file leak on nfsd4_layout_setlease failure · 1ca4b88e
      Kinglong Mee 提交于
      If nfsd4_layout_setlease fails, nfsd will not put ls->ls_file.
      
      Fix commit c5c707f9 "nfsd: implement pNFS layout recalls".
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1ca4b88e
    • K
      nfsd: Drop BUG_ON and ignore SECLABEL on absent filesystem · c2227a39
      Kinglong Mee 提交于
      On an absent filesystem (one served by another server), we need to be
      able to handle requests for certain attributest (like fs_locations, so
      the client can find out which server does have the filesystem), but
      others we can't.
      
      We forgot to take that into account when adding another attribute
      bitmask work for the SECURITY_LABEL attribute.
      
      There an export entry with the "refer" option can result in:
      
      [   88.414272] kernel BUG at fs/nfsd/nfs4xdr.c:2249!
      [   88.414828] invalid opcode: 0000 [#1] SMP
      [   88.415368] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iosf_mbi ppdev btrfs coretemp crct10dif_pclmul crc32_pclmul crc32c_intel xor ghash_clmulni_intel raid6_pq vmw_balloon parport_pc parport i2c_piix4 shpchp vmw_vmci acpi_cpufreq auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi mptscsih serio_raw mptbase e1000 scsi_transport_spi ata_generic pata_acpi [last unloaded: nfsd]
      [   88.417827] CPU: 0 PID: 2116 Comm: nfsd Not tainted 4.0.7-300.fc22.x86_64 #1
      [   88.418448] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
      [   88.419093] task: ffff880079146d50 ti: ffff8800785d8000 task.ti: ffff8800785d8000
      [   88.419729] RIP: 0010:[<ffffffffa04b3c10>]  [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd]
      [   88.420376] RSP: 0000:ffff8800785db998  EFLAGS: 00010206
      [   88.421027] RAX: 0000000000000001 RBX: 000000000018091a RCX: ffff88006668b980
      [   88.421676] RDX: 00000000fffef7fc RSI: 0000000000000000 RDI: ffff880078d05000
      [   88.422315] RBP: ffff8800785dbb58 R08: ffff880078d043f8 R09: ffff880078d4a000
      [   88.422968] R10: 0000000000010000 R11: 0000000000000002 R12: 0000000000b0a23a
      [   88.423612] R13: ffff880078d05000 R14: ffff880078683100 R15: ffff88006668b980
      [   88.424295] FS:  0000000000000000(0000) GS:ffff88007c600000(0000) knlGS:0000000000000000
      [   88.424944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   88.425597] CR2: 00007f40bc370f90 CR3: 0000000035af5000 CR4: 00000000001407f0
      [   88.426285] Stack:
      [   88.426921]  ffff8800785dbaa8 ffffffffa049e4af ffff8800785dba08 ffffffff813298f0
      [   88.427585]  ffff880078683300 ffff8800769b0de8 0000089d00000001 0000000087f805e0
      [   88.428228]  ffff880000000000 ffff880079434a00 0000000000000000 ffff88006668b980
      [   88.428877] Call Trace:
      [   88.429527]  [<ffffffffa049e4af>] ? exp_get_by_name+0x7f/0xb0 [nfsd]
      [   88.430168]  [<ffffffff813298f0>] ? inode_doinit_with_dentry+0x210/0x6a0
      [   88.430807]  [<ffffffff8123833e>] ? d_lookup+0x2e/0x60
      [   88.431449]  [<ffffffff81236133>] ? dput+0x33/0x230
      [   88.432097]  [<ffffffff8123f214>] ? mntput+0x24/0x40
      [   88.432719]  [<ffffffff812272b2>] ? path_put+0x22/0x30
      [   88.433340]  [<ffffffffa049ac87>] ? nfsd_cross_mnt+0xb7/0x1c0 [nfsd]
      [   88.433954]  [<ffffffffa04b54e0>] nfsd4_encode_dirent+0x1b0/0x3d0 [nfsd]
      [   88.434601]  [<ffffffffa04b5330>] ? nfsd4_encode_getattr+0x40/0x40 [nfsd]
      [   88.435172]  [<ffffffffa049c991>] nfsd_readdir+0x1c1/0x2a0 [nfsd]
      [   88.435710]  [<ffffffffa049a530>] ? nfsd_direct_splice_actor+0x20/0x20 [nfsd]
      [   88.436447]  [<ffffffffa04abf30>] nfsd4_encode_readdir+0x120/0x220 [nfsd]
      [   88.437011]  [<ffffffffa04b58cd>] nfsd4_encode_operation+0x7d/0x190 [nfsd]
      [   88.437566]  [<ffffffffa04aa6dd>] nfsd4_proc_compound+0x24d/0x6f0 [nfsd]
      [   88.438157]  [<ffffffffa0496103>] nfsd_dispatch+0xc3/0x220 [nfsd]
      [   88.438680]  [<ffffffffa006f0cb>] svc_process_common+0x43b/0x690 [sunrpc]
      [   88.439192]  [<ffffffffa0070493>] svc_process+0x103/0x1b0 [sunrpc]
      [   88.439694]  [<ffffffffa0495a57>] nfsd+0x117/0x190 [nfsd]
      [   88.440194]  [<ffffffffa0495940>] ? nfsd_destroy+0x90/0x90 [nfsd]
      [   88.440697]  [<ffffffff810bb728>] kthread+0xd8/0xf0
      [   88.441260]  [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180
      [   88.441762]  [<ffffffff81789e58>] ret_from_fork+0x58/0x90
      [   88.442322]  [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180
      [   88.442879] Code: 0f 84 93 05 00 00 83 f8 ea c7 85 a0 fe ff ff 00 00 27 30 0f 84 ba fe ff ff 85 c0 0f 85 a5 fe ff ff e9 e3 f9 ff ff 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 be 04 00 00 00 4c 89 ef 4c 89 8d 68 fe
      [   88.444052] RIP  [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd]
      [   88.444658]  RSP <ffff8800785db998>
      [   88.445232] ---[ end trace 6cb9d0487d94a29f ]---
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      c2227a39
  15. 18 7月, 2015 2 次提交