1. 14 4月, 2011 2 次提交
  2. 13 4月, 2011 6 次提交
  3. 12 4月, 2011 16 次提交
    • J
      cifs: don't allow mmap'ed pages to be dirtied while under writeback (try #3) · ca83ce3d
      Jeff Layton 提交于
      This is more or less the same patch as before, but with some merge
      conflicts fixed up.
      
      If a process has a dirty page mapped into its page tables, then it has
      the ability to change it while the client is trying to write the data
      out to the server. If that happens after the signature has been
      calculated then that signature will then be wrong, and the server will
      likely reset the TCP connection.
      
      This patch adds a page_mkwrite handler for CIFS that simply takes the
      page lock. Because the page lock is held over the life of writepage and
      writepages, this prevents the page from becoming writeable until
      the write call has completed.
      
      With this, we can also remove the "sign_zero_copy" module option and
      always inline the pages when writing.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      ca83ce3d
    • S
      [CIFS] Warn on requesting default security (ntlm) on mount · d9b94201
      Steve French 提交于
      Warn once if default security (ntlm) requested. We will
      update the default to the stronger security mechanism
      (ntlmv2) in 2.6.41.  Kerberos is also stronger than
      ntlm, but more servers support ntlmv2 and ntlmv2
      does not require an upcall, so ntlmv2 is a better
      default.
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      CC: Suresh Jayaraman <sjayaraman@suse.de>
      Reviewed-by: NShirish Pargaonkar <shirishp@us.ibm.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      d9b94201
    • S
      [CIFS] cifs: clarify the meaning of tcpStatus == CifsGood · fd88ce93
      Steve French 提交于
      When the TCP_Server_Info is first allocated and connected, tcpStatus ==
      CifsGood means that the NEGOTIATE_PROTOCOL request has completed and the
      socket is ready for other calls. cifs_reconnect however sets tcpStatus
      to CifsGood as soon as the socket is reconnected and the optional
      RFC1001 session setup is done. We have no clear way to tell the
      difference between these two states, and we need to know this in order
      to know whether we can send an echo or not.
      
      Resolve this by adding a new statusEnum value -- CifsNeedNegotiate. When
      the socket has been connected but has not yet had a NEGOTIATE_PROTOCOL
      request done, set it to this value. Once the NEGOTIATE is done,
      cifs_negotiate_protocol will set tcpStatus to CifsGood.
      
      This also fixes and cleans the logic in cifs_reconnect and
      cifs_reconnect_tcon. The old code checked for specific states when what
      it really wants to know is whether the state has actually changed from
      CifsNeedReconnect.
      Reported-and-Tested-by: NJG <jg@cms.ac>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      fd88ce93
    • J
      cifs: wrap received signature check in srv_mutex · 157c2491
      Jeff Layton 提交于
      While testing my patchset to fix asynchronous writes, I hit a bunch
      of signature problems when testing with signing on. The problem seems
      to be that signature checks on receive can be running at the same
      time as a process that is sending, or even that multiple receives can
      be checking signatures at the same time, clobbering the same data
      structures.
      
      While we're at it, clean up the comments over cifs_calculate_signature
      and add a note that the srv_mutex should be held when calling this
      function.
      
      This patch seems to fix the problems for me, but I'm not clear on
      whether it's the best approach. If it is, then this should probably
      go to stable too.
      
      Cc: stable@kernel.org
      Cc: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      157c2491
    • J
      cifs: clean up various nits in unicode routines (try #2) · 581ade4d
      Jeff Layton 提交于
      Minor revision to the original patch. Don't abuse the __le16 variable
      on the stack by casting it to wchar_t and handing it off to char2uni.
      Declare an actual wchar_t on the stack instead. This fixes a valid
      sparse warning.
      
      Fix the spelling of UNI_ASTERISK. Eliminate the unneeded len_remaining
      variable in cifsConvertToUCS.
      
      Also, as David Howells points out. We were better off making
      cifsConvertToUCS *not* use put_unaligned_le16 since it means that we
      can't optimize the mapped characters at compile time. Switch them
      instead to use cpu_to_le16, and simply use put_unaligned to set them
      in the string.
      Reported-and-acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      581ade4d
    • J
      cifs: clean up length checks in check2ndT2 · c0c7b905
      Jeff Layton 提交于
      Thus spake David Howells:
      
      The code that follows this:
      
        	remaining = total_data_size - data_in_this_rsp;
      	if (remaining == 0)
      		return 0;
      	else if (remaining < 0) {
      
      generates better code if you drop the 'remaining' variable and compare
      the values directly.
      
      Clean it up per his recommendation...
      Reported-and-acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      c0c7b905
    • J
      cifs: set ra_pages in backing_dev_info · 2b6c26a0
      Jeff Layton 提交于
      Commit 522440ed made cifs set backing_dev_info on the mapping attached
      to new inodes. This change caused a fairly significant read performance
      regression, as cifs started doing page-sized reads exclusively.
      
      By virtue of the fact that they're allocated as part of cifs_sb_info by
      kzalloc, the ra_pages on cifs BDIs get set to 0, which prevents any
      readahead. This forces the normal read codepaths to use readpage instead
      of readpages causing a four-fold increase in the number of read calls
      with the default rsize.
      
      Fix it by setting ra_pages in the BDI to the same value as that in the
      default_backing_dev_info.
      
      Fixes https://bugzilla.kernel.org/show_bug.cgi?id=31662
      
      Cc: stable@kernel.org
      Reported-and-Tested-by: NTill <till2.schaefer@uni-dortmund.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      2b6c26a0
    • J
      cifs: fix broken BCC check in is_valid_oplock_break · 8679b0db
      Jeff Layton 提交于
      The BCC is still __le16 at this point, and in any case we need to
      use the get_bcc_le macro to make sure we don't hit alignment
      problems.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      8679b0db
    • J
      cifs: always do is_path_accessible check in cifs_mount · 70945643
      Jeff Layton 提交于
      Currently, we skip doing the is_path_accessible check in cifs_mount if
      there is no prefixpath. I have a report of at least one server however
      that allows a TREE_CONNECT to a share that has a DFS referral at its
      root. The reporter in this case was using a UNC that had no prefixpath,
      so the is_path_accessible check was not triggered and the box later hit
      a BUG() because we were chasing a DFS referral on the root dentry for
      the mount.
      
      This patch fixes this by removing the check for a zero-length
      prefixpath.  That should make the is_path_accessible check be done in
      this situation and should allow the client to chase the DFS referral at
      mount time instead.
      
      Cc: stable@kernel.org
      Reported-and-Tested-by: NYogesh Sharma <ysharma@cymer.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      70945643
    • S
      various endian fixes to cifs · 5443d130
      Steve French 提交于
      make modules C=2 M=fs/cifs CF=-D__CHECK_ENDIAN__
      
      Found for example:
      
       CHECK   fs/cifs/cifssmb.c
      fs/cifs/cifssmb.c:728:22: warning: incorrect type in assignment (different base types)
      fs/cifs/cifssmb.c:728:22:    expected unsigned short [unsigned] [usertype] Tid
      fs/cifs/cifssmb.c:728:22:    got restricted __le16 [usertype] <noident>
      fs/cifs/cifssmb.c:1883:45: warning: incorrect type in assignment (different base types)
      fs/cifs/cifssmb.c:1883:45:    expected long long [signed] [usertype] fl_start
      fs/cifs/cifssmb.c:1883:45:    got restricted __le64 [usertype] start
      fs/cifs/cifssmb.c:1884:54: warning: restricted __le64 degrades to integer
      fs/cifs/cifssmb.c:1885:58: warning: restricted __le64 degrades to integer
      fs/cifs/cifssmb.c:1886:43: warning: incorrect type in assignment (different base types)
      fs/cifs/cifssmb.c:1886:43:    expected unsigned int [unsigned] fl_pid
      fs/cifs/cifssmb.c:1886:43:    got restricted __le32 [usertype] pid
      
      In checking new smb2 code for missing endian conversions, I noticed
      some endian errors had crept in over the last few releases into the
      cifs code (symlink, ntlmssp, posix lock, and also a less problematic warning
      in fscache).  A followon patch will address a few smb2 endian
      problems.
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      5443d130
    • S
      Elminate sparse __CHECK_ENDIAN__ warnings on port conversion · 6da97910
      Steve French 提交于
      Ports are __be16 not unsigned short int
      
      Eliminates the remaining fixable endian warnings:
      
      ~/cifs-2.6$ make modules C=1 M=fs/cifs CF=-D__CHECK_ENDIAN__
        CHECK   fs/cifs/connect.c
      fs/cifs/connect.c:2408:23: warning: incorrect type in assignment (different base types)
      fs/cifs/connect.c:2408:23:    expected unsigned short *sport
      fs/cifs/connect.c:2408:23:    got restricted __be16 *<noident>
      fs/cifs/connect.c:2410:23: warning: incorrect type in assignment (different base types)
      fs/cifs/connect.c:2410:23:    expected unsigned short *sport
      fs/cifs/connect.c:2410:23:    got restricted __be16 *<noident>
      fs/cifs/connect.c:2416:24: warning: incorrect type in assignment (different base types)
      fs/cifs/connect.c:2416:24:    expected unsigned short [unsigned] [short] <noident>
      fs/cifs/connect.c:2416:24:    got restricted __be16 [usertype] <noident>
      fs/cifs/connect.c:2423:24: warning: incorrect type in assignment (different base types)
      fs/cifs/connect.c:2423:24:    expected unsigned short [unsigned] [short] <noident>
      fs/cifs/connect.c:2423:24:    got restricted __be16 [usertype] <noident>
      fs/cifs/connect.c:2326:23: warning: incorrect type in assignment (different base types)
      fs/cifs/connect.c:2326:23:    expected unsigned short [unsigned] sport
      fs/cifs/connect.c:2326:23:    got restricted __be16 [usertype] sin6_port
      fs/cifs/connect.c:2330:23: warning: incorrect type in assignment (different base types)
      fs/cifs/connect.c:2330:23:    expected unsigned short [unsigned] sport
      fs/cifs/connect.c:2330:23:    got restricted __be16 [usertype] sin_port
      fs/cifs/connect.c:2394:22: warning: restricted __be16 degrades to integer
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      6da97910
    • S
      Max share size is too small · 2e325d59
      Steve French 提交于
      Max share name was set to 64, and (at least for Windows)
      can be 80.
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      2e325d59
    • S
      Allow user names longer than 32 bytes · 8727c8a8
      Steve French 提交于
      We artificially limited the user name to 32 bytes, but modern servers handle
      larger.  Set the maximum length to a reasonable 256, and make the user name
      string dynamically allocated rather than a fixed size in session structure.
      Also clean up old checkpatch warning.
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      8727c8a8
    • J
      cifs: replace /proc/fs/cifs/Experimental with a module parm · bdf1b03e
      Jeff Layton 提交于
      This flag currently only affects whether we allow "zero-copy" writes
      with signing enabled. Typically we map pages in the pagecache directly
      into the write request. If signing is enabled however and the contents
      of the page change after the signature is calculated but before the
      write is sent then the signature will be wrong. Servers typically
      respond to this by closing down the socket.
      
      Still, this can provide a performance benefit so the "Experimental" flag
      was overloaded to allow this. That's really not a good place for this
      option however since it's not clear what that flag does.
      
      Move that flag instead to a new module parameter that better describes
      its purpose. That's also better since it can be set at module insertion
      time by configuring modprobe.d.
      Reviewed-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      bdf1b03e
    • J
      cifs: check for private_data before trying to put it · 77970693
      Jeff Layton 提交于
      cifs_close doesn't check that the filp->private_data is non-NULL before
      trying to put it. That can cause an oops in certain error conditions
      that can occur on open or lookup before the private_data is set.
      Reported-by: NBen Greear <greearb@candelatech.com>
      CC: Stable <stable@kernel.org>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      77970693
    • L
      xfs_destroy_workqueues() should not be tagged with__exit · 39411f81
      Luck, Tony 提交于
      ia64 throws away .exit sections for the built-in CONFIG case, so routines
      that are used in other circumstances should not be tagged as __exit.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39411f81
  4. 11 4月, 2011 5 次提交
    • T
      ext4: fix data corruption regression by reverting commit 6de9843d · c8205636
      Theodore Ts'o 提交于
      Revert commit 6de9843d, since it
      caused a data corruption regression with BitTorrent downloads.  Thanks
      to Damien for discovering and bisecting to find the problem commit.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=32972Reported-by: NDamien Grassart <damien@grassart.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c8205636
    • K
      ext4: Allow indirect-block file to grow the file size to max file size · f80da1e7
      Kazuya Mio 提交于
      We can create 4402345721856 byte file with indirect block mapping.
      However, if we grow an indirect-block file to the size with ftruncate(),
      we can see an ext4 warning. The following patch fixes this problem.
      
      How to reproduce:
      # dd if=/dev/zero of=/mnt/mp1/hoge bs=1 count=0 seek=4402345721856
      0+0 records in
      0+0 records out
      0 bytes (0 B) copied, 0.000221428 s, 0.0 kB/s
      # tail -n 1 /var/log/messages
      Nov 25 15:10:27 test kernel: EXT4-fs warning (device sda8): ext4_block_to_path:345: block 1074791436 > max in inode 12
      Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f80da1e7
    • Y
      ext4: allow an active handle to be started when freezing · be4f27d3
      Yongqiang Yang 提交于
      ext4_journal_start_sb() should not prevent an active handle from being
      started due to s_frozen.  Otherwise, deadlock is easy to happen, below
      is a situation.
      
      ================================================
           freeze         |       truncate
      ================================================
                          |  ext4_ext_truncate()
          freeze_super()  |   starts a handle
          sets s_frozen   |
                          |  ext4_ext_truncate()
                          |  holds i_data_sem
        ext4_freeze()     |
        waits for updates |
                          |  ext4_free_blocks()
                          |  calls dquot_free_block()
                          |
                          |  dquot_free_blocks()
                          |  calls ext4_dirty_inode()
                          |
                          |  ext4_dirty_inode()
                          |  trys to start an active
                          |  handle
                          |
                          |  block due to s_frozen
      ================================================
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NAmir Goldstein <amir73il@users.sf.net>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      be4f27d3
    • C
      ext4: sync the directory inode in ext4_sync_parent() · 0893ed45
      Curt Wohlgemuth 提交于
      ext4 has taken the stance that, in the absence of a journal,
      when an fsync/fdatasync of an inode is done, the parent
      directory should be sync'ed if this inode entry is new.
      ext4_sync_parent(), which implements this, does indeed sync
      the dirent pages for parent directories, but it does not
      sync the directory *inode*.  This patch fixes this.
      
      Also now return error status from ext4_sync_parent().
      
      I tested this using a power fail test, which panics a
      machine running a file server getting requests from a
      client.  Without this patch, on about every other test run,
      the server is missing many, many files that had been synced.
      With this patch, on > 6 runs, I see zero files being lost.
      
      Google-Bug-Id: 4179519
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0893ed45
    • J
      nfsd4: fix oops on lock failure · 23fcf2ec
      J. Bruce Fields 提交于
      Lock stateid's can have access_bmap 0 if they were only partially
      initialized (due to a failed lock request); handle that case in
      free_generic_stateid.
      
      ------------[ cut here ]------------
      kernel BUG at fs/nfsd/nfs4state.c:380!
      invalid opcode: 0000 [#1] SMP
      last sysfs file: /sys/kernel/mm/ksm/run
      Modules linked in: nfs fscache md4 nls_utf8 cifs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss sunrpc ipv6 ppdev parport_pc parport pcnet32 mii pcspkr microcode i2c_piix4 BusLogic floppy [last unloaded: mperf]
      
      Pid: 1468, comm: nfsd Not tainted 2.6.38+ #120 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
      EIP: 0060:[<e24f180d>] EFLAGS: 00010297 CPU: 0
      EIP is at nfs4_access_to_omode+0x1c/0x29 [nfsd]
      EAX: ffffffff EBX: dd758120 ECX: 00000000 EDX: 00000004
      ESI: dd758120 EDI: ddfe657c EBP: dd54dde0 ESP: dd54dde0
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      Process nfsd (pid: 1468, ti=dd54c000 task=ddc92580 task.ti=dd54c000)
      Stack:
       dd54ddf0 e24f19ca 00000000 ddfe6560 dd54de08 e24f1a5d dd758130 deee3a20
       ddfe6560 31270000 dd54df1c e24f52fd 0000000f dd758090 e2505dd0 0be304cf
       dbb51d68 0000000e ddfe657c ddcd8020 dd758130 dd758128 dd7580d8 dd54de68
      Call Trace:
       [<e24f19ca>] free_generic_stateid+0x1c/0x3e [nfsd]
       [<e24f1a5d>] release_lockowner+0x71/0x8a [nfsd]
       [<e24f52fd>] nfsd4_lock+0x617/0x66c [nfsd]
       [<e24e57b6>] ? nfsd_setuser+0x199/0x1bb [nfsd]
       [<e24e056c>] ? nfsd_setuser_and_check_port+0x65/0x81 [nfsd]
       [<c07a0052>] ? _cond_resched+0x8/0x1c
       [<c04ca61f>] ? slab_pre_alloc_hook.clone.33+0x23/0x27
       [<c04cac01>] ? kmem_cache_alloc+0x1a/0xd2
       [<c04835a0>] ? __call_rcu+0xd7/0xdd
       [<e24e0dfb>] ? fh_verify+0x401/0x452 [nfsd]
       [<e24f0b61>] ? nfsd4_encode_operation+0x52/0x117 [nfsd]
       [<e24ea0d7>] ? nfsd4_putfh+0x33/0x3b [nfsd]
       [<e24f4ce6>] ? nfsd4_delegreturn+0xd4/0xd4 [nfsd]
       [<e24ea2c9>] nfsd4_proc_compound+0x1ea/0x33e [nfsd]
       [<e24de6ee>] nfsd_dispatch+0xd1/0x1a5 [nfsd]
       [<e1d6e1c7>] svc_process_common+0x282/0x46f [sunrpc]
       [<e1d6e578>] svc_process+0xdc/0xfa [sunrpc]
       [<e24de0fa>] nfsd+0xd6/0x115 [nfsd]
       [<e24de024>] ? nfsd_shutdown+0x24/0x24 [nfsd]
       [<c0454322>] kthread+0x62/0x67
       [<c04542c0>] ? kthread_worker_fn+0x114/0x114
       [<c07a6ebe>] kernel_thread_helper+0x6/0x10
      Code: eb 05 b8 00 00 27 4f 8d 65 f4 5b 5e 5f 5d c3 83 e0 03 55 83 f8 02 89 e5 74 17 83 f8 03 74 05 48 75 09 eb 09 b8 02 00 00 00 eb 0b <0f> 0b 31 c0 eb 05 b8 01 00 00 00 5d c3 55 89 e5 57 56 89 d6 8d
      EIP: [<e24f180d>] nfs4_access_to_omode+0x1c/0x29 [nfsd] SS:ESP 0068:dd54dde0
      ---[ end trace 2b0bf6c6557cb284 ]---
      
      The trace route is:
      
       -> nfsd4_lock()
         -> if (lock->lk_is_new) {
           -> alloc_init_lock_stateid()
      
              3739: stp->st_access_bmap = 0;
      
         ->if (status && lock->lk_is_new && lock_sop)
           -> release_lockowner()
            -> free_generic_stateid()
             -> nfs4_access_bmap_to_omode()
                -> nfs4_access_to_omode()
      
              380: BUG();   *****
      
      This problem was introduced by 0997b173.
      Reported-by: NMi Jinlong <mijinlong@cn.fujitsu.com>
      Tested-by: NMi Jinlong <mijinlong@cn.fujitsu.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      23fcf2ec
  5. 08 4月, 2011 11 次提交
    • C
      xfs: use proper interfaces for on-stack plugging · a1b7ea5d
      Christoph Hellwig 提交于
      Add proper blk_start_plug/blk_finish_plug pairs for the two places where
      we issue buffer I/O, and remove the blk_flush_plug in xfs_buf_lock and
      xfs_buf_iowait, given that context switches already flush the per-process
      plugging lists.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      a1b7ea5d
    • C
      xfs: fix xfs_debug warnings · 957935dc
      Christoph Hellwig 提交于
      For a CONFIG_XFS_DEBUG=n build gcc complains about statements with no
      effect in xfs_debug:
      
      fs/xfs/quota/xfs_qm_syscalls.c: In function 'xfs_qm_scall_trunc_qfiles':
      fs/xfs/quota/xfs_qm_syscalls.c:291:3: warning: statement with no effect
      
      The reason for that is that the various new xfs message functions have a
      return value which is never used, and in case of the non-debug build
      xfs_debug the macro evaluates to a plain 0 which produces the above
      warnings.  This can be fixed by turning xfs_debug into an inline function
      instead of a macro, but in addition to that I've also changed all the
      message helpers to return void as we never use their return values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      957935dc
    • C
      xfs: fix variable set but not used warnings · ecb697c1
      Christoph Hellwig 提交于
      GCC 4.6 now warnings about variables set but not used.  Fix the trivially
      fixable warnings of this sort.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ecb697c1
    • D
      xfs: convert log tail checking to a warning · da8a1a4a
      Dave Chinner 提交于
      On the Power platform, the log tail debug checks fire excessively
      causing the system to panic early in testing. The debug checks are
      known to be racy, though on x86_64 there is no evidence that they
      trigger at all.
      
      We want to keep the checks active on debug systems to alert us to
      problems with log space accounting, but we need to reduce the impact
      of a racy check on testing on the Power platform.
      
      As a result, convert the ASSERT conditions to warnings, and
      allow them to fire only once per filesystem mount. This will prevent
      false positives from interfering with testing, whilst still
      providing us with the indication that they may be a problem with log
      space accounting should that occur.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      da8a1a4a
    • D
      xfs: catch bad block numbers freeing extents. · be65b18a
      Dave Chinner 提交于
      A fuzzed filesystem crashed a kernel when freeing an extent with a
      block number beyond the end of the filesystem. Convert all the debug
      asserts in xfs_free_extent() to active checks so that we catch bad
      extents and return that the filesytsem is corrupted rather than
      crashing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      be65b18a
    • D
      xfs: push the AIL from memory reclaim and periodic sync · fd074841
      Dave Chinner 提交于
      When we are short on memory, we want to expedite the cleaning of
      dirty objects.  Hence when we run short on memory, we need to kick
      the AIL flushing into action to clean as many dirty objects as
      quickly as possible.  To implement this, sample the lsn of the log
      item at the head of the AIL and use that as the push target for the
      AIL flush.
      
      Further, we keep items in the AIL that are dirty that are not
      tracked any other way, so we can get objects sitting in the AIL that
      don't get written back until the AIL is pushed. Hence to get the
      filesystem to the idle state, we might need to push the AIL to flush
      out any remaining dirty objects sitting in the AIL. This requires
      the same push mechanism as the reclaim push.
      
      This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
      match the new xfs_ail_max_lsn() function introduced in this patch.
      Similarly for xfs_trans_ail_push -> xfs_ail_push.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      fd074841
    • D
      xfs: clean up code layout in xfs_trans_ail.c · cd4a3c50
      Dave Chinner 提交于
      This patch rearranges the location of functions in xfs_trans_ail.c
      to remove the need for forward declarations of those functions in
      preparation for adding new functions without the need for forward
      declarations.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      cd4a3c50
    • D
      xfs: convert the xfsaild threads to a workqueue · 0bf6a5bd
      Dave Chinner 提交于
      Similar to the xfssyncd, the per-filesystem xfsaild threads can be
      converted to a global workqueue and run periodically by delayed
      works. This makes sense for the AIL pushing because it uses
      variable timeouts depending on the work that needs to be done.
      
      By removing the xfsaild, we simplify the AIL pushing code and
      remove the need to spread the code to implement the threading
      and pushing across multiple files.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      0bf6a5bd
    • D
      xfs: introduce background inode reclaim work · a7b339f1
      Dave Chinner 提交于
      Background inode reclaim needs to run more frequently that the XFS
      syncd work is run as 30s is too long between optimal reclaim runs.
      Add a new periodic work item to the xfs syncd workqueue to run a
      fast, non-blocking inode reclaim scan.
      
      Background inode reclaim is kicked by the act of marking inodes for
      reclaim.  When an AG is first marked as having reclaimable inodes,
      the background reclaim work is kicked. It will continue to run
      periodically untill it detects that there are no more reclaimable
      inodes. It will be kicked again when the first inode is queued for
      reclaim.
      
      To ensure shrinker based inode reclaim throttles to the inode
      cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
      background inode reclaim so that when we are low on memory we are
      trying to reclaim inodes as efficiently as possible. This kick shoul
      d not be necessary, but it will protect against failures to kick the
      background reclaim when inodes are first dirtied.
      
      To provide the rate throttling, make the shrinker pass do
      synchronous inode reclaim so that it blocks on inodes under IO. This
      means that the shrinker will reclaim inodes rather than just
      skipping over them, but it does not adversely affect the rate of
      reclaim because most dirty inodes are already under IO due to the
      background reclaim work the shrinker kicked.
      
      These two modifications solve one of the two OOM killer invocations
      Chris Mason reported recently when running a stress testing script.
      The particular workload trigger for the OOM killer invocation is
      where there are more threads than CPUs all unlinking files in an
      extremely memory constrained environment. Unlike other solutions,
      this one does not have a performance impact on performance when
      memory is not constrained or the number of concurrent threads
      operating is <= to the number of CPUs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      a7b339f1
    • D
      xfs: convert ENOSPC inode flushing to use new syncd workqueue · 89e4cb55
      Dave Chinner 提交于
      On of the problems with the current inode flush at ENOSPC is that we
      queue a flush per ENOSPC event, regardless of how many are already
      queued. Thi can result in    hundreds of queued flushes, most of
      which simply burn CPU scanned and do no real work. This simply slows
      down allocation at ENOSPC.
      
      We really only need one active flush at a time, and we can easily
      implement that via the new xfs_syncd_wq. All we need to do is queue
      a flush if one is not already active, then block waiting for the
      currently active flush to complete. The result is that we only ever
      have a single ENOSPC inode flush active at a time and this greatly
      reduces the overhead of ENOSPC processing.
      
      On my 2p test machine, this results in tests exercising ENOSPC
      conditions running significantly faster - 042 halves execution time,
      083 drops from 60s to 5s, etc - while not introducing test
      regressions.
      
      This allows us to remove the old xfssyncd threads and infrastructure
      as they are no longer used.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      89e4cb55
    • D
      xfs: introduce a xfssyncd workqueue · c6d09b66
      Dave Chinner 提交于
      All of the work xfssyncd does is background functionality. There is
      no need for a thread per filesystem to do this work - it can al be
      managed by a global workqueue now they manage concurrency
      effectively.
      
      Introduce a new gglobal xfssyncd workqueue, and convert the periodic
      work to use this new functionality. To do this, use a delayed work
      construct to schedule the next running of the periodic sync work
      for the filesystem. When the sync work is complete, queue a new
      delayed work for the next running of the sync work.
      
      For laptop mode, we wait on completion for the sync works, so ensure
      that the sync work queuing interface can flush and wait for work to
      complete to enable the work queue infrastructure to replace the
      current sequence number and wakeup that is used.
      
      Because the sync work does non-trivial amounts of work, mark the
      new work queue as CPU intensive.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      c6d09b66