1. 31 1月, 2011 1 次提交
    • A
      NTFS: Fix invalid pointer dereference in ntfs_mft_record_alloc(). · af5eb745
      Anton Altaparmakov 提交于
      In ntfs_mft_record_alloc() when mapping the new extent mft record with
      map_extent_mft_record() we overwrite @m with the return value and on
      error, we then try to use the old @m but that is no longer there as @m
      now contains an error code instead so we crash when dereferencing the
      error code as if it were a pointer.
      
      The simple fix is to use a temporary variable to store the return value
      thus preserving the original @m for later use.  This is a backport from
      the commercial Tuxera-NTFS driver and is well tested...
      
      Thanks go to Julia Lawall for pointing this out (whilst I had fixed it
      in the commercial driver I had failed to fix it in the Linux kernel).
      Signed-off-by: NAnton Altaparmakov <anton@tuxera.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af5eb745
  2. 29 1月, 2011 3 次提交
    • C
      NFS: NFSv4 readdir loses entries · d1205f87
      Chuck Lever 提交于
      On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
      NFSv4 mounts of OpenSolaris with something like:
      
      > ./test6: readdir
      > 	./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0
      > 	./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0
      > 	./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0
      > 	./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
      > basic tests failed
      > Tests failed, leaving /mnt/klimt mounted
      > [cel@matisse cthon04]$
      
      I narrowed the problem down to nfs4_decode_dirent() reporting that the
      decode buffer had overflowed while decoding the entries for those
      missing files.
      
      verify_attr_len() assumes both it's pointer arguments reside on the
      same page.  When these arguments point to locations on two different
      pages, verify_attr_len() can report false errors.  This can happen now
      that a large NFSv4 readdir result can span pages.
      
      We have reasonably good checking in nfs4_decode_dirent() anyway, so
      it should be safe to simply remove the extra checking.
      
      At a guess, this was introduced by commit 6650239a, "NFS: Don't use
      vm_map_ram() in readdir".
      
      Cc: stable@kernel.org [2.6.37]
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      d1205f87
    • C
      NFS: Micro-optimize nfs4_decode_dirent() · c08e76d0
      Chuck Lever 提交于
      Make the decoding of NFSv4 directory entries slightly more efficient
      by:
      
        1.  Avoiding unnecessary byte swapping when checking XDR booleans,
            and
      
        2.  Not bumping "p" when its value will be immediately replaced by
            xdr_inline_decode()
      
      This commit makes nfs4_decode_dirent() consistent with similar logic
      in the other two decode_dirent() functions.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c08e76d0
    • T
      NFS: Fix an NFS client lockdep issue · e00b8a24
      Trond Myklebust 提交于
      There is no reason to be freeing the delegation cred in the rcu callback,
      and doing so is resulting in a lockdep complaint that rpc_credcache_lock
      is being called from both softirq and non-softirq contexts.
      Reported-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@kernel.org
      e00b8a24
  3. 28 1月, 2011 9 次提交
    • B
      xfs: xfs_bmap_add_extent_delay_real should init br_startblock · 24446fc6
      bpm@sgi.com 提交于
      When filling in the middle of a previous delayed allocation in
      xfs_bmap_add_extent_delay_real, set br_startblock of the new delay
      extent to the right to nullstartblock instead of 0 before inserting
      the extent into the ifork (xfs_iext_insert), rather than setting
      br_startblock afterward.
      
      Adding the extent into the ifork with br_startblock=0 can lead to
      the extent being copied into the btree by xfs_bmap_extent_to_btree
      if we happen to convert from extents format to btree format before
      updating br_startblock with the correct value.  The unexpected
      addition of this delay extent to the btree can cause subsequent
      XFS_WANT_CORRUPTED_GOTO filesystem shutdown in several
      xfs_bmap_add_extent_delay_real cases where we are converting a delay
      extent to real and unexpectedly find an extent already inserted.
      For example:
      
      911         case BMAP_LEFT_FILLING:
      912                 /*
      913                  * Filling in the first part of a previous delayed allocation.
      914                  * The left neighbor is not contiguous.
      915                  */
      916                 trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_);
      917                 xfs_bmbt_set_startoff(ep, new_endoff);
      918                 temp = PREV.br_blockcount - new->br_blockcount;
      919                 xfs_bmbt_set_blockcount(ep, temp);
      920                 xfs_iext_insert(ip, idx, 1, new, state);
      921                 ip->i_df.if_lastex = idx;
      922                 ip->i_d.di_nextents++;
      923                 if (cur == NULL)
      924                         rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
      925                 else {
      926                         rval = XFS_ILOG_CORE;
      927                         if ((error = xfs_bmbt_lookup_eq(cur, new->br_startoff,
      928                                         new->br_startblock, new->br_blockcount,
      929                                         &i)))
      930                                 goto done;
      931                         XFS_WANT_CORRUPTED_GOTO(i == 0, done);
      
      With the bogus extent in the btree we shutdown the filesystem at
      931.  The conversion from extents to btree format happens when the
      number of extents in the inode increases above ip->i_df.if_ext_max.
      xfs_bmap_extent_to_btree copies extents from the ifork into the
      btree, ignoring all delalloc extents which are denoted by
      br_startblock having some value of nullstartblock.
      
      SGI-PV: 1013221
      Signed-off-by: NBen Myers <bpm@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      24446fc6
    • D
      xfs: fix dquot shaker deadlock · 0fbca4d1
      Dave Chinner 提交于
      Commit 368e1361 ("xfs: remove duplicate code from dquot reclaim") fails
      to unlock the dquot freelist when the number of loop restarts is
      exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
      reclaim.
      
      Rework the loop control logic into an unwind stack that all the
      different cases jump into. This means there is only one set of code
      that processes the loop exit criteria, and simplifies the unlocking
      of all the items from different points in the loop. It also fixes a
      double increment of the restart counter from the qi_dqlist_lock
      case.
      Reported-by: NMalcolm Scott <lkml@malc.org.uk>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      0fbca4d1
    • D
      xfs: handle CIl transaction commit failures correctly · c6f990d1
      Dave Chinner 提交于
      Failure to commit a transaction into the CIL is not handled
      correctly. This currently can only happen when racing with a
      shutdown and requires an explicit shutdown check, so it rare and can
      be avoided. Remove the shutdown check and make the CIL commit a void
      function to indicate it will always succeed, thereby removing the
      incorrectly handled failure case.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      c6f990d1
    • D
      xfs: limit extsize to size of AGs and/or MAXEXTLEN · 5315837d
      Dave Chinner 提交于
      The extent size hint can be set to larger than an AG. This means
      that the alignment process can push the range to be allocated
      outside the bounds of the AG, resulting in assert failures or
      corrupted bmbt records. Similarly, if the extsize is larger than the
      maximum extent size supported, the alignment process will produce
      extents that are too large to fit into the bmbt records, resulting
      in a different type of assert/corruption failure.
      
      Fix this by limiting extsize at the time іt is set firstly to be
      less than MAXEXTLEN, then to be a maximum of half the size of the
      AGs in the filesystem for non-realtime inodes. Realtime inodes do
      not allocate out of AGs, so don't have to be restricted by the size
      of AGs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      5315837d
    • D
      xfs: prevent extsize alignment from exceeding maximum extent size · 4ce15989
      Dave Chinner 提交于
      When doing delayed allocation, if the allocation size is for a
      maximally sized extent, extent size alignment can push it over this
      limit. This results in an assert failure in xfs_bmbt_set_allf() as
      the extent length is too large to find in the extent record.
      
      Fix this by ensuring that we allow for space that extent size
      alignment requires (up to 2 * (extsize -1) blocks as we have to
      handle both head and tail alignment) when limiting the maximum size
      of the extent.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      4ce15989
    • D
      xfs: limit extent length for allocation to AG size · 14b064ce
      Dave Chinner 提交于
      Delayed allocation extents can be larger than AGs, so when trying to
      convert a large range we may scan every AG inside
      xfs_bmap_alloc_nullfb() trying to find an AG with a size larger than
      an AG. We should stop when we find the first AG with a maximum
      possible allocation size. This causes excessive CPU usage when there
      are lots of AGs.
      
      The same problem occurs when doing preallocation of a range larger
      than an AG.
      
      Fix the problem by limiting real allocation lengths to the maximum
      that an AG can support. This means if we have empty AGs, we'll stop
      the search at the first of them. If there are no empty AGs, we'll
      still scan them all, but that is a different problem....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      14b064ce
    • D
      xfs: speculative delayed allocation uses rounddown_power_of_2 badly · b8fc8263
      Dave Chinner 提交于
      rounddown_power_of_2() returns an undefined result when passed a
      value of zero. The specualtive delayed allocation code is doing this
      when the inode is zero length. Hence occasionally the preallocation
      is much, much larger than is necessary (e.g. 8GB for a 270 _byte_
      file). Ensure we don't even pass a zero value to this function so
      the result of preallocation is always the desired size.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      b8fc8263
    • D
      xfs: fix efi item leak on forced shutdown · e34a314c
      Dave Chinner 提交于
      After test 139, kmemleak shows:
      
      unreferenced object 0xffff880078b405d8 (size 400):
        comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
        hex dump (first 32 bytes):
          60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff  `..y....`..y....
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
          [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
          [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
          [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
          [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0
          [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90
          [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0
          [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0
          [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70
          [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20
          [<ffffffff8117dc00>] notify_change+0x170/0x2e0
          [<ffffffff81163bf6>] do_truncate+0x66/0xa0
          [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0
          [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      The cause of the leak is that the "remove" parameter of IOP_UNPIN()
      is never set when a CIL push is aborted. This means that the EFI
      item is never freed if it was in the push being cancelled. The
      problem is specific to delayed logging, but has uncovered a couple
      of problems with the handling of IOP_UNPIN(remove).
      
      Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
      in the CIL commit failure path or the iclog write failure path
      because for delayed loging we have no transaction context. Hence we
      must only call xfs_trans_del_item() if the log item being unpinned
      has an active log item descriptor.
      
      Secondly, xfs_trans_uncommit() does not handle log item descriptor
      freeing during the traversal of log items on a transaction. It can
      reference a freed log item descriptor when unpinning an EFI item.
      Hence it needs to use a safe list traversal method to allow items to
      be removed from the transaction during IOP_UNPIN().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      e34a314c
    • S
      cifs: More crypto cleanup (try #2) · ee2c9258
      Shirish Pargaonkar 提交于
      Replaced md4 hashing function local to cifs module with kernel crypto APIs.
      As a result, md4 hashing function and its supporting functions in
      file md4.c are not needed anymore.
      
      Cleaned up function declarations, removed forward function declarations,
      and removed a header file that is being deleted from being included.
      
      Verified that sec=ntlm/i, sec=ntlmv2/i, and sec=ntlmssp/i work correctly.
      Signed-off-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      ee2c9258
  4. 27 1月, 2011 1 次提交
    • D
      xfs: fix log ticket leak on forced shutdown. · 7db37c5e
      Dave Chinner 提交于
      The kmemleak detector shows this after test 139:
      
      unreferenced object 0xffff880079b88bb0 (size 264):
        comm "xfs_io", pid 4904, jiffies 4294909382 (age 276.824s)
        hex dump (first 32 bytes):
          00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
          ff ff ff ff ff ff ff ff 48 7b c9 82 ff ff ff ff  ........H{......
        backtrace:
          [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
          [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
          [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
          [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
          [<ffffffff8148f394>] xlog_ticket_alloc+0x34/0x170
          [<ffffffff81494444>] xlog_cil_push+0xa4/0x3f0
          [<ffffffff81494eca>] xlog_cil_force_lsn+0x15a/0x160
          [<ffffffff814933a5>] _xfs_log_force_lsn+0x75/0x2d0
          [<ffffffff814a264d>] _xfs_trans_commit+0x2bd/0x2f0
          [<ffffffff8148bfdd>] xfs_iomap_write_allocate+0x1ad/0x350
          [<ffffffff814ac17f>] xfs_map_blocks+0x21f/0x370
          [<ffffffff814ad1b7>] xfs_vm_writepage+0x1c7/0x550
          [<ffffffff8112200a>] __writepage+0x1a/0x50
          [<ffffffff81122df2>] write_cache_pages+0x1c2/0x4c0
          [<ffffffff81123117>] generic_writepages+0x27/0x30
          [<ffffffff814aba5d>] xfs_vm_writepages+0x5d/0x80
      
      By inspection, the leak occurs when xlog_write() returns and error
      and we jump to the abort path without dropping the reference on the
      active ticket.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      7db37c5e
  5. 26 1月, 2011 17 次提交
    • A
      NFS construct consistent co_ownerid for v4.1 · c7a360b0
      Andy Adamson 提交于
      As stated in section 2.4 of RFC 5661, subsequent instances of the client need
      to present the same co_ownerid. Concatinate the client's IP dot address,
      host name, and the rpc_auth pseudoflavor to form the co_ownerid.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c7a360b0
    • T
      console: rename acquire/release_console_sem() to console_lock/unlock() · ac751efa
      Torben Hohn 提交于
      The -rt patches change the console_semaphore to console_mutex.  As a
      result, a quite large chunk of the patches changes all
      acquire/release_console_sem() to acquire/release_console_mutex()
      
      This commit makes things use more neutral function names which dont make
      implications about the underlying lock.
      
      The only real change is the return value of console_trylock which is
      inverted from try_acquire_console_sem()
      
      This patch also paves the way to switching console_sem from a semaphore to
      a mutex.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert]
      Signed-off-by: NTorben Hohn <torbenh@gmx.de>
      Cc: Thomas Gleixner <tglx@tglx.de>
      Cc: Greg KH <gregkh@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac751efa
    • P
      squashfs: fix use of uninitialised variable in zlib & xz decompressors · 3689456b
      Phillip Lougher 提交于
      Fix potential use of uninitialised variable caused by recent
      decompressor code optimisations.
      
      In zlib_uncompress (zlib_wrapper.c) we have
      
      	int zlib_err, zlib_init = 0;
      	...
      	do {
      		...
      			if (avail == 0) {
      				offset = 0;
      				put_bh(bh[k++]);
      				continue;
      			}
      		...
      		zlib_err = zlib_inflate(stream, Z_SYNC_FLUSH);
      		...
      	} while (zlib_err == Z_OK);
      
      If continue is executed (avail == 0) then the while condition will be
      evaluated testing zlib_err, which is uninitialised first time around the
      loop.
      
      Fix this by getting rid of the 'if (avail == 0)' condition test, this
      edge condition should not be being handled in the decompressor code, and
      instead handle it generically in the caller code.
      
      Similarly for xz_wrapper.c.
      
      Incidentally, on most architectures (bar Mips and Parisc), no
      uninitialised variable warning is generated by gcc, this is because the
      while condition test on continue is optimised out and not performed
      (when executing continue zlib_err has not been changed since entering
      the loop, and logically if the while condition was true previously, then
      it's still true).
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      Reported-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3689456b
    • T
      NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount · 27dc1cd3
      Trond Myklebust 提交于
      If the call to nfs_wcc_update_inode() results in an attribute update, we
      need to ensure that the inode's attr_gencount gets bumped too, otherwise
      we are not protected against races with other GETATTR calls.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      27dc1cd3
    • A
      NFS improve pnfs_put_deviceid_cache debug print · b2a2897d
      Andy Adamson 提交于
      What we really want to know is the ref count.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      b2a2897d
    • A
      NFS fix cb_sequence error processing · 2c4cdf8f
      Andy Adamson 提交于
      Always assign the cb_process_state nfs_client pointer so a processing error
      in cb_sequence after the nfs_client is found and referenced returns
      a non-NULL cb_process_state nfs_client and the matching nfs_put_client in
      nfs4_callback_compound dereferences the client.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      2c4cdf8f
    • A
      NFS do not find client in NFSv4 pg_authenticate · 778be232
      Andy Adamson 提交于
      The information required to find the nfs_client cooresponding to the incoming
      back channel request is contained in the NFS layer. Perform minimal checking
      in the RPC layer pg_authenticate method, and push more detailed checking into
      the NFS layer where the nfs_client can be found.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      778be232
    • C
      NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!" · 80c30e8d
      Chuck Lever 提交于
      Nick Bowler <nbowler@elliptictech.com> reports:
      
      > We were just having some NFS server troubles, and my client machine
      > running 2.6.38-rc1+ (specifically, commit 2b1caf6e) crashed
      > hard (syslog output appended to this mail).
      >
      > I'm not sure what the exact timeline was or how to reproduce this,
      > but the server was rebooted during all this.  Since I've never seen
      > this happen before, it is possibly a regression from previous kernel
      > releases.  However, I recently updated my nfs-utils (on the client) to
      > version 1.2.3, so that might be related as well.
      
        [ BUG output redacted ]
      
      When done searching, the for_each_host loop in next_host_state() falls
      through and returns the final host on the host chain without bumping
      it's reference count.
      
      Since the host's ref count is only one at that point, releasing the
      host in nlm_host_rebooted() attempts to destroy the host prematurely,
      and therefore hits a BUG().
      
      Likely, the original intent of the for_each_host behavior in
      next_host_state() was to handle the case when the host chain is empty.
      Searching the chain and finding no suitable host to return needs to be
      handled as well.
      
      Defensively restructure next_host_state() always to return NULL when
      the loop falls through.
      
      Introduced by commit b10e30f6 "lockd: reorganize nlm_host_rebooted".
      
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      80c30e8d
    • C
      NFS: Prevent memory allocation failure in nfsacl_encode() · f61f6da0
      Chuck Lever 提交于
      nfsacl_encode() allocates memory in certain cases.  This of course
      is not guaranteed to work.
      
      Since commit 9f06c719 "SUNRPC: New xdr_streams XDR encoder API", the
      kernel's XDR encoders can't return a result indicating possibly a
      failure, so a memory allocation failure in nfsacl_encode() has become
      fatal (ie, the XDR code Oopses) in some cases.
      
      However, the allocated memory is a tiny fixed amount, on the order
      of 40-50 bytes.  We can easily use a stack-allocated buffer for
      this, with only a wee bit of nose-holding.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f61f6da0
    • C
      NFS: nfsacl_{encode,decode} should return signed integer · 731f3f48
      Chuck Lever 提交于
      Clean up.
      
      The nfsacl_encode() and nfsacl_decode() functions return negative
      errno values, and each call site verifies that the returned value
      is not negative.  Change the synopsis of both of these functions
      to reflect this usage.
      
      Document the synopsis and return values.
      Reported-by: NTrond Myklebust <trond.myklebust@netapp.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      731f3f48
    • C
      NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!" · ee5dc773
      Chuck Lever 提交于
      Milan Broz <mbroz@redhat.com> reports:
      
      > on today Linus' tree I get OOps if using nfs.
      >
      > server (2.6.36) exports dir:
      > /dir   172.16.1.0/24(rw,async,all_squash,no_subtree_check,anonuid=500,anongid=500)
      >
      > on client it is mounted  in fstab
      > server:/dir  /mnt/tst  nfs  rw,soft 0 0
      >
      > and these commands OOpses it (simplified from a configure script):
      >
      > cd /dir
      > touch x
      > install x y
      >
      > [  105.327701] ------------[ cut here ]------------
      > [  105.327979] kernel BUG at fs/nfs/nfs3xdr.c:1338!
      > [  105.328075] invalid opcode: 0000 [#1] PREEMPT SMP
      > [  105.328223] last sysfs file: /sys/devices/virtual/bdi/0:16/uevent
      > [  105.328349] Modules linked in: usbcore dm_mod
      > [  105.328553]
      > [  105.328678] Pid: 3710, comm: install Not tainted 2.6.37+ #423 440BX Desktop Reference Platform/VMware Virtual Platform
      > [  105.328853] EIP: 0060:[<c116c06c>] EFLAGS: 00010282 CPU: 0
      > [  105.329152] EIP is at nfs3_xdr_enc_setacl3args+0x61/0x98
      > [  105.329249] EAX: ffffffea EBX: ce941d98 ECX: 00000000 EDX: 00000004
      > [  105.329340] ESI: ce941cd0 EDI: 000000a4 EBP: ce941cc0 ESP: ce941cb4
      > [  105.329431]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      > [  105.329525] Process install (pid: 3710, ti=ce940000 task=ced36f20 task.ti=ce940000)
      > [  105.336600] Stack:
      > [  105.336693]  ce941cd0 ce9dc000 00000000 ce941cf8 c12ecd02 c12f43e0 c116c00b cf754158
      > [  105.336982]  ce9dc004 cf754284 ce9dc004 cf7ffee8 ceff9978 ce9dc000 cf7ffee8 ce9dc000
      > [  105.337182]  ce9dc000 ce941d14 c12e698d cf75412c ce941d98 cf7ffee8 cf7fff20 00000000
      > [  105.337405] Call Trace:
      > [  105.337695]  [<c12ecd02>] rpcauth_wrap_req+0x75/0x7f
      > [  105.337806]  [<c12f43e0>] ? xdr_encode_opaque+0x12/0x15
      > [  105.337898]  [<c116c00b>] ? nfs3_xdr_enc_setacl3args+0x0/0x98
      > [  105.337988]  [<c12e698d>] call_transmit+0x17e/0x1e8
      > [  105.338072]  [<c12ec307>] __rpc_execute+0x6d/0x1a6
      > [  105.338155]  [<c12ec474>] rpc_execute+0x34/0x37
      > [  105.338235]  [<c12e738d>] rpc_run_task+0xb5/0xbd
      > [  105.338316]  [<c12e7474>] rpc_call_sync+0x3d/0x58
      > [  105.338402]  [<c116d0c6>] nfs3_proc_setacls+0x18e/0x24f
      > [  105.338493]  [<c10b3f76>] ? __kmalloc+0x148/0x1c4
      > [  105.338579]  [<c10ecd01>] ? posix_acl_alloc+0x12/0x22
      > [  105.338665]  [<c116d5c8>] nfs3_proc_setacl+0xa0/0xca
      > [  105.338748]  [<c116d69c>] nfs3_setxattr+0x62/0x88
      > [  105.338834]  [<c1317042>] ? sub_preempt_count+0x7c/0x89
      > [  105.338926]  [<c116d63a>] ? nfs3_setxattr+0x0/0x88
      > [  105.339026]  [<c10cfa79>] __vfs_setxattr_noperm+0x26/0x95
      > [  105.339114]  [<c10cfb43>] vfs_setxattr+0x5b/0x76
      > [  105.339211]  [<c10cfbfb>] setxattr+0x9d/0xc3
      > [  105.339298]  [<c10a2ea8>] ? handle_pte_fault+0x258/0x5cb
      > [  105.339428]  [<c1091ff6>] ? __free_pages+0x1a/0x23
      > [  105.339517]  [<c10498ea>] ? up_read+0x16/0x2c
      > [  105.339599]  [<c10b8365>] ? fget+0x0/0xa3
      > [  105.339677]  [<c10b8365>] ? fget+0x0/0xa3
      > [  105.339760]  [<c1025d23>] ? get_parent_ip+0xb/0x31
      > [  105.339843]  [<c1317042>] ? sub_preempt_count+0x7c/0x89
      > [  105.339931]  [<c10cfc72>] sys_fsetxattr+0x51/0x79
      > [  105.340014]  [<c1002853>] sysenter_do_call+0x12/0x32
      > [  105.340133] Code: 2e 76 18 00 58 31 d2 8b 7f 28 f6 43 04 01 74 03 8b 53 08 6a 00 8b 46 04 6a 01 8b 0b 52 89 fa e8 85 10 f8 ff 83 c4 0c 85 c0 79 04 <0f> 0b eb fe 31 c9 f6 43 04 04 74 03 8b 4b 0c 68 00 10 00 00 8d
      > [  105.350321] EIP: [<c116c06c>] nfs3_xdr_enc_setacl3args+0x61/0x98 SS:ESP 0068:ce941cb4
      > [  105.364385] ---[ end trace 01fcfe7f0f7f6e4a ]---
      
      nfs3_xdr_enc_setacl3args() is not properly setting up the target
      buffer before nfsacl_encode() attempts to encode the ACL.
      
      Introduced by commit d9c407b1 "NFS: Introduce new-style XDR encoding
      functions for NFSv3."
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      ee5dc773
    • C
      NFS: Fix "kernel BUG at fs/aio.c:554!" · 839f7ad6
      Chuck Lever 提交于
      Nick Piggin reports:
      
      > I'm getting use after frees in aio code in NFS
      >
      > [ 2703.396766] Call Trace:
      > [ 2703.396858]  [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80
      > [ 2703.396959]  [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40
      > [ 2703.397058]  [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140
      > [ 2703.397159]  [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0
      > [ 2703.397260]  [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
      > [ 2703.397361]  [<ffffffff81039701>] ? get_parent_ip+0x11/0x50
      > [ 2703.397464]  [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80
      > [ 2703.397564]  [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
      > [ 2703.397662]  [<ffffffff811627db>] aio_put_req+0x2b/0x60
      > [ 2703.397761]  [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0
      > [ 2703.397895]  [<ffffffff81164d0b>] sys_io_submit+0xb/0x10
      > [ 2703.397995]  [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b
      >
      > Adding some tracing, it is due to nfs completing the request then
      > returning something other than -EIOCBQUEUED, so aio.c
      > also completes the request.
      
      To address this, prevent the NFS direct I/O engine from completing
      async iocbs when the forward path returns an error without starting
      any I/O.
      
      This fix appears to survive ^C during both "xfstest no. 208" and "fsx
      -Z."
      
      It's likely this bug has existed for a very long while, as we are seeing
      very similar symptoms in OEL 5.  Copying stable.
      
      Cc: Stable <stable@kernel.org>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      839f7ad6
    • J
      NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds(). · ad3d2eed
      Jesper Juhl 提交于
      On Mon, 17 Jan 2011, Mi Jinlong wrote:
      
      >
      >
      > Jesper Juhl:
      > > strrchr() can return NULL if nothing is found. If this happens we'll
      > > dereference a NULL pointer in
      > > fs/nfs/nfs4filelayoutdev.c::decode_and_add_ds().
      > >
      > > I tried to find some other code that guarantees that this can never
      > > happen but I was unsuccessful. So, unless someone else can point to some
      > > code that ensures this can never be a problem, I believe this patch is
      > > needed.
      > >
      > > While I was changing this code I also noticed that all the dprintk()
      > > statements, except one, start with "%s:". The one missing the ":" I added
      > > it to.
      >
      >   Maybe another one also should be changed at decode_and_add_ds() at line 243:
      >
      >    243  printk("%s Decoded address and port %s\n", __func__, buf);
      >
      Missed that one. Thanks.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      ad3d2eed
    • P
      CIFS: Add strictcache mount option · d39454ff
      Pavel Shilovsky 提交于
      Use for switching on strict cache mode. In this mode the
      client reads from the cache all the time it has Oplock Level II,
      otherwise - read from the server. As for write - the client stores
      a data in the cache in Exclusive Oplock case, otherwise - write
      directly to the server.
      Signed-off-by: NPavel Shilovsky <piastryyy@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      d39454ff
    • P
      CIFS: Implement cifs_strict_writev (try #4) · 72432ffc
      Pavel Shilovsky 提交于
      If we don't have Exclusive oplock we write a data to the server.
      Also set invalidate_mapping flag on the inode if we wrote something
      to the server. Add cifs_iovec_write to let the client write iovec
      buffers through CIFSSMBWrite2.
      Signed-off-by: NPavel Shilovsky <piastryyy@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      72432ffc
    • S
      [CIFS] Replace cifs md5 hashing functions with kernel crypto APIs · 93c100c0
      Steve French 提交于
      Replace remaining use of md5 hash functions local to cifs module
      with kernel crypto APIs.
      Remove header and source file containing those local functions.
      Signed-off-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      93c100c0
    • S
      ceph: avoid picking MDS that is not active · d66bbd44
      Sage Weil 提交于
      Ignore replication or auth frag data if it indicates an MDS that is not
      active.  This can happen if the MDS shuts down and the client has stale
      data about the namespace distribution across the MDS cluster.  If that's
      the case, fall back to directing the request based on the auth cap (which
      should always be accurate).
      Signed-off-by: NSage Weil <sage@newdream.net>
      d66bbd44
  6. 24 1月, 2011 2 次提交
    • R
      Make CIFS mount work in a container. · f1d0c998
      Rob Landley 提交于
      Teach cifs about network namespaces, so mounting uses adresses/routing
      visible from the container rather than from init context.
      
      A container is a chroot on steroids that changes more than just the root
      filesystem the new processes see.  One thing containers can isolate is
      "network namespaces", meaning each container can have its own set of
      ethernet interfaces, each with its own own IP address and routing to the
      outside world.  And if you open a socket in _userspace_ from processes
      within such a container, this works fine.
      
      But sockets opened from within the kernel still use a single global
      networking context in a lot of places, meaning the new socket's address
      and routing are correct for PID 1 on the host, but are _not_ what
      userspace processes in the container get to use.
      
      So when you mount a network filesystem from within in a container, the
      mount code in the CIFS driver uses the host's networking context and not
      the container's networking context, so it gets the wrong address, uses
      the wrong routing, and may even try to go out an interface that the
      container can't even access...  Bad stuff.
      
      This patch copies the mount process's network context into the CIFS
      structure that stores the rest of the server information for that mount
      point, and changes the socket open code to use the saved network context
      instead of the global network context.  I.E. "when you attempt to use
      these addresses, do so relative to THIS set of network interfaces and
      routing rules, not the old global context from back before we supported
      containers".
      
      The big long HOWTO sets up a test environment on the assumption you've
      never used ocntainers before.  It basically says:
      
      1) configure and build a new kernel that has container support
      2) build a new root filesystem that includes the userspace container
      control package (LXC)
      3) package/run them under KVM (so you don't have to mess up your host
      system in order to play with containers).
      4) set up some containers under the KVM system
      5) set up contradictory routing in the KVM system and the container so
      that the host and the container see different things for the same address
      6) try to mount a CIFS share from both contexts so you can both force it
      to work and force it to fail.
      
      For a long drawn out test reproduction sequence, see:
      
        http://landley.livejournal.com/47024.html
        http://landley.livejournal.com/47205.html
        http://landley.livejournal.com/47476.htmlSigned-off-by: NRob Landley <rlandley@parallels.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      f1d0c998
    • J
      CIFS: Remove pointless variable assignment in cifs_dfs_do_automount() · 3f391c79
      Jesper Juhl 提交于
      In fs/cifs/cifs_dfs_ref.c::cifs_dfs_do_automount() we have this code:
      
      	...
      	mnt = ERR_PTR(-EINVAL);
      	if (IS_ERR(tlink)) {
      		mnt = ERR_CAST(tlink);
      		goto free_full_path;
      	}
      	ses = tlink_tcon(tlink)->ses;
      
      	rc = get_dfs_path(xid, ses, full_path + 1, cifs_sb->local_nls,
      		&num_referrals, &referrals,
      		cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MAP_SPECIAL_CHR);
      
      	cifs_put_tlink(tlink);
      
      	mnt = ERR_PTR(-ENOENT);
      	...
      
      The assignment of 'mnt = ERR_PTR(-EINVAL);' is completely pointless. If we
      take the 'if (IS_ERR(tlink))' branch we'll set 'mnt' again and we'll also
      do so if we do not take the branch. There is no way we'll ever use 'mnt'
      with the assigned 'ERR_PTR(-EINVAL)' value, so we may as well just remove
      the pointless assignment.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      3f391c79
  7. 23 1月, 2011 1 次提交
  8. 22 1月, 2011 1 次提交
    • R
      nilfs2: fix crash after one superblock became unavailable · 0ca7a5b9
      Ryusuke Konishi 提交于
      Fixes the following kernel oops in nilfs_setup_super() which could
      arise if one of two super-blocks is unavailable.
      
      > BUG: unable to handle kernel NULL pointer dereference at   (null)
      > Pid: 3529, comm: mount.nilfs2 Not tainted 2.6.37 #1 /
      > EIP: 0060:[<c03196bc>] EFLAGS: 00010202 CPU: 3
      > EIP is at memcpy+0xc/0x1b
      > Call Trace:
      >  [<f953720e>] ? nilfs_setup_super+0x6c/0xa5 [nilfs2]
      >  [<f95369e9>] ? nilfs_get_root_dentry+0x81/0xcb [nilfs2]
      >  [<f9537a08>] ? nilfs_mount+0x4f9/0x62c [nilfs2]
      >  [<c02745cf>] ? kstrdup+0x36/0x3f
      >  [<f953750f>] ? nilfs_mount+0x0/0x62c [nilfs2]
      >  [<c0293940>] ? vfs_kern_mount+0x4d/0x12c
      >  [<c02a5100>] ? get_fs_type+0x76/0x8f
      >  [<c0293a68>] ? do_kern_mount+0x33/0xbf
      >  [<c02a784a>] ? do_mount+0x2ed/0x714
      >  [<c02a6171>] ? copy_mount_options+0x28/0xfc
      >  [<c02a7ce3>] ? sys_mount+0x72/0xaf
      >  [<c0473085>] ? syscall_call+0x7/0xb
      Reported-by: NWakko Warner <wakko@animx.eu.org>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: NWakko Warner <wakko@animx.eu.org>
      Cc: stable <stable@kernel.org> [2.6.37, 2.6.36]
      LKML-Reference: <20110121024918.GA29598@animx.eu.org>
      0ca7a5b9
  9. 21 1月, 2011 5 次提交