1. 18 12月, 2007 11 次提交
  2. 13 12月, 2007 2 次提交
    • T
      NFS: Fix an Oops in NFS unmount · a10db50a
      Trond Myklebust 提交于
      Ensure that the dummy 'root dentry' is invisible to d_find_alias(). If not,
      then it may be spliced into the tree if a parent directory from the same
      filesystem gets mounted at a later time.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      a10db50a
    • T
      Revert "NFS: Ensure we return zero if applications attempt to write zero bytes" · a5576cfa
      Trond Myklebust 提交于
      This reverts commit b9148c6b.
      
      On Wed, 12 Dec 2007 10:57:30 -0500, Chuck Lever wrote
      > commit b9148c6b should be reverted.  It was recently forward-ported
      > from some years-old patches, and is clearly not needed now.
      >
      > On Dec 11, 2007, at 5:21 PM, Adrian Bunk wrote:
      >
      >> This code became dead after commit
      >> b9148c6b
      >> (which BTW doesn't seem to have changed any behaviour) and can
      >> therefore
      >> be removed.
      >>
      >> Spotted by the Coverity checker.
      >>
      >> Signed-off-by: Adrian Bunk <bunk@kernel.org>
      >>
      >> ---
      >> --- linux-2.6/fs/nfs/direct.c.old     2007-12-02 21:54:53.000000000 +0100
      >> +++ linux-2.6/fs/nfs/direct.c 2007-12-02 21:55:10.000000000 +0100
      >> @@ -897,15 +897,12 @@ ssize_t nfs_file_direct_write(struct kio
      >>       if (!count)
      >>               goto out;       /* return 0 */
      >>
      >>       retval = -EINVAL;
      >>       if ((ssize_t) count < 0)
      >>               goto out;
      >> -     retval = 0;
      >> -     if (!count)
      >> -             goto out;
      >>
      >>       retval = nfs_sync_mapping(mapping);
      >>       if (retval)
      >>               goto out;
      >>
      >>       retval = nfs_direct_write(iocb, iov, nr_segs, pos, count);
      >>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      a5576cfa
  3. 12 12月, 2007 2 次提交
    • T
      NFSv2/v3: Fix a memory leak when using -onolock · 5cef338b
      Trond Myklebust 提交于
      Neil Brown said:
      > Hi Trond,
      > 
      > We found that a machine which made moderately heavy use of
      > 'automount' was leaking some nfs data structures - particularly the
      > 4K allocated by rpc_alloc_iostats.
      > It turns out that this only happens with filesystems with -onolock
      > set.
      
      > The problem is that if NFS_MOUNT_NONLM is set, nfs_start_lockd doesn't
      > set server->destroy, so when the filesystem is unmounted, the
      > ->client_acl is not shutdown, and so several resources are still
      > held.  Multiple mount/umount cycles will slowly eat away memory
      > several pages at a time.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      5cef338b
    • T
      NFS: Fix NFS mountpoint crossing... · 4584f520
      Trond Myklebust 提交于
      The check that was added to nfs_xdev_get_sb() to work around broken
      servers, works fine for NFSv2, but causes mountpoint crossing on NFSv3 to
      always return ESTALE.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      4584f520
  4. 11 12月, 2007 1 次提交
    • E
      proc: remove/Fix proc generic d_revalidate · 3790ee4b
      Eric W. Biederman 提交于
      Ultimately to implement /proc perfectly we need an implementation of
      d_revalidate because files and directories can be removed behind the back
      of the VFS, and d_revalidate is the only way we can let the VFS know that
      this has happened.
      
      Unfortunately the linux VFS can not cope with anything in the path to a
      mount point going away.  So a proper d_revalidate method that calls d_drop
      also needs to call have_submounts which is moderately expensive, so you
      really don't want a d_revalidate method that unconditionally calls it, but
      instead only calls it when the backing object has really gone away.
      
      proc generic entries only disappear on module_unload (when not counting the
      fledgling network namespace) so it is quite rare that we actually encounter
      that case and has not actually caused us real world trouble yet.
      
      So until we get a proper test for keeping dentries in the dcache fix the
      current d_revalidate method by completely removing it.  This returns us to
      the current status quo.
      
      So with CONFIG_NETNS=n things should look as they have always looked.
      
      For CONFIG_NETNS=y things work most of the time but there are a few rare
      corner cases that don't behave properly.  As the network namespace is
      barely present in 2.6.24 this should not be a problem.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "Denis V. Lunev" <den@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3790ee4b
  5. 10 12月, 2007 8 次提交
    • D
      [XFS] Fix xfs_ichgtime()s broken usage of I_SYNC · cf10e82b
      David Chinner 提交于
      The recent I_LOCK->I_SYNC changes mistakenly changed xfs_ichgtime to look
      at I_SYNC instead of I_LOCK. This was incorrect and prevents newly created
      inodes from moving to the dirty list. Change this to the correct check
      which is for I_NEW, not I_LOCK or I_SYNC so that behaviour is correct.
      
      SGI-PV: 974225
      SGI-Modid: xfs-linux-melb:xfs-kern:30204a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      cf10e82b
    • R
      [XFS] Make xfsbufd threads freezable · 978c7b2f
      Rafael J. Wysocki 提交于
      Fix breakage caused by commit 83144186
      that did not introduce the necessary call to set_freezable() in
      xfs/linux-2.6/xfs_buf.c .
      
      SGI-PV: 974224
      SGI-Modid: xfs-linux-melb:xfs-kern:30203a
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      978c7b2f
    • C
      [XFS] revert to double-buffering readdir · e89bc612
      Christoph Hellwig 提交于
      The current readdir implementation deadlocks on a btree buffers locks
      because nfsd calls back into ->lookup from the filldir callback. The only
      short-term fix for this is to revert to the old inefficient
      double-buffering scheme.
      
      SGI-PV: 973377
      SGI-Modid: xfs-linux-melb:xfs-kern:30201a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      e89bc612
    • D
      [XFS] Fix broken inode cluster setup. · a7430847
      David Chinner 提交于
      The radix tree based inode caches did away with the inode cluster hashes,
      replacing them with a bunch of masking and gang lookups on the radix tree.
      
      This masking got broken when moving the code to per-ag radix trees and
      indexing by agino # rather than straight inode number. The result is
      clustered inode writeback does not cluster and things can go extremely
      slowly when there are lots of inodes to write.
      
      Fix it up by comparing the agino # of the inode we just looked up to the
      index of the cluster we are looking for.
      Tested-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
      
      SGI-PV: 972915
      SGI-Modid: xfs-linux-melb:xfs-kern:30033a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      a7430847
    • L
      [XFS] Clear XBF_READ_AHEAD flag on I/O completion. · 77be55a5
      Lachlan McIlroy 提交于
      SGI-PV: 972554
      SGI-Modid: xfs-linux-melb:xfs-kern:30128a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      77be55a5
    • L
      [XFS] Fixed a few bugs in xfs_buf_associate_memory() · d1afb678
      Lachlan McIlroy 提交于
      - calculation of 'page_count' was incorrect as it did not
        consider the offset of 'mem' into the first page. The
        logic to bump 'page_count' didn't work if 'len' was <=
        PAGE_CACHE_SIZE (ie offset = 3k, len = 2k).
      - setting b_buffer_length to 'len' is incorrect if 'offset'
        is > 0. Set it to the total length of the buffer.
      - I suspect that passing a non-aligned address into
        mem_to_page() for the first page may have been causing
        issues - don't know but just tidy up that code anyway.
      
      SGI-PV: 971596
      SGI-Modid: xfs-linux-melb:xfs-kern:30143a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      d1afb678
    • L
      [XFS] 971064 Various fixups for xfs_bulkstat(). · cd57e594
      Lachlan McIlroy 提交于
      - sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]()
      - remove the special case for XFS_IOC_FSBULKSTAT with count == 1. This
        special case causes bulkstat to fail because the special case uses
        xfs_bulkstat_single() instead of xfs_bulkstat() and the two functions
        have different semantics.  xfs_bulkstat() will return the next inode
        after the one supplied while skipping internal inodes (ie quota inodes).
        xfs_bulkstate_single() will only lookup the inode supplied and return
        an error if it is an internal inode.
      - in xfs_bulkstat(), need to initialise 'lastino' to the inode supplied
        so in cases were we return without examining any inodes the scan wont
        restart back at zero.
      - sanity check for valid *ubcountp values. Cannot sanity check for valid
        ubuffer here because some users of xfs_bulkstat() don't supply a buffer.
      - checks against 'ubleft' (the space left in the user's buffer) should be
        against 'statstruct_size' which is the supplied minimum object size.
        The mixture of checks against statstruct_size and 0 was one of the
        reasons we were skipping inodes.
      - if the formatter function returns BULKSTAT_RV_NOTHING and an error and
        the error is not ENOENT or EINVAL then we need to abort the scan. ENOENT
        is for inodes that are no longer valid and we just skip them. EINVAL is
        returned if we try to lookup an internal inode so we skip them too. For
        a DMF scan if the inode and DMF attribute cannot fit into the space left
        in the user's buffer it would return ERANGE. We didn't handle this error
        and skipped the inode. We would continue to skip inodes until one fitted
        into the user's buffer or we completed the scan.
      - put back the recalculation of agino (that got removed with the last fix)
        at the end of the while loop. This is because the code at the start of
        the loop expects agino to be the last inode examined if it is non-zero.
      - if we found some inodes but then encountered an error, return success
        this time and the error next time. If the formatter aborted with ENOMEM
        we will now return this error but only if we couldn't read any inodes.
        Previously if we encountered ENOMEM without reading any inodes we
        returned a zero count and no error which falsely indicated the scan was
        complete.
      
      SGI-PV: 973431
      SGI-Modid: xfs-linux-melb:xfs-kern:30089a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      cd57e594
    • D
      [XFS] Fix dbflush panic in xfs_qm_sync. · d757762b
      Donald Douwsma 提交于
      The recent behaviour layer removal dropped the check for quotas that have
      been requested at mount time but have subsequently been turned off. This
      results in a panic when accessing m_quotainfo which has been freed.
      
      This patch adds the check originally made by xfs_qm_syncall() to
      xfs_qm_sync().
      
      SGI-PV: 969769
      SGI-Modid: xfs-linux-melb:xfs-kern:29908a
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      d757762b
  6. 06 12月, 2007 7 次提交
    • A
      remove nonsense force-casts from ocfs2 · 97bd7919
      Al Viro 提交于
      endianness annotations in networking code had been in place for quite a
      while; in particular, sin_port and s_addr are annotated as big-endian.
      
      Code in ocfs2 had __force casts added apparently to shut the sparse
      warnings up; of course, these days they only serve to *produce* warnings
      for no reason whatsoever...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97bd7919
    • A
      regression: bfs endianness bug · 7e46aa5c
      Al Viro 提交于
      BFS_FILEBLOCKS() expects struct bfs_inode * (on-disk data, with little-
      endian fields), not struct bfs_inode_info * (in-core stuff, with host-
      endian ones).
      
      It's a macro and fields with the right names are present in
      bfs_inode_info, so it compiles, but on big-endian host it gives bogus
      results.
      
      Introduced in commit f433dc56 ("Fixes to
      the BFS filesystem driver").
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e46aa5c
    • A
      regression: cifs endianness bug · 9b5e6857
      Al Viro 提交于
      access_flags_to_mode() gets on-the-wire data (little-endian) and treats
      it as host-endian.
      
      Introduced in commit e01b6400 ("[CIFS]
      enable get mode from ACL when cifsacl mount option specified")
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b5e6857
    • A
      proc: fix proc_dir_entry refcounting · 5a622f2d
      Alexey Dobriyan 提交于
      Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
      Switch to usual scheme:
      * PDE is created with refcount 1
      * every de_get does +1
      * every de_put() and remove_proc_entry() do -1
      * once refcount reaches 0, PDE is freed.
      
      This elegantly fixes at least two following races (both observed) without
      introducing new locks, without abusing old locks, without spreading
      lock_kernel():
      
      1) PDE leak
      
      remove_proc_entry			de_put
      -----------------			------
      			[refcnt = 1]
      if (atomic_read(&de->count) == 0)
      					if (atomic_dec_and_test(&de->count))
      						if (de->deleted)
      							/* also not taken! */
      							free_proc_entry(de);
      else
      	de->deleted = 1;
      		[refcount=0, deleted=1]
      
      2) use after free
      
      remove_proc_entry			de_put
      -----------------			------
      			[refcnt = 1]
      
      					if (atomic_dec_and_test(&de->count))
      if (atomic_read(&de->count) == 0)
      	free_proc_entry(de);
      						/* boom! */
      						if (de->deleted)
      							free_proc_entry(de);
      
      BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
      printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
      Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c086340 #4)
      EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1
      EIP is at strnlen+0x6/0x18
      EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
      ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
      Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
             c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
             f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
      Call Trace:
       [<c10ac4f0>] vsnprintf+0x2ad/0x49b
       [<c10ac779>] vscnprintf+0x14/0x1f
       [<c1018e6b>] vprintk+0xc5/0x2f9
       [<c10379f1>] handle_fasteoi_irq+0x0/0xab
       [<c1004f44>] do_IRQ+0x9f/0xb7
       [<c117db3b>] preempt_schedule_irq+0x3f/0x5b
       [<c100264e>] need_resched+0x1f/0x21
       [<c10190ba>] printk+0x1b/0x1f
       [<c107c8ad>] de_put+0x3d/0x50
       [<c107c8f8>] proc_delete_inode+0x38/0x41
       [<c107c8c0>] proc_delete_inode+0x0/0x41
       [<c1066298>] generic_delete_inode+0x5e/0xc6
       [<c1065aa9>] iput+0x60/0x62
       [<c1063c8e>] d_kill+0x2d/0x46
       [<c1063fa9>] dput+0xdc/0xe4
       [<c10571a1>] __fput+0xb0/0xcd
       [<c1054e49>] filp_close+0x48/0x4f
       [<c1055ee9>] sys_close+0x67/0xa5
       [<c10026b6>] sysenter_past_esp+0x5f/0x85
      =======================
      Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
      EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44
      
      Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
      module is already pinned and remove_proc_entry() can't happen => nobody
      can mark PDE deleted.
      
      Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
      never get it, it's just for proper /proc/net removal. I double checked
      CLONE_NETNS continues to work.
      
      Patch survives many hours of modprobe/rmmod/cat loops without new bugs
      which can be attributed to refcounting.
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a622f2d
    • J
      jbd: Fix assertion failure in fs/jbd/checkpoint.c · d4beaf4a
      Jan Kara 提交于
      Before we start committing a transaction, we call
      __journal_clean_checkpoint_list() to cleanup transaction's written-back
      buffers.
      
      If this call happens to remove all of them (and there were already some
      buffers), __journal_remove_checkpoint() will decide to free the transaction
      because it isn't (yet) a committing transaction and soon we fail some
      assertion - the transaction really isn't ready to be freed :).
      
      We change the check in __journal_remove_checkpoint() to free only a
      transaction in T_FINISHED state.  The locking there is subtle though (as
      everywhere in JBD ;().  We use j_list_lock to protect the check and a
      subsequent call to __journal_drop_transaction() and do the same in the end
      of journal_commit_transaction() which is the only place where a transaction
      can get to T_FINISHED state.
      
      Probably I'm too paranoid here and such locking is not really necessary -
      checkpoint lists are processed only from log_do_checkpoint() where a
      transaction must be already committed to be processed or from
      __journal_clean_checkpoint_list() where kjournald itself calls it and thus
      transaction cannot change state either.  Better be safe if something
      changes in future...
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d4beaf4a
    • E
      ufs: fix nexstep dir block size · 0c664f97
      Evgeniy Dushistov 提交于
      This patch fixes regression, introduced since 2.6.16.  NextStep variant of
      UFS as OpenStep uses directory block size equals to 1024.  Without this
      change, ufs_check_page fails in many cases.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NEvgeniy Dushistov <dushistov@mail.ru>
      Cc: Dave Bailey <dsbailey@pacbell.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c664f97
    • J
      aio: only account I/O wait time in read_events if there are active requests · e00ba3da
      Jeff Moyer 提交于
      On 2.6.24, top started showing 100% iowait on one CPU when a UML instance was
      running (but completely idle).  The UML code sits in io_getevents waiting for
      an event to be submitted and completed.
      
      Fix this by checking ctx->reqs_active before scheduling to determine whether
      or not we are waiting for I/O.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e00ba3da
  7. 04 12月, 2007 1 次提交
  8. 01 12月, 2007 1 次提交
    • E
      [NETNS]: Fix /proc/net breakage · 2b1e300a
      Eric W. Biederman 提交于
      Well I clearly goofed when I added the initial network namespace support
      for /proc/net.  Currently things work but there are odd details visible to
      user space, even when we have a single network namespace.
      
      Since we do not cache proc_dir_entry dentries at the moment we can just
      modify ->lookup to return a different directory inode depending on the
      network namespace of the process looking at /proc/net, replacing the
      current technique of using a magic and fragile follow_link method.
      
      To accomplish that this patch:
      - introduces a shadow_proc method to allow different dentries to
        be returned from proc_lookup.
      - Removes the old /proc/net follow_link magic
      - Fixes a weakness in our not caching of proc generic dentries.
      
      As shadow_proc uses a task struct to decided which dentry to return we can
      go back later and fix the proc generic caching without modifying any code
      that uses the shadow_proc method.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      2b1e300a
  9. 30 11月, 2007 7 次提交