1. 25 6月, 2006 5 次提交
    • C
      NFS: Eliminate nfs_get_user_pages() · 06cf6f2e
      Chuck Lever 提交于
      Neil Brown observed that the kmalloc() in nfs_get_user_pages() is more
      likely to fail if the I/O is large enough to require the allocation of more
      than a single page to keep track of all the pinned pages in the user's
      buffer.
      
      Instead of tracking one large page array per dreq/iocb, track pages per
      nfs_read/write_data, just like the cached I/O path does.  An array for
      pages is already allocated for us by nfs_readdata_alloc() (and the write
      and commit equivalents).
      
      This is also required for adding support for vectored I/O to the NFS direct
      I/O path.
      
      The original reason to pin the user buffer and allocate all the NFS data
      structures before trying to schedule I/O was to ensure all needed resources
      are allocated on the client before starting to send requests.  This reduces
      the chance that resource exhaustion on the client will cause a short read
      or write.
      
      On the other hand, for an application making very large application I/O
      requests, this means that it will be nearly impossible for the application
      to make forward progress on a resource-limited client.
      
      Thus, moving the buffer pinning functionality into the I/O scheduling
      loops should be good for scalability.  The next patch will do the same for
      NFS data structure allocation.
      Signed-off-by: NChuck Lever <cel@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      06cf6f2e
    • C
      NFS: refactor nfs_direct_free_user_pages · 9c93ab7d
      Chuck Lever 提交于
      Clean-up and fix a minor bug: the logic was dirtying page cache pages on
      both read and write operations.
      Signed-off-by: NChuck Lever <cel@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      9c93ab7d
    • C
      NFS: remove user_addr, user_count, and pos from nfs_direct_req · 51a7bc6c
      Chuck Lever 提交于
      Make the user_addr, user_count, and pos parameters explicit to the
      scheduler routines, and remove the fields from nfs_direct_req.  The
      iovec API will be passing in a series of these, not just one set.
      Signed-off-by: NChuck Lever <cel@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      51a7bc6c
    • C
      NFS: "open code" the NFS direct write rescheduler · fedb595c
      Chuck Lever 提交于
      An NFSv3/v4 client must reschedule on-the-wire writes if the writes are
      UNSTABLE, and the server reboots before the client can complete a
      subsequent COMMIT request.
      
      To support direct asynchronous scatter-gather writes, the write
      rescheduler in fs/nfs/direct.c must not depend on the I/O parameters
      in the controlling nfs_direct_req structure.  iovecs can be somewhat
      arbitrarily complex, so there could be an unbounded amount of information
      to save for a rarely encountered requirement.
      
      Refactor the direct write rescheduler so it uses information from each
      nfs_write_data structure to reschedule writes, instead of caching that
      information in the controlling nfs_direct_req structure.
      Signed-off-by: NChuck Lever <cel@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      fedb595c
    • C
      NFS: Separate functions for counting outstanding NFS direct I/Os · b1c5921c
      Chuck Lever 提交于
      Factor out the logic that increments and decrements the outstanding I/O
      count.  This will be a commonly used bit of code in upcoming patches.
      Also make this an atomic_t again, since it will be very often manipulated
      outside dreq->spin lock.
      Signed-off-by: NChuck Lever <cel@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      b1c5921c
  2. 09 6月, 2006 1 次提交
    • D
      NFS: Split fs/nfs/inode.c · f7b422b1
      David Howells 提交于
      As fs/nfs/inode.c is rather large, heterogenous and unwieldy, the attached
      patch splits it up into a number of files:
      
       (*) fs/nfs/inode.c
      
           Strictly inode specific functions.
      
       (*) fs/nfs/super.c
      
           Superblock management functions for NFS and NFS4, normal access, clones
           and referrals.  The NFS4 superblock functions _could_ move out into a
           separate conditionally compiled file, but it's probably not worth it as
           there're so many common bits.
      
       (*) fs/nfs/namespace.c
      
           Some namespace-specific functions have been moved here.
      
       (*) fs/nfs/nfs4namespace.c
      
           NFS4-specific namespace functions (this could be merged into the previous
           file).  This file is conditionally compiled.
      
       (*) fs/nfs/internal.h
      
           Inter-file declarations, plus a few simple utility functions moved from
           fs/nfs/inode.c.
      
           Additionally, all the in-.c-file externs have been moved here, and those
           files they were moved from now includes this file.
      
      For the most part, the functions have not been changed, only some multiplexor
      functions have changed significantly.
      
      I've also:
      
       (*) Added some extra banner comments above some functions.
      
       (*) Rearranged the function order within the files to be more logical and
           better grouped (IMO), though someone may prefer a different order.
      
       (*) Reduced the number of #ifdefs in .c files.
      
       (*) Added missing __init and __exit directives.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      f7b422b1
  3. 20 4月, 2006 1 次提交
  4. 24 3月, 2006 2 次提交
    • P
      [PATCH] cpuset memory spread: slab cache format · fffb60f9
      Paul Jackson 提交于
      Rewrap the overly long source code lines resulting from the previous
      patch's addition of the slab cache flag SLAB_MEM_SPREAD.  This patch
      contains only formatting changes, and no function change.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fffb60f9
    • P
      [PATCH] cpuset memory spread: slab cache filesystems · 4b6a9316
      Paul Jackson 提交于
      Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD
      memory spreading.
      
      If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's
      in a cpuset with the 'memory_spread_slab' option enabled goes to allocate
      from such a slab cache, the allocations are spread evenly over all the
      memory nodes (task->mems_allowed) allowed to that task, instead of favoring
      allocation on the node local to the current cpu.
      
      The following inode and similar caches are marked SLAB_MEM_SPREAD:
      
          file                               cache
          ====                               =====
          fs/adfs/super.c                    adfs_inode_cache
          fs/affs/super.c                    affs_inode_cache
          fs/befs/linuxvfs.c                 befs_inode_cache
          fs/bfs/inode.c                     bfs_inode_cache
          fs/block_dev.c                     bdev_cache
          fs/cifs/cifsfs.c                   cifs_inode_cache
          fs/coda/inode.c                    coda_inode_cache
          fs/dquot.c                         dquot
          fs/efs/super.c                     efs_inode_cache
          fs/ext2/super.c                    ext2_inode_cache
          fs/ext2/xattr.c (fs/mbcache.c)     ext2_xattr
          fs/ext3/super.c                    ext3_inode_cache
          fs/ext3/xattr.c (fs/mbcache.c)     ext3_xattr
          fs/fat/cache.c                     fat_cache
          fs/fat/inode.c                     fat_inode_cache
          fs/freevxfs/vxfs_super.c           vxfs_inode
          fs/hpfs/super.c                    hpfs_inode_cache
          fs/isofs/inode.c                   isofs_inode_cache
          fs/jffs/inode-v23.c                jffs_fm
          fs/jffs2/super.c                   jffs2_i
          fs/jfs/super.c                     jfs_ip
          fs/minix/inode.c                   minix_inode_cache
          fs/ncpfs/inode.c                   ncp_inode_cache
          fs/nfs/direct.c                    nfs_direct_cache
          fs/nfs/inode.c                     nfs_inode_cache
          fs/ntfs/super.c                    ntfs_big_inode_cache_name
          fs/ntfs/super.c                    ntfs_inode_cache
          fs/ocfs2/dlm/dlmfs.c               dlmfs_inode_cache
          fs/ocfs2/super.c                   ocfs2_inode_cache
          fs/proc/inode.c                    proc_inode_cache
          fs/qnx4/inode.c                    qnx4_inode_cache
          fs/reiserfs/super.c                reiser_inode_cache
          fs/romfs/inode.c                   romfs_inode_cache
          fs/smbfs/inode.c                   smb_inode_cache
          fs/sysv/inode.c                    sysv_inode_cache
          fs/udf/super.c                     udf_inode_cache
          fs/ufs/super.c                     ufs_inode_cache
          net/socket.c                       sock_inode_cache
          net/sunrpc/rpc_pipe.c              rpc_inode_cache
      
      The choice of which slab caches to so mark was quite simple.  I marked
      those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache,
      inode_cache, and buffer_head, which were marked in a previous patch.  Even
      though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same
      potentially large file system i/o related slab caches as we need for memory
      spreading.
      
      Given that the rule now becomes "wherever you would have used a
      SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use
      the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain.
      Future file system writers will just copy one of the existing file system
      slab cache setups and tend to get it right without thinking.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4b6a9316
  5. 21 3月, 2006 24 次提交
  6. 14 3月, 2006 1 次提交
    • T
      [PATCH] NFS: Fix a potential panic in O_DIRECT · 143f412e
      Trond Myklebust 提交于
      Based on an original patch by Mike O'Connor and Greg Banks of SGI.
      
      Mike states:
      
      A normal user can panic an NFS client and cause a local DoS with
      'judicious'(?) use of O_DIRECT.  Any O_DIRECT write to an NFS file where the
      user buffer starts with a valid mapped page and contains an unmapped page,
      will crash in this way.  I haven't followed the code, but O_DIRECT reads with
      similar user buffers will probably also crash albeit in different ways.
      
      Details: when nfs_get_user_pages() calls get_user_pages(), it detects and
      correctly handles get_user_pages() returning an error, which happens if the
      first page covered by the user buffer's address range is unmapped.  However,
      if the first page is mapped but some subsequent page isn't, get_user_pages()
      will return a positive number which is less than the number of pages requested
      (this behaviour is sort of analagous to a short write() call and appears to be
      intentional).  nfs_get_user_pages() doesn't detect this and hands off the
      array of pages (whose last few elements are random rubbish from the newly
      allocated array memory) to it's caller, whence they go to
      nfs_direct_write_seg(), which then totally ignores the nr_pages it's given,
      and calculates its own idea of how many pages are in the array from the user
      buffer length.  Needless to say, when it comes to transmit those uninitialised
      page* pointers, we see a crash in the network stack.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      143f412e
  7. 02 2月, 2006 1 次提交
  8. 07 1月, 2006 5 次提交