1. 07 5月, 2007 6 次提交
    • M
      locks: add fl_grant callback for asynchronous lock return · 2beb6614
      Marc Eshel 提交于
      Acquiring a lock on a cluster filesystem may require communication with
      remote hosts, and to avoid blocking lockd or nfsd threads during such
      communication, we allow the results to be returned asynchronously.
      
      When a ->lock() call needs to block, the file system will return
      -EINPROGRESS, and then later return the results with a call to the
      routine in the fl_grant field of the lock_manager_operations struct.
      
      This differs from the case when ->lock returns -EAGAIN to a blocking
      lock request; in that case, the filesystem calls fl_notify when the lock
      is granted, and the caller retries the original lock.  So while
      fl_notify is merely a hint to the caller that it should retry, fl_grant
      actually communicates the final result of the lock operation (with the
      lock already acquired in the succesful case).
      
      Therefore fl_grant takes a lock, a status and, for the test lock case, a
      conflicting lock.  We also allow fl_grant to return an error to the
      filesystem, to handle the case where the fl_grant requests arrives after
      the lock manager has already given up waiting for it.
      Signed-off-by: NMarc Eshel <eshel@almaden.ibm.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      2beb6614
    • M
      locks: add lock cancel command · 9b9d2ab4
      Marc Eshel 提交于
      Lock managers need to be able to cancel pending lock requests.  In the case
      where the exported filesystem manages its own locks, it's not sufficient just
      to call posix_unblock_lock(); we need to let the filesystem know what's
      happening too.
      
      We do this by adding a new fcntl lock command: FL_CANCELLK.  Some day this
      might also be made available to userspace applications that could benefit from
      an asynchronous locking api.
      Signed-off-by: NMarc Eshel <eshel@almaden.ibm.com>
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      9b9d2ab4
    • M
      locks: allow {vfs,posix}_lock_file to return conflicting lock · 150b3934
      Marc Eshel 提交于
      The nfsv4 protocol's lock operation, in the case of a conflict, returns
      information about the conflicting lock.
      
      It's unclear how clients can use this, so for now we're not going so far as to
      add a filesystem method that can return a conflicting lock, but we may as well
      return something in the local case when it's easy to.
      Signed-off-by: NMarc Eshel <eshel@almaden.ibm.com>
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      150b3934
    • M
      locks: factor out generic/filesystem switch from setlock code · 7723ec97
      Marc Eshel 提交于
      Factor out the code that switches between generic and filesystem-specific lock
      methods; eventually we want to call this from lock managers (lockd and nfsd)
      too; currently they only call the generic methods.
      
      This patch does that for all the setlk code.
      Signed-off-by: NMarc Eshel <eshel@almaden.ibm.com>
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      7723ec97
    • J
      locks: factor out generic/filesystem switch from test_lock · 3ee17abd
      J. Bruce Fields 提交于
      Factor out the code that switches between generic and filesystem-specific lock
      methods; eventually we want to call this from lock managers (lockd and nfsd)
      too; currently they only call the generic methods.
      
      This patch does that for test_lock.
      
      Note that this hasn't been necessary until recently, because the few
      filesystems that define ->lock() (nfs, cifs...) aren't exportable via NFS.
      However GFS (and, in the future, other cluster filesystems) need to implement
      their own locking to get cluster-coherent locking, and also want to be able to
      export locking to NFS (lockd and NFSv4).
      
      So we accomplish this by factoring out code such as this and exporting it for
      the use of lockd and nfsd.
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      3ee17abd
    • M
      locks: give posix_test_lock same interface as ->lock · 9d6a8c5c
      Marc Eshel 提交于
      posix_test_lock() and ->lock() do the same job but have gratuitously
      different interfaces.  Modify posix_test_lock() so the two agree,
      simplifying some code in the process.
      Signed-off-by: NMarc Eshel <eshel@almaden.ibm.com>
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      9d6a8c5c
  2. 13 2月, 2007 3 次提交
  3. 12 2月, 2007 3 次提交
  4. 12 1月, 2007 2 次提交
    • D
      [PATCH] Revert bd_mount_mutex back to a semaphore · f73ca1b7
      David Chinner 提交于
      Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
      xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.
      
      (XFS unlocks the semaphore from a different task, by design.  The mutex
      code warns about this)
      Signed-off-by: NDave Chinner <dgc@sgi.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f73ca1b7
    • T
      [PATCH] NFS: Fix race in nfs_release_page() · e3db7691
      Trond Myklebust 提交于
          NFS: Fix race in nfs_release_page()
      
          invalidate_inode_pages2() may find the dirty bit has been set on a page
          owing to the fact that the page may still be mapped after it was locked.
          Only after the call to unmap_mapping_range() are we sure that the page
          can no longer be dirtied.
          In order to fix this, NFS has hooked the releasepage() method and tries
          to write the page out between the call to unmap_mapping_range() and the
          call to remove_mapping(). This, however leads to deadlocks in the page
          reclaim code, where the page may be locked without holding a reference
          to the inode or dentry.
      
          Fix is to add a new address_space_operation, launder_page(), which will
          attempt to write out a dirty page without releasing the page lock.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      
          Also, the bare SetPageDirty() can skew all sort of accounting leading to
          other nasties.
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e3db7691
  5. 14 12月, 2006 1 次提交
  6. 09 12月, 2006 2 次提交
  7. 08 12月, 2006 5 次提交
    • A
      [PATCH] Save some bytes in struct inode · 12d40e43
      Arnaldo Carvalho de Melo 提交于
      [acme@newtoy net-2.6.20]$ pahole --cacheline 64 fs/inode.o inode
      /* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/dcache.h:86 */
      struct inode {
              struct hlist_node          i_hash;               /*     0     8 */
              struct list_head           i_list;               /*     8     8 */
              struct list_head           i_sb_list;            /*    16     8 */
              struct list_head           i_dentry;             /*    24     8 */
              long unsigned int          i_ino;                /*    32     4 */
              atomic_t                   i_count;              /*    36     4 */
              umode_t                    i_mode;               /*    40     2 */
      
              /* XXX 2 bytes hole, try to pack */
      
              unsigned int               i_nlink;              /*    44     4 */
              uid_t                      i_uid;                /*    48     4 */
              gid_t                      i_gid;                /*    52     4 */
              dev_t                      i_rdev;               /*    56     4 */
              loff_t                     i_size;               /*    60     8 */
              struct timespec            i_atime;              /*    68     8 */
              struct timespec            i_mtime;              /*    76     8 */
              struct timespec            i_ctime;              /*    84     8 */
              unsigned int               i_blkbits;            /*    92     4 */
              long unsigned int          i_version;            /*    96     4 */
              blkcnt_t                   i_blocks;             /*   100     4 */
              short unsigned int         i_bytes;              /*   104     2 */
      
              /* XXX 2 bytes hole, try to pack */
      
              spinlock_t                 i_lock;               /*   108    40 */
              struct mutex               i_mutex;              /*   148    76 */
              struct rw_semaphore        i_alloc_sem;          /*   224    64 */
              struct inode_operations *  i_op;                 /*   288     4 */
              const struct file_operations  * i_fop;           /*   292     4 */
              struct super_block *       i_sb;                 /*   296     4 */
              struct file_lock *         i_flock;              /*   300     4 */
              struct address_space *     i_mapping;            /*   304     4 */
              struct address_space       i_data;               /*   308   188 */
              struct list_head           i_devices;            /*   496     8 */
              union                      ;                     /*   504     4 */
              int                        i_cindex;             /*   508     4 */
              __u32                      i_generation;         /*   512     4 */
              /* ---------- cacheline 8 boundary ---------- */
              long unsigned int          i_dnotify_mask;       /*   516     4 */
              struct dnotify_struct *    i_dnotify;            /*   520     4 */
              struct list_head           inotify_watches;      /*   524     8 */
              struct mutex               inotify_mutex;        /*   532    76 */
              long unsigned int          i_state;              /*   608     4 */
              long unsigned int          dirtied_when;         /*   612     4 */
              unsigned int               i_flags;              /*   616     4 */
              atomic_t                   i_writecount;         /*   620     4 */
              void *                     i_security;           /*   624     4 */
              void *                     i_private;            /*   628     4 */
      }; /* size: 632, sum members: 628, holes: 2, sum holes: 4 */
      
      [acme@newtoy net-2.6.20]$
      
      So just moving i_mode to after i_bytes we save 4 bytes by nuking both holes:
      
      [acme@newtoy net-2.6.20]$ codiff -V /tmp/inode.o.before fs/inode.o
      /pub/scm/linux/kernel/git/acme/net-2.6.20/fs/inode.c:
        struct inode |   -4
          i_mode;
           from: umode_t               /*    40(0)     2(0) */
           to:   umode_t               /*   102(0)     2(0) */
       1 struct changed
      [acme@newtoy net-2.6.20]$
      
      I've prunned all the other offset changes, only this one is of interest here.
      
      So now we have:
      
      [acme@newtoy net-2.6.20]$ pahole --cacheline 64 ../OUTPUT/qemu/net-2.6.20/fs/inode.o inode
      /* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/dcache.h:86 */
      struct inode {
              struct hlist_node          i_hash;               /*     0     8 */
              struct list_head           i_list;               /*     8     8 */
              struct list_head           i_sb_list;            /*    16     8 */
              struct list_head           i_dentry;             /*    24     8 */
              long unsigned int          i_ino;                /*    32     4 */
              atomic_t                   i_count;              /*    36     4 */
              unsigned int               i_nlink;              /*    40     4 */
              uid_t                      i_uid;                /*    44     4 */
              gid_t                      i_gid;                /*    48     4 */
              dev_t                      i_rdev;               /*    52     4 */
              loff_t                     i_size;               /*    56     8 */
              /* ---------- cacheline 1 boundary ---------- */
              struct timespec            i_atime;              /*    64     8 */
              struct timespec            i_mtime;              /*    72     8 */
              struct timespec            i_ctime;              /*    80     8 */
              unsigned int               i_blkbits;            /*    88     4 */
              long unsigned int          i_version;            /*    92     4 */
              blkcnt_t                   i_blocks;             /*    96     4 */
              short unsigned int         i_bytes;              /*   100     2 */
              umode_t                    i_mode;               /*   102     2 */
              spinlock_t                 i_lock;               /*   104    40 */
              struct mutex               i_mutex;              /*   144    76 */
              struct rw_semaphore        i_alloc_sem;          /*   220    64 */
              struct inode_operations *  i_op;                 /*   284     4 */
              const struct file_operations  * i_fop;           /*   288     4 */
              struct super_block *       i_sb;                 /*   292     4 */
              struct file_lock *         i_flock;              /*   296     4 */
              struct address_space *     i_mapping;            /*   300     4 */
              struct address_space       i_data;               /*   304   188 */
              struct list_head           i_devices;            /*   492     8 */
              union                      ;                     /*   500     4 */
              int                        i_cindex;             /*   504     4 */
              __u32                      i_generation;         /*   508     4 */
              /* ---------- cacheline 8 boundary ---------- */
              long unsigned int          i_dnotify_mask;       /*   512     4 */
              struct dnotify_struct *    i_dnotify;            /*   516     4 */
              struct list_head           inotify_watches;      /*   520     8 */
              struct mutex               inotify_mutex;        /*   528    76 */
              long unsigned int          i_state;              /*   604     4 */
              long unsigned int          dirtied_when;         /*   608     4 */
              unsigned int               i_flags;              /*   612     4 */
              atomic_t                   i_writecount;         /*   616     4 */
              void *                     i_security;           /*   620     4 */
              void *                     i_private;            /*   624     4 */
      }; /* size: 628 */
      
      [acme@newtoy net-2.6.20]$
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      12d40e43
    • E
      [PATCH] fs: reorder some 'struct inode' fields to speedup i_size manipulations · 83b7b44e
      Eric Dumazet 提交于
      On 32bits SMP platforms, 64bits i_size is protected by a seqcount
      (i_size_seqcount).
      
      When i_size is read or written, i_size_seqcount is read/written as well, so
      it make sense to group these two fields together in the same cache line.
      
      This patch moves i_size_seqcount next to i_size, and also moves i_version
      to let offsetof(struct inode, i_size) being 0x40 instead of 0x3c (for
      32bits platforms).
      
      For 64 bits platforms, i_size_seqcount doesnt exist, and the move of a
      'long i_version' should not introduce a new hole because of padding.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      83b7b44e
    • J
      [PATCH] constify inode accessors · 48ed214d
      Jan Engelhardt 提交于
      Change the signature of i_size_read(), IMINOR() and IMAJOR() because they,
      or the functions they call, will never modify the argument.
      Signed-off-by: NJan Engelhardt <jengelh@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      48ed214d
    • C
      [PATCH] slab: remove SLAB_KERNEL · e94b1766
      Christoph Lameter 提交于
      SLAB_KERNEL is an alias of GFP_KERNEL.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e94b1766
    • C
      [PATCH] Move names_cachep to linux/fs.h · b86c089b
      Christoph Lameter 提交于
      The names_cachep is used for getname() and putname().  So lets put it into
      fs.h near those two definitions.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b86c089b
  8. 04 12月, 2006 1 次提交
  9. 20 10月, 2006 3 次提交
  10. 17 10月, 2006 1 次提交
  11. 03 10月, 2006 2 次提交
    • A
      [PATCH] dm: export blkdev_driver_ioctl · 7006f6ec
      Alasdair G Kergon 提交于
      Export blkdev_driver_ioctl for device-mapper.
      
      If we get as far as the device-mapper ioctl handler, we know the ioctl is not
      a standard block layer BLK* one, so we don't need to check for them a second
      time and can call blkdev_driver_ioctl() directly.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7006f6ec
    • D
      [PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers · afefdbb2
      David Howells 提交于
      These patches make the kernel pass 64-bit inode numbers internally when
      communicating to userspace, even on a 32-bit system.  They are required
      because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
      for example.  The 64-bit inode numbers are then propagated to userspace
      automatically where the arch supports it.
      
      Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
      number returned by stat64() or getdents64() to differentiate files, and
      failing because the 64-bit inode number space was compressed to 32-bits, and
      so overlaps occur.
      
      This patch:
      
      Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
      inode number so that 64-bit inode numbers can be passed back to userspace.
      
      The stat functions then returns the full 64-bit inode number where
      available and where possible.  If it is not possible to represent the inode
      number supplied by the filesystem in the field provided by userspace, then
      error EOVERFLOW will be issued.
      
      Similarly, the getdents/readdir functions now pass the full 64-bit inode
      number to userspace where possible, returning EOVERFLOW instead when a
      directory entry is encountered that can't be properly represented.
      
      Note that this means that some inodes will not be stat'able on a 32-bit
      system with old libraries where they were before - but it does mean that
      there will be no ambiguity over what a 32-bit inode number refers to.
      
      Note similarly that directory scans may be cut short with an error on a
      32-bit system with old libraries where the scan would work before for the
      same reasons.
      
      It is judged unlikely that this situation will occur because modern glibc
      uses 64-bit capable versions of stat and getdents class functions
      exclusively, and that older systems are unlikely to encounter
      unrepresentable inode numbers anyway.
      
      [akpm: alpha build fix]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      afefdbb2
  12. 02 10月, 2006 2 次提交
  13. 01 10月, 2006 9 次提交