1. 06 2月, 2008 1 次提交
    • C
      Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user · eebd2aa3
      Christoph Lameter 提交于
      Simplify page cache zeroing of segments of pages through 3 functions
      
      zero_user_segments(page, start1, end1, start2, end2)
      
              Zeros two segments of the page. It takes the position where to
              start and end the zeroing which avoids length calculations and
      	makes code clearer.
      
      zero_user_segment(page, start, end)
      
              Same for a single segment.
      
      zero_user(page, start, length)
      
              Length variant for the case where we know the length.
      
      We remove the zero_user_page macro. Issues:
      
      1. Its a macro. Inline functions are preferable.
      
      2. The KM_USER0 macro is only defined for HIGHMEM.
      
         Having to treat this special case everywhere makes the
         code needlessly complex. The parameter for zeroing is always
         KM_USER0 except in one single case that we open code.
      
      Avoiding KM_USER0 makes a lot of code not having to be dealing
      with the special casing for HIGHMEM anymore. Dealing with
      kmap is only necessary for HIGHMEM configurations. In those
      configurations we use KM_USER0 like we do for a series of other
      functions defined in highmem.h.
      
      Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
      function could not be a macro. zero_user_* functions introduced
      here can be be inline because that constant is not used when these
      functions are called.
      
      Also extract the flushing of the caches to be outside of the kmap.
      
      [akpm@linux-foundation.org: fix nfs and ntfs build]
      [akpm@linux-foundation.org: fix ntfs build some more]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eebd2aa3
  2. 11 1月, 2008 1 次提交
  3. 21 12月, 2007 2 次提交
  4. 18 12月, 2007 2 次提交
  5. 10 12月, 2007 8 次提交
    • D
      [XFS] Fix xfs_ichgtime()s broken usage of I_SYNC · cf10e82b
      David Chinner 提交于
      The recent I_LOCK->I_SYNC changes mistakenly changed xfs_ichgtime to look
      at I_SYNC instead of I_LOCK. This was incorrect and prevents newly created
      inodes from moving to the dirty list. Change this to the correct check
      which is for I_NEW, not I_LOCK or I_SYNC so that behaviour is correct.
      
      SGI-PV: 974225
      SGI-Modid: xfs-linux-melb:xfs-kern:30204a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      cf10e82b
    • R
      [XFS] Make xfsbufd threads freezable · 978c7b2f
      Rafael J. Wysocki 提交于
      Fix breakage caused by commit 83144186
      that did not introduce the necessary call to set_freezable() in
      xfs/linux-2.6/xfs_buf.c .
      
      SGI-PV: 974224
      SGI-Modid: xfs-linux-melb:xfs-kern:30203a
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      978c7b2f
    • C
      [XFS] revert to double-buffering readdir · e89bc612
      Christoph Hellwig 提交于
      The current readdir implementation deadlocks on a btree buffers locks
      because nfsd calls back into ->lookup from the filldir callback. The only
      short-term fix for this is to revert to the old inefficient
      double-buffering scheme.
      
      SGI-PV: 973377
      SGI-Modid: xfs-linux-melb:xfs-kern:30201a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      e89bc612
    • D
      [XFS] Fix broken inode cluster setup. · a7430847
      David Chinner 提交于
      The radix tree based inode caches did away with the inode cluster hashes,
      replacing them with a bunch of masking and gang lookups on the radix tree.
      
      This masking got broken when moving the code to per-ag radix trees and
      indexing by agino # rather than straight inode number. The result is
      clustered inode writeback does not cluster and things can go extremely
      slowly when there are lots of inodes to write.
      
      Fix it up by comparing the agino # of the inode we just looked up to the
      index of the cluster we are looking for.
      Tested-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
      
      SGI-PV: 972915
      SGI-Modid: xfs-linux-melb:xfs-kern:30033a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      a7430847
    • L
      [XFS] Clear XBF_READ_AHEAD flag on I/O completion. · 77be55a5
      Lachlan McIlroy 提交于
      SGI-PV: 972554
      SGI-Modid: xfs-linux-melb:xfs-kern:30128a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      77be55a5
    • L
      [XFS] Fixed a few bugs in xfs_buf_associate_memory() · d1afb678
      Lachlan McIlroy 提交于
      - calculation of 'page_count' was incorrect as it did not
        consider the offset of 'mem' into the first page. The
        logic to bump 'page_count' didn't work if 'len' was <=
        PAGE_CACHE_SIZE (ie offset = 3k, len = 2k).
      - setting b_buffer_length to 'len' is incorrect if 'offset'
        is > 0. Set it to the total length of the buffer.
      - I suspect that passing a non-aligned address into
        mem_to_page() for the first page may have been causing
        issues - don't know but just tidy up that code anyway.
      
      SGI-PV: 971596
      SGI-Modid: xfs-linux-melb:xfs-kern:30143a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      d1afb678
    • L
      [XFS] 971064 Various fixups for xfs_bulkstat(). · cd57e594
      Lachlan McIlroy 提交于
      - sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]()
      - remove the special case for XFS_IOC_FSBULKSTAT with count == 1. This
        special case causes bulkstat to fail because the special case uses
        xfs_bulkstat_single() instead of xfs_bulkstat() and the two functions
        have different semantics.  xfs_bulkstat() will return the next inode
        after the one supplied while skipping internal inodes (ie quota inodes).
        xfs_bulkstate_single() will only lookup the inode supplied and return
        an error if it is an internal inode.
      - in xfs_bulkstat(), need to initialise 'lastino' to the inode supplied
        so in cases were we return without examining any inodes the scan wont
        restart back at zero.
      - sanity check for valid *ubcountp values. Cannot sanity check for valid
        ubuffer here because some users of xfs_bulkstat() don't supply a buffer.
      - checks against 'ubleft' (the space left in the user's buffer) should be
        against 'statstruct_size' which is the supplied minimum object size.
        The mixture of checks against statstruct_size and 0 was one of the
        reasons we were skipping inodes.
      - if the formatter function returns BULKSTAT_RV_NOTHING and an error and
        the error is not ENOENT or EINVAL then we need to abort the scan. ENOENT
        is for inodes that are no longer valid and we just skip them. EINVAL is
        returned if we try to lookup an internal inode so we skip them too. For
        a DMF scan if the inode and DMF attribute cannot fit into the space left
        in the user's buffer it would return ERANGE. We didn't handle this error
        and skipped the inode. We would continue to skip inodes until one fitted
        into the user's buffer or we completed the scan.
      - put back the recalculation of agino (that got removed with the last fix)
        at the end of the while loop. This is because the code at the start of
        the loop expects agino to be the last inode examined if it is non-zero.
      - if we found some inodes but then encountered an error, return success
        this time and the error next time. If the formatter aborted with ENOMEM
        we will now return this error but only if we couldn't read any inodes.
        Previously if we encountered ENOMEM without reading any inodes we
        returned a zero count and no error which falsely indicated the scan was
        complete.
      
      SGI-PV: 973431
      SGI-Modid: xfs-linux-melb:xfs-kern:30089a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      cd57e594
    • D
      [XFS] Fix dbflush panic in xfs_qm_sync. · d757762b
      Donald Douwsma 提交于
      The recent behaviour layer removal dropped the check for quotas that have
      been requested at mount time but have subsequently been turned off. This
      results in a panic when accessing m_quotainfo which has been freed.
      
      This patch adds the check originally made by xfs_qm_syncall() to
      xfs_qm_sync().
      
      SGI-PV: 969769
      SGI-Modid: xfs-linux-melb:xfs-kern:29908a
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      d757762b
  6. 22 10月, 2007 2 次提交
  7. 19 10月, 2007 2 次提交
  8. 17 10月, 2007 7 次提交
    • J
      introduce I_SYNC · 1c0eeaf5
      Joern Engel 提交于
      I_LOCK was used for several unrelated purposes, which caused deadlock
      situations in certain filesystems as a side effect.  One of the purposes
      now uses the new I_SYNC bit.
      
      Also document the various bits and change their order from historical to
      logical.
      
      [bunk@stusta.de: make fs/inode.c:wake_up_inode() static]
      Signed-off-by: NJoern Engel <joern@wohnheim.fh-wedel.de>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Anton Altaparmakov <aia21@cam.ac.uk>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c0eeaf5
    • F
      writeback: remove pages_skipped accounting in __block_write_full_page() · 1f7decf6
      Fengguang Wu 提交于
      Miklos Szeredi <miklos@szeredi.hu> and me identified a writeback bug:
      
      > The following strange behavior can be observed:
      >
      > 1. large file is written
      > 2. after 30 seconds, nr_dirty goes down by 1024
      > 3. then for some time (< 30 sec) nothing happens (disk idle)
      > 4. then nr_dirty again goes down by 1024
      > 5. repeat from 3. until whole file is written
      >
      > So basically a 4Mbyte chunk of the file is written every 30 seconds.
      > I'm quite sure this is not the intended behavior.
      
      It can be produced by the following test scheme:
      
      # cat bin/test-writeback.sh
      grep nr_dirty /proc/vmstat
      echo 1 > /proc/sys/fs/inode_debug
      dd if=/dev/zero of=/var/x bs=1K count=204800&
      while true; do grep nr_dirty /proc/vmstat; sleep 1; done
      
      # bin/test-writeback.sh
      nr_dirty 19207
      nr_dirty 19207
      nr_dirty 30924
      204800+0 records in
      204800+0 records out
      209715200 bytes (210 MB) copied, 1.58363 seconds, 132 MB/s
      nr_dirty 47150
      nr_dirty 47141
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47205
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47215
      nr_dirty 47216
      nr_dirty 47216
      nr_dirty 47216
      nr_dirty 47154
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47134
      nr_dirty 47134
      nr_dirty 47135
      nr_dirty 47135
      nr_dirty 47135
      nr_dirty 46097 <== -1038
      nr_dirty 46098
      nr_dirty 46098
      nr_dirty 46098
      [...]
      nr_dirty 46091
      nr_dirty 46092
      nr_dirty 46092
      nr_dirty 45069 <== -1023
      nr_dirty 45056
      nr_dirty 45056
      nr_dirty 45056
      [...]
      nr_dirty 37822
      nr_dirty 36799 <== -1023
      [...]
      nr_dirty 36781
      nr_dirty 35758 <== -1023
      [...]
      nr_dirty 34708
      nr_dirty 33672 <== -1024
      [...]
      nr_dirty 33692
      nr_dirty 32669 <== -1023
      
      % ls -li /var/x
      847824 -rw-r--r-- 1 root root 200M 2007-08-12 04:12 /var/x
      
      % dmesg|grep 847824  # generated by a debug printk
      [  529.263184] redirtied inode 847824 line 548
      [  564.250872] redirtied inode 847824 line 548
      [  594.272797] redirtied inode 847824 line 548
      [  629.231330] redirtied inode 847824 line 548
      [  659.224674] redirtied inode 847824 line 548
      [  689.219890] redirtied inode 847824 line 548
      [  724.226655] redirtied inode 847824 line 548
      [  759.198568] redirtied inode 847824 line 548
      
      # line 548 in fs/fs-writeback.c:
      543                 if (wbc->pages_skipped != pages_skipped) {
      544                         /*
      545                          * writeback is not making progress due to locked
      546                          * buffers.  Skip this inode for now.
      547                          */
      548                         redirty_tail(inode);
      549                 }
      
      More debug efforts show that __block_write_full_page()
      never has the chance to call submit_bh() for that big dirty file:
      the buffer head is *clean*. So basicly no page io is issued by
      __block_write_full_page(), hence pages_skipped goes up.
      
      Also the comment in generic_sync_sb_inodes():
      
      544                         /*
      545                          * writeback is not making progress due to locked
      546                          * buffers.  Skip this inode for now.
      547                          */
      
      and the comment in __block_write_full_page():
      
      1713                 /*
      1714                  * The page was marked dirty, but the buffers were
      1715                  * clean.  Someone wrote them back by hand with
      1716                  * ll_rw_block/submit_bh.  A rare case.
      1717                  */
      
      do not quite agree with each other. The page writeback should be skipped for
      'locked buffer', but here it is 'clean buffer'!
      
      This patch fixes this bug. Though I'm not sure why __block_write_full_page()
      is called only to do nothing and who actually issued the writeback for us.
      
      This is the two possible new behaviors after the patch:
      
      1) pretty nice: wait 30s and write ALL:)
      2) not so good:
      	- during the dd: ~16M
      	- after 30s:      ~4M
      	- after 5s:       ~4M
      	- after 5s:     ~176M
      
      The next patch will fix case (2).
      
      Cc: David Chinner <dgc@sgi.com>
      Cc: Ken Chen <kenchen@google.com>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f7decf6
    • C
      Slab API: remove useless ctor parameter and reorder parameters · 4ba9b9d0
      Christoph Lameter 提交于
      Slab constructors currently have a flags parameter that is never used.  And
      the order of the arguments is opposite to other slab functions.  The object
      pointer is placed before the kmem_cache pointer.
      
      Convert
      
              ctor(void *object, struct kmem_cache *s, unsigned long flags)
      
      to
      
              ctor(struct kmem_cache *s, void *object)
      
      throughout the kernel
      
      [akpm@linux-foundation.org: coupla fixes]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ba9b9d0
    • J
      [XFS] eagerly remove vmap mappings to avoid upsetting Xen · 7f015072
      Jeremy Fitzhardinge 提交于
      XFS leaves stray mappings around when it vmaps memory to make it virtually
      contigious. This upsets Xen if one of those pages is being recycled into a
      pagetable, since it finds an extra writable mapping of the page.
      
      This patch solves the problem in a brute force way, by making XFS always
      eagerly unmap its mappings.
      
      SGI-PV: 971902
      SGI-Modid: xfs-linux-melb:xfs-kern:29886a
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      7f015072
    • C
      [XFS] simplify validata_fields · 6572bc28
      Christoph Hellwig 提交于
      Stop using xfs_getattr and a onstack bhv_vattr_t just to get three fields
      from the underlying inode and opencode copying from the inode fields
      instead.
      
      SGI-PV: 970662
      SGI-Modid: xfs-linux-melb:xfs-kern:29711a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      6572bc28
    • J
      xfs: eagerly remove vmap mappings to avoid upsetting Xen · ace2e92e
      Jeremy Fitzhardinge 提交于
      XFS leaves stray mappings around when it vmaps memory to make it
      virtually contigious.  This upsets Xen if one of those pages is being
      recycled into a pagetable, since it finds an extra writable mapping of
      the page.
      
      This patch solves the problem in a brute force way, by making XFS
      always eagerly unmap its mappings.  David Chinner says this shouldn't
      have any performance impact on filesystems with default block sizes;
      it will only affect filesystems with large block sizes.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Acked-by: NDavid Chinner <dgc@sgi.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: XFS masters <xfs-masters@oss.sgi.com>
      Cc: Stable kernel <stable@kernel.org>
      Cc: Morten =?utf-8?q?B=C3=B8geskov?= <xen-users@morten.bogeskov.dk>
      Cc: Mark Williamson <mark.williamson@cl.cam.ac.uk>
      ace2e92e
    • N
      xfs: convert to new aops · d79689c7
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Timothy Shimmin <tes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d79689c7
  9. 16 10月, 2007 15 次提交