1. 25 6月, 2014 1 次提交
  2. 24 10月, 2013 2 次提交
  3. 31 5月, 2013 1 次提交
    • D
      xfs: rework remote attr CRCs · 7bc0dc27
      Dave Chinner 提交于
      Note: this changes the on-disk remote attribute format. I assert
      that this is OK to do as CRCs are marked experimental and the first
      kernel it is included in has not yet reached release yet. Further,
      the userspace utilities are still evolving and so anyone using this
      stuff right now is a developer or tester using volatile filesystems
      for testing this feature. Hence changing the format right now to
      save longer term pain is the right thing to do.
      
      The fundamental change is to move from a header per extent in the
      attribute to a header per filesytem block in the attribute. This
      means there are more header blocks and the parsing of the attribute
      data is slightly more complex, but it has the advantage that we
      always know the size of the attribute on disk based on the length of
      the data it contains.
      
      This is where the header-per-extent method has problems. We don't
      know the size of the attribute on disk without first knowing how
      many extents are used to hold it. And we can't tell from a
      mapping lookup, either, because remote attributes can be allocated
      contiguously with other attribute blocks and so there is no obvious
      way of determining the actual size of the atribute on disk short of
      walking and mapping buffers.
      
      The problem with this approach is that if we map a buffer
      incorrectly (e.g. we make the last buffer for the attribute data too
      long), we then get buffer cache lookup failure when we map it
      correctly. i.e. we get a size mismatch on lookup. This is not
      necessarily fatal, but it's a cache coherency problem that can lead
      to returning the wrong data to userspace or writing the wrong data
      to disk. And debug kernels will assert fail if this occurs.
      
      I found lots of niggly little problems trying to fix this issue on a
      4k block size filesystem, finally getting it to pass with lots of
      fixes. The thing is, 1024 byte filesystems still failed, and it was
      getting really complex handling all the corner cases that were
      showing up. And there were clearly more that I hadn't found yet.
      
      It is complex, fragile code, and if we don't fix it now, it will be
      complex, fragile code forever more.
      
      Hence the simple fix is to add a header to each filesystem block.
      This gives us the same relationship between the attribute data
      length and the number of blocks on disk as we have without CRCs -
      it's a linear mapping and doesn't require us to guess anything. It
      is simple to implement, too - the remote block count calculated at
      lookup time can be used by the remote attribute set/get/remove code
      without modification for both CRC and non-CRC filesystems. The world
      becomes sane again.
      
      Because the copy-in and copy-out now need to iterate over each
      filesystem block, I moved them into helper functions so we separate
      the block mapping and buffer manupulations from the attribute data
      and CRC header manipulations. The code becomes much clearer as a
      result, and it is a lot easier to understand and debug. It also
      appears to be much more robust - once it worked on 4k block size
      filesystems, it has worked without failure on 1k block size
      filesystems, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit ad1858d7)
      7bc0dc27
  4. 24 5月, 2013 1 次提交
    • D
      xfs: rework remote attr CRCs · ad1858d7
      Dave Chinner 提交于
      Note: this changes the on-disk remote attribute format. I assert
      that this is OK to do as CRCs are marked experimental and the first
      kernel it is included in has not yet reached release yet. Further,
      the userspace utilities are still evolving and so anyone using this
      stuff right now is a developer or tester using volatile filesystems
      for testing this feature. Hence changing the format right now to
      save longer term pain is the right thing to do.
      
      The fundamental change is to move from a header per extent in the
      attribute to a header per filesytem block in the attribute. This
      means there are more header blocks and the parsing of the attribute
      data is slightly more complex, but it has the advantage that we
      always know the size of the attribute on disk based on the length of
      the data it contains.
      
      This is where the header-per-extent method has problems. We don't
      know the size of the attribute on disk without first knowing how
      many extents are used to hold it. And we can't tell from a
      mapping lookup, either, because remote attributes can be allocated
      contiguously with other attribute blocks and so there is no obvious
      way of determining the actual size of the atribute on disk short of
      walking and mapping buffers.
      
      The problem with this approach is that if we map a buffer
      incorrectly (e.g. we make the last buffer for the attribute data too
      long), we then get buffer cache lookup failure when we map it
      correctly. i.e. we get a size mismatch on lookup. This is not
      necessarily fatal, but it's a cache coherency problem that can lead
      to returning the wrong data to userspace or writing the wrong data
      to disk. And debug kernels will assert fail if this occurs.
      
      I found lots of niggly little problems trying to fix this issue on a
      4k block size filesystem, finally getting it to pass with lots of
      fixes. The thing is, 1024 byte filesystems still failed, and it was
      getting really complex handling all the corner cases that were
      showing up. And there were clearly more that I hadn't found yet.
      
      It is complex, fragile code, and if we don't fix it now, it will be
      complex, fragile code forever more.
      
      Hence the simple fix is to add a header to each filesystem block.
      This gives us the same relationship between the attribute data
      length and the number of blocks on disk as we have without CRCs -
      it's a linear mapping and doesn't require us to guess anything. It
      is simple to implement, too - the remote block count calculated at
      lookup time can be used by the remote attribute set/get/remove code
      without modification for both CRC and non-CRC filesystems. The world
      becomes sane again.
      
      Because the copy-in and copy-out now need to iterate over each
      filesystem block, I moved them into helper functions so we separate
      the block mapping and buffer manupulations from the attribute data
      and CRC header manipulations. The code becomes much clearer as a
      result, and it is a lot easier to understand and debug. It also
      appears to be much more robust - once it worked on 4k block size
      filesystems, it has worked without failure on 1k block size
      filesystems, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ad1858d7
  5. 28 4月, 2013 3 次提交
    • D
      xfs: add buffer types to directory and attribute buffers · d75afeb3
      Dave Chinner 提交于
      Add buffer types to the buffer log items so that log recovery can
      validate the buffers and calculate CRCs correctly after the buffers
      are recovered.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      d75afeb3
    • D
      xfs: add CRC protection to remote attributes · d2e448d5
      Dave Chinner 提交于
      There are two ways of doing this - the first is to add a CRC to the
      remote attribute entry in the attribute block. The second is to
      treat them similar to the remote symlink, where each fragment has
      it's own header and identifies fragment location in the attribute.
      
      The problem with the CRC in the remote attr entry is that we cannot
      identify the owner of the metadata from the metadata blocks
      themselves, or where the blocks fit into the remote attribute. The
      down side to this approach is that we never know when the attribute
      has been read from disk or not and so we have to verify it every
      time it is read, and we must calculate it during the create
      transaction and log it. We do not log CRCs for any other metadata,
      and so this creates a unique set of coherency problems that, in
      general, are best avoided.
      
      Adding an identifying header to each allocated block allows us to
      identify each fragment and where in the attribute it is located. It
      enables us to rebuild the remote attribute from just the raw blocks
      containing the attribute. It also provides us to do per-block CRCs
      verification at IO time rather than during the transaction context
      that creates it or every time it is read into a user buffer. Hence
      it avoids all the problems that an external, logged CRC has, and
      provides all the benefits of self identifying metadata.
      
      The only complexity is that we have to add a header per fragment,
      and we don't know how many fragments will be needed prior to
      allocations. If we take the symlink example, the header is 56 bytes
      and hence for a 4k block size filesystem, in the worst case 16
      headers requires 1 extra block for the 64k attribute data. For 512
      byte filesystems the worst case is an extra block for every 9
      fragments (i.e. 16 extra blocks in the worse case). This will be
      very rare and so it's not really a major concern.
      
      Because allocation is done in two steps - the first finds a hole
      large enough in the attribute file, the second does the allocation -
      we only need to find a hole big enough for a worst case allocation.
      We only need to allocate enough extra blocks for number of headers
      required by the fragments, and we can calculate that as we go....
      
      Hence it really only makes sense to use the same model as for
      symlinks - it doesn't add that much complexity, does not require an
      attribute tree format change, and does not require logging
      calculated CRC values.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      d2e448d5
    • D
      xfs: split remote attribute code out · 95920cd6
      Dave Chinner 提交于
      Adding CRC support to remote attributes adds a significant amount of
      remote attribute specific code. Split the existing remote attribute
      code out into it's own file so that all the relevant remote
      attribute code is in a single, easy to find place.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      95920cd6
  6. 13 8月, 2011 1 次提交
  7. 13 7月, 2011 1 次提交
  8. 15 12月, 2009 1 次提交
    • C
      xfs: event tracing support · 0b1b213f
      Christoph Hellwig 提交于
      Convert the old xfs tracing support that could only be used with the
      out of tree kdb and xfsidbg patches to use the generic event tracer.
      
      To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
      all xfs trace channels by:
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
      
      or alternatively enable single events by just doing the same in one
      event subdirectory, e.g.
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable
      
      or set more complex filters, etc. In Documentation/trace/events.txt
      all this is desctribed in more detail.  To reads the events do a
      
         cat /sys/kernel/debug/tracing/trace
      
      Compared to the last posting this patch converts the tracing mostly to
      the one tracepoint per callsite model that other users of the new
      tracing facility also employ.  This allows a very fine-grained control
      of the tracing, a cleaner output of the traces and also enables the
      perf tool to use each tracepoint as a virtual performance counter,
           allowing us to e.g. count how often certain workloads git various
           spots in XFS.  Take a look at
      
          http://lwn.net/Articles/346470/
      
      for some examples.
      
      Also the btree tracing isn't included at all yet, as it will require
      additional core tracing features not in mainline yet, I plan to
      deliver it later.
      
      And the really nice thing about this patch is that it actually removes
      many lines of code while adding this nice functionality:
      
       fs/xfs/Makefile                |    8
       fs/xfs/linux-2.6/xfs_acl.c     |    1
       fs/xfs/linux-2.6/xfs_aops.c    |   52 -
       fs/xfs/linux-2.6/xfs_aops.h    |    2
       fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
       fs/xfs/linux-2.6/xfs_buf.h     |   33
       fs/xfs/linux-2.6/xfs_fs_subr.c |    3
       fs/xfs/linux-2.6/xfs_ioctl.c   |    1
       fs/xfs/linux-2.6/xfs_ioctl32.c |    1
       fs/xfs/linux-2.6/xfs_iops.c    |    1
       fs/xfs/linux-2.6/xfs_linux.h   |    1
       fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
       fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
       fs/xfs/linux-2.6/xfs_super.c   |  104 ---
       fs/xfs/linux-2.6/xfs_super.h   |    7
       fs/xfs/linux-2.6/xfs_sync.c    |    1
       fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
       fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
       fs/xfs/linux-2.6/xfs_vnode.h   |    4
       fs/xfs/quota/xfs_dquot.c       |  110 ---
       fs/xfs/quota/xfs_dquot.h       |   21
       fs/xfs/quota/xfs_qm.c          |   40 -
       fs/xfs/quota/xfs_qm_syscalls.c |    4
       fs/xfs/support/ktrace.c        |  323 ---------
       fs/xfs/support/ktrace.h        |   85 --
       fs/xfs/xfs.h                   |   16
       fs/xfs/xfs_ag.h                |   14
       fs/xfs/xfs_alloc.c             |  230 +-----
       fs/xfs/xfs_alloc.h             |   27
       fs/xfs/xfs_alloc_btree.c       |    1
       fs/xfs/xfs_attr.c              |  107 ---
       fs/xfs/xfs_attr.h              |   10
       fs/xfs/xfs_attr_leaf.c         |   14
       fs/xfs/xfs_attr_sf.h           |   40 -
       fs/xfs/xfs_bmap.c              |  507 +++------------
       fs/xfs/xfs_bmap.h              |   49 -
       fs/xfs/xfs_bmap_btree.c        |    6
       fs/xfs/xfs_btree.c             |    5
       fs/xfs/xfs_btree_trace.h       |   17
       fs/xfs/xfs_buf_item.c          |   87 --
       fs/xfs/xfs_buf_item.h          |   20
       fs/xfs/xfs_da_btree.c          |    3
       fs/xfs/xfs_da_btree.h          |    7
       fs/xfs/xfs_dfrag.c             |    2
       fs/xfs/xfs_dir2.c              |    8
       fs/xfs/xfs_dir2_block.c        |   20
       fs/xfs/xfs_dir2_leaf.c         |   21
       fs/xfs/xfs_dir2_node.c         |   27
       fs/xfs/xfs_dir2_sf.c           |   26
       fs/xfs/xfs_dir2_trace.c        |  216 ------
       fs/xfs/xfs_dir2_trace.h        |   72 --
       fs/xfs/xfs_filestream.c        |    8
       fs/xfs/xfs_fsops.c             |    2
       fs/xfs/xfs_iget.c              |  111 ---
       fs/xfs/xfs_inode.c             |   67 --
       fs/xfs/xfs_inode.h             |   76 --
       fs/xfs/xfs_inode_item.c        |    5
       fs/xfs/xfs_iomap.c             |   85 --
       fs/xfs/xfs_iomap.h             |    8
       fs/xfs/xfs_log.c               |  181 +----
       fs/xfs/xfs_log_priv.h          |   20
       fs/xfs/xfs_log_recover.c       |    1
       fs/xfs/xfs_mount.c             |    2
       fs/xfs/xfs_quota.h             |    8
       fs/xfs/xfs_rename.c            |    1
       fs/xfs/xfs_rtalloc.c           |    1
       fs/xfs/xfs_rw.c                |    3
       fs/xfs/xfs_trans.h             |   47 +
       fs/xfs/xfs_trans_buf.c         |   62 -
       fs/xfs/xfs_vnodeops.c          |    8
       70 files changed, 2151 insertions(+), 2592 deletions(-)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0b1b213f
  9. 30 10月, 2008 1 次提交
  10. 29 4月, 2008 1 次提交
  11. 18 4月, 2008 1 次提交
  12. 07 2月, 2008 1 次提交
  13. 14 7月, 2007 1 次提交
    • D
      [XFS] Concurrent Multi-File Data Streams · 2a82b8be
      David Chinner 提交于
      In media spaces, video is often stored in a frame-per-file format. When
      dealing with uncompressed realtime HD video streams in this format, it is
      crucial that files do not get fragmented and that multiple files a placed
      contiguously on disk.
      
      When multiple streams are being ingested and played out at the same time,
      it is critical that the filesystem does not cross the streams and
      interleave them together as this creates seek and readahead cache miss
      latency and prevents both ingest and playout from meeting frame rate
      targets.
      
      This patch set creates a "stream of files" concept into the allocator to
      place all the data from a single stream contiguously on disk so that RAID
      array readahead can be used effectively. Each additional stream gets
      placed in different allocation groups within the filesystem, thereby
      ensuring that we don't cross any streams. When an AG fills up, we select a
      new AG for the stream that is not in use.
      
      The core of the functionality is the stream tracking - each inode that we
      create in a directory needs to be associated with the directories' stream.
      Hence every time we create a file, we look up the directories' stream
      object and associate the new file with that object.
      
      Once we have a stream object for a file, we use the AG that the stream
      object point to for allocations. If we can't allocate in that AG (e.g. it
      is full) we move the entire stream to another AG. Other inodes in the same
      stream are moved to the new AG on their next allocation (i.e. lazy
      update).
      
      Stream objects are kept in a cache and hold a reference on the inode.
      Hence the inode cannot be reclaimed while there is an outstanding stream
      reference. This means that on unlink we need to remove the stream
      association and we also need to flush all the associations on certain
      events that want to reclaim all unreferenced inodes (e.g. filesystem
      freeze).
      
      SGI-PV: 964469
      SGI-Modid: xfs-linux-melb:xfs-kern:29096a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NBarry Naujok <bnaujok@sgi.com>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      Signed-off-by: NVlad Apostolov <vapo@sgi.com>
      2a82b8be
  14. 11 11月, 2006 1 次提交
  15. 09 11月, 2005 1 次提交
    • O
      [PATCH] changing CONFIG_LOCALVERSION rebuilds too much, for no good reason · 733482e4
      Olaf Hering 提交于
      This patch removes almost all inclusions of linux/version.h.  The 3
      #defines are unused in most of the touched files.
      
      A few drivers use the simple KERNEL_VERSION(a,b,c) macro, which is
      unfortunatly in linux/version.h.
      
      There are also lots of #ifdef for long obsolete kernels, this was not
      touched.  In a few places, the linux/version.h include was move to where
      the LINUX_VERSION_CODE was used.
      
      quilt vi `find * -type f -name "*.[ch]"|xargs grep -El '(UTS_RELEASE|LINUX_VERSION_CODE|KERNEL_VERSION|linux/version.h)'|grep -Ev '(/(boot|coda|drm)/|~$)'`
      
      search pattern:
      /UTS_RELEASE\|LINUX_VERSION_CODE\|KERNEL_VERSION\|linux\/\(utsname\|version\).h
      Signed-off-by: NOlaf Hering <olh@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      733482e4
  16. 02 11月, 2005 2 次提交
  17. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4