1. 25 7月, 2013 1 次提交
    • D
      xfs: di_flushiter considered harmful · e1b4271a
      Dave Chinner 提交于
      When we made all inode updates transactional, we no longer needed
      the log recovery detection for inodes being newer on disk than the
      transaction being replayed - it was redundant as replay of the log
      would always result in the latest version of the inode would be on
      disk. It was redundant, but left in place because it wasn't
      considered to be a problem.
      
      However, with the new "don't read inodes on create" optimisation,
      flushiter has come back to bite us. Essentially, the optimisation
      made always initialises flushiter to zero in the create transaction,
      and so if we then crash and run recovery and the inode already on
      disk has a non-zero flushiter it will skip recovery of that inode.
      As a result, log recovery does the wrong thing and we end up with a
      corrupt filesystem.
      
      Because we have to support old kernel to new kernel upgrades, we
      can't just get rid of the flushiter support in log recovery as we
      might be upgrading from a kernel that doesn't have fully transactional
      inode updates.  Unfortunately, for v4 superblocks there is no way to
      guarantee that log recovery knows about this fact.
      
      We cannot add a new inode format flag to say it's a "special inode
      create" because it won't be understood by older kernels and so
      recovery could do the wrong thing on downgrade. We cannot specially
      detect the combination of zero mode/non-zero flushiter on disk to
      non-zero mode, zero flushiter in the log item during recovery
      because wrapping of the flushiter can result in false detection.
      
      Hence that makes this "don't use flushiter" optimisation limited to
      a disk format that guarantees that we don't need it. And that means
      the only fix here is to limit the "no read IO on create"
      optimisation to version 5 superblocks....
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit e60896d8)
      e1b4271a
  2. 10 7月, 2013 1 次提交
  3. 22 4月, 2013 2 次提交
    • C
      xfs: add version 3 inode format with CRCs · 93848a99
      Christoph Hellwig 提交于
      Add a new inode version with a larger core.  The primary objective is
      to allow for a crc of the inode, and location information (uuid and ino)
      to verify it was written in the right place.  We also extend it by:
      
      	a creation time (for Samba);
      	a changecount (for NFSv4);
      	a flush sequence (in LSN format for recovery);
      	an additional inode flags field; and
      	some additional padding.
      
      These additional fields are not implemented yet, but already laid
      out in the structure.
      
      [dchinner@redhat.com] Added LSN and flags field, some factoring and rework to
      capture all the necessary information in the crc calculation.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      93848a99
    • C
      xfs: add support for large btree blocks · ee1a47ab
      Christoph Hellwig 提交于
      Add support for larger btree blocks that contains a CRC32C checksum,
      a filesystem uuid and block number for detecting filesystem
      consistency and out of place writes.
      
      [dchinner@redhat.com] Also include an owner field to allow reverse
      mappings to be implemented for improved repairability and a LSN
      field to so that log recovery can easily determine the last
      modification that made it to disk for each buffer.
      
      [dchinner@redhat.com] Add buffer log format flags to indicate the
      type of buffer to recovery so that we don't have to do blind magic
      number tests to determine what the buffer is.
      
      [dchinner@redhat.com] Modified to fit into the verifier structure.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ee1a47ab
  4. 15 3月, 2013 1 次提交
  5. 15 6月, 2012 1 次提交
  6. 26 7月, 2011 1 次提交
  7. 19 10月, 2010 1 次提交
  8. 29 3月, 2009 1 次提交
  9. 01 12月, 2008 3 次提交
  10. 30 10月, 2008 2 次提交
    • C
      [XFS] Always use struct xfs_btree_block instead of short / longform · 7cc95a82
      Christoph Hellwig 提交于
      structures.
      
      Always use the generic xfs_btree_block type instead of the short / long
      structures. Add XFS_BTREE_SBLOCK_LEN / XFS_BTREE_LBLOCK_LEN defines for
      the length of a short / long form block. The rationale for this is that we
      will grow more btree block header variants to support CRCs and other RAS
      information, and always accessing them through the same datatype with
      unions for the short / long form pointers makes implementing this much
      easier.
      
      SGI-PV: 988146
      
      SGI-Modid: xfs-linux-melb:xfs-kern:32300a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NDavid Chinner <david@fromorbit.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      7cc95a82
    • C
      [XFS] Cleanup maxrecs calculation. · 60197e8d
      Christoph Hellwig 提交于
      Clean up the way the maximum and minimum records for the btree blocks are
      calculated. For the alloc and inobt btrees all the values are
      pre-calculated in xfs_mount_common, and we switch the current loop around
      the ugly generic macros that use cpp token pasting to generate type names
      to two small helpers in normal C code. For the bmbt and bmdr trees these
      helpers also exist, but can be called during runtime, too. Here we also
      kill various macros dealing with them and inline the logic into the
      get_minrecs / get_maxrecs / get_dmaxrecs methods in xfs_bmap_btree.c.
      
      Note that all these new helpers take an xfs_mount * argument which will be
      needed to determine the size of a btree block once we add support for
      extended btree blocks with CRCs and other RAS information.
      
      SGI-PV: 988146
      
      SGI-Modid: xfs-linux-melb:xfs-kern:32292a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      60197e8d
  11. 07 2月, 2008 2 次提交
    • C
      [XFS] Remove CFORK macros and use code directly in IFORK and DFORK macros. · 45ba598e
      Christoph Hellwig 提交于
      Currently XFS_IFORK_* and XFS_DFORK* are implemented by means of
      XFS_CFORK* macros. But given that XFS_IFORK_* operates on an xfs_inode
      that embedds and xfs_icdinode_core and XFS_DFORK_* operates on an
      xfs_dinode that embedds a xfs_dinode_core one will have to do endian
      swapping while the other doesn't. Instead of having the current mess with
      the CFORK macros that have byteswapping and non-byteswapping version
      (which are inconsistantly named while we're at it) just define each family
      of the macros to stand by itself and simplify the whole matter.
      
      A few direct references to the CFORK variants were cleaned up to use IFORK
      or DFORK to make this possible.
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:30163a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      45ba598e
    • E
      [XFS] optimize XFS_IS_REALTIME_INODE w/o realtime config · 71ddabb9
      Eric Sandeen 提交于
      Use XFS_IS_REALTIME_INODE in more places, and #define it to 0 if
      CONFIG_XFS_RT is off. This should be safe because mount checks in
      xfs_rtmount_init:
      
      so if we get mounted w/o CONFIG_XFS_RT, no realtime inodes should be
      encountered after that.
      
      Defining XFS_IS_REALTIME_INODE to 0 saves a bit of stack space,
      presumeably gcc can optimize around the various "if (0)" type checks:
      
      xfs_alloc_file_space -8 xfs_bmap_adjacent -16 xfs_bmapi -8
      xfs_bmap_rtalloc -16 xfs_bunmapi -28 xfs_free_file_space -64 xfs_imap +8
      <-- ? hmm. xfs_iomap_write_direct -12 xfs_qm_dqusage_adjust -4
      xfs_qm_vop_chown_reserve -4
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:30014a
      Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      71ddabb9
  12. 15 10月, 2007 1 次提交
    • C
      [XFS] dinode endianess annotations · 347d1c01
      Christoph Hellwig 提交于
      Biggest bit is duplicating the dinode structure so we have one annotated for
      native endianess and one for disk endianess. The other significant change
      is that xfs_xlate_dinode_core is split into one helper per direction to
      allow for proper annotations, everything else is trivial.
      
      As a sidenode splitting out the incore dinode means we can move it into
      xfs_inode.h in a later patch and severely improving on the include hell in
      xfs.
      
      SGI-PV: 968563
      SGI-Modid: xfs-linux-melb:xfs-kern:29476a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      347d1c01
  13. 14 7月, 2007 1 次提交
    • D
      [XFS] Concurrent Multi-File Data Streams · 2a82b8be
      David Chinner 提交于
      In media spaces, video is often stored in a frame-per-file format. When
      dealing with uncompressed realtime HD video streams in this format, it is
      crucial that files do not get fragmented and that multiple files a placed
      contiguously on disk.
      
      When multiple streams are being ingested and played out at the same time,
      it is critical that the filesystem does not cross the streams and
      interleave them together as this creates seek and readahead cache miss
      latency and prevents both ingest and playout from meeting frame rate
      targets.
      
      This patch set creates a "stream of files" concept into the allocator to
      place all the data from a single stream contiguously on disk so that RAID
      array readahead can be used effectively. Each additional stream gets
      placed in different allocation groups within the filesystem, thereby
      ensuring that we don't cross any streams. When an AG fills up, we select a
      new AG for the stream that is not in use.
      
      The core of the functionality is the stream tracking - each inode that we
      create in a directory needs to be associated with the directories' stream.
      Hence every time we create a file, we look up the directories' stream
      object and associate the new file with that object.
      
      Once we have a stream object for a file, we use the AG that the stream
      object point to for allocations. If we can't allocate in that AG (e.g. it
      is full) we move the entire stream to another AG. Other inodes in the same
      stream are moved to the new AG on their next allocation (i.e. lazy
      update).
      
      Stream objects are kept in a cache and hold a reference on the inode.
      Hence the inode cannot be reclaimed while there is an outstanding stream
      reference. This means that on unlink we need to remove the stream
      association and we also need to flush all the associations on certain
      events that want to reclaim all unreferenced inodes (e.g. filesystem
      freeze).
      
      SGI-PV: 964469
      SGI-Modid: xfs-linux-melb:xfs-kern:29096a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NBarry Naujok <bnaujok@sgi.com>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      Signed-off-by: NVlad Apostolov <vapo@sgi.com>
      2a82b8be
  14. 20 6月, 2006 1 次提交
  15. 09 6月, 2006 1 次提交
  16. 11 1月, 2006 2 次提交
  17. 02 11月, 2005 2 次提交
  18. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4