- 07 3月, 2016 14 次提交
-
-
由 Dave Chinner 提交于
-
由 Dave Chinner 提交于
-
由 Darrick J. Wong 提交于
We need to create a new ioend if the current writepage call isn't logically contiguous with the range contained in the previous ioend. Hopefully writepage gets called in order of increasing file offset. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
-
由 Dave Chinner 提交于
-
由 Dave Chinner 提交于
-
由 Dave Chinner 提交于
-
由 Dave Chinner 提交于
-
由 Dave Chinner 提交于
-
由 Dave Chinner 提交于
-
由 Brian Foster 提交于
XFS uses CRC verification over a sub-range of the head of the log to detect and handle torn writes. This torn log write detection currently runs unconditionally at mount time, regardless of whether the log is dirty or clean. This is problematic in cases where a filesystem might end up being moved across different, incompatible (i.e., opposite byte-endianness) architectures. The problem lies in the fact that log data is not necessarily written in an architecture independent format. For example, certain bits of data are written in native endian format. Further, the size of certain log data structures differs (i.e., struct xlog_rec_header) depending on the word size of the cpu. This leads to false positive crc verification errors and ultimately failed mounts when a cleanly unmounted filesystem is mounted on a system with an incompatible architecture from data that was written near the head of the log. Update the log head/tail discovery code to run torn write detection only when the log is not clean. This means something other than an unmount record resides at the head of the log and log recovery is imminent. It is a requirement to run log recovery on the same type of host that had written the content of the dirty log and therefore CRC failures are legitimate corruptions in that scenario. Reported-by: NJan Beulich <JBeulich@suse.com> Tested-by: NJan Beulich <JBeulich@suse.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Once the record at the head of the log is identified and verified, the in-core log state is updated based on the record. This includes information such as the current head block and cycle, the start block of the last record written to the log, the tail lsn, etc. Once torn write detection is conditional, this logic will need to be reused. Factor the code to update the in-core log data structures into a new helper function. This patch does not change behavior. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Once the mount sequence has identified the head and tail blocks of the physical log, the record at the head of the log is located and examined for an unmount record to determine if the log is clean. This currently occurs after torn write verification of the head region of the log. This must ultimately be separated from torn write verification and may need to be called again if the log head is walked back due to a torn write (to determine whether the new head record is an unmount record). Separate this logic into a new helper function. This patch does not change behavior. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
The code that locates the log record at the head of the log is buried in the log head verification function. This is fine when torn write verification occurs unconditionally, but this behavior is problematic for filesystems that might be moved across systems with different architectures. In preparation for separating examination of the log head for unmount records from torn write detection, lift the record location logic out of the log verification function and into the caller. This patch does not change behavior. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 01 3月, 2016 4 次提交
-
-
由 Dave Chinner 提交于
If the block size of a filesystem is not at least PAGE_SIZEd, then at this point in time DAX cannot be used due to the fact we can't guarantee extents are page sized or aligned without further work. Hence disallow setting the DAX flag on an inode if the block size is too small. Also, be defensive and check the block size when reading an inode in off disk. In future, we want to allow DAX to work on any filesystem, so this is temporary while we sort of the correct conbination of extent size hints and allocation alignment configurations needed to guarantee page sized and aligned extent allocation for DAX enabled files. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
When we set or clear the XFS_DIFLAG2_DAX flag, we should also set/clear the S_DAX flag in the VFS inode. To do this, we need to ensure that we first flush and remove any cached entries in the radix tree to ensure the correct data access method is used when we next try to read or write data. We ahve to be especially careful here to lock out page faults so they don't race with the flush and invalidation before we change the access mode. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Only regular files can use DAX for data operations, so we should restrict setting it on the VFS inode to regular files. Setting it on metadata inodes may cause the VFS to do the wrong thing for such inodes, so avoid potential problems by restricting the scope of the flag to what we know is supported. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Only file data can use DAX, so we should onyl be able to set this flag on regular files. However, the flag also serves as an "inherit" flag at file create time when set on directories, so limit the FS_IOC_FSSETXATTR ioctl to only set this flag on regular files and directories. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 29 2月, 2016 1 次提交
-
-
由 Jan Kara 提交于
When AIO DIO fails e.g. due to IO error, we must not convert unwritten extents as that will expose uninitialized data. Handle this case by clearing unwritten flag from io_end in case of error and thus preventing extent conversion. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 15 2月, 2016 6 次提交
-
-
由 Dave Chinner 提交于
Currently we can build a long ioend chain during ->writepages that gets attached to the writepage context. IO submission only then occurs when we finish all the writepage processing. This means we can have many ioends allocated and pending, and this violates the mempool guarantees that we need to give about forwards progress. i.e. we really should only have one ioend being built at a time, otherwise we may drain the mempool trying to allocate a new ioend and that blocks submission, completion and freeing of ioends that are already in progress. To prevent this situation from happening, we need to submit ioends for IO as soon as they are ready for dispatch rather than queuing them for later submission. This means the ioends have bios built immediately and they get queued on any plug that is current active. Hence if we schedule away from writeback, the ioends that have been built will make forwards progress due to the plug flushing on context switch. This will also prevent context switches from creating unnecessary IO submission latency. We can't completely avoid having nested IO allocation - when we have a block size smaller than a page size, we still need to hold the ioend submission until after we have marked the current page dirty. Hence we may need multiple ioends to be held while the current page is completely mapped and made ready for IO dispatch. We cannot avoid this problem - the current code already has this ioend chaining within a page so we can mostly ignore that it occurs. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Separate out the bufferhead based mapping from the writepage code so that we have a clear separation of the page operations and the bufferhead state. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
xfs_cluster_write() is not necessary now that xfs_vm_writepages() aggregates writepage calls across a single mapping. This means we no longer need to do page lookups in xfs_cluster_write, so writeback only needs to look up th epage cache once per page being written. This also removes a large amount of mostly duplicate code between xfs_do_writepage() and xfs_convert_page(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
xfs_vm_writepages() calls generic_writepages to writeback a range of a file, but then xfs_vm_writepage() clusters pages itself as it does not have any context it can pass between->writepage calls from __write_cache_pages(). Introduce a writeback context for xfs_vm_writepages() and call __write_cache_pages directly with our own writepage callback so that we can pass that context to each writepage invocation. This encapsulates the current mapping, whether it is valid or not, the current ioend and it's IO type and the ioend chain being built. This requires us to move the ioend submission up to the level where the writepage context is declared. This does mean we do not submit IO until we packaged the entire writeback range, but with the block plugging in the writepages call this is the way IO is submitted, anyway. It also means that we need to handle discontiguous page ranges. If the pages sent down by write_cache_pages to the writepage callback are discontiguous, we need to detect this and put each discontiguous page range into individual ioends. This is needed to ensure that the ioend accurately represents the range of the file that it covers so that file size updates during IO completion set the size correctly. Failure to take into account the discontiguous ranges results in files being too small when writeback patterns are non-sequential. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We currently have code to cancel ioends being built because we change bufferhead state as we build the ioend. On error, this needs to be unwound and so we have cancelling code that walks the buffers on the ioend chain and undoes these state changes. However, the IO submission path already handles state changes for buffers when a submission error occurs, so we don't really need a separate cancel function to do this - we can simply submit the ioend chain with the specific error and it will be cancelled rather than submitted. Hence we can remove the explicit cancel code and just rely on submission to deal with the error correctly. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Remove the nonblocking optimisation done for mapping lookups during writeback. It's not clear that leaving a hole in the writeback range just because we couldn't get a lock is really a win, as it makes us do another small random IO later on rather than a large sequential IO now. As this gets in the way of sane error handling later on, just remove for the moment and we can re-introduce an equivalent optimisation in future if we see problems due to extent map lock contention. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 10 2月, 2016 6 次提交
-
-
由 Dave Chinner 提交于
The places where we use this macro already clear unnecessary IO flags (e.g. through xfs_bwrite()) or never have unexpected IO flags set on them in the first place (e.g. iclog buffers). Remove the macro from these locations, and where necessary clear only the specific flags that are conditional in the current buffer context. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 09 2月, 2016 9 次提交
-
-
由 Dave Chinner 提交于
Move the di_mode value from the xfs_icdinode to the VFS inode, reducing the xfs_icdinode byte another 2 bytes and collapsing another 2 byte hole in the structure. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We can store the di_changecount in the i_version field of the VFS inode and remove another 8 bytes from the xfs_icdinode. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Pull another 4 bytes out of the xfs_icdinode. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The VFS tracks the inode nlink just like the xfs_icdinode. We can remove the variable from the icdinode and use the VFS inode variable everywhere, reducing the size of the xfs_icdinode by a further 4 bytes. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We are going to keep certain on-disk information in the VFS inode rather than in a separate XFS specific stucture, so we have to be careful of the VFS code clearing that information when we re-initialise reclaimable cached inodes during lookup. If we don't do this, then we lose critical information from the inode and that results in corruption being detected. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
So we don't have to carry an di_onlink variable around anymore, move the inode conversion from v1 inode format to v2 inode format into xfs_inode_from_disk(). This means we can remove the di_onlink fields from the struct xfs_icdinode. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Now that the struct xfs_icdinode is not directly related to the on-disk format, we can cull things in it we really don't need to store: - magic number never changes - padding is not necessary - next_unlinked is never used - inode number is redundant - uuid is redundant - lsn is accessed directly from dinode - inode CRC is only accessed directly from dinode Hence we can remove these from the struct xfs_icdinode and redirect the code that uses them to the xfs_dinode appripriately. This reduces the size of the struct icdinode from 152 bytes to 88 bytes, and removes a fair chunk of unnecessary code, too. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The struct xfs_inode has two copies of the current timestamps in it, one in the vfs inode and one in the struct xfs_icdinode. Now that we no longer log the struct xfs_icdinode directly, we don't need to keep the timestamps in this structure. instead we can copy them straight out of the VFS inode when formatting the inode log item or the on-disk inode. This reduces the struct xfs_inode in size by 24 bytes. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We currently carry around and log an entire inode core in the struct xfs_inode. A lot of the information in the inode core is duplicated in the VFS inode, but we cannot remove this duplication of infomration because the inode core is logged directly in xfs_inode_item_format(). Add a new function xfs_inode_item_format_core() that copies the inode core data into a struct xfs_icdinode that is pulled directly from the log vector buffer. This means we no longer directly copy the inode core, but copy the structures one member at a time. This will be slightly less efficient than copying, but will allow us to remove duplicate and unnecessary items from the struct xfs_inode. To enable us to do this, call the new structure a xfs_log_dinode, so that we know it's different to the physical xfs_dinode and the in-core xfs_icdinode. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-