- 06 1月, 2009 1 次提交
-
-
由 Joel Becker 提交于
The ocfs2 code currently reads inodes off disk with a simple ocfs2_read_block() call. Each place that does this has a different set of sanity checks it performs. Some check only the signature. A couple validate the block number (the block read vs di->i_blkno). A couple others check for VALID_FL. Only one place validates i_fs_generation. A couple check nothing. Even when an error is found, they don't all do the same thing. We wrap inode reading into ocfs2_read_inode_block(). This will validate all the above fields, going readonly if they are invalid (they never should be). ocfs2_read_inode_block_full() is provided for the places that want to pass read_block flags. Every caller is passing a struct inode with a valid ip_blkno, so we don't need a separate blkno argument either. We will remove the validation checks from the rest of the code in a later commit, as they are no longer necessary. Signed-off-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
- 15 10月, 2008 2 次提交
-
-
由 Joel Becker 提交于
More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED. Only six pass a different flag set. Rather than have every caller care, let's make ocfs2_read_block() take no flags and always do a cached read. The remaining six places can call ocfs2_read_blocks() directly. Signed-off-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Joel Becker 提交于
Now that synchronous readers are using ocfs2_read_blocks_sync(), all callers of ocfs2_read_blocks() are passing an inode. Use it unconditionally. Since it's there, we don't need to pass the ocfs2_super either. Signed-off-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
- 14 10月, 2008 9 次提交
-
-
由 Mark Fasheh 提交于
This is pointless as brelse() already does the check. Signed-off-by: Mark Fasheh
-
由 Joel Becker 提交于
ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most functions are just renamed. The only functional change is moving to Jan's inode-based ordered data mode. It's better, too. Because JBD2 reads and writes JBD journals, this is compatible with any existing filesystem. It can even interact with JBD-based ocfs2 as long as the journal is formated for JBD. We provide a compatibility option so that paranoid people can still use JBD for the time being. This will go away shortly. [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to ocfs2_truncate_for_delete(). --Mark ] Signed-off-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Joel Becker 提交于
The original get/put_extent_tree() functions held a reference on et_root_bh. However, every single caller already has a safe reference, making the get/put cycle irrelevant. We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree(). It no longer gets a reference on et_root_bh. ocfs2_put_extent_tree() is removed. Callers now have a simpler init+use pattern. Signed-off-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Joel Becker 提交于
We now have three different kinds of extent trees in ocfs2: inode data (dinode), extended attributes (xattr_tree), and extended attribute values (xattr_value). There is a nice abstraction for them, ocfs2_extent_tree, but it is hidden in alloc.c. All the calling functions have to pick amongst a varied API and pass in type bits and often extraneous pointers. A better way is to make ocfs2_extent_tree a first-class object. Everyone converts their object to an ocfs2_extent_tree() via the ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all tree calls to alloc.c. This simplifies a lot of callers, making for readability. It also provides an easy way to add additional extent tree types, as they only need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree() function. Signed-off-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
Add some thin wrappers around ocfs2_insert_extent() for each of the 3 different btree types, ocfs2_inode_insert_extent(), ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The last is for the xattr index btree, which will be used in a followup patch. All the old callers in file.c etc will call ocfs2_dinode_insert_extent(), while the other two handle the xattr issue. And the init of extent tree are handled by these functions. When storing xattr value which is too large, we will allocate some clusters for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In order to re-use the b-tree operation code, a new parameter named "private" is added into ocfs2_extent_tree and it is used to indicate the root of ocfs2_exent_list. The reason is that we can't deduce the root from the buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse, in any place in an ocfs2_xattr_bucket. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls ocfs2_do_cluster_allocation() now, but the latter can be used for other btree types as well. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
In the old extent tree operation, we take the hypothesis that we are using the ocfs2_extent_list in ocfs2_dinode as the tree root. As xattr will also use ocfs2_extent_list to store large value for a xattr entry, we refactor the tree operation so that xattr can use it directly. The refactoring includes 4 steps: 1. Abstract set/get of last_eb_blk and update_clusters since they may be stored in different location for dinode and xattr. 2. Add a new structure named ocfs2_extent_tree to indicate the extent tree the operation will work on. 3. Remove all the use of fe_bh and di, use root_bh and root_el in extent tree instead. So now all the fe_bh is replaced with et->root_bh, el with root_el accordingly. 4. Make ocfs2_lock_allocators generic. Now it is limited to be only used in file extend allocation. But the whole function is useful when we want to store large EAs. Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used for anything other than truncate inode data btrees. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and ocfs2_reserve_new_metadata() are all useful for extent tree operations. But they are all limited to an inode btree because they use a struct ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list (the part of an ocfs2_dinode they actually use) so that the xattr btree code can use these functions. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
ocfs2_num_free_extents() is used to find the number of free extent records in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to use this for extended attribute trees in the future, so genericize the interface the take a buffer head. A future patch will allow that buffer_head to contain any structure rooting an ocfs2 btree. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
- 10 9月, 2008 1 次提交
-
-
由 Tao Ma 提交于
ocfs2 will become read-only if we try to read the bytes which pass the end of i_size. This can be easily reproduced by following steps: 1. mkfs a ocfs2 volume with bs=4k cs=4k and nosparse. 2. create a small file(say less than 100 bytes) and we will create the file which is allocated 1 cluster. 3. read 8196 bytes from the kernel using O_DIRECT which exceeds the limit. 4. The ocfs2 volume becomes read-only and dmesg shows: OCFS2: ERROR (device sda13): ocfs2_direct_IO_get_blocks: Inode 66010 has a hole at block 1 File system is now read-only due to the potential of on-disk corruption. Please run fsck.ocfs2 once the file system is unmounted. So suppress the ERROR message. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
- 01 8月, 2008 1 次提交
-
-
由 Sunil Mushran 提交于
This patch fixes an oops that is reproduced when one races writes to a mmap-ed region with another process truncating the file. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
- 17 7月, 2008 1 次提交
-
-
由 Coly Li 提交于
This patch fixes a mmap_truncate bug which was found by ocfs2 test suite. In an ocfs2 cluster more than 1 node, run program mmap_truncate, which races mmap writes and truncates from multiple processes. While the test is running, a stat from another node forces writeout, causing an oops in ocfs2_get_block() because it sees a buffer to write which isn't allocated. This patch fixed the bug by clear dirty and uptodate bits in buffer, leave the buffer unmapped and return. Fix is suggested by Mark Fasheh, and I code up the patch. Signed-off-by: NColy Li <coyli@suse.de> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
- 18 4月, 2008 1 次提交
-
-
由 Julia Lawall 提交于
The function ocfs2_start_trans always returns either a valid pointer or a value made with ERR_PTR, so its result should be tested with IS_ERR, not with a test for 0. Signed-off-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
- 04 3月, 2008 1 次提交
-
-
由 Julia Lawall 提交于
In commit e6bafba5, a bug was fixed that involved converting !x & y to !(x & y). The code below shows the same pattern, and thus should perhaps be fixed in the same way. This is not tested and clearly changes the semantics, so it is only something to consider. Signed-off-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
- 06 2月, 2008 1 次提交
-
-
由 Christoph Lameter 提交于
Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 1月, 2008 4 次提交
-
-
由 Jan Kara 提交于
In ocfs2_read_inline_data() we should store file size in loff_t. Although the file size should fit in 32 bits we cannot be sure in case filesystem is corrupted. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
由 Mark Fasheh 提交于
Add ->readpages support to Ocfs2. This is rather trivial - all it required is a small update to ocfs2_get_block (for mapping full extents via b_size) and an ocfs2_readpages() function which partially mirrors ocfs2_readpage(). Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
由 Mark Fasheh 提交于
Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
由 Mark Fasheh 提交于
The meta lock now covers both meta data and data, so this just removes the now-redundant data lock. Combining locks saves us a round of lock mastery per inode and one less lock to ping between nodes during read/write. We don't lose much - since meta locks were always held before a data lock (and at the same level) ordered writeout mode (the default) ensured that flushing for the meta data lock also pushed out data anyways. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
- 28 11月, 2007 1 次提交
-
-
由 Mark Fasheh 提交于
This was causing us to prematurely push out inline data by one byte. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
- 07 11月, 2007 1 次提交
-
-
由 Mark Fasheh 提交于
On file systems which don't support sparse files, Ocfs2_map_page_blocks() was reading blocks on appending writes. This caused write performance to suffer dramatically. Fix this by detecting an appending write on a nonsparse fs and skipping the read. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
- 17 10月, 2007 1 次提交
-
-
由 Nick Piggin 提交于
Plug ocfs2 into the ->write_begin and ->write_end aops. A bunch of custom code is now gone - the iovec iteration stuff during write and the ocfs2 splice write actor. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 10月, 2007 4 次提交
-
-
由 Mark Fasheh 提交于
This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline inode data. For the most part, the changes to the core write code can be relied on to do the heavy lifting. Any code calling ocfs2_write_begin (including shared writeable mmap) can count on it doing the right thing with respect to growing inline data to an extent tree. Size reducing truncates, including UNRESVP can simply zero that portion of the inode block being removed. Size increasing truncatesm, including RESVP have to be a little bit smarter and grow the inode to an extent tree if necessary. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com> Reviewed-by: NJoel Becker <joel.becker@oracle.com>
-
由 Mark Fasheh 提交于
This hooks up ocfs2_readpage() to populate a page with data from an inode block. Direct IO reads from inline data are modified to fall back to buffered I/O. Appropriate checks are also placed in the extent map code to avoid reading an extent list when inline data might be stored. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com> Reviewed-by: NJoel Becker <joel.becker@oracle.com>
-
由 Mark Fasheh 提交于
We'll want to reuse most of this when pushing inline data back out to an extent. Keeping this part as a seperate patch helps to keep the upcoming changes for write support uncluttered. The core portion of ocfs2_zero_cluster_pages() responsible for making sure a page is mapped and properly dirtied is abstracted out into it's own function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change, though zeroing becomes optional. We also turn part of ocfs2_free_write_ctxt() into a common function for unlocking and freeing a page array. This operation is very common (and uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense to keep the code in one place. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com> Reviewed-by: NJoel Becker <joel.becker@oracle.com>
-
由 Mark Fasheh 提交于
By doing this, we can remove any higher level logic which has to have knowledge of btree functionality - any callers of ocfs2_write_begin() can now expect it to do anything necessary to prepare the inode for new data. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com> Reviewed-by: NJoel Becker <joel.becker@oracle.com>
-
- 21 9月, 2007 2 次提交
-
-
由 Mark Fasheh 提交于
The target page offsets were being incorrectly set a second time in ocfs2_prepare_page_for_write(), which was causing problems on a 16k page size kernel. Additionally, ocfs2_write_failure() was incorrectly using those parameters instead of the parameters for the individual page being cleaned up. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
由 Mark Fasheh 提交于
This was broken for file systems whose cluster size is greater than page size. Pos needs to be incremented as we loop through the descriptors, and len needs to be capped to the size of a single cluster. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
- 12 9月, 2007 1 次提交
-
-
由 tao.ma@oracle.com 提交于
In ocfs2_alloc_write_write_ctxt, the written clusters length is calculated by the byte length only. This may cause some problems if we start to write at some position in the end of one cluster and last to a second cluster while the "len" is smaller than a cluster size. In that case, we have to write 2 clusters actually. So we have to take the start position into consideration also. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
- 20 7月, 2007 1 次提交
-
-
由 Nick Piggin 提交于
Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 7月, 2007 7 次提交
-
-
由 Eric Sandeen 提交于
Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
由 Mark Fasheh 提交于
This can now be trivially supported with re-use of our existing extend code. ocfs2_allocate_unwritten_extents() takes a start offset and a byte length and iterates over the inode, adding extents (marked as unwritten) until len is reached. Existing extents are skipped over. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
由 Mark Fasheh 提交于
Update the write code to detect when the user is asking to write to an unwritten extent. Like writing to a hole, we must zero the region between the write and the cluster boundaries. Most of the existing cluster zeroing logic can be re-used with some additional checks for the unwritten flag on extent records. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
由 Mark Fasheh 提交于
We can easily seperate out the write descriptor setup and manipulation into helper functions. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
由 Mark Fasheh 提交于
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
由 Mark Fasheh 提交于
We don't want to submit buffer_new blocks for read i/o. This actually won't happen right now because those requests during an allocating write are all nicely aligned. It's probably a good idea to provide an explicit check though. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-
由 Mark Fasheh 提交于
Implement cluster consistent shared writeable mappings using the ->page_mkwrite() callback. Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
-