- 17 2月, 2015 2 次提交
-
-
由 Joseph Qi 提交于
If one node has crashed with orphan entry leftover, another node which do append O_DIRECT write to the same file will override the i_dio_orphaned_slot. Then the old entry won't be cleaned forever. If this case happens, we let it wait for orphan recovery first. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
Define two orphan recovery types, which indicates if need truncate file or not. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 2月, 2015 1 次提交
-
-
由 Daeseok Youn 提交于
Signed-off-by: NDaeseok Youn <daeseok.youn@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 11月, 2014 1 次提交
-
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 6月, 2014 1 次提交
-
-
由 Joseph Qi 提交于
Once JBD2_ABORT is set, ocfs2_commit_cache will fail in ocfs2_commit_thread. Then it will get into a loop with mass logs. This will meaninglessly consume a larger number of resource and may lead to the system hanging. So limit printk in this case. [akpm@linux-foundation.org: document the msleep] Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 4月, 2014 1 次提交
-
-
由 Jan Kara 提交于
The flag was never set, delete it. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 9月, 2013 2 次提交
-
-
由 Junxiao Bi 提交于
Though ocfs2 uses inode->i_mutex to protect i_size, there are both i_size_read/write() and direct accesses. Clean up all direct access to eliminate confusion. Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Younger Liu 提交于
The issue scenario is as following: When fallocating a very large disk space for a small file, __ocfs2_extend_allocation attempts to get a very large transaction. For some journal sizes, there may be not enough room for this transaction, and the fallocate will fail. The patch below extends & restarts the transaction as necessary while allocating space, and should work with even the smallest journal. This patch refers ext4 resize. Test: # mkfs.ocfs2 -b 4K -C 32K -T datafiles /dev/sdc ...(jounral size is 32M) # mount.ocfs2 /dev/sdc /mnt/ocfs2/ # touch /mnt/ocfs2/1.log # fallocate -o 0 -l 400G /mnt/ocfs2/1.log fallocate: /mnt/ocfs2/1.log: fallocate failed: Cannot allocate memory # tail -f /var/log/messages [ 7372.278591] JBD: fallocate wants too many credits (2051 > 2048) [ 7372.278597] (fallocate,6438,0):__ocfs2_extend_allocation:709 ERROR: status = -12 [ 7372.278603] (fallocate,6438,0):ocfs2_allocate_unwritten_extents:1504 ERROR: status = -12 [ 7372.278607] (fallocate,6438,0):__ocfs2_change_file_space:1955 ERROR: status = -12 ^C With this patch, the test works well. Signed-off-by: NYounger Liu <younger.liu@huawei.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 6月, 2013 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 2月, 2013 1 次提交
-
-
由 Tim Gardner 提交于
smatch analysis indicates a number of redundant NULL checks before calling kfree(), eg: fs/ocfs2/alloc.c:6138 ocfs2_begin_truncate_log_recovery() info: redundant null check on *tl_copy calling kfree() fs/ocfs2/alloc.c:6755 ocfs2_zero_range_for_truncate() info: redundant null check on pages calling kfree() etc.... [akpm@linux-foundation.org: revert dubious change in ocfs2_begin_truncate_log_recovery()] Signed-off-by: NTim Gardner <tim.gardner@canonical.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: NJoel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 7月, 2012 1 次提交
-
-
由 Jan Kara 提交于
Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new freeze protection. We also protect several ioctl entry points which were missing the protection. Finally, we add freeze protection to the journaling mechanism so that iput() of unlinked inode cannot modify a frozen filesystem. CC: Mark Fasheh <mfasheh@suse.com> CC: Joel Becker <jlbec@evilplan.org> CC: ocfs2-devel@oss.oracle.com Acked-by: NJoel Becker <jlbec@evilplan.org> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 25 7月, 2011 2 次提交
-
-
由 Sunil Mushran 提交于
Add a comment that explains the reason as to why orphan scan scans all the slots. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
-
由 Sunil Mushran 提交于
Convert useful messages from ML_NOTICE to KERN_NOTICE to improve readability. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
-
- 14 5月, 2011 1 次提交
-
-
由 Sunil Mushran 提交于
Patch skips mount recovery for hard-ro mounts which otherwise leads to an oops. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Acked-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 24 2月, 2011 1 次提交
-
-
由 Tao Ma 提交于
Remove mlog(0) from fs/ocfs2/journal.c and the masklog JOURNAL. Signed-off-by: NTao Ma <boyu.mt@taobao.com>
-
- 07 3月, 2011 1 次提交
-
-
由 Tao Ma 提交于
mlog_exit is used to record the exit status of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. This patch just try to remove it or change it. So: 1. if all the error paths already use mlog_errno, it is just removed. Otherwise, it will be replaced by mlog_errno. 2. if it is used to print some return value, it is replaced with mlog(0,...). mlog_exit_ptr is changed to mlog(0. All those mlog(0,...) will be replaced with trace events later. Signed-off-by: NTao Ma <boyu.mt@taobao.com>
-
- 21 2月, 2011 1 次提交
-
-
由 Tao Ma 提交于
ENTRY is used to record the entry of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. So for mlog_entry_void, we just remove it. for mlog_entry(...), we replace it with mlog(0,...), and they will be replace by trace event later. Signed-off-by: NTao Ma <boyu.mt@taobao.com>
-
- 10 9月, 2010 3 次提交
-
-
由 Tao Ma 提交于
Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
Now orphan scan worker has no trace log, so it is very hard to tell whether it is finished or blocked. So add 2 mlog trace log so that we can tell whether the current orphan scan worker is blocked or not. It does help when I analyzed a orphan scan bug. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 04 8月, 2010 1 次提交
-
-
由 Theodore Ts'o 提交于
Lockstat reports have shown that j_state_lock is a major source of lock contention, especially on systems with more than 4 CPU cores. So change it to be a read/write spinlock. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 16 7月, 2010 1 次提交
-
-
由 Jan Kara 提交于
OCFS2 uses t_commit trigger to compute and store checksum of the just committed blocks. When a buffer has b_frozen_data, checksum is computed for it instead of b_data but this can result in an old checksum being written to the filesystem in the following scenario: 1) transaction1 is opened 2) handle1 is opened 3) journal_access(handle1, bh) - This sets jh->b_transaction to transaction1 4) modify(bh) 5) journal_dirty(handle1, bh) 6) handle1 is closed 7) start committing transaction1, opening transaction2 8) handle2 is opened 9) journal_access(handle2, bh) - This copies off b_frozen_data to make it safe for transaction1 to commit. jh->b_next_transaction is set to transaction2. 10) jbd2_journal_write_metadata() checksums b_frozen_data 11) the journal correctly writes b_frozen_data to the disk journal 12) handle2 is closed - There was no dirty call for the bh on handle2, so it is never queued for any more journal operation 13) Checkpointing finally happens, and it just spools the bh via normal buffer writeback. This will write b_data, which was never triggered on and thus contains a wrong (old) checksum. This patch fixes the problem by calling the trigger at the moment data is frozen for journal commit - i.e., either when b_frozen_data is created by do_get_write_access or just before we write a buffer to the log if b_frozen_data does not exist. We also rename the trigger to t_frozen as that better describes when it is called. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 16 6月, 2010 1 次提交
-
-
由 Tao Ma 提交于
We used to let orphan scan work in the default work queue, but there is a corner case which will make the system deadlock. The scenario is like this: 1. set heartbeat threadshold to 200. this will allow us to have a great chance to have a orphan scan work before our quorum decision. 2. mount node 1. 3. after 1~2 minutes, mount node 2(in order to make the bug easier to reproduce, better add maxcpus=1 to kernel command line). 4. node 1 do orphan scan work. 5. node 2 do orphan scan work. 6. node 1 do orphan scan work. After this, node 1 hold the orphan scan lock while node 2 know node 1 is the master. 7. ifdown eth2 in node 2(eth2 is what we do ocfs2 interconnection). Now when node 2 begins orphan scan, the system queue is blocked. The root cause is that both orphan scan work and quorum decision work will use the system event work queue. orphan scan has a chance of blocking the event work queue(in dlm_wait_for_node_death) so that there is no chance for quorum decision work to proceed. This patch resolve it by moving orphan scan work to ocfs2_wq. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 06 5月, 2010 2 次提交
-
-
由 Tao Ma 提交于
In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's blocks. But if jbd2_journal_extend() fails, it will only restart with the the new number of blocks. This tends to be awkward since in most cases we want additional reserved blocks. It makes our code harder to mantain since the caller can't be sure all the original blocks will not be accessed and dirtied again. There are 15 callers of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add h_buffer_credits before they call ocfs2_extend_trans(). This makes ocfs2_extend_trans() really extend atop the original block count. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Joel Becker 提交于
jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0 since before the kernel moved to git. There is no point in checking this error. ocfs2_journal_dirty() has been faithfully returning the status since the beginning. All over ocfs2, we have blocks of code checking this can't fail status. In the past few years, we've tried to avoid adding these checks, because they are pointless. But anyone who looks at our code assumes they are needed. Finally, ocfs2_journal_dirty() is made a void function. All error checking is removed from other files. We'll BUG_ON() the status of jbd2_journal_dirty_metadata() just in case they change it someday. They won't. Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 26 1月, 2010 1 次提交
-
-
由 Sunil Mushran 提交于
Patch removes trailing whitespaces. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 04 12月, 2009 1 次提交
-
-
由 André Goddard Rosa 提交于
That is "success", "unknown", "through", "performance", "[re|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 23 9月, 2009 1 次提交
-
-
由 Tao Ma 提交于
Add metaecc and journal trigger for ocfs2_refcount_block. Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
- 05 9月, 2009 2 次提交
-
-
由 Joel Becker 提交于
The next step in divorcing metadata I/O management from struct inode is to pass struct ocfs2_caching_info to the journal functions. Thus the journal locks a metadata cache with the cache io_lock function. It also can compare ci_last_trans and ci_created_trans directly. This is a large patch because of all the places we change ocfs2_journal_access..(handle, inode, ...) to ocfs2_journal_access..(handle, INODE_CACHE(inode), ...). Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Joel Becker 提交于
We are really passing the inode into the ocfs2_read/write_blocks() functions to get at the metadata cache. This commit passes the cache directly into the metadata block functions, divorcing them from the inode. Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 09 7月, 2009 1 次提交
-
-
由 Jeff Mahoney 提交于
If the mount fails for any reason, ocfs2_dismount_volume calls ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init be called to setup the mutex and work queues, but that doesn't happen if the mount has failed and we oops accessing an uninitialized work queue. This patch splits the init and startup of the orphan scan, eliminating the oops. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 23 6月, 2009 3 次提交
-
-
由 Sunil Mushran 提交于
Local and Hard-RO mounts do not need orphan scanning. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Sunil Mushran 提交于
We don't access the LVB in our ocfs2_*_lock_res_init() functions. Since the LVB can become invalid during some cluster recovery operations, the dlmglue must be able to handle an uninitialized LVB. For the orphan scan lock, we initialized an uninitialzed LVB with our scan sequence number plus one. This starts a normal orphan scan cycle. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Sunil Mushran 提交于
Currently if the orphan scan fires a tick before the user issues the umount, the umount will wait for the queued orphan scan tasks to complete. This patch makes the umount stop the orphan scan as early as possible so as to reduce the probability of the queued tasks slowing down the umount. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 04 6月, 2009 2 次提交
-
-
由 Srinivas Eeda 提交于
Patch to track delayed orphan scan timer statistics. Modifies ocfs2_osb_dump to print the following: Orphan Scan=> Local: 10 Global: 21 Last Scan: 67 seconds ago Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Srinivas Eeda 提交于
When a dentry is unlinked, the unlinking node takes an EX on the dentry lock before moving the dentry to the orphan directory. Other nodes that have this dentry in cache have a PR on the same dentry lock. When the EX is requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED during downconvert. The inode is finally deleted when the last node to iput the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set. A problem arises if a node is forced to free dentry locks because of memory pressure. If this happens, the node will no longer get downconvert notifications for the dentries that have been unlinked on another node. If it also happens that node is actively using the corresponding inode and happens to be the one performing the last iput on that inode, it will fail to delete the inode as it will not have the MAYBE_ORPHANED flag set. This patch fixes this shortcoming by introducing a periodic scan of the orphan directories to delete such inodes. Care has been taken to distribute the workload across the cluster so that no one node has to perform the task all the time. Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 04 4月, 2009 3 次提交
-
-
由 Srinivas Eeda 提交于
During recovery, a node recovers orphans in it's slot and the dead node(s). But if the dead nodes were holding orphans in offline slots, they will be left unrecovered. If the dead node is the last one to die and is holding orphans in other slots and is the first one to mount, then it only recovers it's own slot, which leaves orphans in offline slots. This patch queues complete_recovery to clean orphans for all offline slots during mount and node recovery. Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Acked-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Mark Fasheh 提交于
This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of small, fixed-length records in each leaf block. Each record stores a hash value, and pointer to a block in the traditional (unindexed) directory tree where a dirent with the given name hash resides. Lookup exclusively uses this tree to find dirents, thus providing us with constant time name lookups. Some of the hashing code was copied from ext3. Unfortunately, it has lots of unfixed checkpatch errors. I left that as-is so that tracking changes would be easier. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NJoel Becker <joel.becker@oracle.com>
-
由 Sunil Mushran 提交于
Move the definition of struct recovery_map from journal.c to journal.h. This is preparation for the next patch. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-