- 01 2月, 2018 1 次提交
-
-
由 Gang He 提交于
Introduce a new dlm lock resource, which will be used to communicate during fstrimming of an ocfs2 device from cluster nodes. Link: http://lkml.kernel.org/r/1513228484-2084-1-git-send-email-ghe@suse.comSigned-off-by: NGang He <ghe@suse.com> Reviewed-by: NChangwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 9月, 2017 1 次提交
-
-
由 Jun Piao 提交于
clean up some unused functions and parameters. Link: http://lkml.kernel.org/r/598A5E21.2080807@huawei.comSigned-off-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NAlex Chen <alex.chen@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 2月, 2017 1 次提交
-
-
由 Eric Ren 提交于
We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs code. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl = ocfs2_iop_get_acl, .set_acl = ocfs2_iop_set_acl, }; Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== first time generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== recursive one A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. The idea to fix this issue is mostly taken from gfs2 code. 1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track of the processes' pid who has taken the cluster lock of this lock resource; 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for us if we've got cluster lock. 3. export a helper: ocfs2_is_locked_by_me() is used to check if we have got the cluster lock in the upper code path. The tracking logic should be used by some of the ocfs2 vfs's callbacks, to solve the recursive locking issue cuased by the fact that vfs routines can call into each other. The performance penalty of processing the holder list should only be seen at a few cases where the tracking logic is used, such as get/set acl. You may ask what if the first time we got a PR lock, and the second time we want a EX lock? fortunately, this case never happens in the real world, as far as I can see, including permission check, (get|set)_(acl|attr), and the gfs2 code also do so. [sfr@canb.auug.org.au remove some inlines] Link: http://lkml.kernel.org/r/20170117100948.11657-2-zren@suse.comSigned-off-by: NEric Ren <zren@suse.com> Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJoseph Qi <jiangqi903@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 12月, 2016 1 次提交
-
-
由 Deepa Dinamani 提交于
struct timespec is not y2038 safe. Use time64_t which is y2038 safe to represent orphan scan times. time64_t is sufficient here as only the seconds delta times are relevant. Also use appropriate time functions that return time in time64_t format. Time functions now return monotonic time instead of real time as only delta scan times are relevant and these values are not persistent across reboots. The format string for the debug print is still using long as this is only the time elapsed since the last scan and long is sufficient to represent this value. Link: http://lkml.kernel.org/r/1475365138-20567-1-git-send-email-deepa.kernel@gmail.comSigned-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: NArnd Bergmann <arnd@arndb.de> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 4月, 2016 1 次提交
-
-
由 Kirill A. Shutemov 提交于
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 3月, 2016 1 次提交
-
-
由 jiangyiwen 提交于
This patch fixes a deadlock, as follows: Node 1 Node 2 Node 3 1)volume a and b are only mount vol a only mount vol b mounted 2) start to mount b start to mount a 3) check hb of Node 3 check hb of Node 2 in vol a, qs_holds++ in vol b, qs_holds++ 4) -------------------- all nodes' network down -------------------- 5) progress of mount b the same situation as failed, and then call Node 2 ocfs2_dismount_volume. but the process is hung, since there is a work in ocfs2_wq cannot beo completed. This work is about vol a, because ocfs2_wq is global wq. BTW, this work which is scheduled in ocfs2_wq is ocfs2_orphan_scan_work, and the context in this work needs to take inode lock of orphan_dir, because lockres owner are Node 1 and all nodes' nework has been down at the same time, so it can't get the inode lock. 6) Why can't this node be fenced when network disconnected? Because the process of mount is hung what caused qs_holds is not equal 0. Because all works in the ocfs2_wq are relative to the super block. The solution is to change the ocfs2_wq from global to local. In other words, move it into struct ocfs2_super. Signed-off-by: NYiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Xue jiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 9月, 2015 1 次提交
-
-
由 Goldwyn Rodrigues 提交于
OCFS2 is often used in high-availaibility systems. However, ocfs2 converts the filesystem to read-only at the drop of the hat. This may not be necessary, since turning the filesystem read-only would affect other running processes as well, decreasing availability. This attempt is to add errors=continue, which would return the EIO to the calling process and terminate furhter processing so that the filesystem is not corrupted further. However, the filesystem is not converted to read-only. As a future plan, I intend to create a small utility or extend fsck.ocfs2 to fix small errors such as in the inode. The input to the utility such as the inode can come from the kernel logs so we don't have to schedule a downtime for fixing small-enough errors. The patch changes the ocfs2_error to return an error. The error returned depends on the mount option set. If none is set, the default is to turn the filesystem read-only. Perhaps errors=continue is not the best option name. Historically it is used for making an attempt to progress in the current process itself. Should we call it errors=eio? or errors=killproc? Suggestions/Comments welcome. Sources are available at: https://github.com/goldwynr/linux/tree/error-contSigned-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 6月, 2015 1 次提交
-
-
由 Joseph Qi 提交于
contig_blocks gotten from ocfs2_extent_map_get_blocks cannot be compared with clusters_to_alloc. So convert it to clusters first. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NWeiwei Wang <wangww631@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 3月, 2015 1 次提交
-
-
由 Mark Fasheh 提交于
It turns out that making this feature ro_compat isn't quite enough to prevent accidental corruption on mount from older kernels. Ocfs2 (like other file systems) will process orphaned inodes even when the user mounts in 'ro' mode. So for the case of a filesystem not knowing the append_dio feature, mounting the filesystem could result in orphaned-for-dio files being deleted, which we clearly don't want. So instead, turn this into an incompat flag. Btw, this is kind of my fault - initially I asked that we add a flag to cover the feature and even suggested that we use an ro flag. It wasn't until I was looking through our commits for v4.0-rc1 that I realized we actually want this to be incompat. Signed-off-by: NMark Fasheh <mfasheh@suse.de> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 2月, 2015 3 次提交
-
-
由 Joseph Qi 提交于
Intruduce a bit OCFS2_FEATURE_RO_COMPAT_APPEND_DIO and check it in write flow. If the bit is not set, fall back to the old way. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
Implement ocfs2_direct_IO_write. Add the inode to orphan dir first, and then delete it once append O_DIRECT finished. This is to make sure block allocation and inode size are consistent. [akpm@linux-foundation.org: fix it for "block: Add discard flag to blkdev_issue_zeroout() function"] Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
Define two orphan recovery types, which indicates if need truncate file or not. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 2月, 2015 1 次提交
-
-
由 alex chen 提交于
Add a mount option to support JBD2 feature: JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT. When this feature is opened, journal commit block can be written to disk without waiting for descriptor blocks, which can improve journal commit performance. This option will enable 'journal_checksum' internally. Using the fs_mark benchmark, using journal_async_commit shows a 50% improvement, the files per second go up from 215.2 to 317.5. test script: fs_mark -d /mnt/ocfs2/ -s 10240 -n 1000 default: FSUse% Count Size Files/sec App Overhead 0 1000 10240 215.2 17878 with journal_async_commit option: FSUse% Count Size Files/sec App Overhead 0 1000 10240 317.5 17881 Signed-off-by: NAlex Chen <alex.chen@huawei.com> Signed-off-by: NWeiwei Wang <wangww631@huawei.comm> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 12月, 2014 1 次提交
-
-
由 Xue jiufei 提交于
ocfs2_readpages() use nonblocking flag to avoid page lock inversion. It will trigger cluster hang because that flag OCFS2_LOCK_UPCONVERT_FINISHING is not cleared if nonblocking lock cannot be granted at once. The flag would prevent dc thread from downconverting. So other nodes cannot acheive this lockres for ever. So we should not set OCFS2_LOCK_UPCONVERT_FINISHING when receiving ast if nonblocking lock had already returned. Signed-off-by: Njoyce.xue <xuejiufei@huawei.com> Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 6月, 2014 1 次提交
-
-
由 Xue jiufei 提交于
Revert commit 75f82eaa ("ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously") because it may cause a umount hang while shutting down the truncate log. fix NULL pointer dereference when dismount and ocfs2rec simultaneously The situation is as followes: ocfs2_dismout_volume -> ocfs2_recovery_exit -> free osb->recovery_map -> ocfs2_truncate_shutdown -> lock global bitmap inode -> ocfs2_wait_for_recovery -> check whether osb->recovery_map->rm_used is zero Because osb->recovery_map is already freed, rm_used can be any other values, so it may yield umount hang. To prevent NULL pointer dereference while getting sys_root_inode, we use a osb_tl_disable flag to disable schedule osb_truncate_log_wq after truncate log shutdown. Signed-off-by: Njoyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 4月, 2014 3 次提交
-
-
由 jiangyiwen 提交于
The following case may lead to the same system inode ref in confusion. A thread B thread ocfs2_get_system_file_inode ->get_local_system_inode ->_ocfs2_get_system_file_inode because of *arr == NULL, ocfs2_get_system_file_inode ->get_local_system_inode ->_ocfs2_get_system_file_inode gets first ref thru _ocfs2_get_system_file_inode, gets second ref thru igrab and set *arr = inode at the moment, B thread also gets two refs, so lead to one more inode ref. So add mutex lock to avoid multi thread set two inode ref once at the same time. Signed-off-by: Njiangyiwen <jiangyiwen@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Goldwyn Rodrigues 提交于
The following patches are reverted in this patch because these patches caused performance regression in the remote unlink() calls. ea455f8a - ocfs2: Push out dropping of dentry lock to ocfs2_wq f7b1aa69 - ocfs2: Fix deadlock on umount 5fd13189 - ocfs2: Don't oops in ocfs2_kill_sb on a failed mount Previous patches in this series removed the possible deadlocks from downconvert thread so the above patches shouldn't be needed anymore. The regression is caused because these patches delay the iput() in case of dentry unlocks. This also delays the unlocking of the open lockres. The open lockresource is required to test if the inode can be wiped from disk or not. When the deleting node does not get the open lock, it marks it as orphan (even though it is not in use by another node/process) and causes a journal checkpoint. This delays operations following the inode eviction. This also moves the inode to the orphaned inode which further causes more I/O and a lot of unneccessary orphans. The following script can be used to generate the load causing issues: declare -a create declare -a remove declare -a iterations=(1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384) unique="`mktemp -u XXXXX`" script="/tmp/idontknow-${unique}.sh" cat <<EOF > "${script}" for n in {1..8}; do mkdir -p test/dir\${n} eval touch test/dir\${n}/foo{1.."\$1"} done EOF chmod 700 "${script}" function fcreate () { exec 2>&1 /usr/bin/time --format=%E "${script}" "$1" } function fremove () { exec 2>&1 /usr/bin/time --format=%E ssh node2 "cd `pwd`; rm -Rf test*" } function fcp () { exec 2>&1 /usr/bin/time --format=%E ssh node3 "cd `pwd`; cp -R test test.new" } echo ------------------------------------------------- echo "| # files | create #s | copy #s | remove #s |" echo ------------------------------------------------- for ((x=0; x < ${#iterations[*]} ; x++)) do create[$x]="`fcreate ${iterations[$x]}`" copy[$x]="`fcp ${iterations[$x]}`" remove[$x]="`fremove`" printf "| %8d | %9s | %9s | %9s |\n" ${iterations[$x]} ${create[$x]} ${copy[$x]} ${remove[$x]} done rm "${script}" echo "------------------------" Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
We cannot drop last dquot reference from downconvert thread as that creates the following deadlock: NODE 1 NODE2 holds dentry lock for 'foo' holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE dquot_initialize(bar) ocfs2_dquot_acquire() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) ... downconvert thread (triggered from another node or a different process from NODE2) ocfs2_dentry_post_unlock() ... iput(foo) ocfs2_evict_inode(foo) ocfs2_clear_inode(foo) dquot_drop(inode) ... ocfs2_dquot_release() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) - blocks finds we need more space in quota file ... ocfs2_extend_no_holes() ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE) - deadlocks waiting for downconvert thread We solve the problem by postponing dropping of the last dquot reference to a workqueue if it happens from the downconvert thread. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 1月, 2014 1 次提交
-
-
由 Goldwyn Rodrigues 提交于
This is an effort of removing ocfs2_controld.pcmk and getting ocfs2 DLM handling up to the times with respect to DLM (>=4.0.1) and corosync (2.3.x). AFAIK, cman also is being phased out for a unified corosync cluster stack. fs/dlm performs all the functions with respect to fencing and node management and provides the API's to do so for ocfs2. For all future references, DLM stands for fs/dlm code. The advantages are: + No need to run an additional userspace daemon (ocfs2_controld) + No controld device handling and controld protocol + Shifting responsibilities of node management to DLM layer For backward compatibility, we are keeping the controld handling code. Once enough time has passed we can remove a significant portion of the code. This was tested by using the kernel with changes on older unmodified tools. The kernel used ocfs2_controld as expected, and displayed the appropriate warning message. This feature requires modification in the userspace ocfs2-tools. The changes can be found at: https://github.com/goldwynr/ocfs2-tools branch: nocontrold Currently, not many checks are present in the userspace code, but that would change soon. This patch (of 6): Add clustername to cluster connection. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 7月, 2013 1 次提交
-
-
由 Goldwyn Rodrigues 提交于
Code cleanup: needs_checkpoint is assigned to but never used. Delete the variable. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Cc: Jeff Liu <jeff.liu@oracle.com> Acked-by: NJoel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 12月, 2011 1 次提交
-
-
由 Akinobu Mita 提交于
The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned, but not 64-bit aligned. The dqc_bitmap is accessed by ocfs2_set_bit(), ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit(). These are wrapper macros for ext2_*_bit() which need to take an unsigned long aligned address (though some architectures are able to handle unaligned address correctly) So some 64bit architectures may not be able to access the dqc_bitmap correctly. This avoids such unaligned access by using another wrapper functions for ext2_*_bit(). The code is taken from fs/ext4/mballoc.c which also need to handle unaligned bitmap access. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Acked-by: NJoel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
- 01 6月, 2011 1 次提交
-
-
由 Akinobu Mita 提交于
Using __test_and_{set,clear}_bit_le() with ignoring its return value can be replaced with __{set,clear}_bit_le(). Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: ocfs2-devel@oss.oracle.com Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
- 24 3月, 2011 1 次提交
-
-
由 Akinobu Mita 提交于
As a preparation for removing ext2 non-atomic bit operations from asm/bitops.h. This converts ext2 non-atomic bit operations to little-endian bit operations. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Acked-by: NJoel Becker <joel.becker@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 2月, 2011 1 次提交
-
-
由 Sunil Mushran 提交于
Patch makes use of the hrtimer to track times in ocfs2 lock stats. The patch is a bit involved to ensure no additional impact on the memory footprint. The size of ocfs2_inode_cache remains 1280 bytes on 32-bit systems. A related change was to modify the unit of the max wait time from nanosec to microsec allowing us to track max time larger than 4 secs. This change necessitated the bumping of the output version in the debugfs file, locking_state, from 2 to 3. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
- 16 12月, 2010 1 次提交
-
-
由 Tao Ma 提交于
Recently, one of our colleagues meet with a problem that if we write/delete a 32mb files repeatly, we will get an ENOSPC in the end. And the corresponding bug is 1288. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1288 The real problem is that although we have freed the clusters, they are in truncate log and they will be summed up so that we can free them once in a whole. So this patch just try to resolve it. In case we see -ENOSPC in ocfs2_write_begin_no_lock, we will check whether the truncate log has enough clusters for our need, if yes, we will try to flush the truncate log at that point and try again. This method is inspired by Mark Fasheh <mfasheh@suse.com>. Thanks. Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 19 11月, 2010 1 次提交
-
-
由 Milton Miller 提交于
Commit 1c66b360 (Change some lock status member in ocfs2_lock_res to char.) states that these fields need to be signed due to comparision to -1, but only changed the type from unsigned char to char. However, it is a compiler option if char is a signed or unsigned type. Change these fields to signed char so the code will work with all compilers. Signed-off-by: NMilton Miller <miltonm@bga.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 13 11月, 2010 1 次提交
-
-
由 Tao Ma 提交于
Commit 83fd9c7f changes l_level, l_requested and l_blocking of ocfs2_lock_res from int to unsigned char. But actually it is initially as -1(ocfs2_lock_res_init_common) which correspoding to 255 for unsigned char. So the whole dlm lock mechanism doesn't work now which means a disaster to ocfs2. Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 12 10月, 2010 1 次提交
-
-
由 Tristan Ye 提交于
Currently, the default behavior of O_DIRECT writes was allowing concurrent writing among nodes to the same file, with no cluster coherency guaranteed (no EX lock held). This can leave stale data in the cache for buffered reads on other nodes. The new mount option introduce a chance to choose two different behaviors for O_DIRECT writes: * coherency=full, as the default value, will disallow concurrent O_DIRECT writes by taking EX locks. * coherency=buffered, allow concurrent O_DIRECT writes without EX lock among nodes, which gains high performance at risk of getting stale data on other nodes. Signed-off-by: NTristan Ye <tristan.ye@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 08 10月, 2010 1 次提交
-
-
由 Sunil Mushran 提交于
ocfs2: Add support for heartbeat=global mount option Adds support for heartbeat=global mount option. It ensures that the heartbeat mode passed matches the one enabled on disk. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
-
- 10 10月, 2010 1 次提交
-
-
由 Sunil Mushran 提交于
ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO OCFS2_FEATURE_INCOMPAT_CLUSTERINFO allows us to use sb->s_cluster_info for both userspace and o2cb cluster stacks. It also allows us to extend cluster info to include stack flags. This patch also adds stackflags to sb->s_clusterinfo. It also introduces a clusterinfo flag OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT to denote the enabled global heartbeat mode. This incompat flag can be set/cleared using tunefs.ocfs2 --fs-features. The clusterinfo flag is set/cleared using tunefs.ocfs2 --update-cluster-stack. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
-
- 10 9月, 2010 2 次提交
-
-
由 Tao Ma 提交于
Durring orphan scan, if we are slot 0, and we are replaying orphan_dir:0001, the general process is that for every file in this dir: 1. we will iget orphan_dir:0001, since there is no inode for it. we will have to create an inode and read it from the disk. 2. do the normal work, such as delete_inode and remove it from the dir if it is allowed. 3. call iput orphan_dir:0001 when we are done. In this case, since we have no dcache for this inode, i_count will reach 0, and VFS will have to call clear_inode and in ocfs2_clear_inode we will checkpoint the inode which will let ocfs2_cmt and journald begin to work. 4. We loop back to 1 for the next file. So you see, actually for every deleted file, we have to read the orphan dir from the disk and checkpoint the journal. It is very time consuming and cause a lot of journal checkpoint I/O. A better solution is that we can have another reference for these inodes in ocfs2_super. So if there is no other race among nodes(which will let dlmglue to checkpoint the inode), for step 3, clear_inode won't be called and for step 1, we may only need to read the inode for the 1st time. This is a big win for us. So this patch will try to cache system inodes of other slots so that we will have one more reference for these inodes and avoid the extra inode read and journal checkpoint. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Goldwyn Rodrigues 提交于
Thanks for the comments. I have incorportated them all. CONFIG_OCFS2_FS_STATS is enabled and CONFIG_DEBUG_LOCK_ALLOC is disabled. Statistics now look like - ocfs2_write_ctxt: 2144 - 2136 = 8 ocfs2_inode_info: 1960 - 1848 = 112 ocfs2_journal: 168 - 160 = 8 ocfs2_lock_res: 336 - 304 = 32 ocfs2_refcount_tree: 512 - 472 = 40 Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 06 5月, 2010 4 次提交
-
-
由 Mark Fasheh 提交于
The default behavior for directory reservations stays the same, but we add a mount option so people can tweak the size of directory reservations according to their workloads. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Mark Fasheh 提交于
I have observed that the current size of 8M gives us pretty poor fragmentation on multi-threaded workloads which do lots of writes. Generally, I can increase the size of local alloc windows and observe a marked decrease in fragmentation, even up and beyond window sizes of 512 megabytes. This makes sense for a couple reasons - larger local alloc means more room for reservation windows. On multi-node workloads the larger local alloc helps as well because we don't have to do window slides as often. Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no longer used and the comment above it was out of date. To test fragmentation, I used a workload which launched 4 threads that did 4k writes into a series of about 140 alternating files. With resv_level=2, and a 4k/4k file system I observed the following average fragmentation for various localalloc= parameters: localalloc= avg. fragmentation 8 48 32 16 64 10 120 7 On larger cluster sizes, the difference is more dramatic. The new default size top out at 256M, which we'll only get for cluster sizes of 32K and above. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Mark Fasheh 提交于
This patch pulls the local alloc sizing code into localalloc.c and provides a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged except that I correctly calculate the maximum local alloc size. The old code in ocfs2_parse_options() calculated the max size as: ocfs2_local_alloc_size(sb) * 8 which is correct, in bits. Unfortunately though the option passed in is in megabytes. Ultimately, this bug made no real difference - the shrink code would catch a too-large size and bring it down to something reasonable. Still, it's less than efficient as-is. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Mark Fasheh 提交于
This patch improves Ocfs2 allocation policy by allowing an inode to reserve a portion of the local alloc bitmap for itself. The reserved portion (allocation window) is advisory in that other allocation windows might steal it if the local alloc bitmap becomes full. Otherwise, the reservations are honored and guaranteed to be free. When the local alloc window is moved to a different portion of the bitmap, existing reservations are discarded. Reservation windows are represented internally by a red-black tree. Within that tree, each node represents the reservation window of one inode. An LRU of active reservations is also maintained. When new data is written, we allocate it from the inodes window. When all bits in a window are exhausted, we allocate a new one as close to the previous one as possible. Should we not find free space, an existing reservation is pulled off the LRU and cannibalized. Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
- 24 3月, 2010 1 次提交
-
-
由 Mark Fasheh 提交于
When the local alloc file changes windows, unused bits are freed back to the global bitmap. By defnition, those bits can not be in use by any file. Also, the local alloc will never have been able to allocate those bits if they were part of a previous truncate. Therefore it makes sense that we should clear unused local alloc bits in the undo buffer so that they can be used immediatly. [ Modified to call it ocfs2_release_clusters() -- Joel ] Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 22 4月, 2010 1 次提交
-
-
由 Tao Ma 提交于
The fixes include: 1. some endian problems. 2. we should use bit/bpc in ocfs2_block_group_grow_discontig to allocate clusters. 3. set num_clusters properly in __ocfs2_claim_clusters. 4. change name from ocfs2_supports_discontig_bh to ocfs2_supports_discontig_bg. Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
- 13 4月, 2010 1 次提交
-
-
由 Joel Becker 提交于
If we cannot get a contiguous region for a block group, allocate a discontiguous one when the filesystem supports it. Signed-off-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
- 03 3月, 2010 1 次提交
-
-
由 Tristan Ye 提交于
Currently we were adding ioctl cmds/structures for ocfs2 into ocfs2_fs.h which was used for define ocfs2 on-disk layout. That sounds a little bit confusing, and it may be quickly polluted espcially when growing the ocfs2_info_request ioctls afterwards(it will grow i bet). As a result, such OCFS2 IOCs do need to be placed somewhere other than ocfs2_fs.h, a separated ocfs2_ioctl.h will be added to store such ioctl structures and definitions which could also be used from userspace to invoke ioctls call. Signed-off-by: NTristan Ye <tristan.ye@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-