提交 · 0d74125a6a68d4f1969ecaf0b3543f315916ccdc · openeuler / raspberrypi-kernel

03 2月, 2010 3 次提交

ocfs2: Do not downconvert if the lock level is already compatible · 0d74125a

由 Sunil Mushran 提交于 1月 29, 2010

During upconvert, if the master were to send a BAST, dlmglue will detect the
upconversion in process and send a cancel convert to the master. Upon receiving
the AST for the cancel convert, it will re-process the lock resource to determine
whether it needs downconverting. Say, the up was from PR to EX and the BAST was
for EX. After the cancel convert, it will need to downconvert to NL.

However, if the node was originally upconverting from NL to EX, then there would
be no reason to downconvert (assuming the same message sequence).

This patch makes dlmglue consider the possibility that the current lock level
is already compatible and that downconverting is not required.

Joel Becker <joel.becker@oracle.com> assisted in fixing this issue.

Fixes ossbz#1178
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1178Reported-by: NColy Li <coly.li@suse.de>
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

0d74125a

ocfs2: Prevent a livelock in dlmglue · a1912826

由 Sunil Mushran 提交于 1月 21, 2010

There is possibility of a livelock in __ocfs2_cluster_lock(). If a node were
to get an ast for an upconvert request, followed immediately by a bast,
there is a small window where the fs may downconvert the lock before the
process requesting the upconvert is able to take the lock.

This patch adds a new flag to indicate that the upconvert is still in
progress and that the dc thread should not downconvert it right now.

Wengang Wang <wen.gang.wang@oracle.com> and Joel Becker
<joel.becker@oracle.com> contributed heavily to this patch.
Reported-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a1912826

ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast · 0b94a909

由 Wengang Wang 提交于 1月 21, 2010

During bast, set the OCFS2_LOCK_BLOCKED flag only if the lock needs to
downconverted.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Acked-by: NSunil Mushran <sunil.mushran@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

0b94a909

26 1月, 2010 1 次提交

ocfs2/trivial: Remove trailing whitespaces · 2bd63216

由 Sunil Mushran 提交于 1月 25, 2010

Patch removes trailing whitespaces.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

2bd63216

04 12月, 2009 1 次提交

tree-wide: fix assorted typos all over the place · af901ca1

由 André Goddard Rosa 提交于 11月 14, 2009

That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.
Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

af901ca1

23 9月, 2009 3 次提交

dlmglue.c: add missed mlog lines · d92bc512

由 Coly Li 提交于 8月 28, 2009

This patch adds the missed mlog_exit() and mlog_exit_void() lines when routines
return.
Signed-off-by: NColy Li <coly.li@suse.de>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

d92bc512

ocfs2: Add new refcount tree lock resource in dlmglue. · 8dec98ed

由 Tao Ma 提交于 8月 18, 2009

refcount tree lock resource is used to protect refcount
tree read/write among multiple nodes.
Signed-off-by: NTao Ma <tao.ma@oracle.com>

8dec98ed

ocfs2: Abstract caching info checkpoint. · a4338481

由 Tao Ma 提交于 8月 18, 2009

In meta downconvert, we need to checkpoint the metadata in an inode.
For refcount tree, we also need it. So abstract the process out.

Signed-off-by: Tao Ma <tao.ma@oracle.com>

a4338481

05 9月, 2009 2 次提交

ocfs2: Pass struct ocfs2_caching_info to the journal functions. · 0cf2f763

由 Joel Becker 提交于 2月 12, 2009

The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions.  Thus the
journal locks a metadata cache with the cache io_lock function.  It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

0cf2f763

ocfs2: Take the inode out of the metadata read/write paths. · 8cb471e8

由 Joel Becker 提交于 2月 10, 2009

We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache.  This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

8cb471e8

23 6月, 2009 4 次提交

ocfs2: Add lockdep annotations · cb25797d

由 Jan Kara 提交于 6月 04, 2009

Add lockdep support to OCFS2. The support also covers all of the cluster
locks except for open locks, journal locks, and local quotafile locks. These
are special because they are acquired for a node, not for a particular process
and lockdep cannot deal with such type of locking.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

cb25797d

ocfs2: Disable orphan scanning for local and hard-ro mounts · df152c24

由 Sunil Mushran 提交于 6月 22, 2009

Local and Hard-RO mounts do not need orphan scanning.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

df152c24

ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() · 3211949f

由 Sunil Mushran 提交于 6月 19, 2009

We don't access the LVB in our ocfs2_*_lock_res_init() functions.

Since the LVB can become invalid during some cluster recovery
operations, the dlmglue must be able to handle an uninitialized
LVB.

For the orphan scan lock, we initialized an uninitialzed LVB with our
scan sequence number plus one.  This starts a normal orphan scan
cycle.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

3211949f

ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API. · 1c520dfb

由 Joel Becker 提交于 6月 19, 2009

The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and
the DLM cannot reconstruct its state.  Clients of the DLM need to know
this.

ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses
track of the state.  This is not a standard behavior, but ocfs2 has
always relied on it.  Thus, an o2dlm LVB is always "valid".

ocfs2 now supports both o2dlm and fs/dlm via the stack glue.  When
fs/dlm loses track of an LVBs state, it sets a flag
(DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB).  The contents of
the LVB may be garbage or merely stale.

ocfs2 doesn't want to try to guess at the validity of the stale LVB.
Instead, it should be checking the VALNOTVALID flag.  As this is the
'standard' way of treating LVBs, we will promote this behavior.

We add a stack glue API ocfs2_dlm_lvb_valid().  It returns non-zero when
the LVB is valid.  o2dlm will always return valid, while fs/dlm will
check VALNOTVALID.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>

1c520dfb

04 6月, 2009 1 次提交

ocfs2: timer to queue scan of all orphan slots · 83273932

由 Srinivas Eeda 提交于 6月 03, 2009

When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
before moving the dentry to the orphan directory. Other nodes that have
this dentry in cache have a PR on the same dentry lock. When the EX is
requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
during downconvert. The inode is finally deleted when the last node to iput
the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

A problem arises if a node is forced to free dentry locks because of memory
pressure. If this happens, the node will no longer get downconvert
notifications for the dentries that have been unlinked on another node.
If it also happens that node is actively using the corresponding inode and
happens to be the one performing the last iput on that inode, it will fail
to delete the inode as it will not have the MAYBE_ORPHANED flag set.

This patch fixes this shortcoming by introducing a periodic scan of the
orphan directories to delete such inodes. Care has been taken to distribute
the workload across the cluster so that no one node has to perform the task
all the time.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

83273932

04 4月, 2009 1 次提交

ocfs2: fix rare stale inode errors when exporting via nfs · 6ca497a8

由 wengang wang 提交于 3月 06, 2009

For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.

This patch fixes above problem.

Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.

We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.

And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.

[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6ca497a8

27 2月, 2009 1 次提交

ocfs2: Cleanup the lockname print in dlmglue.c · c74ff8bb

由 Sunil Mushran 提交于 2月 03, 2009

The dentry lock has a different format than other locks. This patch fixes
ocfs2_log_dlm_error() macro to make it print the dentry lock correctly.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

c74ff8bb

03 2月, 2009 1 次提交

ocfs2: Wakeup the downconvert thread after a successful cancel convert · a4b91965

由 Sunil Mushran 提交于 1月 29, 2009

When two nodes holding PR locks on a resource concurrently attempt to
upconvert the locks to EX, the master sends a BAST to one of the nodes. This
message tells that node to first cancel convert the upconvert request,
followed by downconvert to a NL. Only when this lock is downconverted to NL,
can the master upconvert the first node's lock to EX.

While the fs was doing the cancel convert, it was forgetting to wake up the
dc thread after a successful cancel, leading to a deadlock.
Reported-and-Tested-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

a4b91965

09 1月, 2009 1 次提交

fix similar typos to successfull · 73ac36ea

由 Coly Li 提交于 1月 07, 2009

When I review ocfs2 code, find there are 2 typos to "successfull".  After
doing grep "successfull " in kernel tree, 22 typos found totally -- great
minds always think alike :)

This patch fixes all the similar typos. Thanks for Randy's ack and comments.
Signed-off-by: NColy Li <coyli@suse.de>
Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NRoland Dreier <rolandd@cisco.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73ac36ea

06 1月, 2009 4 次提交

ocfs2: remove unneeded lvb casts · a641dc2a

由 Mark Fasheh 提交于 12月 24, 2008

dlmglue.c has lots of code which casts the return value of ocfs2_dlm_lvb().
This is pointless however, as ocfs2_dlm_lvb() returns void *.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

a641dc2a

ocfs2: Fix ocfs2_read_quota_block() error handling. · 85eb8b73

由 Joel Becker 提交于 11月 25, 2008

ocfs2_bread() has become ocfs2_read_virt_blocks(), with a prototype to
match ocfs2_read_blocks().  The quota code, converting from
ocfs2_bread(), wraps the call to ocfs2_read_virt_blocks() in
ocfs2_read_quota_block().  Unfortunately, the prototype of
ocfs2_read_quota_block() matches the old prototype of ocfs2_bread().

The problem is that ocfs2_bread() returned the buffer head, and callers
assumed that a NULL pointer was indicative of error.  It wasn't.  This
is why ocfs2_bread() took an int*err argument as well.

The new prototype of ocfs2_read_virt_blocks() avoids this error handling
confusion.  Let's change ocfs2_read_quota_block() to match.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

85eb8b73

ocfs2: Implementation of local and global quota file handling · 9e33d69f

由 Jan Kara 提交于 8月 25, 2008

For each quota type each node has local quota file. In this file it stores
changes users have made to disk usage via this node. Once in a while this
information is synced to global file (and thus with other nodes) so that
limits enforcement at least aproximately works.

Global quota files contain all the information about usage and limits. It's
mostly handled by the generic VFS code (which implements a trie of structures
inside a quota file). We only have to provide functions to convert structures
from on-disk format to in-memory one. We also have to provide wrappers for
various quota functions starting transactions and acquiring necessary cluster
locks before the actual IO is really started.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9e33d69f

ocfs2: Wrap inode block reads in a dedicated function. · b657c95c

由 Joel Becker 提交于 11月 13, 2008

The ocfs2 code currently reads inodes off disk with a simple
ocfs2_read_block() call.  Each place that does this has a different set
of sanity checks it performs.  Some check only the signature.  A couple
validate the block number (the block read vs di->i_blkno).  A couple
others check for VALID_FL.  Only one place validates i_fs_generation.  A
couple check nothing.  Even when an error is found, they don't all do
the same thing.

We wrap inode reading into ocfs2_read_inode_block().  This will validate
all the above fields, going readonly if they are invalid (they never
should be).  ocfs2_read_inode_block_full() is provided for the places
that want to pass read_block flags.  Every caller is passing a struct
inode with a valid ip_blkno, so we don't need a separate blkno argument
either.

We will remove the validation checks from the rest of the code in a
later commit, as they are no longer necessary.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

b657c95c

02 12月, 2008 1 次提交

ocfs2: fix wake_up in unlock_ast · 07f9eebc

由 David Teigland 提交于 11月 17, 2008

In ocfs2_unlock_ast(), call wake_up() on lockres before releasing
the spin lock on it.  As soon as the spin lock is released, the
lockres can be freed.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

07f9eebc

15 10月, 2008 2 次提交

ocfs2: Simplify ocfs2_read_block() · 0fcaa56a

由 Joel Becker 提交于 10月 09, 2008

More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
Only six pass a different flag set. Rather than have every caller care,
let's make ocfs2_read_block() take no flags and always do a cached read.
The remaining six places can call ocfs2_read_blocks() directly.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

0fcaa56a

ocfs2: Require an inode for ocfs2_read_block(s)(). · 31d33073

由 Joel Becker 提交于 10月 09, 2008

Now that synchronous readers are using ocfs2_read_blocks_sync(), all
callers of ocfs2_read_blocks() are passing an inode.  Use it
unconditionally.  Since it's there, we don't need to pass the
ocfs2_super either.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

31d33073

15 7月, 2008 2 次提交

ocfs2: fix printk format warnings with OCFS2_FS_STATS=n · dd25e55e

由 Randy Dunlap 提交于 5月 28, 2008

Fix printk format warnings when OCFS2_FS_STATS=n:

linux-next-20080528/fs/ocfs2/dlmglue.c: In function 'ocfs2_dlm_seq_show':
linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'int'
linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'int'
linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 7 has type 'int'
linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 8 has type 'int'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

dd25e55e

[PATCH 2/2] ocfs2: Instrument fs cluster locks · 8ddb7b00

由 Sunil Mushran 提交于 5月 13, 2008

This patch adds code to track the number of times the fs takes
various cluster locks as well as the times associated with it.
The information is made available to users via debugfs.

This patch was originally written by Jan Kara <jack@suse.cz>.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

8ddb7b00

11 7月, 2008 1 次提交

ocfs2: Fix flags in ocfs2_file_lock · e988cf1c

由 Mark Fasheh 提交于 7月 10, 2008

The stack-glue merge changed the way we use flags in dlmglue in that we now
use the fs/dlm equivalents. Unfortunately, a merge error left the new flock
code only partially updated. This took a while to show up though, because
the lock level constants are actually identical between o2dlm and fs/dlm.
The *_CONVERT and *_NOQUEUE flags have different values though, which is
eventually causing a crash in flags_to_o2dlm().
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e988cf1c

18 4月, 2008 11 次提交

ocfs2: Add the 'cluster_stack' sysfs file. · 9c6c877c

由 Joel Becker 提交于 2月 01, 2008

Userspace can now query and specify the cluster stack in use via the
/sys/fs/ocfs2/cluster_stack file.  By default, it is 'o2cb', which is
the classic stack.  Thus, old tools that do not know how to modify this
file will work just fine.  The stack cannot be modified if there is a
live filesystem.

ocfs2_cluster_connect() now takes the expected cluster stack as an
argument.  This way, the filesystem and the stack glue ensure they are
speaking to the same backend.

If the stack is 'o2cb', the o2cb stack plugin is used.  For any other
value, the fsdlm stack plugin is selected.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9c6c877c

ocfs2: Break out stackglue into modules. · 286eaa95

由 Joel Becker 提交于 2月 01, 2008

We define the ocfs2_stack_plugin structure to represent a stack driver.
The o2cb stack code is split into stack_o2cb.c.  This becomes the
ocfs2_stack_o2cb.ko module.

The stackglue generic functions are similarly split into the
ocfs2_stackglue.ko module.  This module now provides an interface to
register drivers.  The ocfs2_stack_o2cb driver registers itself.  As
part of this interface, ocfs2_stackglue can load drivers on demand.
This is accomplished in ocfs2_cluster_connect().

ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
If a hangup is pending, it will not release the driver module and will
let _hangup() do that.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

286eaa95

ocfs2: Clean up stackglue initialization · 63e0c48a

由 Joel Becker 提交于 1月 30, 2008

The stack glue initialization function needs a better name so that it can be
used cleanly when stackglue becomes a module.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

63e0c48a

ocfs2: Abstract out a debugging function for underlying dlms. · cf0acdcd

由 Joel Becker 提交于 1月 29, 2008

dlmglue.c was still referencing a raw o2dlm lksb in one instance.  Let's
create a generic ocfs2_dlm_dump_lksb() function.  This allows underlying
DLMs to print whatever they want about their lock.

We then move the o2dlm dump into stackglue.c where it belongs.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

cf0acdcd

ocfs2: handle async EAGAIN from NOQUEUE request · 1693a5c0

由 David Teigland 提交于 1月 30, 2008

When using fsdlm, -EAGAIN is returned in the async callback for NOQUEUE
requests. Fix up dlmglue to expect this.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

1693a5c0

ocfs2: Remove CANCELGRANT from the view of dlmglue. · de551246

由 Joel Becker 提交于 2月 01, 2008

o2dlm has the non-standard behavior of providing a cancel callback
(unlock_ast) even when the cancel has failed (the locking operation
succeeded without canceling). This is called CANCELGRANT after the
status code sent to the callback. fs/dlm does not provide this
callback, so dlmglue must be changed to live without it.
o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls.

Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer
needs to check for it. ocfs2_locking_ast() must catch that a cancel was
tried and clear the cancel state.

Making these changes opens up a locking race. dlmglue uses the the
OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any
one time. But dlmglue must unlock the lockres before calling into the
dlm. In the small window of time between unlocking the lockres and
calling the dlm, the downconvert thread can try to cancel the lock. The
downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't
know that ocfs2_dlm_lock() has not yet been called.

Because ocfs2_dlm_lock() has not yet been called, the cancel operation
will just be a no-op. There's nothing to cancel. With CANCELGRANT,
dlmglue uses the CANCELGRANT callback to clear up the cancel state.
When it comes around again, it will retry the cancel. Eventually, the
first thread will have called into ocfs2_dlm_lock(), and either the
lock or the cancel will succeed. The downconvert thread can then do its
downconvert.

Without CANCELGRANT, there is nothing to clean up the cancellation
state. The downconvert thread does not know to retry its operations.
More importantly, the original lock may be blocking on the other node
that is trying to cancel us. With neither able to make progress, the
ast is never called and the cancellation state is never cleaned up that
way. dlmglue is deadlocked.

The OCFS2_LOCK_PENDING flag is introduced to remedy this window. It is
set at the same time OCFS2_LOCK_BUSY is. Thus, the downconvert thread
can check whether the lock is cancelable. If not, it just loops around
to try again. Once ocfs2_dlm_lock() is called, the thread then clears
OCFS2_LOCK_PENDING and wakes the downconvert thread. Now, if the
downconvert thread finds the lock BUSY, it can safely try to cancel it.
Whether the cancel works or not, the state will be properly set and the
lock processing can continue.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

de551246

ocfs2: Fill node number during cluster stack init · 0abd6d18

由 Mark Fasheh 提交于 1月 29, 2008

It doesn't make sense to query for a node number before connecting to the
cluster stack. This should be safe to do because node_num is only just
printed,
and we're actually only moving the setting of node num a small amount
further in the mount process.

[ Disconnect when node query fails -- Joel ]
Reviewed-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

0abd6d18

ocfs2: Move o2hb functionality into the stack glue. · 6953b4c0

由 Joel Becker 提交于 1月 29, 2008

The last bit of classic stack used directly in ocfs2 code is o2hb.
Specifically, the check for heartbeat during mount and the call to
ocfs2_hb_ctl during unmount.

We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
to ocfs2_hb_ctl.  Other stacks will just leave hangup() empty.

The check for heartbeat is moved into ocfs2_cluster_connect().  It will
be matched by a similar check for other stacks.

With this change, only stackglue.c includes cluster/ headers.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6953b4c0

ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API. · 4670c46d

由 Joel Becker 提交于 2月 01, 2008

This step introduces a cluster stack agnostic API for initializing and
exiting. fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
connect to the stack. It is all handled in stackglue.c.

heartbeat.c no longer needs to know how it gets called.
ocfs2_do_node_down() is now a clean recovery trigger.

The big gotcha is the ordering of initializations and de-initializations done
underneath ocfs2_cluster_connect(). ocfs2_dlm_init() used to do all
o2dlm initialization in one block. Thus, the o2dlm functionality of
ocfs2_cluster_connect() is very straightforward. ocfs2_dlm_shutdown(),
however, did a few things between de-registration of the eviction
callback and actually shutting down the domain. Now de-registration and
shutdown of the domain are wrapped within the single
ocfs2_cluster_disconnect() call. I've checked the code paths to make
sure we can safely tear down things in ocfs2_dlm_shutdown() before
calling ocfs2_cluster_disconnect(). The filesystem has already set
itself to ignore the callback.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

4670c46d

ocfs2: Create the lock status block union. · 8f2c9c1b

由 Joel Becker 提交于 2月 01, 2008

Wrap the lock status block (lksb) in a union.  Later we will add a union
element for the fs/dlm lksb.  Create accessors for the status and lvb
fields.

Other than a debugging function, dlmglue.c does not directly reference
the o2dlm locking path anymore.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

8f2c9c1b

ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API. · 7431cd7e

由 Joel Becker 提交于 2月 01, 2008

Change the ocfs2_dlm_lock/unlock() functions to return -errno values.
This is the first step towards elminiating dlm_status in
fs/ocfs2/dlmglue.c.  The change also passes -errno values to
->unlock_ast().

[ Fix a return code in dlmglue.c and change the error translation table into
  an array of ints. --Mark ]
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

7431cd7e