提交 · dad30e9031c5927c30b402f73ac57ffbe09dc9ee · openanolis / cloud-kernel

24 4月, 2012 1 次提交

GFS2: Clean up log write code path · e8c92ed7

由 Steven Whitehouse 提交于 4月 16, 2012

Prior to this patch, we have two ways of sending i/o to the log.
One of those is used when we need to allocate both the data
to be written itself and also a buffer head to submit it. This
is done via sb_getblk and friends. This is used mostly for writing
log headers.

The other method is used when writing blocks which have some
in-place counterpart. This is the case for all the metadata
blocks which are journalled, and when journaled data is in use,
for unescaped journalled data blocks.

This patch replaces both of those two methods, and about half
a dozen separate i/o submission points with a single i/o
submission function. We also go direct to bio rather than
using buffer heads, since this allows us to build i/o
requests of the maximum size for the block device in
question. It also reduces the memory required for flushing
the log, which can be very useful in low memory situations.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e8c92ed7

05 3月, 2012 1 次提交

GFS2: Eliminate sd_rindex_mutex · 6aad1c3d

由 Bob Peterson 提交于 3月 05, 2012

Over time, we've slowly eliminated the use of sd_rindex_mutex.
Up to this point, it was only used in two places: function
gfs2_ri_total (which totals the file system size by reading
and parsing the rindex file) and function gfs2_rindex_update
which updates the rgrps in memory. Both of these functions have
the rindex glock to protect them, so the rindex is unnecessary.
Since gfs2_grow writes to the rindex via the meta_fs, the mutex
is in the wrong order according to the normal rules. This patch
eliminates the mutex entirely to avoid the problem.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

6aad1c3d

29 2月, 2012 1 次提交

GFS2: glock statistics gathering · a245769f

由 Steven Whitehouse 提交于 1月 20, 2012

The stats are divided into two sets: those relating to the
super block and those relating to an individual glock. The
super block stats are done on a per cpu basis in order to
try and reduce the overhead of gathering them. They are also
further divided by glock type.

In the case of both the super block and glock statistics,
the same information is gathered in each case. The super
block statistics are used to provide default values for
most of the glock statistics, so that newly created glocks
should have, as far as possible, a sensible starting point.

The statistics are divided into three pairs of mean and
variance, plus two counters. The mean/variance pairs are
smoothed exponential estimates and the algorithm used is
one which will be very familiar to those used to calculation
of round trip times in network code.

The three pairs of mean/variance measure the following
things:

 1. DLM lock time (non-blocking requests)
 2. DLM lock time (blocking requests)
 3. Inter-request time (again to the DLM)

A non-blocking request is one which will complete right
away, whatever the state of the DLM lock in question. That
currently means any requests when (a) the current state of
the lock is exclusive (b) the requested state is either null
or unlocked or (c) the "try lock" flag is set. A blocking
request covers all the other lock requests.

There are two counters. The first is there primarily to show
how many lock requests have been made, and thus how much data
has gone into the mean/variance calculations. The other counter
is counting queueing of holders at the top layer of the glock
code. Hopefully that number will be a lot larger than the number
of dlm lock requests issued.

So why gather these statistics? There are several reasons
we'd like to get a better idea of these timings:

1. To be able to better set the glock "min hold time"
2. To spot performance issues more easily
3. To improve the algorithm for selecting resource groups for
allocation (to base it on lock wait time, rather than blindly
using a "try lock")
Due to the smoothing action of the updates, a step change in
some input quantity being sampled will only fully be taken
into account after 8 samples (or 4 for the variance) and this
needs to be carefully considered when interpreting the
results.

Knowing both the time it takes a lock request to complete and
the average time between lock requests for a glock means we
can compute the total percentage of the time for which the
node is able to use a glock vs. time that the rest of the
cluster has its share. That will be very useful when setting
the lock min hold time.

The other point to remember is that all times are in
nanoseconds. Great care has been taken to ensure that we
measure exactly the quantities that we want, as accurately
as possible. There are always inaccuracies in any
measuring system, but I hope this is as accurate as we
can reasonably make it.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a245769f

11 1月, 2012 3 次提交

GFS2: fail mount if journal recovery fails · 376d3778

由 David Teigland 提交于 1月 09, 2012

If the first mounter fails to recover one of the journals
during mount, the mount should fail.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

376d3778

GFS2: let spectator mount do read only recovery · e8ca5cc5

由 David Teigland 提交于 1月 09, 2012

Previously, a spectator mount would not even attempt to do
journal recovery for a failed node.  This meant that if all
mounted nodes were spectators, everyone would be stuck after
a node failed, all waiting for recovery to be performed.
This is unnecessary since the failed node had a clean journal.

Instead, allow a spectator mount to do a partial "read only"
recovery, which means it will check if the failed journal is
clean, and if so, report a successful recovery.  If the failed
journal is not clean, it reports that journal recovery failed.
This makes it work the same as a read only mount on a read only
block device.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e8ca5cc5

GFS2: dlm based recovery coordination · e0c2a9aa

由 David Teigland 提交于 1月 09, 2012

This new method of managing recovery is an alternative to
the previous approach of using the userland gfs_controld.

- use dlm slot numbers to assign journal id's
- use dlm recovery callbacks to initiate journal recovery
- use a dlm lock to determine the first node to mount fs
- use a dlm lock to track journals that need recovery
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e0c2a9aa

22 11月, 2011 1 次提交

GFS2: decouple quota allocations from block allocations · 564e12b1

由 Bob Peterson 提交于 11月 21, 2011

This patch separates the code pertaining to allocations into two
parts: quota-related information and block reservations.
This patch also moves all the block reservation structure allocations to
function gfs2_inplace_reserve to simplify the code, and moves
the frees to function gfs2_inplace_release.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

564e12b1

18 11月, 2011 1 次提交

GFS2: remove vestigial al_alloced · b9f417f3

由 Bob Peterson 提交于 11月 16, 2011

This patch removes the vestigial variable al_alloced from
the gfs2_alloc structure. This is another baby step toward
multi-block reservations.

My next planned step is to decouple the quota variables
from the gfs2_alloc structure so we can use a different
method for allocations.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

b9f417f3

21 10月, 2011 6 次提交

GFS2: Remove two unused variables · 9ae32429

由 Steven Whitehouse 提交于 9月 20, 2011

The two variables being initialised in gfs2_inplace_reserve
to track the file & line number of the caller are never
used, so we might as well remove them.

If something does go wrong, then a stack trace is probably
more useful anyway.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

9ae32429

GFS2: rewrite fallocate code to write blocks directly · 64dd153c

由 Benjamin Marzinski 提交于 9月 12, 2011

GFS2's fallocate code currently goes through the page cache. Since it's only
writing to the end of the file or to holes in it, it doesn't need to, and it
was causing issues on low memory environments. This patch pulls in some of
Steve's block allocation work, and uses it to simply allocate the blocks for
the file, and zero them out at allocation time. It provides a slight
performance increase, and it dramatically simplifies the code.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

64dd153c

GFS2: Cache the most recently used resource group in the inode · 54335b1f

由 Steven Whitehouse 提交于 9月 01, 2011

This means that after the initial allocation for any inode, the
last used resource group is cached in the inode for future use.
This drastically reduces the number of lookups of resource
groups in the common case, and this the contention on that
data structure.

The allocation algorithm is the same as previously, except that we
always check to see if the goal block is within the cached rgrp
first before going to the rbtree to look one up.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

54335b1f

GFS2: Make resource groups "append only" during life of fs · 8339ee54

由 Steven Whitehouse 提交于 8月 31, 2011

Since we have ruled out supporting online filesystem shrink,
it is possible to make the resource group list append only
during the life of a super block. This gives several benefits:

Firstly, we only need to read new rindex elements as they are added
rather than needing to reread the whole rindex file each time one
element is added.

Secondly, the rindex glock can be held for much shorter periods of
time, and is completely removed from the fast path for allocations.
The lock is taken in shared mode only when updating the resource
groups when the first allocation occurs, and after a grow has
taken place.

Thirdly, this results in a reduction in code size, and everything
gets a lot simpler to understand in this area.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

8339ee54

GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme · 7c9ca621

由 Bob Peterson 提交于 8月 31, 2011

Here is an update of Bob's original rbtree patch which, in addition, also
resolves the rather strange ref counting that was being done relating to
the bitmap blocks.

Originally we had a dual system for journaling resource groups. The metadata
blocks were journaled and also the rgrp itself was added to a list. The reason
for adding the rgrp to the list in the journal was so that the "repolish
clones" code could be run to update the free space, and potentially send any
discard requests when the log was flushed. This was done by comparing the
"cloned" bitmap with what had been written back on disk during the transaction
commit.

Due to this, there was a requirement to hang on to the rgrps' bitmap buffers
until the journal had been flushed. For that reason, there was a rather
complicated set up in the ->go_lock ->go_unlock functions for rgrps involving
both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference
count on the buffers.

However, the journal maintains a reference count on the buffers anyway, since
they are being journaled as metadata buffers. So by moving the code which deals
with the post-journal accounting for bitmap blocks to the metadata journaling
code, we can entirely dispense with the rather strange buffer ref counting
scheme and also the requirement to journal the rgrps.

The net result of all this is that the ->sd_rindex_spin is left to do exactly
one job, and that is to look after the rbtree or rgrps.

This patch is designed to be a stepping stone towards using RCU for the rbtree
of resource groups, however the reduction in the number of uses of the
->sd_rindex_spin is likely to have benefits for multi-threaded workloads,
anyway.

The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also
be removed in future in favour of calling the functions directly where required
in the code. That will allow locking of resource groups without needing to
actually read them in - something that could be useful in speeding up statfs.

In the mean time though it is valid to dereference ->bi_bh only when the rgrp
is locked. This is basically the same rule as before, modulo the references not
being valid until the following journal flush.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Cc: Benjamin Marzinski <bmarzins@redhat.com>

7c9ca621

GFS2: Fix inode allocation error path · 40ac218f

由 Steven Whitehouse 提交于 8月 02, 2011

If we have got far enough through the inode allocation code
path that an inode has already been allocated, then we must
call iput to dispose of it, if an error occurs during a
later part of the process. This will always be the final iput
since there will be no other references to the inode.

Unlike when the inode has been unlinked, its block state will
be GFS2_BLKST_INODE rather than GFS2_BLKST_UNLINKED so we need
to skip the test in ->evict_inode() for this one case in order
to ensure that it will be deallocated correctly. This patch adds
a new flag in order to ensure that this will happen correctly.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

40ac218f

15 7月, 2011 2 次提交

GFS2: Automatically adjust glock min hold time · 7cf8dcd3

由 Bob Peterson 提交于 6月 15, 2011

This patch is a performance improvement for GFS2 in a clustered
environment. It makes the glock hold time self-adjusting.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

7cf8dcd3

GFS2: Cache dir hash table in a contiguous buffer · 17d539f0

由 Steven Whitehouse 提交于 6月 15, 2011

This patch adds a cache for the hash table to the directory code
in order to help simplify the way in which the hash table is
accessed. This is intended to be a first step towards introducing
some performance improvements in the directory code.

There are two follow ups that I'm hoping to see fairly shortly. One
is to simplify the hash table reading code now that we always read the
complete hash table, whether we want one entry or all of them. The
other is to introduce readahead on the heads of the hash chains
which are referred to from the table.

The hash table is a maximum of 128k in size, so it is not worth trying
to read it in small chunks.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

17d539f0

12 7月, 2011 1 次提交

GFS2: Fix race during filesystem mount · 3942ae53

由 Steven Whitehouse 提交于 7月 11, 2011

There is a potential race during filesystem mounting which has recently
been reported. It occurs when the userland gfs_controld is able to
process requests fast enough that it tries to use the sysfs interface
before the lock module is properly initialised. This is a pretty
unusual case as normally the lock module initialisation is very quick
compared with gfs_controld.

This patch adds an interruptible completion which is used to ensure that
userland will wait for the initialisation of the lock module to
complete.

There are other potential solutions to this problem, but this is the
quickest at this stage and has been tested both with and without
mount.gfs2 present in the system.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Reported-by: NDavid Booher <dbooher@adams.net>

3942ae53

10 5月, 2011 1 次提交

GFS2: Use UUID field in generic superblock · 32e471ef

由 Steven Whitehouse 提交于 5月 10, 2011

The VFS superblock structure now has a UUID field, so we can use that
in preference to the UUID field in the GFS2 superblock now.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

32e471ef

20 4月, 2011 3 次提交

GFS2: Make writeback more responsive to system conditions · 4667a0ec

由 Steven Whitehouse 提交于 4月 18, 2011

This patch adds writeback_control to writing back the AIL
list. This means that we can then take advantage of the
information we get in ->write_inode() in order to set off
some pre-emptive writeback.

In addition, the AIL code is cleaned up a bit to make it
a bit simpler to understand.

There is still more which can usefully be done in this area,
but this is a good start at least.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4667a0ec

GFS2: Optimise glock lru and end of life inodes · f42ab085

由 Steven Whitehouse 提交于 4月 14, 2011

The GLF_LRU flag introduced in the previous patch can be
used to check if a glock is on the lru list when a new
holder is queued and if so remove it, without having first
to get the lru_lock.

The main purpose of this patch however is to optimise the
glocks left over when an inode at end of life is being
evicted. Previously such glocks were left with the GLF_LFLUSH
flag set, so that when reclaimed, each one required a log flush.
This patch resets the GLF_LFLUSH flag when there is nothing
left to flush thus preventing later log flushes as glocks are
reused or demoted.

In order to do this, we need to keep track of the number of
revokes which are outstanding, and also to clear the GLF_LFLUSH
bit after a log commit when only revokes have been processed.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

f42ab085

GFS2: Improve tracing support (adds two flags) · 627c10b7

由 Steven Whitehouse 提交于 4月 14, 2011

This adds support for two new flags. One keeps track of whether
the glock is on the LRU list or not. The other isn't really a
flag as such, but an indication of whether the glock has an
attached object or not. This indication is reported without
any locking, which is ok since we do not dereference the object
pointer but merely report whether it is NULL or not.

Also, this fixes one place where a tracepoint was missing, which
was at the point we remove deallocated blocks from the journal.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

627c10b7

11 3月, 2011 1 次提交

GFS2: introduce AIL lock · d6a079e8

由 Dave Chinner 提交于 3月 11, 2011

The log lock is currently used to protect the AIL lists and
the movements of buffers into and out of them. The lists
are self contained and no log specific items outside the
lists are accessed when starting or emptying the AIL lists.

Hence the operation of the AIL does not require the protection
of the log lock so split them out into a new AIL specific lock
to reduce the amount of traffic on the log lock. This will
also reduce the amount of serialisation that occurs when
the gfs2_logd pushes on the AIL to move it forward.

This reduces the impact of log pushing on sequential write
throughput.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

d6a079e8

09 3月, 2011 1 次提交

GFS2: quota allows exceeding hard limit · 662e3a55

由 Abhijith Das 提交于 3月 08, 2011

Immediately after being synced to disk, cached quotas are zeroed out and a
subsequent access of the cached quotas results in incorrect zero values. This
meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage
values it found in the cached quotas and comparison against warn/limits never
triggered a quota violation.

This patch adds a new flag QDF_REFRESH that is set after a sync so that the
cached quotas are forcefully refreshed from disk on a subsequent access on
seeing this flag set.

Resolves: rhbz#675944
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

662e3a55

21 1月, 2011 1 次提交

GFS2: Use RCU for glock hash table · bc015cb8

由 Steven Whitehouse 提交于 1月 19, 2011

This has a number of advantages:

 - Reduces contention on the hash table lock
 - Makes the code smaller and simpler
 - Should speed up glock dumps when under load
 - Removes ref count changing in examine_bucket
 - No longer need hash chain lock in glock_put() in common case

There are some further changes which this enables and which
we may do in the future. One is to look at using SLAB_RCU,
and another is to look at using a per-cpu counter for the
per-sb glock counter, since that is touched twice in the
lifetime of each glock (but only used at umount time).
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

bc015cb8

11 1月, 2011 1 次提交

headers: kobject.h redux · 57cc7215

由 Alexey Dobriyan 提交于 1月 10, 2011

Remove kobject.h from files which don't need it, notably,
sched.h and fs.h.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57cc7215

30 11月, 2010 1 次提交

GFS2: Merge glock state fields into a bitfield · 47a25380

由 Steven Whitehouse 提交于 11月 30, 2010

We can only merge the fields into a bitfield if the locking
rules for them are the same. In this case gl_spin covers all
of the fields (write side) but a couple of them are used
with GLF_LOCK as the read side lock, which should be ok
since we know that the field in question won't be changing
at the time.

The gl_req setting has to be done earlier (in glock.c) in order
to place it under gl_spin. The gl_reply setting also has to be
brought under gl_spin in order to comply with the new rules.

This saves 4*sizeof(unsigned int) per glock.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>

47a25380

29 9月, 2010 1 次提交

GFS2: Improve journal allocation via sysfs · feb47ca9

由 Steven Whitehouse 提交于 9月 29, 2010

Recently a feature was added to GFS2 to allow journal id allocation
via sysfs. This patch builds upon that so that a negative journal id
will be treated as an error code to be passed back as the return code
from mount. This allows termination of the mount process if there is
a failure.

Also, the process has been updated so that the kernel will wait
for a journal id, even in the "spectator" case. This is required
in order to avoid mounting a filesystem in case there is an error
while joining the cluster. In the spectator case, 0 is written into
the file to indicate that all is well, and that mount should continue.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

feb47ca9

24 9月, 2010 1 次提交

GFS2: Remove upgrade mount option · c80dbb58

由 Steven Whitehouse 提交于 9月 24, 2010

This option has never done anything useful. Also at the same time
this cleans up the sb checks which are done at mount time. The
debug option will be accepted, but ignored in future. Since it
didn't do anything, there didn't seem much point in retaining it.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

c80dbb58

23 9月, 2010 2 次提交

GFS2: Remove localcaching mount option · c2048b00

由 Steven Whitehouse 提交于 9月 23, 2010

This option defaulted to on for lock_nolock mounts and off
otherwise. The only function was to avoid the revalidation of
dentries. In the cluster case, that is entirely pointless and
liable to cause coherency problems.

The patch changes the revalidation to depend upon whether the
fs is a local or cluster fs (i.e. it follows the existing default
behaviour). I very much doubt anybody ever used this option as
there is no reason to. Even so we will continue to accept it
on the mount command line, but ignore it.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

c2048b00

GFS2: Remove ignore_local_fs mount argument · f57a024e

由 Steven Whitehouse 提交于 9月 23, 2010

This is been a no-op for a very long time now. I'm pretty sure
nobody uses it, but just in case we'll still accept it on the
command line, but ignore it.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

f57a024e

20 9月, 2010 3 次提交

GFS2: Don't enforce min hold time when two demotes occur in rapid succession · 7b5e3d5f

由 Steven Whitehouse 提交于 9月 03, 2010

Due to the design of the VFS, it is quite usual for operations on GFS2
to consist of a lookup (requiring a shared lock) followed by an
operation requiring an exclusive lock. If a remote node has cached an
exclusive lock, then it will receive two demote events in rapid succession
firstly for a shared lock and then to unlocked. The existing min hold time
code was triggering in this case, even if the node was otherwise idle
since the state change time was being updated by the initial demote.

This patch introduces logic to skip the min hold timer in the case that
a "double demote" of this kind has occurred. The min hold timer will
still be used in all other cases.

A new glock flag is introduced which is used to keep track of whether
there have been any newly queued holders since the last glock state
change. The min hold time is only applied if the flag is set.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Tested-by: NAbhijith Das <adas@redhat.com>

7b5e3d5f

GFS2: fallocate support · 3921120e

由 Benjamin Marzinski 提交于 8月 20, 2010

This patch adds support for fallocate to gfs2. Since the gfs2 does not support
uninitialized data blocks, it must write out zeros to all the blocks. However,
since it does not need to lock any pages to read from, gfs2 can write out the
zero blocks much more efficiently. On a moderately full filesystem, fallocate
works around 5 times faster on average. The fallocate call also allows gfs2 to
add blocks to the file without changing the filesize, which will make it
possible for gfs2 to preallocate space for the rindex file, so that gfs2 can
grow a completely full filesystem.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

3921120e

GFS2: Remove i_disksize · a2e0f799

由 Steven Whitehouse 提交于 8月 11, 2010

With the update of the truncate code, ip->i_disksize and
inode->i_size are merely copies of each other. This means
we can remove ip->i_disksize and use inode->i_size exclusively
reducing the size of a GFS2 inode by 8 bytes.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a2e0f799

29 7月, 2010 1 次提交

GFS2: Wait for journal id on mount if not specified on mount command line · ba6e9364

由 Steven Whitehouse 提交于 6月 14, 2010

This patch implements a wait for the journal id in the case that it has
not been specified on the command line. This is to allow the future
removal of the mount.gfs2 helper. The journal id would instead be
directly communicated by gfs_controld to the file system. Here is a
comparison of the two systems:

Current:
1. mount calls mount.gfs2
2. mount.gfs2 connects to gfs_controld to retrieve the journal id
3. mount.gfs2 adds the journal id to the mount command line and calls
the mount system call
4. gfs_controld receives the status of the mount request via a uevent

Proposed:
1. mount calls the mount system call (no mount.gfs2 helper)
2. gfs_controld receives a uevent for a gfs2 fs which it doesn't know
about already
3. gfs_controld assigns a journal id to it via sysfs
4. the mount system call then completes as normal (sending a uevent
according to status)

The advantage of the proposed system is that it is completely backward
compatible with the current system both at the kernel and at the
userland levels. The "first" parameter can also be set the same way,
with the restriction that it must be set before the journal id is
assigned.

In addition, if mount becomes stuck waiting for a reply from
gfs_controld which never arrives, then it is killable and will abort the
mount gracefully.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

ba6e9364

23 7月, 2010 1 次提交

gfs2: use workqueue instead of slow-work · 6ecd7c2d

由 Tejun Heo 提交于 7月 20, 2010

Workqueue can now handle high concurrency.  Convert gfs to use
workqueue instead of slow-work.

* Steven pointed out that recovery path might be run from allocation
  path and thus requires forward progress guarantee without memory
  allocation.  Create and use gfs_recovery_wq with rescuer.  Please
  note that forward progress wasn't guaranteed with slow-work.

* Updated to use non-reentrant workqueue.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>

6ecd7c2d

06 5月, 2010 1 次提交

GFS2: Add some useful messages · 913a71d2

由 Steven Whitehouse 提交于 5月 06, 2010

The following patch adds a message to indicate when barriers have been
disabled due to a block device which doesn't support them. You could
already tell this via the mount options in /proc/mounts, but all the
other filesystems also log a message at the same time.

Also, the same mechanisms are used to indicate when the lock
demote interface has been used (only ever used for debugging)
which is a request from our support team.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

913a71d2

05 5月, 2010 1 次提交

GFS2: Various gfs2_logd improvements · 5e687eac

由 Benjamin Marzinski 提交于 5月 04, 2010

This patch contains various tweaks to how log flushes and active item writeback
work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits
for gfs2_logd to do the log flushing. Multiple functions were rewritten to
remove the need to call gfs2_log_lock(). Instead of using one test to see if
gfs2_logd had work to do, there are now seperate tests to check if there
are two many buffers in the incore log or if there are two many items on the
active items list.

This patch is a port of a patch Steve Whitehouse wrote about a year ago, with
some minor changes. Since gfs2_ail1_start always submits all the active items,
it no longer needs to keep track of the first ai submitted, so this has been
removed. In gfs2_log_reserve(), the order of the calls to
prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has
been switched. If it called wake_up first there was a small window for a race,
where logd could run and return before gfs2_log_reserve was ready to get woken
up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve()
would be left waiting for gfs2_logd to eventualy run because it timed out.
Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times
out, and flushes the log, can now be set on mount with ar_commit.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

5e687eac

11 3月, 2010 1 次提交

GFS2: Allow the number of committed revokes to temporarily be negative · 2e95e3f6

由 Benjamin Marzinski 提交于 3月 10, 2010

GFS2 tracks the number of revokes and unrevokes that are part of committed
transactions via sd_log_commited_revoke. It is possible for one process to add
revokes during its transaction, while another process unrevokes them during its
transaction. If the second process finishes its transaction first,
sd_log_commited_revoke will be decremented by the number of unrevokes that the
second process did, without first being incremented by the number of revokes
the first process did. This is fine, since all started transactions must be
completed before the journal can be flushed.  However, sd_log_commited_revoke
is an unsigned integer, and log_refund() causes an assertion failure if it
would go negative at the end of a transaction.  This patch makes
sd_log_commited_revoke a signed integer and allows it to go negative.
__gfs2_log_flush() still checks that it mataches the actual number of revokes.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

2e95e3f6

01 3月, 2010 2 次提交

GFS2: Remove loopy umount code · c1184f8a

由 Steven Whitehouse 提交于 1月 08, 2010

As a consequence of the previous patch, we can now remove the
loop which used to be required due to the circular dependency
between the inodes and glocks. Instead we can just invalidate
the inodes, and then clear up any glocks which are left.

Also we no longer need the rwsem since there is no longer any
danger of the inode invalidation calling back into the glock
code (and from there back into the inode code).
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

c1184f8a

GFS2: Metadata address space clean up · 009d8518

由 Steven Whitehouse 提交于 12月 08, 2009

Since the start of GFS2, an "extra" inode has been used to store
the metadata belonging to each inode. The only reason for using
this inode was to have an extra address space, the other fields
were unused. This means that the memory usage was rather inefficient.

The reason for keeping each inode's metadata in a separate address
space is that when glocks are requested on remote nodes, we need to
be able to efficiently locate the data and metadata which relating
to that glock (inode) in order to sync or sync and invalidate it
(depending on the remotely requested lock mode).

This patch adds a new type of glock, which has in addition to
its normal fields, has an address space. This applies to all
inode and rgrp glocks (but to no other glock types which remain
as before). As a result, we no longer need to have the second
inode.

This results in three major improvements:
 1. A saving of approx 25% of memory used in caching inodes
 2. A removal of the circular dependency between inodes and glocks
 3. No confusion between "normal" and "metadata" inodes in super.c

Although the first of these is the more immediately apparent, the
second is just as important as it now enables a number of clean
ups at umount time. Those will be the subject of future patches.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

009d8518

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功