提交 · dad30e9031c5927c30b402f73ac57ffbe09dc9ee · openanolis / cloud-kernel

24 4月, 2012 12 次提交

GFS2: Remove duplicate log code · dad30e90

由 Steven Whitehouse 提交于 4月 16, 2012

The main part of this patch merges the two functions used to
write metadata and data buffers to the log. Most of the code
is common between the two functions, so this provides a nice
clean up, and makes the code more readable.

The gfs2_get_log_desc() function is also extended to take two more
arguments, and thus avoid having to set the length and data1
fields of this strucuture as a separate operation.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

dad30e90

GFS2: Clean up log write code path · e8c92ed7

由 Steven Whitehouse 提交于 4月 16, 2012

Prior to this patch, we have two ways of sending i/o to the log.
One of those is used when we need to allocate both the data
to be written itself and also a buffer head to submit it. This
is done via sb_getblk and friends. This is used mostly for writing
log headers.

The other method is used when writing blocks which have some
in-place counterpart. This is the case for all the metadata
blocks which are journalled, and when journaled data is in use,
for unescaped journalled data blocks.

This patch replaces both of those two methods, and about half
a dozen separate i/o submission points with a single i/o
submission function. We also go direct to bio rather than
using buffer heads, since this allows us to build i/o
requests of the maximum size for the block device in
question. It also reduces the memory required for flushing
the log, which can be very useful in low memory situations.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e8c92ed7

GFS2: Use variable rather than qa to determine if unstuff necessary · 2f7ee358

由 Bob Peterson 提交于 4月 12, 2012

In the future, the qadata structure will be eliminated and merged
back in with the block reservation structure, after we extend the
lifespan of that. This patch is a step forward in eliminating the
qadata structure. It adds a variable to the do_grow function to
determine when unstuffing is necessary, and has been done.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

2f7ee358

GFS2: Change variable blk to biblk · 9598d25e

由 Bob Peterson 提交于 4月 12, 2012

In the resource group code, we have no less than three different
kinds of block references: block relative to the file system (u64),
block relative to the rgrp (u32), and block relative to the bitmap.
This is a small step to making the code more readable; it renames
variable blk to biblk to solidify in my mind that it's relative to
the bitmap and nothing else.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

9598d25e

GFS2: Fix function parameter comments in rgrp.c · 886b1416

由 Bob Peterson 提交于 4月 11, 2012

This patch just fixes a bunch of function parameter comments.
Slowly, over the years, the comments have gotten out of date
(mostly my fault, as I haven't been good at keeping them up to date).
This patch rectifies some of that.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

886b1416

GFS2: Eliminate offset parameter to gfs2_setbit · 29c578f5

由 Bob Peterson 提交于 4月 11, 2012

This patch eliminates a redundant parameter.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

29c578f5

GFS2: Use slab for block reservation memory · 36f5580b

由 Bob Peterson 提交于 4月 11, 2012

This patch changes block reservations so it uses slab storage.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

36f5580b

GFS2: make function gfs2_page_add_databufs static · b120193e

由 Bob Peterson 提交于 4月 11, 2012

This patch makes function gfs2_page_add_databufs static.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

b120193e

GFS2: Rename function gfs2_close to gfs2_release · df3fd117

由 Bob Peterson 提交于 4月 11, 2012

This patch renames function gfs2_close to gfs2_release.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

df3fd117

GFS2: Make gfs2_log_fake_buf() write the buffer too · 14e5f184

由 Steven Whitehouse 提交于 4月 03, 2012

Since we always write the buffer directly after this function
returns, we might as well merge it into here. This is a clean
up in preparation for some further updates to the log code
which are coming soon.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

14e5f184

GFS2: Drop "pull" argument from log_write_header() · fdb76a42

由 Steven Whitehouse 提交于 4月 02, 2012

The "pull" argument to log_write_header() is only used
for debug purposes and it is not really needed any more. There
are other tests for this particular problem, so I think we can
dispose of it in order to simplify the code.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

fdb76a42

GFS2: Instruct DLM to avoid queue convert slowdown · 4c569a72

由 Bob Peterson 提交于 4月 10, 2012

This patch instructs DLM to prevent an "in place" conversion, where the
lock just stays on the granted queue, and instead forces the conversion to
the back of the convert queue. This is done on upward conversions only.

This is useful in cases where, for example, a lock is frequently needed in
PR on one node, but another node needs it temporarily in EX to update it.
This may happen, for example, when the rindex is being updated by gfs2_grow.
The gfs2_grow needs to have the lock in EX, but the other nodes need to
re-read it to retrieve the updates. The glock is already granted in PR on
the non-growing nodes, so this prevents them from continually re-granting
the lock in PR, and forces the EX from gfs2_grow to go through.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4c569a72

10 4月, 2012 1 次提交

GFS2: Allow caching of rindex glock · ca9248d8

由 Bob Peterson 提交于 4月 10, 2012

This patch allows caching of the rindex glock. We were previously
setting the GL_NOCACHE bit when the glock was released. That forced
the rindex inode to be invalidated, which caused us to re-read
rindex at the next access. However, it caused the glock to be
unnecessarily bounced around the cluster. This patch allows
the glock to remain cached, but it still causes the rindex to be
re-read once it has been written to by gfs2_grow.

Ben and I have tested single-node gfs2_grow cases and I've tested
clustered gfs2_grow cases on my four-node cluster.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

ca9248d8

05 4月, 2012 1 次提交

GFS2: Make sure rindex is uptodate before starting transactions · 5e2f7d61

由 Bob Peterson 提交于 4月 04, 2012

This patch removes the call from gfs2_blk2rgrd to function
gfs2_rindex_update and replaces it with individual calls.
The former way turned out to be too problematic.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

5e2f7d61

01 4月, 2012 1 次提交
- A
  get rid of pointless includes of ext2_fs.h · 2f99c369
  由 Al Viro 提交于 3月 23, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  2f99c369
26 3月, 2012 2 次提交

GFS2: use depends instead of select in kconfig · 97cc008a

由 Benjamin Poirier 提交于 3月 23, 2012

Avoids having to duplicate the dependencies of what is 'select'ed (and on
down...)

Those dependencies are currently incomplete, leading to broken builds with
GFS2_FS_LOCKING_DLM=y and IP_SCTP=n.
Signed-off-by: NBenjamin Poirier <bpoirier@suse.de>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

97cc008a

GFS2: put glock reference in error patch of read_rindex_entry · c1ac539e

由 Bob Peterson 提交于 3月 22, 2012

This patch fixes the error path of function read_rindex_entry
so that it correctly gives up its glock reference in cases where
there is a race to re-read the rindex after gfs2_grow.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

c1ac539e

21 3月, 2012 1 次提交
- A
  switch open-coded instances of d_make_root() to new helper · 48fde701
  由 Al Viro 提交于 1月 08, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  48fde701
20 3月, 2012 2 次提交

C
gfs2: remove the second argument of k[un]map_atomic() · d9349285
由 Cong Wang 提交于 11月 25, 2011
```
Signed-off-by: NCong Wang <amwang@redhat.com>
```
d9349285

GFS2: Change truncate page allocation to be GFP_NOFS · 220cca2a

由 Bob Peterson 提交于 3月 19, 2012

This patch changes the page allocation in gfs2_block_truncate_page
and two others to GFP_NOFS to avoid deadlock in low-memory conditions.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

220cca2a

09 3月, 2012 2 次提交

GFS2: call gfs2_write_alloc_required for each chunk · 58a7d5fb

由 Benjamin Marzinski 提交于 3月 08, 2012

gfs2_fallocate was calling gfs2_write_alloc_required() once at the start of
the function. This caused problems since gfs2_write_alloc_required used a
long unsigned int for the len, but gfs2_fallocate could allocate a much
larger amount. This patch will move the call into the loop where the
chunks are actually allocated and zeroed out. This will keep the allocation
size under the limit, and also allow gfs2_fallocate to quickly skip over
sections of the file that are already completely allocated.

fallcate_chunk was also not correctly setting the file size. It was using the
len veriable to find the last block written to, but by the time it was setting
the size, the len variable had already been decremented to 0.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

58a7d5fb

GFS2: Clean up log flush header writing · 34cc1781

由 Steven Whitehouse 提交于 3月 09, 2012

We already send both a pre and post flush to the block device
when writing a journal header. There is no need to wait for
the previous I/O specifically when we do this, unless we've
turned "barriers" off.

As a side effect, this also cleans up the code path for flushing
the journal and makes it more readable.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

34cc1781

08 3月, 2012 1 次提交

GFS2: Remove a __GFP_NOFAIL allocation · 75ca61c1

由 Steven Whitehouse 提交于 3月 08, 2012

In order to ensure that we've got enough buffer heads for flushing
the journal, the orignal code used __GFP_NOFAIL when performing
this allocation. Here we dispense with that in favour of using a
mempool. This should improve efficiency in low memory conditions
since flushing the journal is a good way to get memory back, we
don't want to be spinning, waiting on memory allocations. The
buffers which are allocated via this mempool are fairly short lived,
so that we'll recycle them pretty quickly.

Although there are other memory allocations which occur during the
journal flush process, this is the one which can potentially require
the most memory, so the most important one to fix.

The amount of memory reserved is a fixed amount, and we should not need
to scale it when there are a greater number of filesystems in use.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

75ca61c1

07 3月, 2012 1 次提交

GFS2: Flush pending glock work when evicting an inode · 35e478f4

由 Steven Whitehouse 提交于 3月 07, 2012

This ensures that we will not try to access the inode thats
being flushed via the glock after it has been freed.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

35e478f4

05 3月, 2012 2 次提交

GFS2: make sure rgrps are up to date in func gfs2_blk2rgrpd · 58884c4d

由 Bob Peterson 提交于 3月 05, 2012

This patch adds a call to gfs2_rindex_update from function gfs2_blk2rgrpd
and removes calls to it that are made redundant by it. The problem is
that a gfs2_grow can add rgrps to the rindex, then put those rgrps into
use, thus rendering the rindex we read in at mount time incomplete.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

58884c4d

GFS2: Eliminate sd_rindex_mutex · 6aad1c3d

由 Bob Peterson 提交于 3月 05, 2012

Over time, we've slowly eliminated the use of sd_rindex_mutex.
Up to this point, it was only used in two places: function
gfs2_ri_total (which totals the file system size by reading
and parsing the rindex file) and function gfs2_rindex_update
which updates the rgrps in memory. Both of these functions have
the rindex glock to protect them, so the rindex is unnecessary.
Since gfs2_grow writes to the rindex via the meta_fs, the mutex
is in the wrong order according to the normal rules. This patch
eliminates the mutex entirely to avoid the problem.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

6aad1c3d

01 3月, 2012 1 次提交

GFS2: Unlock rindex mutex on glock error · a08fd280

由 Bob Peterson 提交于 2月 29, 2012

This patch fixes an error path in function gfs2_rindex_update
that leaves the rindex mutex held.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a08fd280

29 2月, 2012 5 次提交

GFS2: Make bd_cmp() static · 08728f2d

由 Steven Whitehouse 提交于 2月 21, 2012

Add missing static to bd_cmp()
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

08728f2d

GFS2: Sort the ordered write list · 4a36d08d

由 Bob Peterson 提交于 2月 14, 2012

This patch sorts the ordered write list for GFS2 writes.
This increases the throughput for simultaneous writes.
For example, if you have ten processes, all doing:
dd if=/dev/zero of=/mnt/gfs2/fileX
on different files, the throughput will be much better.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4a36d08d

GFS2: FITRIM ioctl support · 66fc061b

由 Steven Whitehouse 提交于 2月 08, 2012

The FITRIM ioctl provides an alternative way to send discard requests to
the underlying device. Using the discard mount option results in every
freed block generating a discard request to the block device. This can
be slow, since many block devices can only process discard requests of
larger sizes, and also such operations can be time consuming.

Rather than using the discard mount option, FITRIM allows a sweep of the
filesystem on an occasional basis, and also to optionally avoid sending
down discard requests for smaller regions.

In GFS2 FITRIM will work at resource group granularity. There is a flag
for each resource group which keeps track of which resource groups have
been trimmed. This flag is reset whenever a deallocation occurs in the
resource group, and set whenever a successful FITRIM of that resource
group has taken place. This helps to reduce repeated discard requests
for the same block ranges, again improving performance.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

66fc061b

GFS2: Move two functions from log.c to lops.c · 47ac5537

由 Steven Whitehouse 提交于 2月 03, 2012

gfs2_log_get_buf() and gfs2_log_fake_buf() are both used
only in lops.c, so move them next to their callers and they
can then become static.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

47ac5537

GFS2: glock statistics gathering · a245769f

由 Steven Whitehouse 提交于 1月 20, 2012

The stats are divided into two sets: those relating to the
super block and those relating to an individual glock. The
super block stats are done on a per cpu basis in order to
try and reduce the overhead of gathering them. They are also
further divided by glock type.

In the case of both the super block and glock statistics,
the same information is gathered in each case. The super
block statistics are used to provide default values for
most of the glock statistics, so that newly created glocks
should have, as far as possible, a sensible starting point.

The statistics are divided into three pairs of mean and
variance, plus two counters. The mean/variance pairs are
smoothed exponential estimates and the algorithm used is
one which will be very familiar to those used to calculation
of round trip times in network code.

The three pairs of mean/variance measure the following
things:

 1. DLM lock time (non-blocking requests)
 2. DLM lock time (blocking requests)
 3. Inter-request time (again to the DLM)

A non-blocking request is one which will complete right
away, whatever the state of the DLM lock in question. That
currently means any requests when (a) the current state of
the lock is exclusive (b) the requested state is either null
or unlocked or (c) the "try lock" flag is set. A blocking
request covers all the other lock requests.

There are two counters. The first is there primarily to show
how many lock requests have been made, and thus how much data
has gone into the mean/variance calculations. The other counter
is counting queueing of holders at the top layer of the glock
code. Hopefully that number will be a lot larger than the number
of dlm lock requests issued.

So why gather these statistics? There are several reasons
we'd like to get a better idea of these timings:

1. To be able to better set the glock "min hold time"
2. To spot performance issues more easily
3. To improve the algorithm for selecting resource groups for
allocation (to base it on lock wait time, rather than blindly
using a "try lock")
Due to the smoothing action of the updates, a step change in
some input quantity being sampled will only fully be taken
into account after 8 samples (or 4 for the variance) and this
needs to be carefully considered when interpreting the
results.

Knowing both the time it takes a lock request to complete and
the average time between lock requests for a glock means we
can compute the total percentage of the time for which the
node is able to use a glock vs. time that the rest of the
cluster has its share. That will be very useful when setting
the lock min hold time.

The other point to remember is that all times are in
nanoseconds. Great care has been taken to ensure that we
measure exactly the quantities that we want, as accurately
as possible. There are always inaccuracies in any
measuring system, but I hope this is as accurate as we
can reasonably make it.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a245769f

28 2月, 2012 4 次提交

GFS2: Read resource groups on mount · a365fbf3

由 Steven Whitehouse 提交于 2月 24, 2012

This makes mount take slightly longer, but at the same time, the first
write to the filesystem will be faster too. It also means that if there
is a problem in the resource index, then we can refuse to mount rather
than having to try and report that when the first write occurs.

In addition, to avoid recursive locking, we hvae to take account of
instances when the rindex glock may already be held when we are
trying to update the rbtree of resource groups.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a365fbf3

GFS2: Ensure rindex is uptodate for fallocate · 9e73f571

由 Bob Peterson 提交于 2月 17, 2012

This patch fixes a problem whereby gfs2_grow was failing and causing GFS2
to assert. The problem was that when GFS2's fallocate operation tried to
acquire an "allocation" it made sure the rindex was up to date, and if not,
it called gfs2_rindex_update. However, if the file being fallocated was
the rindex itself, it was already locked at that point. By calling
gfs2_rindex_update at an earlier point in time, we bring rindex up to date
and thereby avoid trying to lock it when the "allocation" is acquired.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

9e73f571

GFS2: Read in rindex if necessary during unlink · 718b97bd

由 Bob Peterson 提交于 2月 16, 2012

This patch fixes a problem whereby you were unable to delete
files until other file system operations were done (such as
statfs, touch, writes, etc.) that caused the rindex to be
read in.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

718b97bd

GFS2: Fix race between lru_list and glock ref count · 4043b886

由 Steven Whitehouse 提交于 1月 16, 2012

This patch fixes a narrow race window between the glock ref count
hitting zero and glocks being removed from the lru_list.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4043b886

11 1月, 2012 4 次提交

GFS2: Fix nlink setting on inode creation · 66ad863b

由 Steven Whitehouse 提交于 1月 11, 2012

Since the nlink count will be 0, we need to use set_nlink rather
than inc_nlink in order to avoid triggering the inc_nlink warning
which was added recently.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

66ad863b

GFS2: fail mount if journal recovery fails · 376d3778

由 David Teigland 提交于 1月 09, 2012

If the first mounter fails to recover one of the journals
during mount, the mount should fail.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

376d3778

GFS2: let spectator mount do read only recovery · e8ca5cc5

由 David Teigland 提交于 1月 09, 2012

Previously, a spectator mount would not even attempt to do
journal recovery for a failed node.  This meant that if all
mounted nodes were spectators, everyone would be stuck after
a node failed, all waiting for recovery to be performed.
This is unnecessary since the failed node had a clean journal.

Instead, allow a spectator mount to do a partial "read only"
recovery, which means it will check if the failed journal is
clean, and if so, report a successful recovery.  If the failed
journal is not clean, it reports that journal recovery failed.
This makes it work the same as a read only mount on a read only
block device.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e8ca5cc5

GFS2: Fix a use-after-free that coverity spotted · 49528b4e

由 Bob Peterson 提交于 1月 06, 2012

In function gfs2_inplace_release it was trying to unlock a gfs2_holder
structure associated with a reservation, after said reservation was
freed. The problem is that the statements have the wrong order.
This patch corrects the order so that the reservation is freed after
the gfs2_holder is unlocked.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

49528b4e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功