提交 · 01b172b7b10146cf5f02604047bee065cfb49946 · openanolis / cloud-kernel

12 3月, 2014 1 次提交

GFS2: Ensure workqueue is scheduled after noexp request · 01b172b7

由 Bob Peterson 提交于 3月 12, 2014

This patch closes a small timing window whereby a request to hold the
transaction glock can get stuck. The problem is that after the DLM has
granted the lock, it can get into a state whereby it doesn't transition
the glock to a held state, due to not having requeued the glock state
machine to finish the transition.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

01b172b7

07 3月, 2014 2 次提交

GFS2: Use pr_<level> more consistently · d77d1b58

由 Joe Perches 提交于 3月 06, 2014

Add pr_fmt, remove embedded "GFS2: " prefixes.
This now consistently emits lower case "gfs2: " for each message.

Other miscellanea around these changes:

o Add missing newlines
o Coalesce formats
o Realign arguments
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

d77d1b58

GFS2: global conversion to pr_foo() · fc554ed3

由 Fabian Frederick 提交于 3月 05, 2014

-All printk(KERN_foo converted to pr_foo().
-Messages updated to fit in 80 columns.
-fs_macros converted as well.
-fs_printk removed.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

fc554ed3

16 1月, 2014 1 次提交

GFS2: Don't use ENOBUFS when ENOMEM is the correct error code · ac3beb6a

由 Steven Whitehouse 提交于 1月 16, 2014

Al Viro has tactfully pointed out that we are using the incorrect
error code in some cases. This patch fixes that, and also removes
the (unused) return value for glock dumping.

>        * gfs2_iget() - ENOBUFS instead of ENOMEM.  ENOBUFS is
> "No buffer space available (POSIX.1 (XSI STREAMS option))" and since
> we don't support STREAMS it's probably fair game, but... what the hell?
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>

ac3beb6a

02 1月, 2014 1 次提交

GFS2: Fix unsafe dereference in dump_holder() · 0b3a2c99

由 Tetsuo Handa 提交于 1月 02, 2014

GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that
RCU read lock is held when using task_struct returned from pid_task().
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

0b3a2c99

21 11月, 2013 1 次提交

GFS2: fix potential NULL pointer dereference · e3c4269d

由 Michal Nazarewicz 提交于 11月 12, 2013

Commit [e66cf161: GFS2: Use lockref for glocks] replaced call:
    atomic_read(&gi->gl->gl_ref) == 0
with:
    __lockref_is_dead(&gl->gl_lockref)
therefore changing how gl is accessed, from gi->gl to plan gl.
However, gl can be a NULL pointer, and so gi->gl needs to be
used instead (which is guaranteed not to be NULL because fo
the while loop checking that condition).
Signed-off-by: NMichal Nazarewicz <mina86@mina86.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e3c4269d

15 10月, 2013 1 次提交

GFS2: Use lockref for glocks · e66cf161

由 Steven Whitehouse 提交于 10月 15, 2013

Currently glocks have an atomic reference count and also a spinlock
which covers various internal fields, such as the state. This intent of
this patch is to replace the spinlock and the atomic reference count
with a lockref structure. This contains a spinlock which we can continue
to use as before, and a reference counter which is used in conjuction
with the spinlock to replace the previous atomic counter.

As a result of this there are some new rules for reference counting on
glocks. We need to distinguish between reference count changes under
gl_spin (which are now just increment or decrement of the new counter,
provided the count cannot hit zero) and those which are outside of
gl_spin, but which now take gl_spin internally.

The conversion is relatively straight forward. There is probably some
further clean up which can be done, but the priority at this stage is to
make the change in as simple a manner as possible.

A consequence of this change is that the reference count is being
decoupled from the lru list processing. This should allow future
adoption of the lru_list code with glocks in due course.

The reason for using the "dead" state and not just relying on 0 being
the "invalid state" is so that in due course 0 ref counts can be
allowable. The intent is to eventually be able to remove the ref count
changes which are currently hidden away in state_change().
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e66cf161

11 9月, 2013 2 次提交

fs: convert fs shrinkers to new scan/count API · 1ab6c499

由 Dave Chinner 提交于 8月 28, 2013

Convert the filesystem shrinkers to use the new API, and standardise some
of the behaviours of the shrinkers at the same time.  For example,
nr_to_scan means the number of objects to scan, not the number of objects
to free.

I refactored the CIFS idmap shrinker a little - it really needs to be
broken up into a shrinker per tree and keep an item count with the tree
root so that we don't need to walk the tree every time the shrinker needs
to count the number of objects in the tree (i.e.  all the time under
memory pressure).

[glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
[assorted fixes folded in]
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NGlauber Costa <glommer@openvz.org>
Acked-by: NMel Gorman <mgorman@suse.de>
Acked-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: NJan Kara <jack@suse.cz>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1ab6c499

super: fix calculation of shrinkable objects for small numbers · 55f841ce

由 Glauber Costa 提交于 8月 28, 2013

The sysctl knob sysctl_vfs_cache_pressure is used to determine which
percentage of the shrinkable objects in our cache we should actively try
to shrink.

It works great in situations in which we have many objects (at least more
than 100), because the aproximation errors will be negligible.  But if
this is not the case, specially when total_objects < 100, we may end up
concluding that we have no objects at all (total / 100 = 0, if total <
100).

This is certainly not the biggest killer in the world, but may matter in
very low kernel memory situations.
Signed-off-by: NGlauber Costa <glommer@openvz.org>
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

55f841ce

04 9月, 2013 1 次提交

GFS2: Remove unnecessary memory barrier · 068213f7

由 Bob Peterson 提交于 7月 25, 2013

Function test_and_clear_bit implies a memory barrier, so subsequent
memory barriers are unnecessary.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

068213f7

20 8月, 2013 1 次提交

GFS2: Take glock reference in examine_bucket() · 7286b31e

由 Steven Whitehouse 提交于 8月 20, 2013

We need to check the glock ref counter in a race free way
in order to ensure that the gfs2_glock_hold() call will
succeed. The easiest way to do that is to simply take the
reference count early in the common code of examine_bucket,
skipping any glocks with zero ref count.

That means that the examiner functions all need to put their
reference on the glock once they've performed their function.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Reported-by: NDavid Teigland <teigland@redhat.com>
Tested-by: NDavid Teigland <teigland@redhat.com>

7286b31e

19 8月, 2013 1 次提交

GFS2: alloc_workqueue() doesn't return an ERR_PTR · dfc4616d

由 Dan Carpenter 提交于 8月 15, 2013

alloc_workqueue() returns a NULL on error, it doesn't return an ERR_PTR.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

dfc4616d

29 4月, 2013 1 次提交

gfs2: Convert print_symbol to %pSR · 7af584d3

由 Joe Perches 提交于 12月 12, 2012

Use the new vsprintf extension to avoid any possible
message interleaving.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

7af584d3

26 4月, 2013 1 次提交

GFS2: Flush work queue before clearing glock hash tables · 222cb538

由 Bob Peterson 提交于 4月 25, 2013

There was a timing window when a GFS2 file system was unmounted
that caused GFS2 to call BUG() and panic the kernel. The call
to BUG() is meant to ensure that the glock reference count,
gl_ref, never gets down to zero and bounce back up again. What was
happening during umount is that function gfs2_put_super was dequeing
its glocks for well-known files. In particular, we saw it on the
journal glock, sd_jinode_gh. The dequeue caused delayed work to be
queued for the glock state machine, to transition the lock to an
"unlocked" state. While the work was still queued, gfs2_put_super
called gfs2_gl_hash_clear to clear out the glock hash tables.
If the timing was just so, the glock work function would drop the
reference count at the time when it was being checked for zero,
and that caused BUG() to be called. This patch calls
flush_workqueue before clearing the glock hash tables, thereby
ensuring that the delayed work is executed before the hash tables
are cleared, and therefore the reference count never goes to zero
until the glock is cleared.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

222cb538

10 4月, 2013 2 次提交

GFS2: Add origin indicator to glock demote tracing · 7bd8b2eb

由 Steven Whitehouse 提交于 4月 10, 2013

This adds the origin indicator to the trace point for glock
demotion, so that it is possible to see where demote requests
have come from.

Note that requests generated from the demote_rq sysfs interface
will show as remote, since they are intended to replicate
exactly the effect of a demote reuqest from a remote node. It
is still possible to tell these apart by looking at the process
which initiated the demote request.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

7bd8b2eb

GFS2: Add origin indicator to glock callbacks · 81ffbf65

由 Steven Whitehouse 提交于 4月 10, 2013

This patch adds a bool indicating whether the demote
request was originated locally or remotely. This is then
used by the iopen ->go_callback() to make 100% sure that
it will only respond to remote callbacks.

Since ->evict_inode() uses GL_NOCACHE when it attempts to
get an exclusive lock on the iopen lock, this may result
in extra scheduling of the workqueue in case that the
exclusive promotion request failed. This patch prevents
that from happening.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

81ffbf65

08 4月, 2013 1 次提交

GFS2: Remove gfs2_refresh_inode from inode creation path · 28fb3027

由 Steven Whitehouse 提交于 2月 26, 2013

The original method for creating inodes used in GFS2 was to fill
out a buffer, with all the information, and then to read that
buffer into the in-core inode, using gfs2_refresh_inode()

The problem with this approach is that all the inode's fields
need to be calculated ahead of time, and were stored in various
variables making the code rather complicated.

The new approach is simply to allocate the in-core inode earlier
and fill in as many fields as possible ahead of time. These can
then be used to initilise the on disk representation. The
code has been working towards the point where it is possible
to remove gfs2_refresh_inode() because all the fields are
correctly initialised ahead of time. We've now reached that
milestone, and have reversed the order of setting up the in
core and on disk inodes.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

28fb3027

02 2月, 2013 1 次提交

GFS2: Split glock lru processing into two parts · 4506a519

由 Steven Whitehouse 提交于 2月 01, 2013

The intent here is to split the processing of the glock lru
list into two parts, so that the selection of glocks and the
disposal are separate functions. The plan is then, that further
updates can then be made to these functions in the future
to improve the selection of glocks and also the efficiency of
glock disposal.

The new feature which this patch brings is sorting the
glocks to be disposed of into glock number (and thus also
disk block number) order. Not all glocks will need i/o in
order to dispose of them, but some will, and at least we'll
generate mostly disk block order i/o now.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4506a519

29 1月, 2013 1 次提交

GFS2: Separate LRU scanning from shrinker · 2a005855

由 Steven Whitehouse 提交于 12月 14, 2012

This breaks out the LRU scanning function from the shrinker in
preparation for adding other callers to the LRU scanner.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

2a005855

12 12月, 2012 1 次提交

mm: redefine address_space.assoc_mapping · 252aa6f5

由 Rafael Aquini 提交于 12月 11, 2012

Overhaul struct address_space.assoc_mapping renaming it to
address_space.private_data and its type is redefined to void*.  By this
approach we consistently name the .private_* elements from struct
address_space as well as allow extended usage for address_space
association with other data structures through ->private_data.

Also, all users of old ->assoc_mapping element are converted to reflect
its new name and type change (->private_data).
Signed-off-by: NRafael Aquini <aquini@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

252aa6f5

15 11月, 2012 2 次提交

GFS2: remove redundant lvb pointer · 4e2f8849

由 David Teigland 提交于 11月 14, 2012

The lksb struct already contains a pointer to the lvb,
so another directly from the glock struct is not needed.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4e2f8849

GFS2: only use lvb on glocks that need it · dba2d70c

由 David Teigland 提交于 11月 14, 2012

Save the effort of allocating, reading and writing
the lvb for most glocks that do not use it.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

dba2d70c

14 11月, 2012 1 次提交

GFS2: skip dlm_unlock calls in unmount · fb6791d1

由 David Teigland 提交于 11月 13, 2012

When unmounting, gfs2 does a full dlm_unlock operation on every
cached lock. This can create a very large amount of work and can
take a long time to complete. However, the vast majority of these
dlm unlock operations are unnecessary because after all the unlocks
are done, gfs2 leaves the dlm lockspace, which automatically clears
the locks of the leaving node, without unlocking each one individually.
So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
remove the locks implicitly. The one exception is when the lock's lvb is
being used. In this case, dlm_unlock is called because it may update the
lvb of the resource.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

fb6791d1

07 11月, 2012 2 次提交

GFS2: Rename glops go_xmote_th to go_sync · 06dfc306

由 Bob Peterson 提交于 10月 24, 2012

[Editorial: This is a nit, but has been a minor irritation for a long time:]

This patch renames glops structure item for go_xmote_th to go_sync.
The functionality is unchanged; it's just for readability.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

06dfc306

GFS2: Review bug traps in glops.c · 8eae1ca0

由 Steven Whitehouse 提交于 10月 15, 2012

Two of the bug traps here could really be warnings. The others are
converted from BUG() to GLOCK_BUG_ON() since we'll most likely
need to know the glock state in order to debug any issues which
arise. As a result of this, __dump_glock has to be renamed and
is no longer static.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

8eae1ca0

24 9月, 2012 4 次提交

GFS2: Eliminate redundant calls to may_grant · e5dc76b9

由 Bob Peterson 提交于 8月 09, 2012

Function add_to_queue was checking may_grant for the passed-in
holder for every iteration of its gh2 loop. Now it only checks it
once at the beginning to see if a try lock is futile.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e5dc76b9

GFS2: Combine functions gfs2_glock_dq_wait and wait_on_demote · 81e1d450

由 Bob Peterson 提交于 8月 09, 2012

Function gfs2_glock_dq_wait called two-line function wait_on_demote,
so they were combined.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

81e1d450

GFS2: Combine functions gfs2_glock_wait and wait_on_holder · 07a79049

由 Bob Peterson 提交于 8月 09, 2012

Function gfs2_glock_wait only called function wait_on_holder and
returned its return code, so they were combined for readability.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

07a79049

GFS2: inline __gfs2_glock_schedule_for_reclaim · 4abb6ad9

由 Bob Peterson 提交于 8月 09, 2012

Since function gfs2_glock_schedule_for_reclaim is only two
significant lines, we can eliminate it, simplifying the code
and making it more readable.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4abb6ad9

11 6月, 2012 2 次提交

GFS2: Size seq_file buffer more carefully · 0fe2f1e9

由 Steven Whitehouse 提交于 6月 11, 2012

This places a limit on the buffer size for archs with larger
PAGE_SIZE.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>

0fe2f1e9

GFS2: Use seq_vprintf for glocks debugfs file · 1bb49303

由 Steven Whitehouse 提交于 6月 11, 2012

Make use of the newly added seq_vprintf() function.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>

1bb49303

08 6月, 2012 2 次提交

GFS2: Use lvbs for storing rgrp information with mount option · 90306c41

由 Benjamin Marzinski 提交于 5月 29, 2012

Instead of reading in the resource groups when gfs2 is checking
for free space to allocate from, gfs2 can store the necessary infromation
in the resource group's lvb. Also, instead of searching for unlinked
inodes in every resource group that's checked for free space, gfs2 can
store the number of unlinked but inodes in the lvb, and only check for
unlinked inodes if it will find some.

The first time a resource group is locked, the lvb must initialized.
Since this involves counting the unlinked inodes in the resource group,
this takes a little extra time. But after that, if the resource group
is locked with GL_SKIP, the buffer head won't be read in unless it's
actually needed.

Enabling the resource groups lvbs is done via the rgrplvb mount option. If
this option isn't set, the lvbs will still be set and updated, but they won't
be verfied or used by the filesystem. To safely turn on this option, all of
the nodes mounting the filesystem must be running code with this patch, and
the filesystem must have been completely unmounted since they were updated.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

90306c41

GFS2: Cache last hash bucket for glock seq_files · ba1ddcb6

由 Steven Whitehouse 提交于 6月 08, 2012

For the glocks and glstats seq_files, which are exposed via debugfs
we should cache the most recent hash bucket, along with the offset
into that bucket. This allows us to restart from that point, rather
than having to begin at the beginning each time.

This is an idea from Eric Dumazet, however I've slightly extended it
so that if the position from which we are due to start is at any
point beyond the last cached point, we start from the last cached
point, plus whatever is the appropriate offset. I don't really expect
people to be lseeking around these files, but if they did so with only
positive offsets, then we'd still get some of the benefit of using a
cached offset.

With my simple test of around 200k entries in the file, I'm seeing
an approx 10x speed up.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

ba1ddcb6

07 6月, 2012 1 次提交

GFS2: Increase buffer size for glocks and glstats debugfs files · df5d2f55

由 Steven Whitehouse 提交于 6月 07, 2012

As per Al Viro's suggestion, this increases the buffer size used
for these two files. This provides a speed up of slightly less than
8x (i.e. proportional to the buffer size) for cases when we have
large numbers of glocks.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

df5d2f55

29 2月, 2012 1 次提交

GFS2: glock statistics gathering · a245769f

由 Steven Whitehouse 提交于 1月 20, 2012

The stats are divided into two sets: those relating to the
super block and those relating to an individual glock. The
super block stats are done on a per cpu basis in order to
try and reduce the overhead of gathering them. They are also
further divided by glock type.

In the case of both the super block and glock statistics,
the same information is gathered in each case. The super
block statistics are used to provide default values for
most of the glock statistics, so that newly created glocks
should have, as far as possible, a sensible starting point.

The statistics are divided into three pairs of mean and
variance, plus two counters. The mean/variance pairs are
smoothed exponential estimates and the algorithm used is
one which will be very familiar to those used to calculation
of round trip times in network code.

The three pairs of mean/variance measure the following
things:

 1. DLM lock time (non-blocking requests)
 2. DLM lock time (blocking requests)
 3. Inter-request time (again to the DLM)

A non-blocking request is one which will complete right
away, whatever the state of the DLM lock in question. That
currently means any requests when (a) the current state of
the lock is exclusive (b) the requested state is either null
or unlocked or (c) the "try lock" flag is set. A blocking
request covers all the other lock requests.

There are two counters. The first is there primarily to show
how many lock requests have been made, and thus how much data
has gone into the mean/variance calculations. The other counter
is counting queueing of holders at the top layer of the glock
code. Hopefully that number will be a lot larger than the number
of dlm lock requests issued.

So why gather these statistics? There are several reasons
we'd like to get a better idea of these timings:

1. To be able to better set the glock "min hold time"
2. To spot performance issues more easily
3. To improve the algorithm for selecting resource groups for
allocation (to base it on lock wait time, rather than blindly
using a "try lock")
Due to the smoothing action of the updates, a step change in
some input quantity being sampled will only fully be taken
into account after 8 samples (or 4 for the variance) and this
needs to be carefully considered when interpreting the
results.

Knowing both the time it takes a lock request to complete and
the average time between lock requests for a glock means we
can compute the total percentage of the time for which the
node is able to use a glock vs. time that the rest of the
cluster has its share. That will be very useful when setting
the lock min hold time.

The other point to remember is that all times are in
nanoseconds. Great care has been taken to ensure that we
measure exactly the quantities that we want, as accurately
as possible. There are always inaccuracies in any
measuring system, but I hope this is as accurate as we
can reasonably make it.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a245769f

28 2月, 2012 1 次提交

GFS2: Fix race between lru_list and glock ref count · 4043b886

由 Steven Whitehouse 提交于 1月 16, 2012

This patch fixes a narrow race window between the glock ref count
hitting zero and glocks being removed from the lru_list.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4043b886

11 1月, 2012 1 次提交

GFS2: dlm based recovery coordination · e0c2a9aa

由 David Teigland 提交于 1月 09, 2012

This new method of managing recovery is an alternative to
the previous approach of using the userland gfs_controld.

- use dlm slot numbers to assign journal id's
- use dlm recovery callbacks to initiate journal recovery
- use a dlm lock to determine the first node to mount fs
- use a dlm lock to track journals that need recovery
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e0c2a9aa

15 7月, 2011 1 次提交

GFS2: Automatically adjust glock min hold time · 7cf8dcd3

由 Bob Peterson 提交于 6月 15, 2011

This patch is a performance improvement for GFS2 in a clustered
environment. It makes the glock hold time self-adjusting.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

7cf8dcd3

25 5月, 2011 2 次提交

vmscan: change shrinker API by passing shrink_control struct · 1495f230

由 Ying Han 提交于 5月 24, 2011

Change each shrinker's API by consolidating the existing parameters into
shrink_control struct.  This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: NYing Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: NPavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1495f230

GFS2: Processes waiting on inode glock that no processes are holding · f90e5b5b

由 Bob Peterson 提交于 5月 24, 2011

This patch fixes a race in the GFS2 glock state machine that may
result in lockups.  The symptom is that all nodes but one will
hang, waiting for a particular glock.  All the holder records
will have the "W" (Waiting) bit set.  The other node will
typically have the glock stuck in Exclusive mode (EX) with no
holder records, but the dinode will be cached.  In other words,
an entry with "I:" will appear in the glock dump for that glock,
but nothing else.

The race has to do with the glock "Pending Demote" bit, which
can be set, then immediately reset, thus losing the fact that
another node needs the glock.  The sequence of events is:

1. Something schedules the glock workqueue (e.g. glock request from fs)
2. The glock workqueue gets to the point between the test of the reply pending
bit and the spin lock:

        if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
                finish_xmote(gl, gl->gl_reply);
                drop_ref = 1;
        }
        down_read(&gfs2_umount_flush_sem);         <---- i.e. here
        spin_lock(&gl->gl_spin);

3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
            (b) the demote request which sets GLF_PENDING_DEMOTE

4. The following test is executed:

        if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
            gl->gl_state != LM_ST_UNLOCKED &&
            gl->gl_demote_state != LM_ST_EXCLUSIVE) {

This resets the pending demote flag, and gl->gl_demote_state is not equal to
exclusive, however because the reply from the dlm arrived after we checked for
the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so
although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.

The patch closes the timing window by only transitioning the
"Pending demote" bit to the "demote" flag once we know the
other conditions (not unlocked and not exclusive) are met.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

f90e5b5b

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功