提交 · 504913fbc84c00bba7224d73e4aab525c1731f7d · openeuler / raspberrypi-kernel

25 10月, 2010 5 次提交

NFS: ask for layouttypes during v4 fsinfo call · 504913fb

由 Andy Adamson 提交于 10月 20, 2010

This information will be used to determine which layout driver,
if any, to use for subsequent IO on this filesystem.  Each driver
is assigned an integer id, with 0 reserved to indicate no driver.

The server can in theory return multiple ids.  However, our current
client implementation only notes the first entry and ignores the
rest.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

504913fb

NFS: change stateid to be a union · 94499252

由 Alexandros Batsakis 提交于 10月 20, 2010

In NFSv4.1 the stateid consists of the other and seqid fields. For layout
processing we need to numerically compare the seqid value of layout stateids.
To do so, introduce a union to nfs4_stateid to switch between opaque(16 bytes)
and opaque(12 bytes) / __be32
Signed-off-by: NAlexandros Batsakis <batsakis@netapp.com>
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

94499252

NFSv4.1: pnfsd, pnfs: protocol level pnfs constants · c772567d

由 Dean Hildebrand 提交于 10月 20, 2010

Use only layoutreturn constant for both returns and recalls.
(return_* works better for recall_type rather the other way around)
Signed-off-by: NDean Hildebrand <dhildebz@umich.edu>
Signed-off-by: NMarc Eshel <eshel@almaden.ibm.com>
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c772567d

SUNRPC: define xdr_decode_opaque_fixed · 35b61e63

由 Benny Halevy 提交于 10月 20, 2010

A helper for decoding a fixed length opaque value.
Returns a pointer to the next item in the xdr stream.
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

35b61e63

Revalidate caches on lock · 6b96724e

由 Ricardo Labiaga 提交于 10月 12, 2010

Instead of blindly zapping the caches, attempt to revalidate them if
the server has indicated that it uses high resolution timestamps.

NFSv4 should be able to always revalidate the cache since the
protocol requires the update of the change attribute on modification of
the data. In reality, there are servers (the Linux NFS server
for example) that do not obey this requirement and use ctime as the
basis for change attribute. Long term, the server needs to be fixed.
At this time, and to be on the safe side, continue zapping caches if
the server indicates that it does not have a high resolution timestamp.
Signed-off-by: NRicardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

6b96724e

24 10月, 2010 4 次提交

NFS: Readdir plus in v4 · 82f2e547

由 Bryan Schumaker 提交于 10月 21, 2010

By requsting more attributes during a readdir, we can mimic the readdir plus
operation that was in NFSv3.

To test, I ran the command `ls -lU --color=none` on directories with various
numbers of files.  Without readdir plus, I see this:

n files |    100    |   1,000   |  10,000   |  100,000  | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real    | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s
user    | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s
sys     | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s
access  | 3         | 1         | 1         | 4         | 31
getattr | 2         | 1         | 1         | 1         | 1
lookup  | 104       | 1,003     | 10,003    | 100,003   | 1,000,003
readdir | 2         | 16        | 158       | 1,575     | 15,749
total   | 111       | 1,021     | 10,163    | 101,583   | 1,015,784

With readdir plus enabled, I see this:

n files |    100    |   1,000   |  10,000   |  100,000  | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real    | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s
user    | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s
sys     | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s
access  | 3         | 1         | 1         | 1         | 7
getattr | 2         | 1         | 1         | 1         | 1
lookup  | 4         | 3         | 3         | 3         | 3
readdir | 6         | 62        | 630       | 6,300     | 62,993
total   | 15        | 67        | 635       | 6,305     | 63,004

Readdir plus disabled has about a 16x increase in the number of rpc calls and
is 4 - 5 times slower on large directories.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

82f2e547

NFS: readdir with vmapped pages · 56e4ebf8

由 Bryan Schumaker 提交于 10月 20, 2010

We can use vmapped pages to read more information from the network at once.
This will reduce the number of calls needed to complete a readdir.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
[trondmy: Added #include for linux/vmalloc.h> in fs/nfs/dir.c]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

56e4ebf8

NFS: decode_dirent should use an xdr_stream · babddc72

由 Bryan Schumaker 提交于 10月 20, 2010

Convert nfs*xdr.c to use an xdr stream in decode_dirent. This will prevent a
kernel oops that has been occuring when reading a vmapped page.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

babddc72

SUNRPC: Add a helper function xdr_inline_peek · ba8e452a

由 Trond Myklebust 提交于 10月 19, 2010

We sometimes need to be able to read ahead in an xdr_stream without
incrementing the current pointer position.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

ba8e452a

08 10月, 2010 1 次提交

NFS: new idmapper · 955a857e

由 Bryan Schumaker 提交于 9月 29, 2010

This patch creates a new idmapper system that uses the request-key function to
place a call into userspace to map user and group ids to names.  The old
idmapper was single threaded, which prevented more than one request from running
at a single time.  This means that a user would have to wait for an upcall to
finish before accessing a cached result.

The upcall result is stored on a keyring of type id_resolver.  See the file
Documentation/filesystems/nfs/idmapper.txt for instructions.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
[Trond: fix up the return value of nfs_idmap_lookup_name and clean up code]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

955a857e

24 9月, 2010 1 次提交

NFSv4.1: keep seq_res.sr_slot as pointer rather than an index · dfb4f309

由 Benny Halevy 提交于 9月 24, 2010

Having to explicitly initialize sr_slotid to NFS4_MAX_SLOT_TABLE
resulted in numerous bugs.  Keeping the current slot as a pointer
to the slot table is more straight forward and robust as it's
implicitly set up to NULL wherever the seq_res member is initialized
to zeroes.
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

dfb4f309

23 9月, 2010 1 次提交

nfs: introduce mount option '-olocal_lock' to make locks local · 5eebde23

由 Suresh Jayaraman 提交于 9月 23, 2010

NFS clients since 2.6.12 support flock locks by emulating fcntl byte-range
locks. Due to this, some windows applications which seem to use both flock
(share mode lock mapped as flock by Samba) and fcntl locks sequentially on
the same file, can't lock as they falsely assume the file is already locked.
The problem was reported on a setup with windows clients accessing excel files
on a Samba exported share which is originally a NFS mount from a NetApp filer.

Older NFS clients (< 2.6.12) did not see this problem as flock locks were
considered local. To support legacy flock behavior, this patch adds a mount
option "-olocal_lock=" which can take the following values:

   'none'  		- Neither flock locks nor POSIX locks are local
   'flock' 		- flock locks are local
   'posix' 		- fcntl/POSIX locks are local
   'all'		- Both flock locks and POSIX locks are local

Testing:

   - This patch was tested by using -olocal_lock option with different values
     and the NLM calls were noted from the network packet captured.

     'none'  - NLM calls were seen during both flock() and fcntl(), flock lock
   	       was granted, fcntl was denied
     'flock' - no NLM calls for flock(), NLM call was seen for fcntl(),
   	       granted
     'posix' - NLM call was seen for flock() - granted, no NLM call for fcntl()
     'all'   - no NLM calls were seen during both flock() and fcntl()

   - No bugs were seen during NFSv4 locking/unlocking in general and NFSv4
     reboot recovery.

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

5eebde23

22 9月, 2010 1 次提交

SUNRPC: Refactor logic to NUL-terminate strings in pages · b4687da7

由 Chuck Lever 提交于 9月 21, 2010

Clean up: Introduce a helper to '\0'-terminate XDR strings
that are placed in a page in the page cache.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b4687da7

18 9月, 2010 4 次提交

nfs: make sillyrename an async operation · d3d4152a

由 Jeff Layton 提交于 9月 17, 2010

A synchronous rename can be interrupted by a SIGKILL. If that happens
during a sillyrename operation, it's possible for the rename call to
be sent to the server, but the task exits before processing the
reply. If this happens, the sillyrenamed file won't get cleaned up
during nfs_dentry_iput and the server is left with a dangling .nfs* file
hanging around.

Fix this problem by turning sillyrename into an asynchronous operation
and have the task doing the sillyrename just wait on the reply. If the
task is killed before the sillyrename completes, it'll still proceed
to completion.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d3d4152a

nfs: move nfs_sillyrename to unlink.c · 779c5179

由 Jeff Layton 提交于 9月 17, 2010

...since that's where most of the sillyrenaming code lives. A comment
block is added to the beginning as well to clarify how sillyrenaming
works. Also, make nfs_async_unlink static as nfs_sillyrename is the only
caller.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

779c5179

nfs: standardize the rename response container · e8582a8b

由 Jeff Layton 提交于 9月 17, 2010

Right now, v3 and v4 have their own variants. Create a standard struct
that will work for v3 and v4. v2 doesn't get anything but a simple error
and so isn't affected by this.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

e8582a8b

nfs: standardize the rename args container · 920769f0

由 Jeff Layton 提交于 9月 17, 2010

Each NFS version has its own version of the rename args container.
Standardize them on a common one that's identical to the one NFSv4
uses.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

920769f0

17 9月, 2010 5 次提交

T
NFS: Add an 'open_context' element to struct nfs_rpc_ops · 2b484297
由 Trond Myklebust 提交于 9月 17, 2010
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
2b484297

NFS: Clean up nfs4_proc_create() · c0204fd2

由 Trond Myklebust 提交于 9月 17, 2010

Remove all remaining references to the struct nameidata from the low level
NFS layers. Again pass down a partially initialised struct nfs_open_context
when we want to do atomic open+create.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c0204fd2

NFSv4: Clean up nfs4_atomic_open · cd9a1c0e

由 Trond Myklebust 提交于 9月 17, 2010

Start moving the 'struct nameidata' dependent code out of the lower level
NFS code in preparation for the removal of open intents.

Instead of the struct nameidata, we pass down a partially initialised
struct nfs_open_context that will be fully initialised by the atomic open
upon success.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

cd9a1c0e

SUNRPC: Remove rpcb_getport_sync() · 859d5024

由 Chuck Lever 提交于 9月 17, 2010

Clean up: rpcb_getport_sync() has no more users, so remove it.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

859d5024

NFS: Use super.c for NFSROOT mount option parsing · 56463e50

由 Chuck Lever 提交于 9月 17, 2010

Replace duplicate code in NFSROOT for mounting an NFS server on '/'
with logic that uses the existing mainline text-based logic in the NFS
client.

Add documenting comments where appropriate.

Note that this means NFSROOT mounts now use the same default settings
as v2/v3 mounts done via mount(2) from user space.

  vers=3,tcp,rsize=<negotiated default>,wsize=<negotiated default>

As before, however, no version/protocol negotiation with the server is
done.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

56463e50

29 8月, 2010 1 次提交

NOMMU: Stub out vm_get_page_prot() if there's no MMU · bad849b3

由 David Howells 提交于 8月 26, 2010

Stub out vm_get_page_prot() if there's no MMU.

This was added by commit 804af2cf ("[AGPGART] remove private page
protection map") and is used in commit c07fbfd1 ("fbmem: VM_IO set,
but not propagated") in the fbmem video driver, but the function doesn't
exist on NOMMU, resulting in an undefined symbol at link time.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bad849b3

28 8月, 2010 1 次提交

fanotify: resize pid and reorder structure · 0fb85621

由 Tvrtko Ursulin 提交于 8月 20, 2010

resize pid and reorder the fanotify_event_metadata so it is naturally
aligned and we can work towards dropping the packed attributed
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@sophos.com>
Cc: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: NEric Paris <eparis@redhat.com>

0fb85621

27 8月, 2010 1 次提交

vgaarb: Wrap vga_(get|put) in CONFIG_VGA_ARB · 04cbe1de

由 Chris Wilson 提交于 8月 19, 2010

Fix link failure without the vga arbitrator.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NDave Airlie <airlied@redhat.com>

04cbe1de

25 8月, 2010 1 次提交

guard page for stacks that grow upwards · 8ca3eb08

由 Luck, Tony 提交于 8月 24, 2010

pa-risc and ia64 have stacks that grow upwards. Check that
they do not run into other mappings. By making VM_GROWSUP
0x0 on architectures that do not ever use it, we can avoid
some unpleasant #ifdefs in check_stack_guard_page().
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ca3eb08

24 8月, 2010 2 次提交

USB: gadget: fix composite kernel-doc warnings · d187abb9

由 Randy Dunlap 提交于 8月 11, 2010

Warning(include/linux/usb/composite.h:284): No description found for parameter 'disconnect'
Warning(drivers/usb/gadget/composite.c:744): No description found for parameter 'c'
Warning(drivers/usb/gadget/composite.c:744): Excess function parameter 'cdev' description in 'usb_string_ids_n'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

d187abb9

kobject: Break the kobject namespace defs into their own header · 8488a38f

由 David Howells 提交于 8月 11, 2010

Break the kobject namespace defs into their own header to avoid a header file
inclusion ordering problem between linux/sysfs.h and linux/kobject.h.

This fixes the build breakage on older versions of gcc.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

8488a38f

23 8月, 2010 2 次提交

header: fix broken headers for user space · 09cd2b99

由 Changli Gao 提交于 8月 22, 2010

__packed is only defined in kernel space, so we should use
__attribute__((packed)) for the code shared between kernel and user space.

Two __attribute() annotations are replaced with __attribute__() too.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09cd2b99

fanotify: flush outstanding perm requests on group destroy · 2eebf582

由 Eric Paris 提交于 8月 18, 2010

When an fanotify listener is closing it may cause a deadlock between the
listener and the original task doing an fs operation. If the original task
is waiting for a permissions response it will be holding the srcu lock. The
listener cannot clean up and exit until after that srcu lock is syncronized.
Thus deadlock. The fix introduced here is to stop accepting new permissions
events when a listener is shutting down and to grant permission for all
outstanding events. Thus the original task will eventually release the srcu
lock and the listener can complete shutdown.
Reported-by: NAndreas Gruenbacher <agruen@suse.de>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

2eebf582

21 8月, 2010 5 次提交

mm: make the vma list be doubly linked · 297c5eee

由 Linus Torvalds 提交于 8月 20, 2010

It's a really simple list, and several of the users want to go backwards
in it to find the previous vma.  So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.
Tested-by: NIan Campbell <ijc@hellion.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

297c5eee

Input: uinput - add devname alias to allow module on-demand load · 8905aaaf

由 Kay Sievers 提交于 8月 19, 2010

Recent modprobe and udev versions allow to create device nodes
for modules which are not loaded. Only the first access will cause
the in-kernel module loader to pull-in the module. Systems which
never access the device node will not needlessly load the module,
and no longer need init scripts or other facilities to unconditionally
load it.
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

8905aaaf

USB: drop tty argument from usb_serial_handle_sysrq_char() · 6ee9f4b4

由 Dmitry Torokhov 提交于 8月 17, 2010

Since handle_sysrq() does not take tty as argument anymore we can
drop it from usb_serial_handle_sysrq_char() as well.
Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NJason Wessel <jason.wessel@windriver.com>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

6ee9f4b4

Input: sysrq - drop tty argument form handle_sysrq() · f335397d

由 Dmitry Torokhov 提交于 8月 17, 2010

Sysrq operations do not accept tty argument anymore so no need to pass
it to us.

[Stephen Rothwell <sfr@canb.auug.org.au>: fix build breakage in drm code
 caused by sysrq using bool but not including linux/types.h]

[Sachin Sant <sachinp@in.ibm.com>: fix build breakage in s390 keyboadr
 driver]
Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NJason Wessel <jason.wessel@windriver.com>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

f335397d

kfifo: implement missing __kfifo_skip_r() · b35de43b

由 Andrea Righi 提交于 8月 19, 2010

kfifo_skip() is currently broken, due to the missing of the internal
helper function.  Add it.
Signed-off-by: NAndrea Righi <arighi@develer.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: NStefani Seibold <stefani@seibold.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b35de43b

20 8月, 2010 1 次提交

Input: sysrq - drop tty argument from sysrq ops handlers · 1495cc9d

由 Dmitry Torokhov 提交于 8月 17, 2010

Noone is using tty argument so let's get rid of it.
Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NJason Wessel <jason.wessel@windriver.com>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

1495cc9d

19 8月, 2010 2 次提交

netfilter: fix userspace header warning · e243f5b6

由 Sam Ravnborg 提交于 8月 15, 2010

"make headers_check" issued the following warning:

  CHECK   include/linux/netfilter (64 files)
usr/include/linux/netfilter/xt_ipvs.h:19: found __[us]{8,16,32,64} type without #include <linux/types.h>

Fix this by as suggested including linux/types.h.
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e243f5b6

net: add Fast Ethernet driver for PXA168. · a49f37ee

由 Sachin Sanap 提交于 8月 13, 2010

Signed-off-by: NSachin Sanap <ssanap@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a49f37ee

18 8月, 2010 2 次提交

fs: scale files_lock · 6416ccb7

由 Nick Piggin 提交于 8月 18, 2010

fs: scale files_lock

Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).

One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.

However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.

A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.

Testing results:

On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.

Booting:    locks=  25049 cpu-hits=  23174 (92.5%) node-hits=  23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64   locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)

So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.

Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.

                throughput
2.6.34-rc2      24.5
+patch          24.9

                us      sys     idle    IO wait (in %)
2.6.34-rc2      51.25   28.25   17.25   3.25
+patch          53.75   18.5    19      8.75

So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.

Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.

Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6416ccb7

lglock: introduce special lglock and brlock spin locks · 2dc91abe

由 Nick Piggin 提交于 8月 18, 2010

lglock: introduce special lglock and brlock spin locks

This patch introduces "local-global" locks (lglocks). These can be used to:

- Provide fast exclusive access to per-CPU data, with exclusive access to
  another CPU's data allowed but possibly subject to contention, and to provide
  very slow exclusive access to all per-CPU data.
- Or to provide very fast and scalable read serialisation, and to provide
  very slow exclusive serialisation of data (not necessarily per-CPU data).

Brlocks are also implemented as a short-hand notation for the latter use
case.

Thanks to Paul for local/global naming convention.

Cc: linux-kernel@vger.kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2dc91abe