提交 · 79637a41e466bbe7dfe394bac3c9d86a92fd55b1 · openeuler / Kernel

07 9月, 2010 1 次提交

agp/intel: Fix cache control for Sandybridge · f8f235e5

由 Zhenyu Wang 提交于 8月 27, 2010

Sandybridge GTT has new cache control bits in PTE, which controls
graphics page cache in LLC or LLC/MLC, so we need to extend the mask
function to respect the new bits.

And set cache control to always LLC only by default on Gen6.
Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

f8f235e5

04 9月, 2010 1 次提交

serial: fix port type conflict between NS16550A & U6_16550A · 71cad055

由 Philippe Langlais 提交于 8月 31, 2010

Bug seen by Dr. David Alan Gilbert with sparse
Signed-off-by: NPhilippe Langlais <philippe.langlais@stericsson.com>
Cc: stable <stable@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

71cad055

03 9月, 2010 1 次提交

mutex: Fix annotations to include it in kernel-locking docbook · ef5dc121

由 Randy Dunlap 提交于 9月 02, 2010

Fix kernel-doc notation in linux/mutex.h and kernel/mutex.c,
then add these 2 files to the kernel-locking docbook as the
Mutex API reference chapter.

Add one API function to mutex-design.txt and correct a typo in
that file.
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20100902154816.6cc2f9ad.randy.dunlap@oracle.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ef5dc121

01 9月, 2010 1 次提交

powerpc/85xx: Add P1021 PCI IDs and quirks · a28dec2f

由 Anton Vorontsov 提交于 8月 08, 2010

This is needed for proper PCI-E support on P1021 SoCs.
Signed-off-by: NAnton Vorontsov <avorontsov@mvista.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

a28dec2f

29 8月, 2010 1 次提交

NOMMU: Stub out vm_get_page_prot() if there's no MMU · bad849b3

由 David Howells 提交于 8月 26, 2010

Stub out vm_get_page_prot() if there's no MMU.

This was added by commit 804af2cf ("[AGPGART] remove private page
protection map") and is used in commit c07fbfd1 ("fbmem: VM_IO set,
but not propagated") in the fbmem video driver, but the function doesn't
exist on NOMMU, resulting in an undefined symbol at link time.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bad849b3

28 8月, 2010 1 次提交

fanotify: resize pid and reorder structure · 0fb85621

由 Tvrtko Ursulin 提交于 8月 20, 2010

resize pid and reorder the fanotify_event_metadata so it is naturally
aligned and we can work towards dropping the packed attributed
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@sophos.com>
Cc: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: NEric Paris <eparis@redhat.com>

0fb85621

27 8月, 2010 1 次提交

vgaarb: Wrap vga_(get|put) in CONFIG_VGA_ARB · 04cbe1de

由 Chris Wilson 提交于 8月 19, 2010

Fix link failure without the vga arbitrator.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NDave Airlie <airlied@redhat.com>

04cbe1de

25 8月, 2010 6 次提交

tcp: Combat per-cpu skew in orphan tests. · ad1af0fe

由 David S. Miller 提交于 8月 25, 2010

As reported by Anton Blanchard when we use
percpu_counter_read_positive() to make our orphan socket limit checks,
the check can be off by up to num_cpus_online() * batch (which is 32
by default) which on a 128 cpu machine can be as large as the default
orphan limit itself.

Fix this by doing the full expensive sum check if the optimized check
triggers.
Reported-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

ad1af0fe

workqueue: fix cwq->nr_active underflow · 8a2e8e5d

由 Tejun Heo 提交于 8月 25, 2010

cwq->nr_active is used to keep track of how many work items are active
for the cpu workqueue, where 'active' is defined as either pending on
global worklist or executing.  This is used to implement the
max_active limit and workqueue freezing.  If a work item is queued
after nr_active has already reached max_active, the work item doesn't
increment nr_active and is put on the delayed queue and gets activated
later as previous active work items retire.

try_to_grab_pending() which is used in the cancellation path
unconditionally decremented nr_active whether the work item being
cancelled is currently active or delayed, so cancelling a delayed work
item makes nr_active underflow.  This breaks max_active enforcement
and triggers BUG_ON() in destroy_workqueue() later on.

This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which
is set while a work item in on the delayed list and making
try_to_grab_pending() decrement nr_active iff the work item is
currently active.

The addition of the flag enlarges cwq alignment to 256 bytes which is
getting a bit too large.  It's scheduled to be reduced back to 128
bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next
devel cycle.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NJohannes Berg <johannes@sipsolutions.net>

8a2e8e5d

ACPI/PCI: Negotiate _OSC control bits before requesting them · 75fb60f2

由 Rafael J. Wysocki 提交于 8月 23, 2010

It is possible that the BIOS will not grant control of all _OSC
features requested via acpi_pci_osc_control_set(), so it is
recommended to negotiate the final set of _OSC features with the
query flag set before calling _OSC to request control of these
features.

To implement it, rework acpi_pci_osc_control_set() so that the caller
can specify the mask of _OSC control bits to negotiate and the mask
of _OSC control bits that are absolutely necessary to it.  Then,
acpi_pci_osc_control_set() will run _OSC queries in a loop until
the mask of _OSC control bits returned by the BIOS is equal to the
mask passed to it.  Also, before running the _OSC request
acpi_pci_osc_control_set() will check if the caller's required
control bits are present in the final mask.

Using this mechanism we will be able to avoid situations in which the
BIOS doesn't grant control of certain _OSC features, because they
depend on some other _OSC features that have not been requested.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

75fb60f2

ACPI/PCI: Do not preserve _OSC control bits returned by a query · 2b8fd918

由 Rafael J. Wysocki 提交于 8月 23, 2010

There is the assumption in acpi_pci_osc_control_set() that it is
always sufficient to compare the mask of _OSC control bits to be
requested with the result of an _OSC query where all of the known
control bits have been checked.  However, in general, that need not
be the case.  For example, if an _OSC feature A depends on an _OSC
feature B and control of A, B plus another _OSC feature C is
requested simultaneously, the BIOS may return A, B, C, while it would
only return C if A and C were requested without B.

That may result in passing a wrong mask of _OSC control bits to an
_OSC control request, in which case the BIOS may only grant control
of a subset of the requested features.  Moreover, acpi_pci_run_osc()
will return error code if that happens and the caller of
acpi_pci_osc_control_set() will not know that it's been granted
control of some _OSC features.  Consequently, the system will
generally not work as expected.

Apart from this acpi_pci_osc_control_set() always uses the mask
of _OSC control bits returned by the very first invocation of
acpi_pci_query_osc(), but that is done with the second argument
equal to OSC_PCI_SEGMENT_GROUPS_SUPPORT which generally happens
to affect the returned _OSC control bits.

For these reasons, make acpi_pci_osc_control_set() always check if
control of the requested _OSC features will be granted before making
the final control request.  As a result, the osc_control_qry and
osc_queried members of struct acpi_pci_root are not necessary any
more, so drop them and remove the remaining code referring to them.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

2b8fd918

guard page for stacks that grow upwards · 8ca3eb08

由 Luck, Tony 提交于 8月 24, 2010

pa-risc and ia64 have stacks that grow upwards. Check that
they do not run into other mappings. By making VM_GROWSUP
0x0 on architectures that do not ever use it, we can avoid
some unpleasant #ifdefs in check_stack_guard_page().
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ca3eb08

workqueue: improve destroy_workqueue() debuggability · e41e704b

由 Tejun Heo 提交于 8月 24, 2010

Now that the worklist is global, having works pending after wq
destruction can easily lead to oops and destroy_workqueue() have
several BUG_ON()s to catch these cases.  Unfortunately, BUG_ON()
doesn't tell much about how the work became pending after the final
flush_workqueue().

This patch adds WQ_DYING which is set before the final flush begins.
If a work is requested to be queued on a dying workqueue,
WARN_ON_ONCE() is triggered and the request is ignored.  This clearly
indicates which caller is trying to queue a work on a dying workqueue
and keeps the system working in most cases.

Locking rule comment is updated such that the 'I' rule includes
modifying the field from destruction path.
Signed-off-by: NTejun Heo <tj@kernel.org>

e41e704b

24 8月, 2010 2 次提交

USB: gadget: fix composite kernel-doc warnings · d187abb9

由 Randy Dunlap 提交于 8月 11, 2010

Warning(include/linux/usb/composite.h:284): No description found for parameter 'disconnect'
Warning(drivers/usb/gadget/composite.c:744): No description found for parameter 'c'
Warning(drivers/usb/gadget/composite.c:744): Excess function parameter 'cdev' description in 'usb_string_ids_n'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

d187abb9

kobject: Break the kobject namespace defs into their own header · 8488a38f

由 David Howells 提交于 8月 11, 2010

Break the kobject namespace defs into their own header to avoid a header file
inclusion ordering problem between linux/sysfs.h and linux/kobject.h.

This fixes the build breakage on older versions of gcc.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

8488a38f

23 8月, 2010 5 次提交

xen: pvhvm: make it clearer that XEN_UNPLUG_* define bits in a bitfield · 9c35e90c

由 Ian Campbell 提交于 8月 23, 2010

by defining in terms of (1<<N).

XEN_UNPLUG_UNNECESSARY and XEN_UNPLUG_NEVER are only used within the
kernel and are not defined as a bit on the unplug IO port. Therefore
use a bit which is outside the potentially valid range of the 16 bit
IO port.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NStefano Stabellini <Stefano.Stabellini@eu.citrix.com>

9c35e90c

xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessary · 1dc7ce99

由 Ian Campbell 提交于 8月 23, 2010

It is not immediately clear what this option causes to become
ignored. The actual meaning is that it is not necessary to unplug the
emulated devices to safely use the PV ones, even if the platform does
not support the unplug protocol. (pressumably the user will only add
this option if they have ensured that their domain configuration is
safe).

I think xen_emul_unplug=unnecessary better captures this.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NStefano Stabellini <Stefano.Stabellini@eu.citrix.com>

1dc7ce99

xen: pvhvm: allow user to request no emulated device unplug · c93a4dfb

由 Ian Campbell 提交于 8月 23, 2010

this allows the user to disable pvhvm and revert to emulated devices
in case of a system misconfiguration (e.g. initramfs with only
emulated drivers in it).
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NStefano Stabellini <Stefano.Stabellini@eu.citrix.com>

c93a4dfb

header: fix broken headers for user space · 09cd2b99

由 Changli Gao 提交于 8月 22, 2010

__packed is only defined in kernel space, so we should use
__attribute__((packed)) for the code shared between kernel and user space.

Two __attribute() annotations are replaced with __attribute__() too.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09cd2b99

fanotify: flush outstanding perm requests on group destroy · 2eebf582

由 Eric Paris 提交于 8月 18, 2010

When an fanotify listener is closing it may cause a deadlock between the
listener and the original task doing an fs operation. If the original task
is waiting for a permissions response it will be holding the srcu lock. The
listener cannot clean up and exit until after that srcu lock is syncronized.
Thus deadlock. The fix introduced here is to stop accepting new permissions
events when a listener is shutting down and to grant permission for all
outstanding events. Thus the original task will eventually release the srcu
lock and the listener can complete shutdown.
Reported-by: NAndreas Gruenbacher <agruen@suse.de>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

2eebf582

22 8月, 2010 1 次提交

workqueue: Add basic tracepoints to track workqueue execution · e36c886a

由 Arjan van de Ven 提交于 8月 21, 2010

With the introduction of the new unified work queue thread pools,
we lost one feature: It's no longer possible to know which worker
is causing the CPU to wake out of idle. The result is that PowerTOP
now reports a lot of "kworker/a:b" instead of more readable results.

This patch adds a pair of tracepoints to the new workqueue code,
similar in style to the timer/hrtimer tracepoints.

With this pair of tracepoints, the next PowerTOP can correctly
report which work item caused the wakeup (and how long it took):

Interrupt (43) i915 time 3.51ms wakeups 141
Work ieee80211_iface_work time 0.81ms wakeups 29
Work do_dbs_timer time 0.55ms wakeups 24
Process Xorg time 21.36ms wakeups 4
Timer sched_rt_period_timer time 0.01ms wakeups 1
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e36c886a

21 8月, 2010 5 次提交

mm: make the vma list be doubly linked · 297c5eee

由 Linus Torvalds 提交于 8月 20, 2010

It's a really simple list, and several of the users want to go backwards
in it to find the previous vma.  So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.
Tested-by: NIan Campbell <ijc@hellion.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

297c5eee

Input: uinput - add devname alias to allow module on-demand load · 8905aaaf

由 Kay Sievers 提交于 8月 19, 2010

Recent modprobe and udev versions allow to create device nodes
for modules which are not loaded. Only the first access will cause
the in-kernel module loader to pull-in the module. Systems which
never access the device node will not needlessly load the module,
and no longer need init scripts or other facilities to unconditionally
load it.
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

8905aaaf

USB: drop tty argument from usb_serial_handle_sysrq_char() · 6ee9f4b4

由 Dmitry Torokhov 提交于 8月 17, 2010

Since handle_sysrq() does not take tty as argument anymore we can
drop it from usb_serial_handle_sysrq_char() as well.
Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NJason Wessel <jason.wessel@windriver.com>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

6ee9f4b4

Input: sysrq - drop tty argument form handle_sysrq() · f335397d

由 Dmitry Torokhov 提交于 8月 17, 2010

Sysrq operations do not accept tty argument anymore so no need to pass
it to us.

[Stephen Rothwell <sfr@canb.auug.org.au>: fix build breakage in drm code
 caused by sysrq using bool but not including linux/types.h]

[Sachin Sant <sachinp@in.ibm.com>: fix build breakage in s390 keyboadr
 driver]
Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NJason Wessel <jason.wessel@windriver.com>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

f335397d

kfifo: implement missing __kfifo_skip_r() · b35de43b

由 Andrea Righi 提交于 8月 19, 2010

kfifo_skip() is currently broken, due to the missing of the internal
helper function.  Add it.
Signed-off-by: NAndrea Righi <arighi@develer.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: NStefani Seibold <stefani@seibold.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b35de43b

20 8月, 2010 1 次提交

Input: sysrq - drop tty argument from sysrq ops handlers · 1495cc9d

由 Dmitry Torokhov 提交于 8月 17, 2010

Noone is using tty argument so let's get rid of it.
Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NJason Wessel <jason.wessel@windriver.com>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

1495cc9d

19 8月, 2010 4 次提交

tracing: Fix timer tracing · ede1b429

由 Arjan van de Ven 提交于 8月 18, 2010

PowerTOP would like to be able to trace timers.

Unfortunately, the current timer tracing is not very useful: the
actual timer function is not recorded in the trace at the start
of timer execution.

Although this is recorded for timer "start" time (when it gets
armed), this is not useful; most timers get started early, and a
tracer like PowerTOP will never see this event, but will only
see the actual running of the  timer.

This patch just adds the function to the timer tracing; I've
verified with PowerTOP that now it can get useful information
about timers.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: xiaoguangrong@cn.fujitsu.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org> # .35.x, .34.x, .33.x
LKML-Reference: <4C6C5FA9.3000405@linux.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ede1b429

netfilter: fix userspace header warning · e243f5b6

由 Sam Ravnborg 提交于 8月 15, 2010

"make headers_check" issued the following warning:

  CHECK   include/linux/netfilter (64 files)
usr/include/linux/netfilter/xt_ipvs.h:19: found __[us]{8,16,32,64} type without #include <linux/types.h>

Fix this by as suggested including linux/types.h.
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e243f5b6

net: add Fast Ethernet driver for PXA168. · a49f37ee

由 Sachin Sanap 提交于 8月 13, 2010

Signed-off-by: NSachin Sanap <ssanap@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a49f37ee

Fix the declaration of sys_execve() in asm-generic/syscalls.h · d15ca320

由 David Howells 提交于 8月 18, 2010

Fix the declaration of sys_execve() in asm-generic/syscalls.h to have
various consts applied to its pointers.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d15ca320

18 8月, 2010 9 次提交

ALSA: emu10k1 - delay the PCM interrupts (add pcm_irq_delay parameter) · 56385a12

由 Jaroslav Kysela 提交于 8月 18, 2010

With some hardware combinations, the PCM interrupts are acknowledged
before the period boundary from the emu10k1 chip. The midlevel PCM code
gets confused and the playback stream is interrupted.

It seems that the interrupt processing shift by 2 samples is enough
to fix this issue. This default value does not harm other,
non-affected hardware.

More information: Kernel bugzilla bug#16300

[A copmile warning fixed by tiwai]
Signed-off-by: NJaroslav Kysela <perex@perex.cz>
Cc: <stable@kernel.org>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>

56385a12

fs: scale files_lock · 6416ccb7

由 Nick Piggin 提交于 8月 18, 2010

fs: scale files_lock

Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).

One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.

However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.

A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.

Testing results:

On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.

Booting:    locks=  25049 cpu-hits=  23174 (92.5%) node-hits=  23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64   locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)

So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.

Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.

                throughput
2.6.34-rc2      24.5
+patch          24.9

                us      sys     idle    IO wait (in %)
2.6.34-rc2      51.25   28.25   17.25   3.25
+patch          53.75   18.5    19      8.75

So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.

Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.

Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6416ccb7

lglock: introduce special lglock and brlock spin locks · 2dc91abe

由 Nick Piggin 提交于 8月 18, 2010

lglock: introduce special lglock and brlock spin locks

This patch introduces "local-global" locks (lglocks). These can be used to:

- Provide fast exclusive access to per-CPU data, with exclusive access to
  another CPU's data allowed but possibly subject to contention, and to provide
  very slow exclusive access to all per-CPU data.
- Or to provide very fast and scalable read serialisation, and to provide
  very slow exclusive serialisation of data (not necessarily per-CPU data).

Brlocks are also implemented as a short-hand notation for the latter use
case.

Thanks to Paul for local/global naming convention.

Cc: linux-kernel@vger.kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2dc91abe

tty: fix fu_list abuse · d996b62a

由 Nick Piggin 提交于 8月 18, 2010

tty: fix fu_list abuse

tty code abuses fu_list, which causes a bug in remount,ro handling.

If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".

Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.

The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.

[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d996b62a

fs: cleanup files_lock locking · ee2ffa0d

由 Nick Piggin 提交于 8月 18, 2010

fs: cleanup files_lock locking

Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ee2ffa0d

fs: fs_struct rwlock to spinlock · 2a4419b5

由 Nick Piggin 提交于 8月 18, 2010

fs: fs_struct rwlock to spinlock

struct fs_struct.lock is an rwlock with the read-side used to protect root and
pwd members while taking references to them. Taking a reference to a path
typically requires just 2 atomic ops, so the critical section is very small.
Parallel read-side operations would have cacheline contention on the lock, the
dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
real parallelism increase.

Replace it with a spinlock to avoid one or two atomic operations in typical
path lookup fastpath.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2a4419b5

remove SWRITE* I/O types · 9cb569d6

由 Christoph Hellwig 提交于 8月 11, 2010

These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.

Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers.  Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.

In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9cb569d6

kill BH_Ordered flag · 87e99511

由 Christoph Hellwig 提交于 8月 11, 2010

Instead of abusing a buffer_head flag just add a variant of
sync_dirty_buffer which allows passing the exact type of write
flag required.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

87e99511

spi.h: missing kernel-doc notation, please fix · 5c79a5ae

由 Ernst Schwab 提交于 8月 16, 2010

Added comments in kernel-doc notation for previously added struct fields.
Signed-off-by: NErnst Schwab <eschwab@online.de>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

5c79a5ae

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功