提交 · db5e53fbf0abf5cadc83be57032242e5e7c6c394 · OpenHarmony / kernel_linux

29 12月, 2008 1 次提交

由 Akinobu Mita 提交于 12月 23, 2008

Currently fault-injection capability for SLAB allocator is only
available to SLAB. This patch makes it available to SLUB, too.

[penberg@cs.helsinki.fi: unify slab and slub implementations]
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

773ff60e

25 12月, 2008 3 次提交

libcrc32c: Select CRYPTO in Kconfig · 93027354

由 Herbert Xu 提交于 11月 13, 2008

Selecting CRYPTO_CRC32C is not enough as CRYPTO which CRYPTO_CRC32C
depends on may be disabled. This patch adds the select on CRYPTO.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

93027354

libcrc32c: Fix "crc32c undefined" compilation error · 53b146ae

由 Adrian-Ken Rueegsegger 提交于 11月 11, 2008

The latest shash changes leave crc32c undefined:

[...]
Building modules, stage 2.
  MODPOST 1381 modules
  ERROR: "crc32c" [net/sctp/sctp.ko] undefined!
  ERROR: "crc32c" [net/ipv4/netfilter/nf_nat_proto_sctp.ko] undefined!

Adding EXPORT_SYMBOL(crc32c) to lib/libcrc32c.c fixes the compile error.
This patch has been compile-tested only.
Signed-off-by: NAdrian-Ken Rueegsegger <rueegsegger@swiss-it.ch>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

53b146ae

libcrc32c: Move implementation to crypto crc32c · 69c35efc

由 Herbert Xu 提交于 11月 07, 2008

This patch swaps the role of libcrc32c and crc32c.  Previously
the implementation was in libcrc32c and crc32c was a wrapper.
Now the code is in crc32c and libcrc32c just calls the crypto
layer.

The reason for the change is to tap into the algorithm selection
capability of the crypto API so that optimised implementations
such as the one utilising Intel's CRC32C instruction can be
used where available.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

69c35efc

19 12月, 2008 1 次提交

"Tree RCU": scalable classic RCU implementation · 64db4cff

由 Paul E. McKenney 提交于 12月 18, 2008

This patch fixes a long-standing performance bug in classic RCU that
results in massive internal-to-RCU lock contention on systems with
more than a few hundred CPUs. Although this patch creates a separate
flavor of RCU for ease of review and patch maintenance, it is intended
to replace classic RCU.

This patch still handles stress better than does mainline, so I am still
calling it ready for inclusion. This patch is against the -tip tree.
Nevertheless, experience on an actual 1000+ CPU machine would still be
most welcome.

Most of the changes noted below were found while creating an rcutiny
(which should permit ejecting the current rcuclassic) and while doing
detailed line-by-line documentation.

Updates from v9 (http://lkml.org/lkml/2008/12/2/334):

o Fixes from remainder of line-by-line code walkthrough,
including comment spelling, initialization, undesirable
narrowing due to type conversion, removing redundant memory
barriers, removing redundant local-variable initialization,
and removing redundant local variables.

I do not believe that any of these fixes address the CPU-hotplug
issues that Andi Kleen was seeing, but please do give it a whirl
in case the machine is smarter than I am.

A writeup from the walkthrough may be found at the following
URL, in case you are suffering from terminal insomnia or
masochism:

http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf

o Made rcutree tracing use seq_file, as suggested some time
ago by Lai Jiangshan.

o Added a .csv variant of the rcudata debugfs trace file, to allow
people having thousands of CPUs to drop the data into
a spreadsheet. Tested with oocalc and gnumeric. Updated
documentation to suit.

Updates from v8 (http://lkml.org/lkml/2008/11/15/139):

o Fix a theoretical race between grace-period initialization and
force_quiescent_state() that could occur if more than three
jiffies were required to carry out the grace-period
initialization. Which it might, if you had enough CPUs.

o Apply Ingo's printk-standardization patch.

o Substitute local variables for repeated accesses to global
variables.

o Fix comment misspellings and redundant (but harmless) increments
of ->n_rcu_pending (this latter after having explicitly added it).

o Apply checkpatch fixes.

Updates from v7 (http://lkml.org/lkml/2008/10/10/291):

o Fixed a number of problems noted by Gautham Shenoy, including
the cpu-stall-detection bug that he was having difficulty
convincing me was real. ;-)

o Changed cpu-stall detection to wait for ten seconds rather than
three in order to reduce false positive, as suggested by Ingo
Molnar.

o Produced a design document (http://lwn.net/Articles/305782/).
The act of writing this document uncovered a number of both
theoretical and "here and now" bugs as noted below.

o Fix dynticks_nesting accounting confusion, simplify WARN_ON()
condition, fix kerneldoc comments, and add memory barriers
in dynticks interface functions.

o Add more data to tracing.

o Remove unused "rcu_barrier" field from rcu_data structure.

o Count calls to rcu_pending() from scheduling-clock interrupt
to use as a surrogate timebase should jiffies stop counting.

o Fix a theoretical race between force_quiescent_state() and
grace-period initialization. Yes, initialization does have to
go on for some jiffies for this race to occur, but given enough
CPUs...

Updates from v6 (http://lkml.org/lkml/2008/9/23/448):

o Fix a number of checkpatch.pl complaints.

o Apply review comments from Ingo Molnar and Lai Jiangshan
on the stall-detection code.

o Fix several bugs in !CONFIG_SMP builds.

o Fix a misspelled config-parameter name so that RCU now announces
at boot time if stall detection is configured.

o Run tests on numerous combinations of configurations parameters,
which after the fixes above, now build and run correctly.

Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):

o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
changeset some time ago, and finally got around to retesting
this option).

o Fix some tracing bugs in rcupreempt that caused incorrect
totals to be printed.

o I now test with a more brutal random-selection online/offline
script (attached). Probably more brutal than it needs to be
on the people reading it as well, but so it goes.

o A number of optimizations and usability improvements:

o Make rcu_pending() ignore the grace-period timeout when
there is no grace period in progress.

o Make force_quiescent_state() avoid going for a global
lock in the case where there is no grace period in
progress.

o Rearrange struct fields to improve struct layout.

o Make call_rcu() initiate a grace period if RCU was
idle, rather than waiting for the next scheduling
clock interrupt.

o Invoke rcu_irq_enter() and rcu_irq_exit() only when
idle, as suggested by Andi Kleen. I still don't
completely trust this change, and might back it out.

o Make CONFIG_RCU_TRACE be the single config variable
manipulated for all forms of RCU, instead of the prior
confusion.

o Document tracing files and formats for both rcupreempt
and rcutree.

Updates from v4 for those missing v5 given its bad subject line:

o Separated dynticks interface so that NMIs and irqs call separate
functions, greatly simplifying it. In particular, this code
no longer requires a proof of correctness. ;-)

o Separated dynticks state out into its own per-CPU structure,
avoiding the duplicated accounting.

o The case where a dynticks-idle CPU runs an irq handler that
invokes call_rcu() is now correctly handled, forcing that CPU
out of dynticks-idle mode.

o Review comments have been applied (thank you all!!!).
For but one example, fixed the dynticks-ordering issue that
Manfred pointed out, saving me much debugging. ;-)

o Adjusted rcuclassic and rcupreempt to handle dynticks changes.

Attached is an updated patch to Classic RCU that applies a hierarchy,
greatly reducing the contention on the top-level lock for large machines.
This passes 10-hour concurrent rcutorture and online-offline testing on
128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
bugs in presence of dynticks (exciting working on a system where
"sleep 1" hangs until interrupted...), which were fixed in the
2.6.27 kernel. It is getting more reliable than mainline by some
measures, so the next version will be against -tip for inclusion.
See also Manfred Spraul's recent patches (or his earlier work from
2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
We will converge onto a common patch in the fullness of time, but are
currently exploring different regions of the design space. That said,
I have already gratefully stolen quite a few of Manfred's ideas.

This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on
64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
there is no hierarchy. By default, the RCU initialization code will
adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
this balancing, allowing the hierarchy to be exactly aligned to the
underlying hardware. Up to two levels of hierarchy are permitted
(in addition to the root node), allowing up to 16,384 CPUs on 32-bit
systems and up to 262,144 CPUs on 64-bit systems. I just know that I
am going to regret saying this, but this seems more than sufficient
for the foreseeable future. (Some architectures might wish to set
CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
If this becomes a real problem, additional levels can be added, but I
doubt that it will make a significant difference on real hardware.)

In the common case, a given CPU will manipulate its private rcu_data
structure and the rcu_node structure that it shares with its immediate
neighbors. This can reduce both lock and memory contention by multiple
orders of magnitude, which should eliminate the need for the strange
manipulations that are reported to be required when running Linux on
very large systems.

Some shortcomings:

o More bugs will probably surface as a result of an ongoing
line-by-line code inspection.

Patches will be provided as required.

o There are probably hangs, rcutorture failures, &c. Seems
quite stable on a 128-CPU machine, but that is kind of small
compared to 4096 CPUs. However, seems to do better than
mainline.

Patches will be provided as required.

o The memory footprint of this version is several KB larger
than rcuclassic.

A separate UP-only rcutiny patch will be provided, which will
reduce the memory footprint significantly, even compared
to the old rcuclassic. One such patch passes light testing,
and has a memory footprint smaller even than rcuclassic.
Initial reaction from various embedded guys was "it is not
worth it", so am putting it aside.

Credits:

o Manfred Spraul for ideas, review comments, and bugs spotted,
as well as some good friendly competition. ;-)

o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
for reviews and comments.

o Thomas Gleixner for much-needed help with some timer issues
(see patches below).

o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
alive despite my heavy abuse^Wtesting.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

64db4cff

18 12月, 2008 8 次提交

driver core: add newlines to debugging enabled/disabled messages · aa6f3c64

由 Marcel Holtmann 提交于 11月 30, 2008

Both messages are missing the newline and thus dmesg output gets
scrambled.
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

aa6f3c64

driver core: fix using 'ret' variable in unregister_dynamic_debug_module · 1c93ca09

由 Johann Felix Soden 提交于 10月 30, 2008

The 'ret' variable is assigned, but not used in the return statement. Fix this.
Signed-off-by: NJohann Felix Soden <johfel@users.sourceforge.net>
Acked-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

1c93ca09

swiotlb: consolidate swiotlb info message printing · 2e5b2b86

由 Ian Campbell 提交于 12月 16, 2008

Impact: clean up swiotlb printks

Remove duplicated swiotlb info printing, and make it more detailed.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2e5b2b86

swiotlb: support bouncing of HighMem pages · ef9b1893

由 Jeremy Fitzhardinge 提交于 12月 16, 2008

Impact: prepare the swiotlb code for HighMem struct pages

This requires us to treat DMA regions in terms of page+offset rather
than virtual addressing since a HighMem page may not have a mapping.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ef9b1893

swiotlb: factor out copy to/from device · 1b548f66

由 Jeremy Fitzhardinge 提交于 12月 16, 2008

Impact: generalize IO bounce memcpys
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1b548f66

swiotlb: add arch hook to force mapping · b81ea27b

由 Ian Campbell 提交于 12月 16, 2008

Impact: generalize the sw-IOTLB range checks

Some architectures require special rules to determine whether a range
needs mapping or not.  This adds a weak function for architectures to
override.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b81ea27b

swiotlb: allow architectures to override phys<->bus<->phys conversions · e08e1f7a

由 Ian Campbell 提交于 12月 16, 2008

Impact: generalize phys<->bus<->phys conversions in the swiotlb code

Architectures may need to override these conversions. Implement a
__weak hook point containing the default implementation.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e08e1f7a

swiotlb: add comment where we handle the overflow of a dma mask on 32 bit · a5ddde4a

由 Ian Campbell 提交于 12月 16, 2008

Impact: cleanup
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a5ddde4a

17 12月, 2008 3 次提交

swiotlb: move some definitions to header · 0016fdee

由 Ian Campbell 提交于 12月 16, 2008

Impact: cleanup
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0016fdee

swiotlb: allow architectures to override swiotlb pool allocation · 8c5df16b

由 Jeremy Fitzhardinge 提交于 12月 16, 2008

Impact: generalize swiotlb allocation code

Architectures may need to allocate memory specially for use with
the swiotlb.  Create the weak function swiotlb_alloc_boot() and
swiotlb_alloc() defaulting to the current behaviour.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8c5df16b

allow bug table entries to use relative pointers (and use it on x86-64) · b93a531e

由 Jan Beulich 提交于 12月 16, 2008

Impact: reduce bug table size

This allows reducing the bug table size by half. Perhaps there are
other 64-bit architectures that could also make use of this.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b93a531e

11 12月, 2008 4 次提交

lib/idr.c: Fix bug introduced by RCU fix · 711a49a0

由 Manfred Spraul 提交于 12月 10, 2008

The last patch to lib/idr.c caused a bug if idr_get_new_above() was
called on an empty idr.

Usually, nodes stay on the same layer.  New layers are added to the top
of the tree.

The exception is idr_get_new_above() on an empty tree: In this case, the
new root node is first added on layer 0, then moved upwards.  p->layer
was not updated.

As usual: You shall never rely on the source code comments, they will
only mislead you.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

711a49a0

revert "percpu_counter: new function percpu_counter_sum_and_set" · 02d21168

由 Andrew Morton 提交于 12月 09, 2008

Revert

    commit e8ced39d
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Fri Jul 11 19:27:31 2008 -0400

        percpu_counter: new function percpu_counter_sum_and_set

As described in

	revert "percpu counter: clean up percpu_counter_sum_and_set()"

the new percpu_counter_sum_and_set() is racy against updates to the
cpu-local accumulators on other CPUs.  Revert that change.

This means that ext4 will be slow again.  But correct.
Reported-by: NEric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>		[2.6.27.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

02d21168

revert "percpu counter: clean up percpu_counter_sum_and_set()" · 71c5576f

由 Andrew Morton 提交于 12月 09, 2008

Revert

    commit 1f7c14c6
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Thu Oct 9 12:50:59 2008 -0400

        percpu counter: clean up percpu_counter_sum_and_set()

Before this patch we had the following:

percpu_counter_sum(): return the percpu_counter's value

percpu_counter_sum_and_set(): return the percpu_counter's value, copying
that value into the central value and zeroing the per-cpu counters before
returning.

After this patch, percpu_counter_sum_and_set() has gone, and
percpu_counter_sum() gets the old percpu_counter_sum_and_set()
functionality.

Problem is, as Eric points out, the old percpu_counter_sum_and_set()
functionality was racy and wrong.  It zeroes out counters on "other" cpus,
without holding any locks which will prevent races agaist updates from
those other CPUS.

This patch reverts 1f7c14c6.  This means
that percpu_counter_sum_and_set() still has the race, but
percpu_counter_sum() does not.

Note that this is not a simple revert - ext4 has since started using
percpu_counter_sum() for its dirty_blocks counter as well.

Note that this revert patch changes percpu_counter_sum() semantics.

Before the patch, a call to percpu_counter_sum() will bring the counter's
central counter mostly up-to-date, so a following percpu_counter_read()
will return a close value.

After this patch, a call to percpu_counter_sum() will leave the counter's
central accumulator unaltered, so a subsequent call to
percpu_counter_read() can now return a significantly inaccurate result.

If there is any code in the tree which was introduced after
e8ced39d was merged, and which depends
upon the new percpu_counter_sum() semantics, that code will break.
Reported-by: NEric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

71c5576f

percpu_counter: fix CPU unplug race in percpu_counter_destroy() · fd3d664f

由 Eric Dumazet 提交于 12月 09, 2008

We should first delete the counter from percpu_counters list
before freeing memory, or a percpu_counter_hotcpu_callback()
could dereference a NULL pointer.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fd3d664f

02 12月, 2008 1 次提交

lib/idr.c: fix rcu related race with idr_find · 6ff2d39b

由 Manfred Spraul 提交于 12月 01, 2008

2nd part of the fixes needed for
http://bugzilla.kernel.org/show_bug.cgi?id=11796.

When the idr tree is either grown or shrunk, then the update to the number
of layers and the top pointer were not atomic.  This race caused crashes.

The attached patch fixes that by replicating the layers counter in each
layer, thus idr_find doesn't need idp->layers anymore.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Clement Calmels <cboulte@gmail.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6ff2d39b

26 11月, 2008 1 次提交

debugobjects: add boot parameter default value · 3ae70205

由 Ingo Molnar 提交于 11月 26, 2008

Impact: add .config driven boot parameter default value

Right now debugobjects can only be activated if the debug_objects
boot parameter is passed in via the boot command line.

Make this more convenient (and randomizable) by also providing
a .config method. Enable it by default. (DEBUG_OBJECTS itself
is default-off)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3ae70205

25 11月, 2008 1 次提交

aoe: remove private mac address format function · 411c41ee

由 Harvey Harrison 提交于 11月 25, 2008

Add %pm to omit the colons when printing a mac address.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

411c41ee

20 11月, 2008 1 次提交

lib/scatterlist.c: fix kunmap() argument in sg_miter_stop() · f652c521

由 Arjan van de Ven 提交于 11月 19, 2008

kunmap() takes as argument the struct page that orginally got kmap()'d,
however the sg_miter_stop() function passed it the kernel virtual address
instead, resulting in weird stuff.

Somehow I ended up fixing this bug by accident while looking for a bug in
the same area.

Reported-by: kerneloops.org
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: <stable@kernel.org>		[2.6.27.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f652c521

17 11月, 2008 1 次提交

swiotlb: use coherent_dma_mask in alloc_coherent · 1e74f300

由 FUJITA Tomonori 提交于 11月 17, 2008

Impact: fix DMA buffer allocation coherency bug in certain configs

This patch fixes swiotlb to use dev->coherent_dma_mask in
swiotlb_alloc_coherent().

coherent_dma_mask is a subset of dma_mask (equal to it most of
the time), enumerating the address range that a given device
is able to DMA to/from in a cache-coherent way.

But currently, swiotlb uses dev->dma_mask in alloc_coherent()
implicitly via address_needs_mapping(), but alloc_coherent is really
supposed to use coherent_dma_mask.

This bug could break drivers that uses smaller coherent_dma_mask than
dma_mask (though the current code works for the majority that use the
same mask for coherent_dma_mask and dma_mask).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: tony.luck@intel.com
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1e74f300

14 11月, 2008 2 次提交

CRED: Inaugurate COW credentials · d84f4f99

由 David Howells 提交于 11月 14, 2008

Inaugurate copy-on-write credentials management.  This uses RCU to manage the
credentials pointer in the task_struct with respect to accesses by other tasks.
A process may only modify its own credentials, and so does not need locking to
access or modify its own credentials.

A mutex (cred_replace_mutex) is added to the task_struct to control the effect
of PTRACE_ATTACHED on credential calculations, particularly with respect to
execve().

With this patch, the contents of an active credentials struct may not be
changed directly; rather a new set of credentials must be prepared, modified
and committed using something like the following sequence of events:

	struct cred *new = prepare_creds();
	int ret = blah(new);
	if (ret < 0) {
		abort_creds(new);
		return ret;
	}
	return commit_creds(new);

There are some exceptions to this rule: the keyrings pointed to by the active
credentials may be instantiated - keyrings violate the COW rule as managing
COW keyrings is tricky, given that it is possible for a task to directly alter
the keys in a keyring in use by another task.

To help enforce this, various pointers to sets of credentials, such as those in
the task_struct, are declared const.  The purpose of this is compile-time
discouragement of altering credentials through those pointers.  Once a set of
credentials has been made public through one of these pointers, it may not be
modified, except under special circumstances:

  (1) Its reference count may incremented and decremented.

  (2) The keyrings to which it points may be modified, but not replaced.

The only safe way to modify anything else is to create a replacement and commit
using the functions described in Documentation/credentials.txt (which will be
added by a later patch).

This patch and the preceding patches have been tested with the LTP SELinux
testsuite.

This patch makes several logical sets of alteration:

 (1) execve().

     This now prepares and commits credentials in various places in the
     security code rather than altering the current creds directly.

 (2) Temporary credential overrides.

     do_coredump() and sys_faccessat() now prepare their own credentials and
     temporarily override the ones currently on the acting thread, whilst
     preventing interference from other threads by holding cred_replace_mutex
     on the thread being dumped.

     This will be replaced in a future patch by something that hands down the
     credentials directly to the functions being called, rather than altering
     the task's objective credentials.

 (3) LSM interface.

     A number of functions have been changed, added or removed:

     (*) security_capset_check(), ->capset_check()
     (*) security_capset_set(), ->capset_set()

     	 Removed in favour of security_capset().

     (*) security_capset(), ->capset()

     	 New.  This is passed a pointer to the new creds, a pointer to the old
     	 creds and the proposed capability sets.  It should fill in the new
     	 creds or return an error.  All pointers, barring the pointer to the
     	 new creds, are now const.

     (*) security_bprm_apply_creds(), ->bprm_apply_creds()

     	 Changed; now returns a value, which will cause the process to be
     	 killed if it's an error.

     (*) security_task_alloc(), ->task_alloc_security()

     	 Removed in favour of security_prepare_creds().

     (*) security_cred_free(), ->cred_free()

     	 New.  Free security data attached to cred->security.

     (*) security_prepare_creds(), ->cred_prepare()

     	 New. Duplicate any security data attached to cred->security.

     (*) security_commit_creds(), ->cred_commit()

     	 New. Apply any security effects for the upcoming installation of new
     	 security by commit_creds().

     (*) security_task_post_setuid(), ->task_post_setuid()

     	 Removed in favour of security_task_fix_setuid().

     (*) security_task_fix_setuid(), ->task_fix_setuid()

     	 Fix up the proposed new credentials for setuid().  This is used by
     	 cap_set_fix_setuid() to implicitly adjust capabilities in line with
     	 setuid() changes.  Changes are made to the new credentials, rather
     	 than the task itself as in security_task_post_setuid().

     (*) security_task_reparent_to_init(), ->task_reparent_to_init()

     	 Removed.  Instead the task being reparented to init is referred
     	 directly to init's credentials.

	 NOTE!  This results in the loss of some state: SELinux's osid no
	 longer records the sid of the thread that forked it.

     (*) security_key_alloc(), ->key_alloc()
     (*) security_key_permission(), ->key_permission()

     	 Changed.  These now take cred pointers rather than task pointers to
     	 refer to the security context.

 (4) sys_capset().

     This has been simplified and uses less locking.  The LSM functions it
     calls have been merged.

 (5) reparent_to_kthreadd().

     This gives the current thread the same credentials as init by simply using
     commit_thread() to point that way.

 (6) __sigqueue_alloc() and switch_uid()

     __sigqueue_alloc() can't stop the target task from changing its creds
     beneath it, so this function gets a reference to the currently applicable
     user_struct which it then passes into the sigqueue struct it returns if
     successful.

     switch_uid() is now called from commit_creds(), and possibly should be
     folded into that.  commit_creds() should take care of protecting
     __sigqueue_alloc().

 (7) [sg]et[ug]id() and co and [sg]et_current_groups.

     The set functions now all use prepare_creds(), commit_creds() and
     abort_creds() to build and check a new set of credentials before applying
     it.

     security_task_set[ug]id() is called inside the prepared section.  This
     guarantees that nothing else will affect the creds until we've finished.

     The calling of set_dumpable() has been moved into commit_creds().

     Much of the functionality of set_user() has been moved into
     commit_creds().

     The get functions all simply access the data directly.

 (8) security_task_prctl() and cap_task_prctl().

     security_task_prctl() has been modified to return -ENOSYS if it doesn't
     want to handle a function, or otherwise return the return value directly
     rather than through an argument.

     Additionally, cap_task_prctl() now prepares a new set of credentials, even
     if it doesn't end up using it.

 (9) Keyrings.

     A number of changes have been made to the keyrings code:

     (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
     	 all been dropped and built in to the credentials functions directly.
     	 They may want separating out again later.

     (b) key_alloc() and search_process_keyrings() now take a cred pointer
     	 rather than a task pointer to specify the security context.

     (c) copy_creds() gives a new thread within the same thread group a new
     	 thread keyring if its parent had one, otherwise it discards the thread
     	 keyring.

     (d) The authorisation key now points directly to the credentials to extend
     	 the search into rather pointing to the task that carries them.

     (e) Installing thread, process or session keyrings causes a new set of
     	 credentials to be created, even though it's not strictly necessary for
     	 process or session keyrings (they're shared).

(10) Usermode helper.

     The usermode helper code now carries a cred struct pointer in its
     subprocess_info struct instead of a new session keyring pointer.  This set
     of credentials is derived from init_cred and installed on the new process
     after it has been cloned.

     call_usermodehelper_setup() allocates the new credentials and
     call_usermodehelper_freeinfo() discards them if they haven't been used.  A
     special cred function (prepare_usermodeinfo_creds()) is provided
     specifically for call_usermodehelper_setup() to call.

     call_usermodehelper_setkeys() adjusts the credentials to sport the
     supplied keyring as the new session keyring.

(11) SELinux.

     SELinux has a number of changes, in addition to those to support the LSM
     interface changes mentioned above:

     (a) selinux_setprocattr() no longer does its check for whether the
     	 current ptracer can access processes with the new SID inside the lock
     	 that covers getting the ptracer's SID.  Whilst this lock ensures that
     	 the check is done with the ptracer pinned, the result is only valid
     	 until the lock is released, so there's no point doing it inside the
     	 lock.

(12) is_single_threaded().

     This function has been extracted from selinux_setprocattr() and put into
     a file of its own in the lib/ directory as join_session_keyring() now
     wants to use it too.

     The code in SELinux just checked to see whether a task shared mm_structs
     with other tasks (CLONE_VM), but that isn't good enough.  We really want
     to know if they're part of the same thread group (CLONE_THREAD).

(13) nfsd.

     The NFS server daemon now has to use the COW credentials to set the
     credentials it is going to use.  It really needs to pass the credentials
     down to the functions it calls, but it can't do that until other patches
     in this series have been applied.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NJames Morris <jmorris@namei.org>

d84f4f99

CRED: Rename is_single_threaded() to is_wq_single_threaded() · 6cc88bc4

由 David Howells 提交于 11月 14, 2008

Rename is_single_threaded() to is_wq_single_threaded() so that a new
is_single_threaded() can be created that refers to tasks rather than
waitqueues.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NJames Morris <jmorris@namei.org>

6cc88bc4

10 11月, 2008 1 次提交

cpumask: introduce new API, without changing anything, v3 · 984f2f37

由 Rusty Russell 提交于 11月 08, 2008

Impact: cleanup

Clean up based on feedback from Andrew Morton and others:

 - change to inline functions instead of macros
 - add __init to bootmem method
 - add a missing debug check
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

984f2f37

07 11月, 2008 1 次提交

cpumask: new API, v2 · cd83e42c

由 Rusty Russell 提交于 11月 07, 2008

- add cpumask_of()
- add free_bootmem_cpumask_var()
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cd83e42c

06 11月, 2008 1 次提交

cpumask: introduce new API, without changing anything · 2d3854a3

由 Rusty Russell 提交于 11月 05, 2008

Impact: introduce new APIs

We want to deprecate cpumasks on the stack, as we are headed for
gynormous numbers of CPUs.  Eventually, we want to head towards an
undefined 'struct cpumask' so they can never be declared on stack.

1) New cpumask functions which take pointers instead of copies.
   (cpus_* -> cpumask_*)

2) Several new helpers to reduce requirements for temporary cpumasks
   (cpumask_first_and, cpumask_next_and, cpumask_any_and)

3) Helpers for declaring cpumasks on or offstack for large NR_CPUS
   (cpumask_var_t, alloc_cpumask_var and free_cpumask_var)

4) 'struct cpumask' for explicitness and to mark new-style code.

5) Make iterator functions stop at nr_cpu_ids (a runtime constant),
   not NR_CPUS for time efficiency and for smaller dynamic allocations
   in future.

6) cpumask_copy() so we can allocate less than a full cpumask eventually
   (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask'
   definition eventually.

7) work_on_cpu() helper for doing task on a CPU, rather than saving old
   cpumask for current thread and manipulating it.

8) smp_call_function_many() which is smp_call_function_mask() except
   taking a cpumask pointer.

Note that this patch simply introduces the new functions and leaves
the obsolescent ones in place.  This is to simplify the transition
patches.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2d3854a3

04 11月, 2008 1 次提交

printk: ipv4 address digits printed in reverse order · b9ac9985

由 Harvey Harrison 提交于 11月 03, 2008

put_dec_trunc prints the digits in reverse order and is reversed
inside number(). Continue using put_dec_trunc, but reverse each quad
in ip4_addr_string.

[Noticed by Julius Volz]
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9ac9985

30 10月, 2008 3 次提交

Driver core: fix 'dynamic_debug' cmd line parameter · 11332830

由 Jason Baron 提交于 10月 27, 2008

In testing 2.6.28-rc1, I found that passing 'dynamic_printk' on the command
line didn't activate the debug code. The problem is that dynamic_printk_setup()
(which activates the debugging) is being called before dynamic_printk_init() is
called (which initializes infrastructure). Fix this by setting setting the
state to 'DYNAMIC_ENABLED_ALL' in dynamic_printk_setup(), which will also
cause all subsequent modules to have debugging automatically started, which is
probably the behavior we want.
Signed-off-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

11332830

printk: remove %p6 format specifier, fix up comments · 6b9a1066

由 Harvey Harrison 提交于 10月 29, 2008

Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b9a1066

printk: add %I4, %I6, %i4, %i6 format specifiers · 4aa99606

由 Harvey Harrison 提交于 10月 29, 2008

For use in printing IPv4, or IPv6 addresses in the usual way:

%i4 and %I4 are currently equivalent and print the address in
dot-separated decimal x.x.x.x

%I6 prints 16-bit network order hex with colon separators:
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx

%i6 omits the colons.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4aa99606

29 10月, 2008 1 次提交

printk: add %p6 format specifier for IPv6 addresses · 689afa7d

由 Harvey Harrison 提交于 10月 28, 2008

Takes a pointer to a IPv6 address and formats it in the usual
colon-separated hex format:
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx

Each 16 bit word is printed in network-endian byteorder.

%#p6 is also supported and will omit the colons.

%p6 is a replacement for NIP6_FMT and NIP6()
%#p6 is a replacement for NIP6_SEQFMT and NIP6()

Note that NIP6() took a struct in6_addr whereas this takes a pointer
to a struct in6_addr.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

689afa7d

28 10月, 2008 1 次提交

printk: add %pM format specifier for MAC addresses · dd45c9cf

由 Harvey Harrison 提交于 10月 27, 2008

Add format specifiers for printing out six colon-separated bytes:

MAC addresses (%pM):
xx:xx:xx:xx:xx:xx

%#pM is also supported and omits the colon separators.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd45c9cf

24 10月, 2008 2 次提交

swiotlb: remove panic for alloc_coherent failure · a2b89b59

由 FUJITA Tomonori 提交于 10月 23, 2008

swiotlb_alloc_coherent calls panic() when allocated swiotlb pages is
not fit for a device's dma mask. However, alloc_coherent failure is
not a disaster at all. AFAIK, none of other x86 and IA64 IOMMU
implementations don't crash in case of alloc_coherent failure.

There are some drivers that don't check alloc_coherent failure but not
many (about ten and I've already started to fix some of
them). alloc_coherent returns NULL in case of failure so it's likely
that these guilty drivers crash immediately. So swiotlb doesn't need
to call panic() just for them.
Reported-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a2b89b59

[SCSI] lib: string_get_size(): don't hang on zero; no decimals on exact · a8659597

由 H. Peter Anvin 提交于 10月 14, 2008

We would hang forever when passing a zero to string_get_size().
Furthermore, string_get_size() would produce decimals on a value small
enough to be exact.  Finally, a few formatting issues are inconsistent
with standard SI style guidelines.

- If the value is less than the divisor, skip the entire rounding
  step.  This prints out all small values including zero as integers,
  without decimals.
- Add a space between the value and the symbol for the unit,
  consistent with standard SI practice.
- Lower case k in kB since we are talking about powers of 10.
- Finally, change "int" to "unsigned int" in one place to shut up a
  gcc warning when compiling the code out-of-kernel for testing.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>

a8659597

21 10月, 2008 2 次提交

ftrace: rename FTRACE to FUNCTION_TRACER · 606576ce

由 Steven Rostedt 提交于 10月 06, 2008

Due to confusion between the ftrace infrastructure and the gcc profiling
tracer "ftrace", this patch renames the config options from FTRACE to
FUNCTION_TRACER. The other two names that are offspring from FTRACE
DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same.

This patch was generated mostly by script, and partially by hand.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

606576ce

Implement %pR to print struct resource content · 332d2e78

由 Linus Torvalds 提交于 10月 20, 2008

Add a %pR option to the kernel vsnprintf that prints the range of
addresses inside a struct resource passed by pointer.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

332d2e78

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多