提交 · e933a73f48e3b2d40cfa56d81e2646f194b5a66a · OpenHarmony / kernel_linux

14 8月, 2009 12 次提交

percpu: kill lpage first chunk allocator · e933a73f

由 Tejun Heo 提交于 8月 14, 2009

With x86 converted to embedding allocator, lpage doesn't have any user
left.  Kill it along with cpa handling code.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>

e933a73f

percpu: update embedding first chunk allocator to handle sparse units · c8826dd5

由 Tejun Heo 提交于 8月 14, 2009

Now that percpu core can handle very sparse units, given that vmalloc
space is large enough, embedding first chunk allocator can use any
memory to build the first chunk.  This patch teaches
pcpu_embed_first_chunk() about distances between cpus and to use
alloc/free callbacks to allocate node specific areas for each group
and use them for the first chunk.

This brings the benefits of embedding allocator to NUMA configurations
- no extra TLB pressure with the flexibility of unified dynamic
allocator and no need to restructure arch code to build memory layout
suitable for percpu.  With units put into atom_size aligned groups
according to cpu distances, using large page for dynamic chunks is
also easily possible with falling back to reuglar pages if large
allocation fails.

Embedding allocator users are converted to specify NULL
cpu_distance_fn, so this patch doesn't cause any visible behavior
difference.  Following patches will convert them.
Signed-off-by: NTejun Heo <tj@kernel.org>

c8826dd5

vmalloc: implement pcpu_get_vm_areas() · ca23e405

由 Tejun Heo 提交于 8月 14, 2009

To directly use spread NUMA memories for percpu units, percpu
allocator will be updated to allow sparsely mapping units in a chunk.
As the distances between units can be very large, this makes
allocating single vmap area for each chunk undesirable.  This patch
implements pcpu_get_vm_areas() and pcpu_free_vm_areas() which
allocates and frees sparse congruent vmap areas.

pcpu_get_vm_areas() take @offsets and @sizes array which define
distances and sizes of vmap areas.  It scans down from the top of
vmalloc area looking for the top-most address which can accomodate all
the areas.  The top-down scan is to avoid interacting with regular
vmallocs which can push up these congruent areas up little by little
ending up wasting address space and page table.

To speed up top-down scan, the highest possible address hint is
maintained.  Although the scan is linear from the hint, given the
usual large holes between memory addresses between NUMA nodes, the
scanning is highly likely to finish after finding the first hole for
the last unit which is scanned first.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Nick Piggin <npiggin@suse.de>

ca23e405

percpu: add pcpu_unit_offsets[] · fb435d52

由 Tejun Heo 提交于 8月 14, 2009

Currently units are mapped sequentially into address space.  This
patch adds pcpu_unit_offsets[] which allows units to be mapped to
arbitrary offsets from the chunk base address.  This is necessary to
allow sparse embedding which might would need to allocate address
ranges and memory areas which aren't aligned to unit size but
allocation atom size (page or large page size).  This also simplifies
things a bit by removing the need to calculate offset from unit
number.

With this change, there's no need for the arch code to know
pcpu_unit_size.  Update pcpu_setup_first_chunk() and first chunk
allocators to return regular 0 or -errno return code instead of unit
size or -errno.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: David S. Miller <davem@davemloft.net>

fb435d52

percpu: introduce pcpu_alloc_info and pcpu_group_info · fd1e8a1f

由 Tejun Heo 提交于 8月 14, 2009

Till now, non-linear cpu->unit map was expressed using an integer
array which maps each cpu to a unit and used only by lpage allocator.
Although how many units have been placed in a single contiguos area
(group) is known while building unit_map, the information is lost when
the result is recorded into the unit_map array.  For lpage allocator,
as all allocations are done by lpages and whether two adjacent lpages
are in the same group or not is irrelevant, this didn't cause any
problem.  Non-linear cpu->unit mapping will be used for sparse
embedding and this grouping information is necessary for that.

This patch introduces pcpu_alloc_info which contains all the
information necessary for initializing percpu allocator.
pcpu_alloc_info contains array of pcpu_group_info which describes how
units are grouped and mapped to cpus.  pcpu_group_info also has
base_offset field to specify its offset from the chunk's base address.
pcpu_build_alloc_info() initializes this field as if all groups are
allocated back-to-back as is currently done but this will be used to
sparsely place groups.

pcpu_alloc_info is a rather complex data structure which contains a
flexible array which in turn points to nested cpu_map arrays.

* pcpu_alloc_alloc_info() and pcpu_free_alloc_info() are provided to
  help dealing with pcpu_alloc_info.

* pcpu_lpage_build_unit_map() is updated to build pcpu_alloc_info,
  generalized and renamed to pcpu_build_alloc_info().
  @cpu_distance_fn may be NULL indicating that all cpus are of
  LOCAL_DISTANCE.

* pcpul_lpage_dump_cfg() is updated to process pcpu_alloc_info,
  generalized and renamed to pcpu_dump_alloc_info().  It now also
  prints which group each alloc unit belongs to.

* pcpu_setup_first_chunk() now takes pcpu_alloc_info instead of the
  separate parameters.  All first chunk allocators are updated to use
  pcpu_build_alloc_info() to build alloc_info and call
  pcpu_setup_first_chunk() with it.  This has the side effect of
  packing units for sparse possible cpus.  ie. if cpus 0, 2 and 4 are
  possible, they'll be assigned unit 0, 1 and 2 instead of 0, 2 and 4.

* x86 setup_pcpu_lpage() is updated to deal with alloc_info.

* sparc64 setup_per_cpu_areas() is updated to build alloc_info.

Although the changes made by this patch are pretty pervasive, it
doesn't cause any behavior difference other than packing of sparse
cpus.  It mostly changes how information is passed among
initialization functions and makes room for more flexibility.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>

fd1e8a1f

percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward · 033e48fb

由 Tejun Heo 提交于 8月 14, 2009

Unit map handling will be generalized and extended and used for
embedding sparse first chunk and other purposes.  Relocate two
unit_map related functions upward in preparation.  This patch just
moves the code without any actual change.
Signed-off-by: NTejun Heo <tj@kernel.org>

033e48fb

percpu: add @align to pcpu_fc_alloc_fn_t · 3cbc8565

由 Tejun Heo 提交于 8月 14, 2009

pcpu_fc_alloc_fn_t is about to see more interesting usage, add @align
parameter.
Signed-off-by: NTejun Heo <tj@kernel.org>

3cbc8565

percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() · 1d9d3257

由 Tejun Heo 提交于 8月 14, 2009

Now that all actual first chunk allocation and copying happen in the
first chunk allocators and helpers, there's no reason for
pcpu_setup_first_chunk() to try to determine @dyn_size automatically.
The only left user is page first chunk allocator.  Make it determine
dyn_size like other allocators and make @dyn_size mandatory for
pcpu_setup_first_chunk().
Signed-off-by: NTejun Heo <tj@kernel.org>

1d9d3257

percpu: drop @static_size from first chunk allocators · 9a773769

由 Tejun Heo 提交于 8月 14, 2009

First chunk allocators assume percpu areas have been linked using one
of PERCPU_*() macros and depend on __per_cpu_load symbol defined by
those macros, so there isn't much point in passing in static area size
explicitly when it can be easily calculated from __per_cpu_start and
__per_cpu_end.  Drop @static_size from all percpu first chunk
allocators and helpers.
Signed-off-by: NTejun Heo <tj@kernel.org>

9a773769

percpu: generalize first chunk allocator selection · f58dc01b

由 Tejun Heo 提交于 8月 14, 2009

Now that all first chunk allocators are in mm/percpu.c, it makes sense
to make generalize percpu_alloc kernel parameter. Define PCPU_FC_*
and set pcpu_chosen_fc using early_param() in mm/percpu.c. Arch code
can use the set value to determine which first chunk allocator to use.
Signed-off-by: NTejun Heo <tj@kernel.org>

f58dc01b

percpu: build first chunk allocators selectively · 08fc4580

由 Tejun Heo 提交于 8月 14, 2009

There's no need to build unused first chunk allocators in.  Define
CONFIG_NEED_PER_CPU_*_FIRST_CHUNK and let archs enable them
selectively.
Signed-off-by: NTejun Heo <tj@kernel.org>

08fc4580

percpu: rename 4k first chunk allocator to page · 00ae4064

由 Tejun Heo 提交于 8月 14, 2009

Page size isn't always 4k depending on arch and configuration.  Rename
4k first chunk allocator to page.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: David Howells <dhowells@redhat.com>

00ae4064

13 8月, 2009 2 次提交

perf: Rework/fix the whole read vs group stuff · 3dab77fb

由 Peter Zijlstra 提交于 8月 13, 2009

Replace PERF_SAMPLE_GROUP with PERF_SAMPLE_READ and introduce
PERF_FORMAT_GROUP to deal with group reads in a more generic
way.

This allows you to get group reads out of read() as well.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey J Ashford <cjashfor@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
LKML-Reference: <20090813103655.117411814@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3dab77fb

perf_counter: Provide hw_perf_counter_setup_online() APIs · 28402971

由 Ingo Molnar 提交于 8月 13, 2009

Provide weak aliases for hw_perf_counter_setup_online(). This is
used by the BTS patches (for v2.6.32), but it interacts with
fixes so propagate this upstream. (it has no effect as of yet)

Also export perf_counter_output() to architecture code.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

28402971

12 8月, 2009 1 次提交

NFS: Fix an O_DIRECT Oops... · 1ae88b2e

由 Trond Myklebust 提交于 8月 12, 2009

We can't call nfs_readdata_release()/nfs_writedata_release() without
first initialising and referencing args.context. Doing so inside
nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment()
causes an Oops.

We should rather be calling nfs_readdata_free()/nfs_writedata_free() in
those cases.

Looking at the O_DIRECT code, the "struct nfs_direct_req" is already
referencing the nfs_open_context for us. Since the readdata and writedata
structures carry a reference to that, we can simplify things by getting rid
of the extra nfs_open_context references, so that we can replace all
instances of nfs_readdata_release()/nfs_writedata_release().
Reported-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1ae88b2e

10 8月, 2009 2 次提交

locking, sched: Give waitqueue spinlocks their own lockdep classes · 2fc39111

由 Peter Zijlstra 提交于 8月 10, 2009

Give waitqueue spinlocks their own lockdep classes when they
are initialised from init_waitqueue_head().  This means that
struct wait_queue::func functions can operate other waitqueues.

This is used by CacheFiles to catch the page from a backing fs
being unlocked and to wake up another thread to take a copy of
it.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NTakashi Iwai <tiwai@suse.de>
Cc: linux-cachefs@redhat.com
Cc: torvalds@osdl.org
Cc: akpm@linux-foundation.org
LKML-Reference: <20090810113305.17284.81508.stgit@warthog.procyon.org.uk>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2fc39111

perf_counter: Correct PERF_SAMPLE_RAW output · a044560c

由 Peter Zijlstra 提交于 8月 10, 2009

PERF_SAMPLE_* output switches should unconditionally output the
correct format, as they are the only way to unambiguously parse
the PERF_EVENT_SAMPLE data.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249896447.17467.74.camel@twins>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a044560c

09 8月, 2009 2 次提交

perf_counter: Fix tracepoint sampling to be part of generic sampling · 3a43ce68

由 Frederic Weisbecker 提交于 8月 08, 2009

Based on Peter's comments, make tracepoint sampling generic
just like all the other sampling bits are. This is a rename
with no code changes:

- PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW
- struct perf_tracepoint_record to perf_raw_record

We want the system in place that transport tracepoints raw
samples events into the perf ring buffer to be generalized and
usable by any type of counter.

Reported-by; Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249698400-5441-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3a43ce68

perf_counter: Fix/complete ftrace event records sampling · f413cdb8

由 Frederic Weisbecker 提交于 8月 07, 2009

This patch implements the kernel side support for ftrace event
record sampling.

A new counter sampling attribute is added:

   PERF_SAMPLE_TP_RECORD

which requests ftrace events record sampling. In this case
if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
fires, we emit the tracepoint binary record to the
perfcounter event buffer, as a sample.

Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
record:

 perf record -f -F 1 -a -e workqueue:workqueue_execution
 perf report -D

 0x21e18 [0x48]: event: 9
 .
 . ... raw event: size 72 bytes
 .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
 .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
 .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
 .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
 .  0040:  e0 b1 31 81 ff ff ff ff                          .......
.
0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33

The raw ftrace binary record starts at offset 0020.

Translation:

 struct trace_entry {
	type		= 0x2b = 43;
	flags		= 1;
	preempt_count	= 2;
	pid		= 0xa = 10;
	tgid		= 0xa = 10;
 }

 thread_comm = "events/1"
 thread_pid  = 0xa = 10;
 func	    = 0xffffffff8131b1e0 = flush_to_ldisc()

What will come next?

 - Userspace support ('perf trace'), 'flight data recorder' mode
   for perf trace, etc.

 - The unconditional copy from the profiling callback brings
   some costs however if someone wants no such sampling to
   occur, and needs to be fixed in the future. For that we need
   to have an instant access to the perf counter attribute.
   This is a matter of a flag to add in the struct ftrace_event.

 - Take care of the events recursivity! Don't ever try to record
   a lock event for example, it seems some locking is used in
   the profiling fast path and lead to a tracing recursivity.
   That will be fixed using raw spinlock or recursivity
   protection.

 - [...]

 - Profit! :-)
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f413cdb8

08 8月, 2009 4 次提交

bzip2/lzma/gzip: fix comments describing decompressor API · daeb6b6f

由 Phillip Lougher 提交于 8月 06, 2009

Fix and improve comments in decompress/generic.h that describe the
decompressor API.  Also remove an unused definition, and rename INBUF_LEN
in lib/decompress_inflate.c to conform to bzip2/lzma naming.
Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

daeb6b6f

mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware · 4bfc4495

由 KAMEZAWA Hiroyuki 提交于 8月 06, 2009

At first, init_task's mems_allowed is initialized as this.
 init_task->mems_allowed == node_state[N_POSSIBLE]

And cpuset's top_cpuset mask is initialized as this
 top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY]

Before 2.6.29:
policy's mems_allowed is initialized as this.

  1. update tasks->mems_allowed by its cpuset->mems_allowed.
  2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask)

Updating task's mems_allowed in reference to top_cpuset's one.
cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.

In 2.6.30: After commit 58568d2a
("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
is initialized as this.

  1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask)

Here, if task is in top_cpuset, task->mems_allowed is not updated from
init's one.  Assume user excutes command as #numactrl --interleave=all
,....

  policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)

Then, policy's mems_allowd can includes a possible node, which has no pgdat.

MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this
directly.

  NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL

Then, what's we need is making policy->mems_allowed be aware of
N_HIGH_MEMORY.  This patch does that.  But to do so, extra nodemask will
be on statck.  Because I know cpumask has a new interface of
CPUMASK_ALLOC(), I added it to node.

This patch stands on old behavior.  But I feel this fix itself is just a
Band-Aid.  But to do fundametal fix, we have to take care of memory
hotplug and it takes time.  (task->mems_allowd should be N_HIGH_MEMORY, I
think.)

mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
should be includes only online nodes.

In old behavior, this is guaranteed by frequent reference to cpuset's
code.  Now, most of them are removed and mempolicy has to check it by
itself.

To do check, a few nodemask_t will be used for calculating nodemask.  But,
size of nodemask_t can be big and it's not good to allocate them on stack.

Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
NODEMASK_ALLOC/FREE shoudl be there.

[akpm@linux-foundation.org: cleanups & tweaks]
Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4bfc4495

vfs: add __destroy_inode · 2e00c97e

由 Christoph Hellwig 提交于 8月 07, 2009

When we want to tear down an inode that lost the add to the cache race
in XFS we must not call into ->destroy_inode because that would delete
the inode that won the race from the inode cache radix tree.

This patch provides the __destroy_inode helper needed to fix this,
the actual fix will be in th next patch. As XFS was the only reason
destroy_inode was exported we shift the export to the new __destroy_inode.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>

2e00c97e

vfs: fix inode_init_always calling convention · 54e34621

由 Christoph Hellwig 提交于 8月 07, 2009

Currently inode_init_always calls into ->destroy_inode if the additional
initialization fails. That's not only counter-intuitive because
inode_init_always did not allocate the inode structure, but in case of
XFS it's actively harmful as ->destroy_inode might delete the inode from
a radix-tree that has never been added. This in turn might end up
deleting the inode for the same inum that has been instanciated by
another process and cause lots of cause subtile problems.

Also in the case of re-initializing a reclaimable inode in XFS it would
free an inode we still want to keep alive.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>

54e34621

06 8月, 2009 2 次提交

Input: matrix_keypad - make matrix keymap size dynamic · d82f1c35

由 Eric Miao 提交于 8月 05, 2009

Remove assumption on the shift and size of rows/columns form
matrix_keypad driver.
Signed-off-by: NEric Miao <eric.y.miao@gmail.com>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

d82f1c35

ftrace: Fix perf-tracepoint OOPS · af6af30c

由 Peter Zijlstra 提交于 8月 05, 2009

Not all tracepoints are created equal, in specific the ftrace
tracepoints are created with TRACE_EVENT_FORMAT() which does
not generate the needed bits to tie them into perf counters.

For those events, don't create the 'id' file and fail
->profile_enable when their ID is specified through other
means.
Reported-by: NChris Mason <chris.mason@oracle.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1249497664.5890.4.camel@laptop>
[ v2: fix build error in the !CONFIG_EVENT_PROFILE case ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

af6af30c

05 8月, 2009 2 次提交

KVM: fix ack not being delivered when msi present · 5116d8f6

由 Michael S. Tsirkin 提交于 7月 26, 2009

kvm_notify_acked_irq does not check irq type, so that it sometimes
interprets msi vector as irq.  As a result, ack notifiers are not
called, which typially hangs the guest.  The fix is to track and
check irq type.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5116d8f6

tty-ldisc: make refcount be atomic_t 'users' count · 18eac1cc

由 Linus Torvalds 提交于 8月 03, 2009

This is pure preparation of changing the ldisc reference counting to be
a true refcount that defines the lifetime of the ldisc.  But this is a
purely syntactic change for now to make the next steps easier.

This patch should make no semantic changes at all. But I wanted to make
the ldisc refcount be an atomic (I will be touching it without locks
soon enough), and I wanted to rename it so that there isn't quite as
much confusion between 'ldo->refcount' (ldisk operations refcount) and
'ld->refcount' (ldisc refcount itself) in the same file.

So it's now an atomic 'ld->users' count. It still starts at zero,
despite having a reference from 'tty->ldisc', but that will change once
we turn it into a _real_ refcount.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Tested-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@mail.by>
Acked-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

18eac1cc

03 8月, 2009 2 次提交

mtd: fix the conversion from dev to mtd_info · 6afc4fdb

由 Saeed Bishara 提交于 7月 28, 2009

The patch fixes a bug when converting dev to mtd_info by using the
drvdata of the dev, the previous code used
container_of(dev, struct mtd_info, dev), but won't work for the mtdXro
devices as they created without being contained inside mtd_info structure.
Signed-off-by: NSaeed Bishara <saeed@marvell.com>
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>

6afc4fdb

mtd: let include/linux/mtd/partitions.h stand on its own · 7699ad35

由 Nicolas Pitre 提交于 6月 15, 2009

When declaring static MTD partitions in board specific code, only
including <include/linux/mtd/partitions.h> should suffice without
gcc nagging us with:

In file included from arch/arm/mach-kirkwood/sheevaplug-setup.c:14:
include/linux/mtd/partitions.h:50: warning: 'struct mtd_info' declared inside parameter list
include/linux/mtd/partitions.h:50: warning: its scope is only this definition or declaration, which is probably not what you want
include/linux/mtd/partitions.h:51: warning: 'struct mtd_info' declared inside parameter list
include/linux/mtd/partitions.h:61: warning: 'struct mtd_info' declared inside parameter list
include/linux/mtd/partitions.h:67: warning: 'struct mtd_info' declared inside parameter list
Signed-off-by: NNicolas Pitre <nico@marvell.com>
Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>

7699ad35

02 8月, 2009 1 次提交

perf_counter: Full task tracing · 9f498cc5

由 Peter Zijlstra 提交于 7月 23, 2009

In order to be able to distinguish between no samples due to
inactivity and no samples due to task ended, Arjan asked for
PERF_EVENT_EXIT events. This is useful to the boot delay
instrumentation (bootchart) app.

This patch changes the PERF_EVENT_FORK to be emitted on every
clone, and adds PERF_EVENT_EXIT to be emitted on task exit,
after the task's counters have been closed.

This task tracing is controlled through: attr.comm || attr.mmap
and through the new attr.task field.
Suggested-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
[ cleaned up perf_counter.h a bit ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9f498cc5

01 8月, 2009 1 次提交

block: Add a wrapper for setting minimum request size without a queue · 7c958e32

由 Martin K. Petersen 提交于 7月 31, 2009

Introduce blk_limits_io_min() and make blk_queue_io_min() call it.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7c958e32

31 7月, 2009 4 次提交

clocksource: Save mult_orig in clocksource_disable() · c7121843

由 Magnus Damm 提交于 7月 28, 2009

To fix the common case where ->enable() does not set up
mult, make sure mult_orig is saved in mult on disable.

Also add comments to explain why we do this.
Signed-off-by: NMagnus Damm <damm@igel.co.jp>
Cc: johnstul@us.ibm.com
Cc: lethal@linux-sh.org
Cc: akpm@linux-foundation.org
LKML-Reference: <20090618152432.10136.9932.sendpatchset@rx1.opensource.se>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

c7121843

cb710: use SG_MITER_TO_SG/SG_MITER_FROM_SG · 4b2a108c

由 Sebastian Andrzej Siewior 提交于 6月 22, 2009

the code allready uses flush_kernel_dcache_page(). This patch updates the
driver to the recent sg API changes which require that either SG_MITER_TO_SG
or SG_MITER_FROM_SG is set. SG_MITER_TO_SG calls flush_kernel_dcache_page()
in sg_mitter_stop()
Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc>
Acked-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NPierre Ossman <pierre@ossman.eu>

4b2a108c

lib/scatterlist: add a flags to signalize mapping direction · 6de7e356

由 Sebastian Andrzej Siewior 提交于 6月 18, 2009

sg_miter_start() is currently unaware of the direction of the copy
process (to or from the scatter list). It is important to know the
direction because the page has to be flushed in case the data written
is seen on a different mapping in user land on cache incoherent
architectures.
Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc>
Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NPierre Ossman <pierre@ossman.eu>

6de7e356

io context: fix ref counting · cbb4f264

由 Li Zefan 提交于 7月 31, 2009

Commit d9c7d394
("block: prevent possible io_context->refcount overflow") mistakenly
changed atomic_inc(&ioc->nr_tasks) to atomic_long_inc(&ioc->refcount).
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NNikanth Karthikesan <knikanth@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cbb4f264

30 7月, 2009 5 次提交

lguest and virtio: cleanup struct definitions to Linux style. · 1842f23c

由 Rusty Russell 提交于 7月 30, 2009

I've been doing this for years, and akpm picked me up on it about 12
months ago.  lguest partly serves as example code, so let's do it Right.

Also, remove two unused fields in struct vblk_info in the example launcher.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>

1842f23c

lguest: fix comment style · 2e04ef76

由 Rusty Russell 提交于 7月 30, 2009

I don't really notice it (except to begrudge the extra vertical
space), but Ingo does.  And he pointed out that one excuse of lguest
is as a teaching tool, it should set a good example.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>

2e04ef76

uio: mark uio.h functions __KERNEL__ only · 812ed032

由 Jiri Slaby 提交于 7月 29, 2009

To avoid userspace build failures such as:

.../linux/uio.h:37: error: expected `=', `,', `;', `asm' or `__attribute__' before `iov_length'
.../linux/uio.h:47: error: expected declaration specifiers or `...' before `size_t'

move uio functions inside a __KERNEL__ block.
Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
Acked-by: NSam Ravnborg <sam@ravnborg.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

812ed032

lib: flexible array implementation · 534acc05

由 Dave Hansen 提交于 7月 29, 2009

Once a structure goes over PAGE_SIZE*2, we see occasional allocation
failures.  Some people have chosen to switch over to things like vmalloc()
that will let them keep array-like access to such a large structures.
But, vmalloc() has plenty of downsides.

Here's an alternative.  I think it's what Andrew was suggesting here:

	http://lkml.org/lkml/2009/7/2/518

I call it a flexible array.  It does all of its work in PAGE_SIZE bits, so
never does an order>0 allocation.  The base level has
PAGE_SIZE-2*sizeof(int) bytes of storage for pointers to the second level.
 So, with a 32-bit arch, you get about 4MB (4183112 bytes) of total
storage when the objects pack nicely into a page.  It is half that on
64-bit because the pointers are twice the size.  There's a table detailing
this in the code.

There are kerneldocs for the functions, but here's an
overview:

flex_array_alloc() - dynamically allocate a base structure
flex_array_free() - free the array and all of the
		    second-level pages
flex_array_free_parts() - free the second-level pages, but
			  not the base (for static bases)
flex_array_put() - copy into the array at the given index
flex_array_get() - copy out of the array at the given index
flex_array_prealloc() - preallocate the second-level pages
			between the given indexes to
			guarantee no allocs will occur at
			put() time.

We could also potentially just pass the "element_size" into each of the
API functions instead of storing it internally.  That would get us one
more base pointer on 32-bit.

I've been testing this by running it in userspace.  The header and patch
that I've been using are here, as well as the little script I'm using to
generate the size table which goes in the kerneldocs.

	http://sr71.net/~dave/linux/flexarray/

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

534acc05

pps.h needs <linux/types.h> · f5a55efa

由 Dave Jones 提交于 7月 29, 2009

Found with make headers_check

/usr/include/linux/pps.h:52: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: NDave Jones <davej@redhat.com>
Cc: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5a55efa

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多