提交 · f30d704fe1244c44a984d3d1f47bc648bcc6c9f7 · openeuler / Kernel

08 8月, 2013 1 次提交

aio: table lookup: verify ctx pointer · f30d704f

由 Benjamin LaHaise 提交于 8月 07, 2013

Another shortcoming of the table lookup patch was revealed where the pointer
was not being tested before being dereferenced.  Verify this to avoid the
NULL pointer dereference.
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

f30d704f

06 8月, 2013 2 次提交

staging/lustre: kiocb->ki_left is removed · 0bdd5ca5

由 Peng Tao 提交于 8月 06, 2013

We also missed ki_nbytes...

Cc: Kent Overstreet <koverstreet@google.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NPeng Tao <tao.peng@emc.com>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

0bdd5ca5

aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3" · da90382c

由 Benjamin LaHaise 提交于 8月 05, 2013

In the patch "aio: convert the ioctx list to table lookup v3", incorrect
handling in the ioctx_alloc() error path was introduced that lead to an
ioctx being added via ioctx_add_table() while freed when the ioctx_alloc()
call returned -EAGAIN due to hitting the aio_max_nr limit. Fix this by
only calling ioctx_add_table() as the last step in ioctx_alloc().

Also, several unnecessary rcu_dereference() calls were added that lead to
RCU warnings where the system was already protected by a spin lock for
accessing mm->ioctx_table.
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

da90382c

31 7月, 2013 3 次提交

aio: be defensive to ensure request batching is non-zero instead of BUG_ON() · 6878ea72

由 Benjamin LaHaise 提交于 7月 31, 2013

In the event that an overflow/underflow occurs while calculating req_batch,
clamp the minimum at 1 request instead of doing a BUG_ON().
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

6878ea72

aio: convert the ioctx list to table lookup v3 · db446a08

由 Benjamin LaHaise 提交于 7月 30, 2013

On Wed, Jun 12, 2013 at 11:14:40AM -0700, Kent Overstreet wrote:
> On Mon, Apr 15, 2013 at 02:40:55PM +0300, Octavian Purdila wrote:
> > When using a large number of threads performing AIO operations the
> > IOCTX list may get a significant number of entries which will cause
> > significant overhead. For example, when running this fio script:
> >
> > rw=randrw; size=256k ;directory=/mnt/fio; ioengine=libaio; iodepth=1
> > blocksize=1024; numjobs=512; thread; loops=100
> >
> > on an EXT2 filesystem mounted on top of a ramdisk we can observe up to
> > 30% CPU time spent by lookup_ioctx:
> >
> >  32.51%  [guest.kernel]  [g] lookup_ioctx
> >   9.19%  [guest.kernel]  [g] __lock_acquire.isra.28
> >   4.40%  [guest.kernel]  [g] lock_release
> >   4.19%  [guest.kernel]  [g] sched_clock_local
> >   3.86%  [guest.kernel]  [g] local_clock
> >   3.68%  [guest.kernel]  [g] native_sched_clock
> >   3.08%  [guest.kernel]  [g] sched_clock_cpu
> >   2.64%  [guest.kernel]  [g] lock_release_holdtime.part.11
> >   2.60%  [guest.kernel]  [g] memcpy
> >   2.33%  [guest.kernel]  [g] lock_acquired
> >   2.25%  [guest.kernel]  [g] lock_acquire
> >   1.84%  [guest.kernel]  [g] do_io_submit
> >
> > This patchs converts the ioctx list to a radix tree. For a performance
> > comparison the above FIO script was run on a 2 sockets 8 core
> > machine. This are the results (average and %rsd of 10 runs) for the
> > original list based implementation and for the radix tree based
> > implementation:
> >
> > cores         1         2         4         8         16        32
> > list       109376 ms  69119 ms  35682 ms  22671 ms  19724 ms  16408 ms
> > %rsd         0.69%      1.15%     1.17%     1.21%     1.71%     1.43%
> > radix       73651 ms  41748 ms  23028 ms  16766 ms  15232 ms   13787 ms
> > %rsd         1.19%      0.98%     0.69%     1.13%    0.72%      0.75%
> > % of radix
> > relative    66.12%     65.59%    66.63%    72.31%   77.26%     83.66%
> > to list
> >
> > To consider the impact of the patch on the typical case of having
> > only one ctx per process the following FIO script was run:
> >
> > rw=randrw; size=100m ;directory=/mnt/fio; ioengine=libaio; iodepth=1
> > blocksize=1024; numjobs=1; thread; loops=100
> >
> > on the same system and the results are the following:
> >
> > list        58892 ms
> > %rsd         0.91%
> > radix       59404 ms
> > %rsd         0.81%
> > % of radix
> > relative    100.87%
> > to list
>
> So, I was just doing some benchmarking/profiling to get ready to send
> out the aio patches I've got for 3.11 - and it looks like your patch is
> causing a ~1.5% throughput regression in my testing :/
... <snip>

I've got an alternate approach for fixing this wart in lookup_ioctx()...
Instead of using an rbtree, just use the reserved id in the ring buffer
header to index an array pointing the ioctx.  It's not finished yet, and
it needs to be tidied up, but is most of the way there.

		-ben
--
"Thought is the essence of where you are now."
--
kmo> And, a rework of Ben's code, but this was entirely his idea
kmo>		-Kent

bcrl> And fix the code to use the right mm_struct in kill_ioctx(), actually
free memory.
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

db446a08

aio: double aio_max_nr in calculations · 4cd81c3d

由 Benjamin LaHaise 提交于 7月 30, 2013

With the changes to use percpu counters for aio event ring size calculation,
existing increases to aio_max_nr are now insufficient to allow for the
allocation of enough events. Double the value used for aio_max_nr to account
for the doubling introduced by the percpu slack.
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

4cd81c3d

30 7月, 2013 9 次提交

aio: Kill ki_dtor · d29c445b

由 Kent Overstreet 提交于 3月 18, 2013

sock_aio_dtor() is dead code - and stuff that does need to do cleanup
can simply do it before calling aio_complete().
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

d29c445b

aio: Kill ki_users · 57282d8f

由 Kent Overstreet 提交于 5月 13, 2013

The kiocb refcount is only needed for cancellation - to ensure a kiocb
isn't freed while a ki_cancel callback is running. But if we restrict
ki_cancel callbacks to not block (which they currently don't), we can
simply drop the refcount.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

57282d8f

aio: Kill unneeded kiocb members · 8bc92afc

由 Kent Overstreet 提交于 2月 25, 2013

The old aio retry infrastucture needed to save the various arguments to
to aio operations. But with the retry infrastructure gone, we can trim
struct kiocb quite a bit.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

8bc92afc

aio: Kill aio_rw_vect_retry() · 73a7075e

由 Kent Overstreet 提交于 5月 09, 2013

This code doesn't serve any purpose anymore, since the aio retry
infrastructure has been removed.

This change should be safe because aio_read/write are also used for
synchronous IO, and called from do_sync_read()/do_sync_write() - and
there's no looping done in the sync case (the read and write syscalls).
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

73a7075e

aio: Don't use ctx->tail unnecessarily · 5ffac122

由 Kent Overstreet 提交于 5月 09, 2013

aio_complete() (arguably) needs to keep its own trusted copy of the tail
pointer, but io_getevents() doesn't have to use it - it's already using
the head pointer from the ring buffer.

So convert it to use the tail from the ring buffer so it touches fewer
cachelines and doesn't contend with the cacheline aio_complete() needs.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

5ffac122

aio: io_cancel() no longer returns the io_event · bec68faa

由 Kent Overstreet 提交于 5月 13, 2013

Originally, io_event() was documented to return the io_event if
cancellation succeeded - the io_event wouldn't be delivered via the ring
buffer like it normally would.

But this isn't what the implementation was actually doing; the only
driver implementing cancellation, the usb gadget code, never returned an
io_event in its cancel function. And aio_complete() was recently changed
to no longer suppress event delivery if the kiocb had been cancelled.

This gets rid of the unused io_event argument to kiocb_cancel() and
kiocb->ki_cancel(), and changes io_cancel() to return -EINPROGRESS if
kiocb->ki_cancel() returned success.

Also tweak the refcounting in kiocb_cancel() to make more sense.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

bec68faa

aio: percpu ioctx refcount · 723be6e3

由 Kent Overstreet 提交于 5月 28, 2013

This just converts the ioctx refcount to the new generic dynamic percpu
refcount code.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

723be6e3

aio: percpu reqs_available · e1bdd5f2

由 Kent Overstreet 提交于 4月 26, 2013

See the previous patch ("aio: reqs_active -> reqs_available") for why we
want to do this - this basically implements a per cpu allocator for
reqs_available that doesn't actually allocate anything.

Note that we need to increase the size of the ringbuffer we allocate,
since a single thread won't necessarily be able to use all the
reqs_available slots - some (up to about half) might be on other per cpu
lists, unavailable for the current thread.

We size the ringbuffer based on the nr_events userspace passed to
io_setup(), so this is a slight behaviour change - but nr_events wasn't
being used as a hard limit before, it was being rounded up to the next
page before so this doesn't change the actual semantics.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

e1bdd5f2

aio: reqs_active -> reqs_available · 34e83fc6

由 Kent Overstreet 提交于 4月 26, 2013

The number of outstanding kiocbs is one of the few shared things left that
has to be touched for every kiocb - it'd be nice to make it percpu.

We can make it per cpu by treating it like an allocation problem: we have
a maximum number of kiocbs that can be outstanding (i.e.  slots) - then we
just allocate and free slots, and we know how to write per cpu allocators.

So as prep work for that, we convert reqs_active to reqs_available.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

34e83fc6

17 7月, 2013 1 次提交

aio: fix build when migration is disabled · 0c45355f

由 Benjamin LaHaise 提交于 7月 17, 2013

When "fs/aio: Add support to aio ring pages migration" was applied, it
broke the build when CONFIG_MIGRATION was disabled.  Wrap the migration
code with a test for CONFIG_MIGRATION to fix this and save a few bytes
when migration is disabled.
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

0c45355f

16 7月, 2013 2 次提交

fs/aio: Add support to aio ring pages migration · 36bc08cc

由 Gu Zheng 提交于 7月 16, 2013

As the aio job will pin the ring pages, that will lead to mem migrated
failed. In order to fix this problem we use an anon inode to manage the aio ring
pages, and setup the migratepage callback in the anon inode's address space, so
that when mem migrating the aio ring pages will be moved to other mem node safely.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

36bc08cc

fs/anon_inode: Introduce a new lib function anon_inode_getfile_private() · 55708698

由 Gu Zheng 提交于 7月 16, 2013

Introduce a new lib function anon_inode_getfile_private(), it creates a new file
instance by hooking it up to an anonymous inode, and a dentry that describe the
"class" of the file, similar to anon_inode_getfile(), but each file holds a
single inode. Furthermore, anyone who wants to create a private anon file will
benefit from this change.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

55708698

15 7月, 2013 5 次提交

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 47188d39

由 Linus Torvalds 提交于 7月 14, 2013

Pull ext4 bugfixes from Ted Ts'o:
 "Various regression and bug fixes for ext4"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: don't allow ext4_free_blocks() to fail due to ENOMEM
  ext4: fix spelling errors and a comment in extent_status tree
  ext4: rate limit printk in buffer_io_error()
  ext4: don't show usrquota/grpquota twice in /proc/mounts
  ext4: fix warning in ext4_evict_inode()
  ext4: fix ext4_get_group_number()
  ext4: silence warning in ext4_writepages()

47188d39

L

Linux 3.11-rc1 · ad81f054
由 Linus Torvalds 提交于 7月 14, 2013

ad81f054

Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux · 54be8200

由 Linus Torvalds 提交于 7月 14, 2013

Pull slab update from Pekka Enberg:
 "Highlights:

  - Fix for boot-time problems on some architectures due to
    init_lock_keys() not respecting kmalloc_caches boundaries
    (Christoph Lameter)

  - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)

  - Fix for excessive slab freelist draining (Wanpeng Li)

  - SLUB and SLOB cleanups and fixes (various people)"

I ended up editing the branch, and this avoids two commits at the end
that were immediately reverted, and I instead just applied the oneliner
fix in between myself.

* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
  slub: Check for page NULL before doing the node_match check
  mm/slab: Give s_next and s_stop slab-specific names
  slob: Check for NULL pointer before calling ctor()
  slub: Make cpu partial slab support configurable
  slab: add kmalloc() to kernel API documentation
  slab: fix init_lock_keys
  slob: use DIV_ROUND_UP where possible
  slub: do not put a slab to cpu partial list when cpu_partial is 0
  mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
  mm/slub: Drop unnecessary nr_partials
  mm/slab: Fix /proc/slabinfo unwriteable for slab
  mm/slab: Sharing s_next and s_stop between slab and slub
  mm/slab: Fix drain freelist excessively
  slob: Rework #ifdeffery in slab.h
  mm, slab: moved kmem_cache_alloc_node comment to correct place

54be8200

slub: Check for page NULL before doing the node_match check · c25f195e

由 Steven Rostedt 提交于 1月 17, 2013

In the -rt kernel (mrg), we hit the following dump:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
PGD a2d39067 PUD b1641067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU 3
Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992
RIP: 0010:[<ffffffff811573f1>]  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
RSP: 0018:ffff8800a9b17d70  EFLAGS: 00010213
RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000
RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500
RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd
R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500
R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000
FS:  00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000)
Stack:
 ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011
 0000000001200011 0000000001200011 0000000000000000 0000000000000000
 00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd
Call Trace:
 [<ffffffff81202e08>] ? current_has_perm+0x68/0x80
 [<ffffffff81041cbd>] copy_process+0xdd/0x15b0
 [<ffffffff810a2125>] ? rt_up_read+0x25/0x30
 [<ffffffff8104369a>] do_fork+0x5a/0x360
 [<ffffffff8107c66b>] ? migrate_enable+0xeb/0x220
 [<ffffffff8100b068>] sys_clone+0x28/0x30
 [<ffffffff81527423>] stub_clone+0x13/0x20
 [<ffffffff81527152>] ? system_call_fastpath+0x16/0x1b
Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 <48> 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2
RIP  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
 RSP <ffff8800a9b17d70>
CR2: 0000000000000000
---[ end trace 0000000000000002 ]---

Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel
with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do
disable migration. But the SLUB code is relatively lockless, and the
spin_locks there are raw_spin_locks (not converted to mutexes), thus I
believe this bug can happen in mainline without -rt features. The -rt
patch is just good at triggering mainline bugs ;-)

Anyway, looking at where this crashed, it seems that the page variable
can be NULL when passed to the node_match() function (which does not
check if it is NULL). When this happens we get the above panic.

As page is only used in slab_alloc() to check if the node matches, if
it's NULL I'm assuming that we can say it doesn't and call the
__slab_alloc() code. Is this a correct assumption?
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c25f195e

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 41d9884c

由 Linus Torvalds 提交于 7月 14, 2013

Pull more vfs stuff from Al Viro:
 "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including
  making simple_lookup() usable for filesystems with non-NULL s_d_op,
  which allows us to get rid of quite a bit of ugliness"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  sunrpc: now we can just set ->s_d_op
  cgroup: we can use simple_lookup() now
  efivarfs: we can use simple_lookup() now
  make simple_lookup() usable for filesystems that set ->s_d_op
  configfs: don't open-code d_alloc_name()
  __rpc_lookup_create_exclusive: pass string instead of qstr
  rpc_create_*_dir: don't bother with qstr
  llist: llist_add() can use llist_add_batch()
  llist: fix/simplify llist_add() and llist_add_batch()
  fput: turn "list_head delayed_fput_list" into llist_head
  fs/file_table.c:fput(): add comment
  Safer ABI for O_TMPFILE

41d9884c

14 7月, 2013 17 次提交

A
sunrpc: now we can just set ->s_d_op · dae3794f
由 Al Viro 提交于 7月 14, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
dae3794f
A
cgroup: we can use simple_lookup() now · 786e1448
由 Al Viro 提交于 7月 14, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
786e1448
A
efivarfs: we can use simple_lookup() now · 6e8cd2cb
由 Al Viro 提交于 7月 14, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6e8cd2cb
A
make simple_lookup() usable for filesystems that set ->s_d_op · 74931da7
由 Al Viro 提交于 7月 14, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
74931da7
A
configfs: don't open-code d_alloc_name() · ec193cf5
由 Al Viro 提交于 7月 14, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ec193cf5

__rpc_lookup_create_exclusive: pass string instead of qstr · d3db90b0

由 Al Viro 提交于 7月 14, 2013

... and use d_hash_and_lookup() instead of open-coding it, for fsck sake...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d3db90b0

A
rpc_create_*_dir: don't bother with qstr · a95e691f
由 Al Viro 提交于 7月 14, 2013
```
just pass the name
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a95e691f

Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86 · 63345b47

由 Linus Torvalds 提交于 7月 13, 2013

Pull x86 platform driver updates from Matthew Garrett:
 "Nothing overly exciting here - a couple of new drivers that don't do a
  great deal, along with some miscellaneous fixes and a couple of small
  feature enablement patches"

* 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
  x86 platform drivers: fix gpio leak
  toshiba_acpi: Add dependency on SERIO_I8042
  asus-nb-wmi: set wapf=4 for ASUSTeK COMPUTER INC. 1015E/U
  Add trivial driver to disable Intel Smart Connect
  Add support driver for Intel Rapid Start Technology
  hp-wmi: add supports for POST code error
  asus-wmi: control wlan-led only if wapf == 4
  drivers/platform/x86/intel_ips: Convert to module_pci_driver
  asus-nb-wmi: ignore ALS notification key code
  asus-wmi: append newline to messages
  x86: asus-laptop: fix invalid point access
  x86: msi-laptop: fix memleak
  amilo-rfkill: Add dependency on SERIO_I8042
  dell-laptop: fix error return code in dell_init()
  hp-wmi: Enable hotkeys on some systems

63345b47

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 18fb38e2

由 Linus Torvalds 提交于 7月 13, 2013

Pull second round of input updates from Dmitry Torokhov:
 "An update to Elantech driver to support hardware v7, fix to the new
  cyttsp4 driver to use proper addressing, ads7846 device tree support
  and nspire-keypad got a small cleanup."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: nspire-keypad - replace magic offset with define
  Input: elantech - fix for newer hardware versions (v7)
  Input: cyttsp4 - use 16bit address for I2C/SPI communication
  Input: ads7846 - add device tree bindings
  Input: ads7846 - make sure we do not change platform data

18fb38e2

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · be9c6d91

由 Linus Torvalds 提交于 7月 13, 2013

Pull networking fixes from David Miller:
 "Just a bunch of small fixes and tidy ups:

   1) Finish the "busy_poll" renames, from Eliezer Tamir.

   2) Fix RCU stalls in IFB driver, from Ding Tianhong.

   3) Linearize buffers properly in tun/macvtap zerocopy code.

   4) Don't crash on rmmod in vxlan, from Pravin B Shelar.

   5) Spinlock used before init in alx driver, from Maarten Lankhorst.

   6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry
      Kravkov.

   7) Dummy and ifb driver load failure paths can oops, fixes from Tan
      Xiaojun and Ding Tianhong.

   8) Correct MTU calculations in IP tunnels, from Alexander Duyck.

   9) Account all TCP retransmits in SNMP stats properly, from Yuchung
      Cheng.

  10) atl1e and via-rhine do not handle DMA mapping failures properly,
      from Neil Horman.

  11) Various equal-cost multipath route fixes in ipv6 from Hannes
      Frederic Sowa"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
  ipv6: only static routes qualify for equal cost multipathing
  via-rhine: fix dma mapping errors
  atl1e: fix dma mapping warnings
  tcp: account all retransmit failures
  usb/net/r815x: fix cast to restricted __le32
  usb/net/r8152: fix integer overflow in expression
  net: access page->private by using page_private
  net: strict_strtoul is obsolete, use kstrtoul instead
  drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe
  drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe
  drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe
  net/usb: add relative mii functions for r815x
  net/tipc: use %*phC to dump small buffers in hex form
  qlcnic: Adding Maintainers.
  gre: Fix MTU sizing check for gretap tunnels
  pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
  pkt_sched: sch_qfq: improve efficiency of make_eligible
  gso: Update tunnel segmentation to support Tx checksum offload
  inet: fix spacing in assignment
  ifb: fix oops when loading the ifb failed
  ...

be9c6d91

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 03ce3ca4

由 Linus Torvalds 提交于 7月 13, 2013

Pull final round of SCSI updates from James Bottomley:
 "This is the remaining set of SCSI patches for the merge window.  It's
  mostly driver updates (scsi_debug, qla2xxx, storvsc, mp3sas).  There
  are also several bug fixes in fcoe, libfc, and megaraid_sas.  We also
  have a couple of core changes to try to make device destruction more
  deterministic"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (46 commits)
  [SCSI] scsi constants: command, sense key + additional sense strings
  fcoe: Reduce number of sparse warnings
  fcoe: Stop fc_rport_priv structure leak
  libfcoe: Fix meaningless log statement
  libfc: Differentiate echange timer cancellation debug statements
  libfc: Remove extra space in fc_exch_timer_cancel definition
  fcoe: fix the link error status block sparse warnings
  fcoe: Fix smatch warning in fcoe_fdmi_info function
  libfc: Reject PLOGI from nodes with incompatible role
  [SCSI] enable destruction of blocked devices which fail LUN scanning
  [SCSI] Fix race between starved list and device removal
  [SCSI] megaraid_sas: fix a bug for 64 bit arches
  [SCSI] scsi_debug: reduce duplication between prot_verify_read and prot_verify_write
  [SCSI] scsi_debug: simplify offset calculation for dif_storep
  [SCSI] scsi_debug: invalidate protection info for unmapped region
  [SCSI] scsi_debug: fix NULL pointer dereference with parameters dif=0 dix=1
  [SCSI] scsi_debug: fix incorrectly nested kmap_atomic()
  [SCSI] scsi_debug: fix invalid address passed to kunmap_atomic()
  [SCSI] mpt3sas: Bump driver version to v02.100.00.00
  [SCSI] mpt3sas: when async scanning is enabled then while scanning, devices are removed but their transport layer entries are not removed
  ...

03ce3ca4

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f8acc450

由 Linus Torvalds 提交于 7月 13, 2013

Pull scheduler fix from Thomas Gleixner:
 "Fix a potential deadlock versus hrtimers"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix HRTICK

f8acc450

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 505608d2

由 Linus Torvalds 提交于 7月 13, 2013

Pull irq updates from Thomas Gleixner:
 - core fix for missing round up in the generic irq chip implementation
 - new irq chip for MOXA SoCs
 - a few fixes and cleanups in the irqchip drivers

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: Add support for MOXA ART SoCs
  genirq: generic chip: Use DIV_ROUND_UP to calculate numchips
  irqchip: nvic: Fix wrong num_ct argument for irq_alloc_domain_generic_chips()
  irqchip: sun4i: Staticize sun4i_irq_ack()
  irqchip: vt8500: Staticize local symbols

505608d2

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0da27366

由 Linus Torvalds 提交于 7月 13, 2013

Pull timer updates from Thomas Gleixner:
 - watchdog fixes for full dynticks
 - improved debug output for full dynticks
 - remove an obsolete full dynticks check
 - two ARM SoC clocksource drivers for sharing across SoCs
 - tick broadcast fix for CPU hotplug

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick: broadcast: Check broadcast mode on CPU hotplug
  clocksource: arm_global_timer: Add ARM global timer support
  clocksource: Add Marvell Orion SoC timer
  nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs
  watchdog: Boot-disable by default on full dynticks
  watchdog: Rename confusing state variable
  watchdog: Register / unregister watchdog kthreads on sysctl control
  nohz: Warn if the machine can not perform nohz_full

0da27366

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 560ae371

由 Linus Torvalds 提交于 7月 13, 2013

Pull perf fixes from Thomas Gleixner:
 - fix for do_div() abuse on x86
 - locking fix in perf core
 - a pile of (build) fixes and cleanups in perf tools

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  perf/x86: Fix incorrect use of do_div() in NMI warning
  perf: Fix perf_lock_task_context() vs RCU
  perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenario
  perf: Clone child context from parent context pmu
  perf script: Fix broken include in Context.xs
  perf tools: Fix -ldw/-lelf link test when static linking
  perf tools: Revert regression in configuration of Python support
  perf tools: Fix perf version generation
  perf stat: Fix per-socket output bug for uncore events
  perf symbols: Fix vdso list searching
  perf evsel: Fix missing increment in sample parsing
  perf tools: Update symbol_conf.nr_events when processing attribute events
  perf tools: Fix new_term() missing free on error path
  perf tools: Fix parse_events_terms() segfault on error path
  perf evsel: Fix count parameter to read call in event_format__new
  perf tools: fix a typo of a Power7 event name
  perf tools: Fix -x/--exclude-other option for report command
  perf evlist: Enhance perf_evlist__start_workload()
  perf record: Remove -f/--force option
  perf record: Remove -A/--append option
  ...

560ae371

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4fa109b1

由 Linus Torvalds 提交于 7月 13, 2013

Pull core locking updates from Thomas Gleixner:
 "Header cleanup as requested by Linus"

(This is the "don't include support for ww_mutex in a header file that
everybody wants, when almost nobody wants the ww part" change)

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mutex: Move ww_mutex definitions to ww_mutex.h

4fa109b1

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 9663398a

由 Linus Torvalds 提交于 7月 13, 2013

Pull ARM SoC fixes from Olof Johansson:
 "This is our first set of fixes from arm-soc for 3.11.
   - A handful of build and warning fixes from Arnd
   - A collection of OMAP fixes
   - defconfig updates to make the default configs more useful for real
     use (and testing) out of the box on hardware

  And a couple of other small fixes.  Some of these have been recently
  applied but it's normally how we deal with fixes, with less bake time
  in -next needed"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (32 commits)
  arm: multi_v7_defconfig: Tweaks for omap and sunxi
  arm: multi_v7_defconfig: add i.MX options and NFS root
  ARM: omap2: add select of TI_PRIV_EDMA
  ARM: exynos: select PM_GENERIC_DOMAINS only when used
  ARM: ixp4xx: avoid circular header dependency
  ARM: OMAP: omap_common_late_init may be unused
  ARM: sti: move DEBUG_STI_UART into alphabetical order
  ARM: OMAP: build mach-omap code only if needed
  ARM: zynq: use DT_MACHINE_START
  ARM: omap5: omap5 has SCU and TWD
  ARM: OMAP2+: omap2plus_defconfig: Enable appended DTB support
  ARM: OMAP2+: Enable TI_EDMA in omap2plus_defconfig
  ARM: OMAP2+: omap2plus_defconfig: enable DRA752 thermal support by default
  ARM: OMAP2+: omap2plus_defconfig: enable TI bandgap driver
  ARM: OMAP2+: devices: remove duplicated include from devices.c
  ARM: OMAP3: igep0020: Set DSS pins in correct mux mode.
  ARM: OMAP2+: N900: enable N900-specific drivers even if device tree is enabled
  ARM: OMAP2+: Cocci spatch "ptr_ret.spatch"
  ARM: OMAP2+: Remove obsolete Makefile line
  ARM: OMAP5: Enable Cortex A15 errata 798181
  ...

9663398a

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功