提交 · 88b8dac0a14c511ff41486b83a8c3d688936eec0 · openanolis / cloud-kernel

24 7月, 2012 2 次提交

sched: Improve scalability via 'CPU buddies', which withstand random perturbations · 970e1789

由 Mike Galbraith 提交于 6月 12, 2012

Traversing an entire package is not only expensive, it also leads to tasks
bouncing all over a partially idle and possible quite large package.  Fix
that up by assigning a 'buddy' CPU to try to motivate.  Each buddy may try
to motivate that one other CPU, if it's busy, tough, it may then try its
SMT sibling, but that's all this optimization is allowed to cost.

Sibling cache buddies are cross-wired to prevent bouncing.

4 socket 40 core + SMT Westmere box, single 30 sec tbench runs, higher is better:

 clients     1       2       4        8       16       32       64      128
 ..........................................................................
 pre        30      41     118      645     3769     6214    12233    14312
 post      299     603    1211     2418     4697     6847    11606    14557

A nice increase in performance.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1339471112.7352.32.camel@marge.simpson.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

970e1789

cpusets, hotplug: Restructure functions that are invoked during hotplug · 7ddf96b0

由 Srivatsa S. Bhat 提交于 5月 24, 2012

Separate out the cpuset related handling for CPU/Memory online/offline.
This also helps us exploit the most obvious and basic level of optimization
that any notification mechanism (CPU/Mem online/offline) has to offer us:
"We *know* why we have been invoked. So stop pretending that we are lost,
and do only the necessary amount of processing!".

And while at it, rename scan_for_empty_cpusets() to
scan_cpusets_upon_hotplug(), which is more appropriate considering how
it is restructured.
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141650.3692.48637.stgit@srivatsabhat.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7ddf96b0

22 7月, 2012 2 次提交

Remove SYSTEM_SUSPEND_DISK system state · bff9d186

由 Rafael J. Wysocki 提交于 7月 21, 2012

The SYSTEM_SUSPEND_DISK system state is never used, so drop it.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bff9d186

printk: Implement some unlocked kmsg_dump functions · 533827c9

由 Anton Vorontsov 提交于 7月 20, 2012

If used from KDB, the locked variants are prone to deadlocks (suppose we
got to the debugger w/ the logbuf lock held).

So, we have to implement a few routines that grab no logbuf lock.

Yet we don't need these functions in modules, so we don't export them.
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

533827c9

19 7月, 2012 1 次提交

Make wait_for_device_probe() also do scsi_complete_async_scans() · eea03c20

由 Linus Torvalds 提交于 7月 18, 2012

Commit a7a20d10 ("sd: limit the scope of the async probe domain")
make the SCSI device probing run device discovery in it's own async
domain.

However, as a result, the partition detection was no longer synchronized
by async_synchronize_full() (which, despite the name, only synchronizes
the global async space, not all of them).  Which in turn meant that
"wait_for_device_probe()" would not wait for the SCSI partitions to be
parsed.

And "wait_for_device_probe()" was what the boot time init code relied on
for mounting the root filesystem.

Now, most people never noticed this, because not only is it
timing-dependent, but modern distributions all use initrd.  So the root
filesystem isn't actually on a disk at all.  And then before they
actually mount the final disk filesystem, they will have loaded the
scsi-wait-scan module, which not only does the expected
wait_for_device_probe(), but also does scsi_complete_async_scans().

[ Side note: scsi_complete_async_scans() had also been partially broken,
  but that was fixed in commit 43a8d39d ("fix async probe
  regression"), so that same commit a7a20d10 had actually broken
  setups even if you used scsi-wait-scan explicitly ]

Solve this problem by just moving the scsi_complete_async_scans() call
into wait_for_device_probe().  Everybody who wants to wait for device
probing to finish really wants the SCSI probing to complete, so there's
no reason not to do this.

So now "wait_for_device_probe()" really does what the name implies, and
properly waits for device probing to finish.  This also removes the now
unnecessary extra calls to scsi_complete_async_scans().
Reported-and-tested-by: NArtem S. Tashkinov <t.artem@mailcity.com>
Cc: Dan Williams <dan.j.williams@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: linux-scsi <linux-scsi@vger.kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eea03c20

18 7月, 2012 2 次提交

libceph: fix messenger retry · 5bdca4e0

由 Sage Weil 提交于 7月 10, 2012

In ancient times, the messenger could both initiate and accept connections.
An artifact if that was data structures to store/process an incoming
ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
Sadly, the negotiation code was referencing those structures and ignoring
important information (like the peer's connect_seq) from the correct ones.

Among other things, this fixes tight reconnect loops where the server sends
RETRY_SESSION and we (the client) retries with the same connect_seq as last
time. This bug pretty easily triggered by injecting socket failures on the
MDS and running some fs workload like workunits/direct_io/test_sync_io.
Signed-off-by: NSage Weil <sage@inktank.com>

5bdca4e0

PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND · d9914cf6

由 Michael Kerrisk 提交于 7月 17, 2012

As discussed in
http://thread.gmane.org/gmane.linux.kernel/1249726/focus=1288990,
the capability introduced in 4d7e30d9
to govern EPOLLWAKEUP seems misnamed: this capability is about governing
the ability to suspend the system, not using a particular API flag
(EPOLLWAKEUP). We should make the name of the capability more general
to encourage reuse in related cases. (Whether or not this capability
should also be used to govern the use of /sys/power/wake_lock is a
question that needs to be separately resolved.)

This patch renames the capability to CAP_BLOCK_SUSPEND. In order to ensure
that the old capability name doesn't make it out into the wild, could you
please apply and push up the tree to ensure that it is incorporated
for the 3.5 release.
Signed-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

d9914cf6

12 7月, 2012 5 次提交

memblock: free allocated memblock_reserved_regions later · 29f67386

由 Yinghai Lu 提交于 7月 11, 2012

memblock_free_reserved_regions() calls memblock_free(), but
memblock_free() would double reserved.regions too, so we could free the
old range for reserved.regions.

Also tj said there is another bug which could be related to this.

| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing.  We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.

in that case, when DEBUG_PAGEALLOC, will get panic:

     memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
  BUG: unable to handle kernel paging request at ffff88102febd948
  IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
  PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  CPU 0
  Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
  RIP: 0010:[<ffffffff836a5774>]  [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155

See the discussion at https://lkml.org/lkml/2012/6/13/469

So try to allocate with PAGE_SIZE alignment and free it later.
Reported-by: NSasha Levin <levinsasha928@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29f67386

mm: sparse: fix usemap allocation above node descriptor section · 99ab7b19

由 Yinghai Lu 提交于 7月 11, 2012

After commit f5bf18fa ("bootmem/sparsemem: remove limit constraint
in alloc_bootmem_section"), usemap allocations may easily be placed
outside the optimal section that holds the node descriptor, even if
there is space available in that section.  This results in unnecessary
hotplug dependencies that need to have the node unplugged before the
section holding the usemap.

The reason is that the bootmem allocator doesn't guarantee a linear
search starting from the passed allocation goal but may start out at a
much higher address absent an upper limit.

Fix this by trying the allocation with the limit at the section end,
then retry without if that fails.  This keeps the fix from f5bf18fa
of not panicking if the allocation does not fit in the section, but
still makes sure to try to stay within the section at first.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>	[3.3.x, 3.4.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

99ab7b19

memory hotplug: fix invalid memory access caused by stale kswapd pointer · d8adde17

由 Jiang Liu 提交于 7月 11, 2012

kswapd_stop() is called to destroy the kswapd work thread when all memory
of a NUMA node has been offlined.  But kswapd_stop() only terminates the
work thread without resetting NODE_DATA(nid)->kswapd to NULL.  The stale
pointer will prevent kswapd_run() from creating a new work thread when
adding memory to the memory-less NUMA node again.  Eventually the stale
pointer may cause invalid memory access.

An example stack dump as below. It's reproduced with 2.6.32, but latest
kernel has the same issue.

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: [<ffffffff81051a94>] exit_creds+0x12/0x78
  PGD 0
  Oops: 0000 [#1] SMP
  last sysfs file: /sys/devices/system/memory/memory391/state
  CPU 11
  Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod
  Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default #92 Tecal RH2285
  RIP: 0010:exit_creds+0x12/0x78
  RSP: 0018:ffff8806044f1d78  EFLAGS: 00010202
  RAX: 0000000000000000 RBX: ffff880604f22140 RCX: 0000000000019502
  RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
  RBP: ffff880604f22150 R08: 0000000000000000 R09: ffffffff81a4dc10
  R10: 00000000000032a0 R11: ffff880006202500 R12: 0000000000000000
  R13: 0000000000c40000 R14: 0000000000008000 R15: 0000000000000001
  FS:  00007fbc03d066f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000000 CR3: 000000060f029000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process sh (pid: 7949, threadinfo ffff8806044f0000, task ffff880603d7c600)
  Stack:
   ffff880604f22140 ffffffff8103aac5 ffff880604f22140 ffffffff8104d21e
   ffff880006202500 0000000000008000 0000000000c38000 ffffffff810bd5b1
   0000000000000000 ffff880603d7c600 00000000ffffdd29 0000000000000003
  Call Trace:
    __put_task_struct+0x5d/0x97
    kthread_stop+0x50/0x58
    offline_pages+0x324/0x3da
    memory_block_change_state+0x179/0x1db
    store_mem_state+0x9e/0xbb
    sysfs_write_file+0xd0/0x107
    vfs_write+0xad/0x169
    sys_write+0x45/0x6e
    system_call_fastpath+0x16/0x1b
  Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0
  RIP  exit_creds+0x12/0x78
   RSP <ffff8806044f1d78>
  CR2: 0000000000000000

[akpm@linux-foundation.org: add pglist_data.kswapd locking comments]
Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Acked-by: NDavid Rientjes <rientjes@google.com>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8adde17

timekeeping: Provide hrtimer update function · f6c06abf

由 Thomas Gleixner 提交于 7月 10, 2012

To finally fix the infamous leap second issue and other race windows
caused by functions which change the offsets between the various time
bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a
function which atomically gets the current monotonic time and updates
the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic
overhead. The previous patch which provides ktime_t offsets allows us
to make this function almost as cheap as ktime_get() which is going to
be replaced in hrtimer_interrupt().
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPrarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f6c06abf

hrtimer: Provide clock_was_set_delayed() · f55a6faa

由 John Stultz 提交于 7月 10, 2012

clock_was_set() cannot be called from hard interrupt context because
it calls on_each_cpu().

For fixing the widely reported leap seconds issue it is necessary to
call it from hard interrupt context, i.e. the timer tick code, which
does the timekeeping updates.

Provide a new function which denotes it in the hrtimer cpu base
structure of the cpu on which it is called and raise the hrtimer
softirq. We then execute the clock_was_set() notificiation from
softirq context in run_hrtimer_softirq(). The hrtimer softirq is
rarely used, so polling the flag there is not a performance issue.

[ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
  rid of all this ifdeffery ASAP ]
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Reported-by: NJan Engelhardt <jengelh@inai.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPrarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f55a6faa

11 7月, 2012 1 次提交

PCI: EHCI: fix crash during suspend on ASUS computers · dbf0e4c7

由 Alan Stern 提交于 7月 09, 2012

Quite a few ASUS computers experience a nasty problem, related to the
EHCI controllers, when going into system suspend.  It was observed
that the problem didn't occur if the controllers were not put into the
D3 power state before starting the suspend, and commit
151b6128 (USB: EHCI: fix crash during
suspend on ASUS computers) was created to do this.

It turned out this approach messed up other computers that didn't have
the problem -- it prevented USB wakeup from working.  Consequently
commit c2fb8a3f (USB: add
NO_D3_DURING_SLEEP flag and revert 151b6128) was merged; it
reverted the earlier commit and added a whitelist of known good board
names.

Now we know the actual cause of the problem.  Thanks to AceLan Kao for
tracking it down.

According to him, an engineer at ASUS explained that some of their
BIOSes contain a bug that was added in an attempt to work around a
problem in early versions of Windows.  When the computer goes into S3
suspend, the BIOS tries to verify that the EHCI controllers were first
quiesced by the OS.  Nothing's wrong with this, but the BIOS does it
by checking that the PCI COMMAND registers contain 0 without checking
the controllers' power state.  If the register isn't 0, the BIOS
assumes the controller needs to be quiesced and tries to do so.  This
involves making various MMIO accesses to the controller, which don't
work very well if the controller is already in D3.  The end result is
a system hang or memory corruption.

Since the value in the PCI COMMAND register doesn't matter once the
controller has been suspended, and since the value will be restored
anyway when the controller is resumed, we can work around the BIOS bug
simply by setting the register to 0 during system suspend.  This patch
(as1590) does so and also reverts the second commit mentioned above,
which is now unnecessary.

In theory we could do this for every PCI device.  However to avoid
introducing new problems, the patch restricts itself to EHCI host
controllers.

Finally the affected systems can suspend with USB wakeup working
properly.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=37632
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42728Based-on-patch-by: NAceLan Kao <acelan.kao@canonical.com>
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Tested-by: NDâniel Fraga <fragabr@gmail.com>
Tested-by: NJavier Marcet <jmarcet@gmail.com>
Tested-by: NAndrey Rahmatullin <wrar@wrar.name>
Tested-by: NOleksij Rempel <bug-track@fisher-privat.net>
Tested-by: NPavel Pisa <pisa@cmp.felk.cvut.cz>
Cc: stable <stable@vger.kernel.org>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

dbf0e4c7

07 7月, 2012 1 次提交

security: Minor improvements to no_new_privs documentation · c540521b

由 Andy Lutomirski 提交于 7月 05, 2012

The documentation didn't actually mention how to enable no_new_privs.
This also adds a note about possible interactions between
no_new_privs and LSMs (i.e. why teaching systemd to set no_new_privs
is not necessarily a good idea), and it references the new docs
from include/linux/prctl.h.
Suggested-by: NRob Landley <rob@landley.net>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

c540521b

06 7月, 2012 1 次提交

sched/nohz: Rewrite and fix load-avg computation -- again · 5167e8d5

由 Peter Zijlstra 提交于 6月 22, 2012

Thanks to Charles Wang for spotting the defects in the current code:

 - If we go idle during the sample window -- after sampling, we get a
   negative bias because we can negate our own sample.

 - If we wake up during the sample window we get a positive bias
   because we push the sample to a known active period.

So rewrite the entire nohz load-avg muck once again, now adding
copious documentation to the code.
Reported-and-tested-by: NDoug Smythies <dsmythies@telus.net>
Reported-and-tested-by: NCharles Wang <muming.wq@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1340373782.18025.74.camel@twins
[ minor edits ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

5167e8d5

05 7月, 2012 2 次提交

gpio: fix bits conflict for gpio flags · f567fde2

由 Laxman Dewangan 提交于 6月 20, 2012

The bit 2 and 3 in GPIO flag are allocated for the
flag OPEN_DRAIN/OPEN_SOURCE. These bits are reused
for the flag EXPORT/EXPORT_CHANGEABLE and so creating
conflict.
Fix this conflict by assigning bit 4 and 5 for the
flag EXPORT/EXPORT_CHANGEABLE.
Signed-off-by: NLaxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>

f567fde2

aio: make kiocb->private NUll in init_sync_kiocb() · 2dfd0603

由 Junxiao Bi 提交于 6月 27, 2012

Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
commit a11f7e63 ocfs2: serialize unaligned aio, the unaligned
io flag is involved in it to serialize the unaligned aio. As
*private is not initialized in init_sync_kiocb() of do_sync_write(),
this unaligned io flag may be unexpectly set in an aligned dio.
And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
will hang forever at ocfs2_aiodio_wait() in ocfs2_file_aio_write().
Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJoel Becker <jlbec@evilplan.org>

2dfd0603

04 7月, 2012 2 次提交

rpmsg: make sure inflight messages don't invoke just-removed callbacks · 15fd943a

由 Ohad Ben-Cohen 提交于 6月 07, 2012

When inbound messages arrive, rpmsg core looks up their associated
endpoint (by destination address) and then invokes their callback.

We've made sure that endpoints will never be de-allocated after they
were found by rpmsg core, but we also need to protect against the
(rare) scenario where the rpmsg driver was just removed, and its
callback function isn't available anymore.

This is achieved by introducing a callback mutex, which must be taken
before the callback is invoked, and, obviously, before it is removed.

Cc: stable <stable@vger.kernel.org>
Reported-by: NFernando Guzman Lugo <fernando.lugo@ti.com>
Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>

15fd943a

rpmsg: avoid premature deallocation of endpoints · 5a081caa

由 Ohad Ben-Cohen 提交于 6月 06, 2012

When an inbound message arrives, the rpmsg core looks up its
associated endpoint and invokes the registered callback.

If a message arrives while its endpoint is being removed (because
the rpmsg driver was removed, or a recovery of a remote processor
has kicked in) we must ensure atomicity, i.e.:

- Either the ept is removed before it is found

or

- The ept is found but will not be freed until the callback returns

This is achieved by maintaining a per-ept reference count, which,
when drops to zero, will trigger deallocation of the ept.

With this in hand, it is now forbidden to directly deallocate
epts once they have been added to the endpoints idr.

Cc: stable <stable@vger.kernel.org>
Reported-by: NFernando Guzman Lugo <fernando.lugo@ti.com>
Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>

5a081caa

03 7月, 2012 2 次提交

KVM: Pass kvm_irqfd to functions · d4db2935

由 Alex Williamson 提交于 6月 29, 2012

Prune this down to just the struct kvm_irqfd so we can avoid
changing function definition for every flag or field we use.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d4db2935

Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation" · cba6d0d6

由 Paul E. McKenney 提交于 7月 02, 2012

This reverts commit 616c310e.
(Move PREEMPT_RCU preemption to switch_to() invocation).
Testing by Sasha Levin <levinsasha928@gmail.com> showed that this
can result in deadlock due to invoking the scheduler when one of
the runqueue locks is held.  Because this commit was simply a
performance optimization, revert it.
Reported-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NSasha Levin <levinsasha928@gmail.com>

cba6d0d6

01 7月, 2012 1 次提交

linux/irq.h: fix kernel-doc warning · 87fac288

由 Randy Dunlap 提交于 6月 30, 2012

Fix kernel-doc warning.  This struct member was removed in commit
87568264 ("irq: Remove irq_chip->release()") so remove its
associated kernel-doc entry also.

  Warning(include/linux/irq.h:338): Excess struct/union/enum/typedef member 'release' description in 'irq_chip'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87fac288

21 6月, 2012 3 次提交

vga_switcheroo: Add include guard · d3decf3a

由 Ozan Çağlayan 提交于 6月 14, 2012

Guard vga_switcheroo.h against multiple inclusion.
Signed-off-by: NOzan Çağlayan <ozancag@gmail.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

d3decf3a

Viresh has moved · 10d8935f

由 Viresh Kumar 提交于 6月 20, 2012

viresh.kumar@st.com email-id doesn't exist anymore as I have left the
company.  Replace ST's id with viresh.linux@gmail.com.

It also updates .mailmap file to fix address for 'git shortlog'
Signed-off-by: NViresh Kumar <viresh.linux@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

10d8935f

mm: fix slab->page _count corruption when using slub · abca7c49

由 Pravin B Shelar 提交于 6月 20, 2012

On arches that do not support this_cpu_cmpxchg_double() slab_lock is used
to do atomic cmpxchg() on double word which contains page->_count.  The
page count can be changed from get_page() or put_page() without taking
slab_lock.  That corrupts page counter.

Fix it by moving page->_count out of cmpxchg_double data.  So that slub
does no change it while updating slub meta-data in struct page.

[akpm@linux-foundation.org: use standard comment layout, tweak comment text]
Reported-by: NAmey Bhide <abhide@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

abca7c49

19 6月, 2012 1 次提交

kmsg - kmsg_dump() fix CONFIG_PRINTK=n compilation · 246f6f2f

由 Kay Sievers 提交于 6月 19, 2012

Signed-off-by: NKay Sievers <kay@vrfy.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
Reported-by: NFengguang Wu <wfg@linux.intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

246f6f2f

18 6月, 2012 2 次提交

ftrace: Make all inline tags also include notrace · 93b3cca1

由 Steven Rostedt 提交于 6月 14, 2012

Commit 5963e317 ("ftrace/x86: Do not change stacks in DEBUG when
calling lockdep") prevented lockdep calls from the int3 breakpoint handler
from reseting the stack if a function that was called was in the process
of being converted for tracing and had a breakpoint on it. The idea is,
before calling the lockdep code, do a load_idt() to the special IDT that
kept the breakpoint stack from reseting. This worked well as a quick fix
for this kernel release, until a certain config caused a lockup in the
function tracer start up tests.

Investigating it, I found that the load_idt that was used to prevent
the int3 from changing stacks was itself being traced!

Even though the config had CONFIG_OPTIMIZE_INLINING disabled, and
all 'inline' tags were set to always inline, there were still cases that
it did not inline! This was caused by CONFIG_PARAVIRT_GUEST, where it
would add a pointer to the native_load_idt() which made that function
to be traced.

Commit 45959ee7 ("ftrace: Do not function trace inlined functions")
only touched the 'inline' tags when CONFIG_OPMITIZE_INLINING was enabled.
PARAVIRT_GUEST shows that this was not enough and we need to also
mark always_inline with notrace as well.
Reported-by: NFengguang Wu <wfg@linux.intel.com>
Tested-by: NFengguang Wu <wfg@linux.intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

93b3cca1

NFSv4.1: Fix umount when filelayout DS is also the MDS · 2a4c8994

由 Trond Myklebust 提交于 6月 14, 2012

Currently there is a 'chicken and egg' issue when the DS is also the mounted
MDS. The nfs_match_client() reference from nfs4_set_ds_client bumps the
cl_count, the nfs_client is not freed at umount, and nfs4_deviceid_purge_client
is not called to dereference the MDS usage of a deviceid which holds a
reference to the DS nfs_client. The result is the umount program returns,
but the nfs_client is not freed, and the cl_session hearbeat continues.

The MDS (and all other nfs mounts) lose their last nfs_client reference in
nfs_free_server when the last nfs_server (fsid) is umounted.
The file layout DS lose their last nfs_client reference in destroy_ds
when the last deviceid referencing the data server is put and destroy_ds is
called. This is triggered by a call to nfs4_deviceid_purge_client which
removes references to a pNFS deviceid used by an MDS mount.

The fix is to track how many pnfs enabled filesystems are mounted from
this server, and then to purge the device id cache once that count reaches
zero.
Reported-by: NJorge Mora <Jorge.Mora@netapp.com>
Reported-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

2a4c8994

16 6月, 2012 4 次提交

vga_switcheroo.h: fix pci_dev warning · f8fee8f5

由 Randy Dunlap 提交于 6月 15, 2012

Fix warnings on some architectures/configs (not on x86):

include/linux/vga_switcheroo.h:28:30: warning: 'struct pci_dev' declared inside parameter list [enabled by default]
include/linux/vga_switcheroo.h:28:30: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Cc: Takashi Iwai <tiwai@suse.de>
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NDave Airlie <airlied@redhat.com>

f8fee8f5

swap: fix shmem swapping when more than 8 areas · 9b15b817

由 Hugh Dickins 提交于 6月 15, 2012

Minchan Kim reports that when a system has many swap areas, and tmpfs
swaps out to the ninth or more, shmem_getpage_gfp()'s attempts to read
back the page cannot locate it, and the read fails with -ENOMEM.

Whoops. Yes, I blindly followed read_swap_header()'s pte_to_swp_entry(
swp_entry_to_pte()) technique for determining maximum usable swap
offset, without stopping to realize that that actually depends upon the
pte swap encoding shifting swap offset to the higher bits and truncating
it there. Whereas our radix_tree swap encoding leaves offset in the
lower bits: it's swap "type" (that is, index of swap area) that was
truncated.

Fix it by reducing the SWP_TYPE_SHIFT() in swapops.h, and removing the
broken radix_to_swp_entry(swp_to_radix_entry()) from read_swap_header().

This does not reduce the usable size of a swap area any further, it
leaves it as claimed when making the original commit: no change from 3.0
on x86_64, nor on i386 without PAE; but 3.0's 512GB is reduced to 128GB
per swapfile on i386 with PAE. It's not a change I would have risked
five years ago, but with x86_64 supported for ten years, I believe it's
appropriate now.

Hmm, and what if some architecture implements its swap pte with offset
encoded below type? That would equally break the maximum usable swap
offset check. Happily, they all follow the same tradition of encoding
offset above type, but I'll prepare a check on that for next.
Reported-and-Reviewed-and-Tested-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org [3.1, 3.2, 3.3, 3.4]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b15b817

net: remove skb_orphan_try() · 62b1a8ab

由 Eric Dumazet 提交于 6月 14, 2012

Orphaning skb in dev_hard_start_xmit() makes bonding behavior
unfriendly for applications sending big UDP bursts : Once packets
pass the bonding device and come to real device, they might hit a full
qdisc and be dropped. Without orphaning, the sender is automatically
throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming
sk_sndbuf is not too big)

We could try to defer the orphaning adding another test in
dev_hard_start_xmit(), but all this seems of little gain,
now that BQL tends to make packets more likely to be parked
in Qdisc queues instead of NIC TX ring, in cases where performance
matters.

Reverts commits :
fc6055a5 net: Introduce skb_orphan_try()
87fd308c net: skb_tx_hash() fix relative to skb_orphan_try()
and removes SKBTX_DRV_NEEDS_SK_REF flag
Reported-and-bisected-by: NJean-Michel Hautbois <jhautbois@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NOliver Hartkopp <socketcan@hartkopp.net>
Acked-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62b1a8ab

kmsg - kmsg_dump() use iterator to receive log buffer content · e2ae715d

由 Kay Sievers 提交于 6月 15, 2012

Provide an iterator to receive the log buffer content, and convert all
kmsg_dump() users to it.

The structured data in the kmsg buffer now contains binary data, which
should no longer be copied verbatim to the kmsg_dump() users.

The iterator should provide reliable access to the buffer data, and also
supports proper log line-aware chunking of data while iterating.
Signed-off-by: NKay Sievers <kay@vrfy.org>
Tested-by: NTony Luck <tony.luck@intel.com>
Reported-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Tested-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e2ae715d

15 6月, 2012 1 次提交

block: Drop dead function blk_abort_queue() · 76aaa510

由 Asias He 提交于 6月 14, 2012

This function was only used by btrfs code in btrfs_abort_devices()
(seems in a wrong way).

It was removed in commit d07eb911,
So, Let's remove the dead code to avoid any confusion.

Changes in v2: update commit log, btrfs_abort_devices() was removed
already.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org
Cc: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Cc: David Sterba <dave@jikos.cz>
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76aaa510

14 6月, 2012 4 次提交

pstore/ram_core: Factor persistent_ram_zap() out of post_init() · fce39793

由 Anton Vorontsov 提交于 5月 26, 2012

A handy function that we will use outside of ram_core soon. But
so far just factor it out and start using it in post_init().
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

fce39793

pstore/ram: Should update old dmesg buffer before reading · 201e4aca

由 Anton Vorontsov 提交于 5月 26, 2012

Without the update, we'll only see the new dmesg buffer after the
reboot, but previously we could see it right away. Making an oops
visible in pstore filesystem before reboot is a somewhat dubious
feature, but removing it wasn't an intentional change, so let's
restore it.

For this we have to make persistent_ram_save_old() safe for calling
multiple times, and also extern it.
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

201e4aca

USB: add NO_D3_DURING_SLEEP flag and revert · c2fb8a3f

由 Alan Stern 提交于 6月 13, 2012

This patch (as1558) fixes a problem affecting several ASUS computers:
The machine crashes or corrupts memory when going into suspend if the
ehci-hcd driver is bound to any controllers.  Users have been forced
to unbind or unload ehci-hcd before putting their systems to sleep.

After extensive testing, it was determined that the machines don't
like going into suspend when any EHCI controllers are in the PCI D3
power state.  Presumably this is a firmware bug, but there's nothing
we can do about it except to avoid putting the controllers in D3
during system sleep.

The patch adds a new flag to indicate whether the problem is present,
and avoids changing the controller's power state if the flag is set.
Runtime suspend is unaffected; this matters only for system suspend.
However as a side effect, the controller will not respond to remote
wakeup requests while the system is asleep.  Hence USB wakeup is not
functional -- but of course, this is already true in the current state
of affairs.

A similar patch has already been applied as commit
151b6128 (USB: EHCI: fix crash during
suspend on ASUS computers).  The patch supersedes that one and reverts
it.  There are two differences:

	The old patch added the flag at the USB level; this patch
	adds it at the PCI level.

	The old patch applied to all chipsets with the same vendor,
	subsystem vendor, and product IDs; this patch makes an
	exception for a known-good system (based on DMI information).
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Tested-by: NDâniel Fraga <fragabr@gmail.com>
Tested-by: NAndrey Rahmatullin <wrar@wrar.name>
Tested-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c2fb8a3f

splice: fix racy pipe->buffers uses · 047fe360

由 Eric Dumazet 提交于 6月 12, 2012

Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()

commit 35f3d14d (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.

Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.

Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.

splice_shrink_spd() loses its struct pipe_inode_info argument.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tom Herbert <therbert@google.com>
Cc: stable <stable@vger.kernel.org> # 2.6.35
Tested-by: NDave Jones <davej@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

047fe360

12 6月, 2012 2 次提交

Input: fix input.h kernel-doc warning · 2177905c

由 Randy Dunlap 提交于 6月 11, 2012

Fix kernel-doc warning in input.h:

Warning(include/linux/input.h:140): No description found for parameter 'len'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

2177905c

[media] Fix regression in ioctl numbering · 1761a110

由 Hans Verkuil 提交于 6月 07, 2012

Yuck. The VIDIOC_(TRY_)DECODER_CMD ioctls already had ioctl numbers 96 and 97,
and after merging the timings API I forgot to continue numbering from 98. So
now we have two ioctls with number 96 and two with 97.

With the new table-driver ioctl handling in v4l2-ioctl.c it is essential that
each ioctl has its own unique number, so let's fix this quickly for 3.5.
Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

1761a110

11 6月, 2012 1 次提交

ASoC: fix pxa-ssp compiling issue under mach-mmp · 972a55b6

由 Qiao Zhou 提交于 6月 04, 2012

pxa-ssp.c uses API like cpu_is_pxa3xx(), cpu_is_pxa2xx(), which is
defined under arch-pxa architecture, and drivers under mach-mmp
can't find it. so just use ssp->type to replace that API.
Signed-off-by: NQiao Zhou <zhouqiao@marvell.com>
Acked-by: NHaojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>

972a55b6

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功