提交 · 90e97820619dc912b52cc9d103272819d8b51259 · openanolis / cloud-kernel

05 2月, 2015 1 次提交

resources: Move struct resource_list_entry from ACPI into resource core · 90e97820

由 Jiang Liu 提交于 2月 05, 2015

Currently ACPI, PCI and pnp all implement the same resource list
management with different data structure. We need to transfer from
one data structure into another when passing resources from one
subsystem into another subsystem. So move struct resource_list_entry
from ACPI into resource core and rename it as resource_entry,
then it could be reused by different subystems and avoid the data
structure conversion.

Introduce dedicated header file resource_ext.h instead of embedding
it into ioport.h to avoid header file inclusion order issues.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Acked-by: NVinod Koul <vinod.koul@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

90e97820

04 2月, 2015 3 次提交

ACPI: Introduce helper function acpi_dev_filter_resource_type() · 62d1141f

由 Jiang Liu 提交于 2月 02, 2015

Introduce helper function acpi_dev_filter_resource_type(), which may
be used by acpi_dev_get_resources() to filer out resource based on
resource type.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

62d1141f

ACPI: Add field offset to struct resource_list_entry · 93286f47

由 Jiang Liu 提交于 2月 02, 2015

Add field offset to struct resource_list_entry to host address space
translation offset so it could be used to represent bridge resources.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

93286f47

ACPI: Return translation offset when parsing ACPI address space resources · a49170b5

由 Jiang Liu 提交于 2月 02, 2015

Change function acpi_dev_resource_address_space() and
acpi_dev_resource_ext_address_space() to return address space
translation offset.

It's based on a patch from Yinghai Lu <yinghai@kernel.org>.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a49170b5

02 2月, 2015 1 次提交

sched: don't cause task state changes in nested sleep debugging · 00845eb9

由 Linus Torvalds 提交于 2月 01, 2015

Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report
on nested sleep conditions, which we generally want to avoid because the
inner sleeping operation can re-set the thread state to TASK_RUNNING,
but that will then cause the outer sleep loop not actually sleep when it
calls schedule.

However, that's actually valid traditional behavior, with the inner
sleep being some fairly rare case (like taking a sleeping lock that
normally doesn't actually need to sleep).

And the debug code would actually change the state of the task to
TASK_RUNNING internally, which makes that kind of traditional and
working code not work at all, because now the nested sleep doesn't just
sometimes cause the outer one to not block, but will cause it to happen
every time.

In particular, it will cause the cardbus kernel daemon (pccardd) to
basically busy-loop doing scheduling, converting a laptop into a heater,
as reported by Bruno Prémont.  But there may be other legacy uses of
that nested sleep model in other drivers that are also likely to never
get converted to the new model.

This fixes both cases:

 - don't set TASK_RUNNING when the nested condition happens (note: even
   if WARN_ONCE() only _warns_ once, the return value isn't whether the
   warning happened, but whether the condition for the warning was true.
   So despite the warning only happening once, the "if (WARN_ON(..))"
   would trigger for every nested sleep.

 - in the cases where we knowingly disable the warning by using
   "sched_annotate_sleep()", don't change the task state (that is used
   for all core scheduling decisions), instead use '->task_state_change'
   that is used for the debugging decision itself.

(Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
suggested change to use 'task_state_change' as part of the test)
Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org>
Tested-by: NRafael J Wysocki <rjw@rjwysocki.net>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Hurley <peter@hurleysoftware.com>,
Cc: Davidlohr Bueso <dave@stgolabs.net>,
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00845eb9

30 1月, 2015 1 次提交

vm: add VM_FAULT_SIGSEGV handling support · 33692f27

由 Linus Torvalds 提交于 1月 29, 2015

The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.

That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works.  However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.

In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV.  And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.

However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space.  And user space really
expected SIGSEGV, not SIGBUS.

To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it.  They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.

This is the mindless minimal patch to do this.  A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.

Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de>
Tested-by: NJan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33692f27

28 1月, 2015 2 次提交

perf: Tighten (and fix) the grouping condition · c3c87e77

由 Peter Zijlstra 提交于 1月 23, 2015

The fix from 9fc81d87 ("perf: Fix events installation during
moving group") was incomplete in that it failed to recognise that
creating a group with events for different CPUs is semantically
broken -- they cannot be co-scheduled.

Furthermore, it leads to real breakage where, when we create an event
for CPU Y and then migrate it to form a group on CPU X, the code gets
confused where the counter is programmed -- triggered in practice
as well by me via the perf fuzzer.

Fix this by tightening the rules for creating groups. Only allow
grouping of counters that can be co-scheduled in the same context.
This means for the same task and/or the same cpu.

Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

c3c87e77

quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff

由 Jan Kara 提交于 10月 09, 2014

Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
tracks space limits and usage in 512-byte blocks. However VFS quotas
track usage in bytes (as some filesystems require that) and we need to
somehow pass this information. Upto now it wasn't a problem because we
didn't do any unit conversion (thus VFS quota routines happily stuck
number of bytes into d_bcount field of struct fd_disk_quota). Only if
you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
/ Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
tried this but reportedly some Samba users hit the problem in practice.
So when we want interfaces compatible we need to fix this.

We bite the bullet and define another quota structure used for passing
information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
to have more conversion routines in fs/quota/quota.c and another copying
of quota structure slows down getting of quota information by about 2%
but it seems cleaner than overloading e.g. units of d_bcount to bytes.

CC: stable@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

14bf61ff

27 1月, 2015 4 次提交

ipv4: try to cache dst_entries which would cause a redirect · df4d9254

由 Hannes Frederic Sowa 提交于 1月 23, 2015

Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f8864972 ("ipv4: fix dst race in sk_dst_get()").

Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU.  Unfortunately waiting for
RCU grace period just takes too long, we can end up with >1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.

Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.

This issue was discovered by Marcelo Leitner.

Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: NMarcelo Leitner <mleitner@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df4d9254

printk: add dummy routine for when CONFIG_PRINTK=n · 07261edb

由 Pranith Kumar 提交于 1月 26, 2015

There are missing dummy routines for log_buf_addr_get() and
log_buf_len_get() for when CONFIG_PRINTK is not set causing build
failures.

This patch adds these dummy routines at the appropriate location.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NPetr Mladek <pmladek@suse.cz>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

07261edb

mm: page_alloc: embed OOM killing naturally into allocation slowpath · 9879de73

由 Johannes Weiner 提交于 1月 26, 2015

The OOM killing invocation does a lot of duplicative checks against the
task's allocation context.  Rework it to take advantage of the existing
checks in the allocator slowpath.

The OOM killer is invoked when the allocator is unable to reclaim any
pages but the allocation has to keep looping.  Instead of having a check
for __GFP_NORETRY hidden in oom_gfp_allowed(), just move the OOM
invocation to the true branch of should_alloc_retry().  The __GFP_FS
check from oom_gfp_allowed() can then be moved into the OOM avoidance
branch in __alloc_pages_may_oom(), along with the PF_DUMPCORE test.

__alloc_pages_may_oom() can then signal to the caller whether the OOM
killer was invoked, instead of requiring it to duplicate the order and
high_zoneidx checks to guess this when deciding whether to continue.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9879de73

i2c: Only include slave support if selected · d5fd120e

由 Jean Delvare 提交于 1月 26, 2015

Make the slave support depend on CONFIG_I2C_SLAVE. Otherwise it gets
included unconditionally, even when it is not needed.

I2C bus drivers which implement slave support must select
I2C_SLAVE.
Signed-off-by: NJean Delvare <jdelvare@suse.de>
Signed-off-by: NWolfram Sang <wsa@the-dreams.de>

d5fd120e

26 1月, 2015 2 次提交

ACPICA: Resources: Provide common part for struct acpi_resource_address structures. · a45de93e

由 Lv Zheng 提交于 1月 26, 2015

struct acpi_resource_address and struct acpi_resource_extended_address64 share substracts
just at different offsets. To unify the parsing functions, OSPMs like Linux
need a new ACPI_ADDRESS64_ATTRIBUTE as their substructs, so they can
extract the shared data.

This patch also synchronizes the structure changes to the Linux kernel.
The usages are searched by matching the following keywords:
1. acpi_resource_address
2. acpi_resource_extended_address
3. ACPI_RESOURCE_TYPE_ADDRESS
4. ACPI_RESOURCE_TYPE_EXTENDED_ADDRESS
And we found and fixed the usages in the following files:
 arch/ia64/kernel/acpi-ext.c
 arch/ia64/pci/pci.c
 arch/x86/pci/acpi.c
 arch/x86/pci/mmconfig-shared.c
 drivers/xen/xen-acpi-memhotplug.c
 drivers/acpi/acpi_memhotplug.c
 drivers/acpi/pci_root.c
 drivers/acpi/resource.c
 drivers/char/hpet.c
 drivers/pnp/pnpacpi/rsparser.c
 drivers/hv/vmbus_drv.c

Build tests are passed with defconfig/allnoconfig/allyesconfig and
defconfig+CONFIG_ACPI=n.
Original-by: NThomas Gleixner <tglx@linutronix.de>
Original-by: NJiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: NLv Zheng <lv.zheng@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a45de93e

ACPI: Introduce acpi_unload_parent_table() usages in Linux kernel · e044d8f9

由 Lv Zheng 提交于 1月 26, 2015

ACPICA has implemented acpi_unload_parent_table() which can exactly replace
the acpi_get_id()/acpi_unload_table_id() implemented in Linux kernel.  The
acpi_unload_parent_table() has been unit tested in ACPICA simulation
environment.

This patch can also help to reduce the source code differences between
Linux and ACPICA.
Signed-off-by: NLv Zheng <lv.zheng@intel.com>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Tested-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e044d8f9

22 1月, 2015 1 次提交

module: make module_refcount() a signed integer. · d5db139a

由 Rusty Russell 提交于 1月 22, 2015

James Bottomley points out that it will be -1 during unload.  It's
only used for diagnostics, so let's not hide that as it could be a
clue as to what's gone wrong.

Cc: Jason Wessel <jason.wessel@windriver.com>
Acked-and-documention-added-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: NMasami Hiramatsu <maasami.hiramatsu.pt@hitachi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

d5db139a

20 1月, 2015 2 次提交

module: remove mod arg from module_free, rename module_memfree(). · be1f221c

由 Rusty Russell 提交于 1月 20, 2015

Nothing needs the module pointer any more, and the next patch will
call it from RCU, where the module itself might no longer exist.
Removing the arg is the safest approach.

This just codifies the use of the module_alloc/module_free pattern
which ftrace and bpf use.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: x86@kernel.org
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linux-cris-kernel@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: nios2-dev@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: netdev@vger.kernel.org

be1f221c

module_arch_freeing_init(): new hook for archs before module->module_init freed. · d453cded

由 Rusty Russell 提交于 1月 20, 2015

Archs have been abusing module_free() to clean up their arch-specific
allocations.  Since module_free() is also (ab)used by BPF and trace code,
let's keep it to simple allocations, and provide a hook called before
that.

This means that avr32, ia64, parisc and s390 no longer need to implement
their own module_free() at all.  avr32 doesn't need module_finalize()
either.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org

d453cded

19 1月, 2015 2 次提交

libata: allow sata_sil24 to opt-out of tag ordered submission · 72dd299d

由 Dan Williams 提交于 1月 16, 2015

Ronny reports: https://bugzilla.kernel.org/show_bug.cgi?id=87101
    "Since commit 8a4aeec8 "libata/ahci: accommodate tag ordered
    controllers" the access to the harddisk on the first SATA-port is
    failing on its first access. The access to the harddisk on the
    second port is working normal.

    When reverting the above commit, access to both harddisks is working
    fine again."

Maintain tag ordered submission as the default, but allow sata_sil24 to
continue with the old behavior.

Cc: <stable@vger.kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Reported-by: NRonny Hegewald <Ronny.Hegewald@online.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

72dd299d

KVM: fix sparse warning in include/trace/events/kvm.h · cdef5119

由 Christian Borntraeger 提交于 1月 15, 2015

sparse complains about
include/trace/events/kvm.h:163:1: error: directive in argument list
include/trace/events/kvm.h:167:1: error: directive in argument list
include/trace/events/kvm.h:169:1: error: directive in argument list
and sparse is right. Preprocessing directives in an argument of a
macro are undefined behaviour as of C99 6.10.3p11.

Lets use an indirection to fix this.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cdef5119

17 1月, 2015 4 次提交

genetlink: synchronize socket closing and family removal · ee1c2442

由 Johannes Berg 提交于 1月 16, 2015

In addition to the problem Jeff Layton reported, I looked at the code
and reproduced the same warning by subscribing and removing the genl
family with a socket still open. This is a fairly tricky race which
originates in the fact that generic netlink allows the family to go
away while sockets are still open - unlike regular netlink which has
a module refcount for every open socket so in general this cannot be
triggered.

Trying to resolve this issue by the obvious locking isn't possible as
it will result in deadlocks between unregistration and group unbind
notification (which incidentally lockdep doesn't find due to the home
grown locking in the netlink table.)

To really resolve this, introduce a "closing socket" reference counter
(for generic netlink only, as it's the only affected family) in the
core netlink code and use that in generic netlink to wait for all the
sockets that are being closed at the same time as a generic netlink
family is removed.

This fixes the race that when a socket is closed, it will should call
the unbind, but if the family is removed at the same time the unbind
will not find it, leading to the warning. The real problem though is
that in this case the unbind could actually find a new family that is
registered to have a multicast group with the same ID, and call its
mcast_unbind() leading to confusing.

Also remove the warning since it would still trigger, but is now no
longer a problem.

This also moves the code in af_netlink.c to before unreferencing the
module to avoid having the same problem in the normal non-genl case.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee1c2442

genetlink: document parallel_ops · f555f3d7

由 Johannes Berg 提交于 1月 16, 2015

The kernel-doc for the parallel_ops family struct member is
missing, add it.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f555f3d7

PCI: Add pci_claim_bridge_resource() to clip window if necessary · 8505e729

由 Yinghai Lu 提交于 1月 15, 2015

Add pci_claim_bridge_resource() to claim a PCI-PCI bridge window. This is
like regular pci_claim_resource(), except that if we fail to claim the
window, we check to see if we can reduce the size of the window and try
again.

This is for scenarios like this:

pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref]
pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]

The 00:01.0 window is illegal: it starts before the host bridge window, so
we have to assume the [0xbdf00000-0xbfffffff] region is inaccessible. We
can make it legal by clipping it to [mem 0xc0000000-0xddefffff 64bit pref].

Previously we discarded the 00:01.0 window and tried to reassign that part
of the hierarchy from scratch. That is a problem because Linux doesn't
always assign things optimally. For example, in this case, BIOS put the
01:00.0 device in a prefetchable window below 4GB, but after 5b285415,
Linux puts the prefetchable window above 4GB where the 32-bit 01:00.0
device can't use it.

Clipping the 00:01.0 window is less intrusive than completely reassigning
things and is sufficient to let us use most of the BIOS configuration. Of
course, it's possible that devices below 00:01.0 will no longer fit. If
that's the case, we'll have to reassign things. But that's a separate
problem.

[bhelgaas: changelog, split into separate patch]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491Reported-by: NMarek Kordik <kordikmarek@gmail.com>
Fixes: 5b285415 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.16+

8505e729

PCI: Add flag for devices where we can't use bus reset · f331a859

由 Alex Williamson 提交于 1月 15, 2015

Enable a mechanism for devices to quirk that they do not behave when
doing a PCI bus reset. We require a modest level of spec compliant
behavior in order to do a reset, for instance the device should come
out of reset without throwing errors and PCI config space should be
accessible after reset. This is too much to ask for some devices.

Link: http://lkml.kernel.org/r/20140923210318.498dacbd@dualc.maya.orgSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.14+

f331a859

15 1月, 2015 3 次提交

can: m_can: tag current CAN FD controllers as non-ISO · 6cfda7fb

由 Oliver Hartkopp 提交于 1月 05, 2015

During the CAN FD standardization process within the ISO it turned out that
the failure detection capability has to be improved.

The CAN in Automation organization (CiA) defined the already implemented CAN
FD controllers as 'non-ISO' and the upcoming improved CAN FD controllers as
'ISO' compliant. See at http://www.can-cia.com/index.php?id=1937

Finally there will be three types of CAN FD controllers in the future:

1. ISO compliant (fixed)
2. non-ISO compliant (fixed, like the M_CAN IP v3.0.1 in m_can.c)
3. ISO/non-ISO CAN FD controllers (switchable, like the PEAK USB FD)

So the current M_CAN driver for the M_CAN IP v3.0.1 has to expose its non-ISO
implementation by setting the CAN_CTRLMODE_FD_NON_ISO ctrlmode at startup.
As this bit cannot be switched at configuration time CAN_CTRLMODE_FD_NON_ISO
must not be set in ctrlmode_supported of the current M_CAN driver.
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

6cfda7fb

openvswitch: packet messages need their own probe attribtue · 1ba39804

由 Thomas Graf 提交于 1月 14, 2015

User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.

Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining to be binary compatible with existing OVS binaries.

Fixes: 05da5898 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
Tracked-down-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Reviewed-by: NJesse Gross <jesse@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ba39804

netdevice: Add missing parentheses in macro · 4ccce02e

由 Benjamin Poirier 提交于 1月 14, 2015

For example, one could conceivably call
	for_each_netdev_in_bond_rcu(condition ? bond1 : bond2, slave)
and get an unexpected result.
Signed-off-by: NBenjamin Poirier <bpoirier@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ccce02e

14 1月, 2015 3 次提交

ARM: dt: GIC: Spelling s/specific/specifier/, s/flaggs/flags/ · d6613aa7

由 Geert Uytterhoeven 提交于 12月 10, 2014

Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Rob Herring <robh+dt@kernel.org>
Signed-off-by: NRob Herring <robh@kernel.org>

d6613aa7

kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val) · 43239cbe

由 Christian Borntraeger 提交于 1月 13, 2015

Feedback has shown that WRITE_ONCE(x, val) is easier to use than
ASSIGN_ONCE(val,x).
There are no in-tree users yet, so lets change it for 3.19.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

43239cbe

net: Corrected the comment describing the ndo operations to reflect the actual... · 5d632cb7

由 B Viswanath 提交于 1月 12, 2015

net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations

Corrected the comment describing the ndo operations to
reflect the actual prototype for couple of operations
Signed-off-by: NB Viswanath <marichika4@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d632cb7

13 1月, 2015 2 次提交

x86/xen: properly retrieve NMI reason · f221b04f

由 Jan Beulich 提交于 1月 13, 2015

Using the native code here can't work properly, as the hypervisor would
normally have cleared the two reason bits by the time Dom0 gets to see
the NMI (if passed to it at all). There's a shared info field for this,
and there's an existing hook to use - just fit the two together. This
is particularly relevant so that NMIs intended to be handled by APEI /
GHES actually make it to the respective handler.

Note that the hook can (and should) be used irrespective of whether
being in Dom0, as accessing port 0x61 in a DomU would be even worse,
while the shared info field would just hold zero all the time. Note
further that hardware NMI handling for PVH doesn't currently work
anyway due to missing code in the hypervisor (but it is expected to
work the native rather than the PV way).
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

f221b04f

mm: mmu_gather: use tlb->end != 0 only for TLB invalidation · 721c21c1

由 Will Deacon 提交于 1月 12, 2015

When batching up address ranges for TLB invalidation, we check tlb->end
!= 0 to indicate that some pages have actually been unmapped.

As of commit f045bbb9 ("mmu_gather: fix over-eager
tlb_flush_mmu_free() calling"), we use the same check for freeing these
pages in order to avoid a performance regression where we call
free_pages_and_swap_cache even when no pages are actually queued up.

Unfortunately, the range could have been reset (tlb->end = 0) by
tlb_end_vma, which has been shown to cause memory leaks on arm64.
Furthermore, investigation into these leaks revealed that the fullmm
case on task exit no longer invalidates the TLB, by virtue of tlb->end
 == 0 (in 3.18, need_flush would have been set).

This patch resolves the problem by reverting commit f045bbb9, using
instead tlb->local.nr as the predicate for page freeing in
tlb_flush_mmu_free and ensuring that tlb->end is initialised to a
non-zero value in the fullmm case.
Tested-by: NMark Langsdorf <mlangsdo@redhat.com>
Tested-by: NDave Hansen <dave@sr71.net>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

721c21c1

12 1月, 2015 2 次提交

mmc: sdhci: Disable re-tuning for HS400 · b5540ce1

由 Adrian Hunter 提交于 12月 05, 2014

Re-tuning for HS400 mode must be done in HS200
mode. Currently there is no support for that.
That needs to be reflected in the code.
Specifically, if tuning is executed in HS400 mode
then return an error, and do not start the
tuning timer if HS200 tuning is being done prior
to switching to HS400.

Note that periodic re-tuning is not expected
to be needed for HS400 but re-tuning is still
needed after the host controller has lost power.
In the case of suspend/resume that is not necessary
because the card is fully re-initialised. That
just leaves runtime suspend/resume with no support
for HS400 re-tuning.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

b5540ce1

Input: uinput - fix ioctl nr overflow for UI_GET_SYSNAME/VERSION · 029b1836

由 Gabriel Laskar 提交于 1月 11, 2015

Request number for ioctls are encoded as 8bit numbers, but unfortunately
UI_GET_SYSNAME and UI_GET_VERSION specifu values larger than that, so they
get truncated to 44 (0x2c) and 45 (0x2d). This change makes requested
values match their effective values (the ABI stays intact).
Signed-off-by: NGabriel Laskar <gabriel@lse.epita.fr>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

029b1836

10 1月, 2015 1 次提交

target: Drop left-over fabric_max_sectors attribute · 7216dc07

由 Nicholas Bellinger 提交于 1月 06, 2015

Now that fabric_max_sectors is no longer used to enforce the maximum
I/O size, go ahead and drop it's left-over usage in target-core and
associated backend drivers.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>

7216dc07

09 1月, 2015 6 次提交

perf: Move task_pt_regs sampling into arch code · 88a7c26a

由 Andy Lutomirski 提交于 1月 04, 2015

On x86_64, at least, task_pt_regs may be only partially initialized
in many contexts, so x86_64 should not use it without extra care
from interrupt context, let alone NMI context.

This will allow x86_64 to override the logic and will supply some
scratch space to use to make a cleaner copy of user regs.
Tested-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: chenggang.qcg@taobao.com
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/e431cd4c18c2e1c44c774f10758527fb2d1025c4.1420396372.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

88a7c26a

vfs: renumber FMODE_NONOTIFY and add to uniqueness check · 75069f2b

由 David Drysdale 提交于 1月 08, 2015

Fix clashing values for O_PATH and FMODE_NONOTIFY on sparc.  The
clashing O_PATH value was added in commit 5229645b ("vfs: add
nonconflicting values for O_PATH") but this can't be changed as it is
user-visible.

FMODE_NONOTIFY is only used internally in the kernel, but it is in the
same numbering space as the other O_* flags, as indicated by the comment
at the top of include/uapi/asm-generic/fcntl.h (and its use in
fs/notify/fanotify/fanotify_user.c).  So renumber it to avoid the clash.

All of this has happened before (commit 12ed2e36: "fanotify:
FMODE_NONOTIFY and __O_SYNC in sparc conflict"), and all of this will
happen again -- so update the uniqueness check in fcntl_init() to
include __FMODE_NONOTIFY.
Signed-off-by: NDavid Drysdale <drysdale@google.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NJan Kara <jack@suse.cz>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

75069f2b

mm: protect set_page_dirty() from ongoing truncation · 2d6d7f98

由 Johannes Weiner 提交于 1月 08, 2015

Tejun, while reviewing the code, spotted the following race condition
between the dirtying and truncation of a page:

__set_page_dirty_nobuffers()       __delete_from_page_cache()
  if (TestSetPageDirty(page))
                                     page->mapping = NULL
				     if (PageDirty())
				       dec_zone_page_state(page, NR_FILE_DIRTY);
				       dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
    if (page->mapping)
      account_page_dirtied(page)
        __inc_zone_page_state(page, NR_FILE_DIRTY);
	__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);

which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.

Dirtiers usually lock out truncation, either by holding the page lock
directly, or in case of zap_pte_range(), by pinning the mapcount with
the page table lock held.  The notable exception to this rule, though,
is do_wp_page(), for which this race exists.  However, do_wp_page()
already waits for a locked page to unlock before setting the dirty bit,
in order to prevent a race where clear_page_dirty() misses the page bit
in the presence of dirty ptes.  Upgrade that wait to a fully locked
set_page_dirty() to also cover the situation explained above.

Afterwards, the code in set_page_dirty() dealing with a truncation race
is no longer needed.  Remove it.
Reported-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d6d7f98

mm: prevent endless growth of anon_vma hierarchy · 7a3ef208

由 Konstantin Khlebnikov 提交于 1月 08, 2015

Constantly forking task causes unlimited grow of anon_vma chain.  Each
next child allocates new level of anon_vmas and links vma to all
previous levels because pages might be inherited from any level.

This patch adds heuristic which decides to reuse existing anon_vma
instead of forking new one.  It adds counter anon_vma->degree which
counts linked vmas and directly descending anon_vmas and reuses anon_vma
if counter is lower than two.  As a result each anon_vma has either vma
or at least two descending anon_vmas.  In such trees half of nodes are
leafs with alive vmas, thus count of anon_vmas is no more than two times
bigger than count of vmas.

This heuristic reuses anon_vmas as few as possible because each reuse
adds false aliasing among vmas and rmap walker ought to scan more ptes
when it searches where page is might be mapped.

Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu
Fixes: 5beb4930 ("mm: change anon_vma linking to fix multi-process server scalability issue")
[akpm@linux-foundation.org: fix typo, per Rik]
Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
Reported-by: NDaniel Forrest <dan.forrest@ssec.wisc.edu>
Tested-by: NMichal Hocko <mhocko@suse.cz>
Tested-by: NJerome Marchand <jmarchan@redhat.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>	[2.6.34+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7a3ef208

regulator: s2mps11: Fix wrong calculation of register offset · ad26aa6c

由 Jonghwa Lee 提交于 1月 08, 2015

This patch adds missing registers('BUCK7_SW' & 'LDO29_CTRL'). Since BUCK7 has
1 more register (BUCK7_SW) than others, register offset should
be added one more for which has bigger address than BUCK7 registers.

Fixes: 76b9840b(regulator: s2mps11: Add support S2MPS13 regulator device)
Signed-off-by: NJonghwa Lee <jonghwa3.lee@samsung.com>
Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: NMark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>

ad26aa6c

libceph: fix sparse endianness warnings · d7d5a007

由 Ilya Dryomov 提交于 12月 19, 2014

The only real issue is the one in auth_x.c and it came with
3.19-rc1 merge.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

d7d5a007

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功