提交 · 90e97820619dc912b52cc9d103272819d8b51259 · openanolis / cloud-kernel

05 2月, 2015 1 次提交

resources: Move struct resource_list_entry from ACPI into resource core · 90e97820

由 Jiang Liu 提交于 2月 05, 2015

Currently ACPI, PCI and pnp all implement the same resource list
management with different data structure. We need to transfer from
one data structure into another when passing resources from one
subsystem into another subsystem. So move struct resource_list_entry
from ACPI into resource core and rename it as resource_entry,
then it could be reused by different subystems and avoid the data
structure conversion.

Introduce dedicated header file resource_ext.h instead of embedding
it into ioport.h to avoid header file inclusion order issues.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Acked-by: NVinod Koul <vinod.koul@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

90e97820

04 2月, 2015 3 次提交

ACPI: Introduce helper function acpi_dev_filter_resource_type() · 62d1141f

由 Jiang Liu 提交于 2月 02, 2015

Introduce helper function acpi_dev_filter_resource_type(), which may
be used by acpi_dev_get_resources() to filer out resource based on
resource type.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

62d1141f

ACPI: Add field offset to struct resource_list_entry · 93286f47

由 Jiang Liu 提交于 2月 02, 2015

Add field offset to struct resource_list_entry to host address space
translation offset so it could be used to represent bridge resources.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

93286f47

ACPI: Return translation offset when parsing ACPI address space resources · a49170b5

由 Jiang Liu 提交于 2月 02, 2015

Change function acpi_dev_resource_address_space() and
acpi_dev_resource_ext_address_space() to return address space
translation offset.

It's based on a patch from Yinghai Lu <yinghai@kernel.org>.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a49170b5

02 2月, 2015 1 次提交

sched: don't cause task state changes in nested sleep debugging · 00845eb9

由 Linus Torvalds 提交于 2月 01, 2015

Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report
on nested sleep conditions, which we generally want to avoid because the
inner sleeping operation can re-set the thread state to TASK_RUNNING,
but that will then cause the outer sleep loop not actually sleep when it
calls schedule.

However, that's actually valid traditional behavior, with the inner
sleep being some fairly rare case (like taking a sleeping lock that
normally doesn't actually need to sleep).

And the debug code would actually change the state of the task to
TASK_RUNNING internally, which makes that kind of traditional and
working code not work at all, because now the nested sleep doesn't just
sometimes cause the outer one to not block, but will cause it to happen
every time.

In particular, it will cause the cardbus kernel daemon (pccardd) to
basically busy-loop doing scheduling, converting a laptop into a heater,
as reported by Bruno Prémont.  But there may be other legacy uses of
that nested sleep model in other drivers that are also likely to never
get converted to the new model.

This fixes both cases:

 - don't set TASK_RUNNING when the nested condition happens (note: even
   if WARN_ONCE() only _warns_ once, the return value isn't whether the
   warning happened, but whether the condition for the warning was true.
   So despite the warning only happening once, the "if (WARN_ON(..))"
   would trigger for every nested sleep.

 - in the cases where we knowingly disable the warning by using
   "sched_annotate_sleep()", don't change the task state (that is used
   for all core scheduling decisions), instead use '->task_state_change'
   that is used for the debugging decision itself.

(Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
suggested change to use 'task_state_change' as part of the test)
Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org>
Tested-by: NRafael J Wysocki <rjw@rjwysocki.net>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Hurley <peter@hurleysoftware.com>,
Cc: Davidlohr Bueso <dave@stgolabs.net>,
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00845eb9

30 1月, 2015 1 次提交

vm: add VM_FAULT_SIGSEGV handling support · 33692f27

由 Linus Torvalds 提交于 1月 29, 2015

The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.

That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works.  However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.

In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV.  And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.

However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space.  And user space really
expected SIGSEGV, not SIGBUS.

To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it.  They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.

This is the mindless minimal patch to do this.  A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.

Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de>
Tested-by: NJan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33692f27

28 1月, 2015 2 次提交

perf: Tighten (and fix) the grouping condition · c3c87e77

由 Peter Zijlstra 提交于 1月 23, 2015

The fix from 9fc81d87 ("perf: Fix events installation during
moving group") was incomplete in that it failed to recognise that
creating a group with events for different CPUs is semantically
broken -- they cannot be co-scheduled.

Furthermore, it leads to real breakage where, when we create an event
for CPU Y and then migrate it to form a group on CPU X, the code gets
confused where the counter is programmed -- triggered in practice
as well by me via the perf fuzzer.

Fix this by tightening the rules for creating groups. Only allow
grouping of counters that can be co-scheduled in the same context.
This means for the same task and/or the same cpu.

Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

c3c87e77

quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff

由 Jan Kara 提交于 10月 09, 2014

Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
tracks space limits and usage in 512-byte blocks. However VFS quotas
track usage in bytes (as some filesystems require that) and we need to
somehow pass this information. Upto now it wasn't a problem because we
didn't do any unit conversion (thus VFS quota routines happily stuck
number of bytes into d_bcount field of struct fd_disk_quota). Only if
you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
/ Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
tried this but reportedly some Samba users hit the problem in practice.
So when we want interfaces compatible we need to fix this.

We bite the bullet and define another quota structure used for passing
information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
to have more conversion routines in fs/quota/quota.c and another copying
of quota structure slows down getting of quota information by about 2%
but it seems cleaner than overloading e.g. units of d_bcount to bytes.

CC: stable@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

14bf61ff

27 1月, 2015 3 次提交

printk: add dummy routine for when CONFIG_PRINTK=n · 07261edb

由 Pranith Kumar 提交于 1月 26, 2015

There are missing dummy routines for log_buf_addr_get() and
log_buf_len_get() for when CONFIG_PRINTK is not set causing build
failures.

This patch adds these dummy routines at the appropriate location.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NPetr Mladek <pmladek@suse.cz>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

07261edb

mm: page_alloc: embed OOM killing naturally into allocation slowpath · 9879de73

由 Johannes Weiner 提交于 1月 26, 2015

The OOM killing invocation does a lot of duplicative checks against the
task's allocation context.  Rework it to take advantage of the existing
checks in the allocator slowpath.

The OOM killer is invoked when the allocator is unable to reclaim any
pages but the allocation has to keep looping.  Instead of having a check
for __GFP_NORETRY hidden in oom_gfp_allowed(), just move the OOM
invocation to the true branch of should_alloc_retry().  The __GFP_FS
check from oom_gfp_allowed() can then be moved into the OOM avoidance
branch in __alloc_pages_may_oom(), along with the PF_DUMPCORE test.

__alloc_pages_may_oom() can then signal to the caller whether the OOM
killer was invoked, instead of requiring it to duplicate the order and
high_zoneidx checks to guess this when deciding whether to continue.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9879de73

i2c: Only include slave support if selected · d5fd120e

由 Jean Delvare 提交于 1月 26, 2015

Make the slave support depend on CONFIG_I2C_SLAVE. Otherwise it gets
included unconditionally, even when it is not needed.

I2C bus drivers which implement slave support must select
I2C_SLAVE.
Signed-off-by: NJean Delvare <jdelvare@suse.de>
Signed-off-by: NWolfram Sang <wsa@the-dreams.de>

d5fd120e

22 1月, 2015 1 次提交

module: make module_refcount() a signed integer. · d5db139a

由 Rusty Russell 提交于 1月 22, 2015

James Bottomley points out that it will be -1 during unload.  It's
only used for diagnostics, so let's not hide that as it could be a
clue as to what's gone wrong.

Cc: Jason Wessel <jason.wessel@windriver.com>
Acked-and-documention-added-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: NMasami Hiramatsu <maasami.hiramatsu.pt@hitachi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

d5db139a

20 1月, 2015 2 次提交

module: remove mod arg from module_free, rename module_memfree(). · be1f221c

由 Rusty Russell 提交于 1月 20, 2015

Nothing needs the module pointer any more, and the next patch will
call it from RCU, where the module itself might no longer exist.
Removing the arg is the safest approach.

This just codifies the use of the module_alloc/module_free pattern
which ftrace and bpf use.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: x86@kernel.org
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linux-cris-kernel@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: nios2-dev@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: netdev@vger.kernel.org

be1f221c

module_arch_freeing_init(): new hook for archs before module->module_init freed. · d453cded

由 Rusty Russell 提交于 1月 20, 2015

Archs have been abusing module_free() to clean up their arch-specific
allocations.  Since module_free() is also (ab)used by BPF and trace code,
let's keep it to simple allocations, and provide a hook called before
that.

This means that avr32, ia64, parisc and s390 no longer need to implement
their own module_free() at all.  avr32 doesn't need module_finalize()
either.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org

d453cded

19 1月, 2015 1 次提交

libata: allow sata_sil24 to opt-out of tag ordered submission · 72dd299d

由 Dan Williams 提交于 1月 16, 2015

Ronny reports: https://bugzilla.kernel.org/show_bug.cgi?id=87101
    "Since commit 8a4aeec8 "libata/ahci: accommodate tag ordered
    controllers" the access to the harddisk on the first SATA-port is
    failing on its first access. The access to the harddisk on the
    second port is working normal.

    When reverting the above commit, access to both harddisks is working
    fine again."

Maintain tag ordered submission as the default, but allow sata_sil24 to
continue with the old behavior.

Cc: <stable@vger.kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Reported-by: NRonny Hegewald <Ronny.Hegewald@online.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

72dd299d

17 1月, 2015 3 次提交

genetlink: synchronize socket closing and family removal · ee1c2442

由 Johannes Berg 提交于 1月 16, 2015

In addition to the problem Jeff Layton reported, I looked at the code
and reproduced the same warning by subscribing and removing the genl
family with a socket still open. This is a fairly tricky race which
originates in the fact that generic netlink allows the family to go
away while sockets are still open - unlike regular netlink which has
a module refcount for every open socket so in general this cannot be
triggered.

Trying to resolve this issue by the obvious locking isn't possible as
it will result in deadlocks between unregistration and group unbind
notification (which incidentally lockdep doesn't find due to the home
grown locking in the netlink table.)

To really resolve this, introduce a "closing socket" reference counter
(for generic netlink only, as it's the only affected family) in the
core netlink code and use that in generic netlink to wait for all the
sockets that are being closed at the same time as a generic netlink
family is removed.

This fixes the race that when a socket is closed, it will should call
the unbind, but if the family is removed at the same time the unbind
will not find it, leading to the warning. The real problem though is
that in this case the unbind could actually find a new family that is
registered to have a multicast group with the same ID, and call its
mcast_unbind() leading to confusing.

Also remove the warning since it would still trigger, but is now no
longer a problem.

This also moves the code in af_netlink.c to before unreferencing the
module to avoid having the same problem in the normal non-genl case.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee1c2442

PCI: Add pci_claim_bridge_resource() to clip window if necessary · 8505e729

由 Yinghai Lu 提交于 1月 15, 2015

Add pci_claim_bridge_resource() to claim a PCI-PCI bridge window. This is
like regular pci_claim_resource(), except that if we fail to claim the
window, we check to see if we can reduce the size of the window and try
again.

This is for scenarios like this:

pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref]
pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]

The 00:01.0 window is illegal: it starts before the host bridge window, so
we have to assume the [0xbdf00000-0xbfffffff] region is inaccessible. We
can make it legal by clipping it to [mem 0xc0000000-0xddefffff 64bit pref].

Previously we discarded the 00:01.0 window and tried to reassign that part
of the hierarchy from scratch. That is a problem because Linux doesn't
always assign things optimally. For example, in this case, BIOS put the
01:00.0 device in a prefetchable window below 4GB, but after 5b285415,
Linux puts the prefetchable window above 4GB where the 32-bit 01:00.0
device can't use it.

Clipping the 00:01.0 window is less intrusive than completely reassigning
things and is sufficient to let us use most of the BIOS configuration. Of
course, it's possible that devices below 00:01.0 will no longer fit. If
that's the case, we'll have to reassign things. But that's a separate
problem.

[bhelgaas: changelog, split into separate patch]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491Reported-by: NMarek Kordik <kordikmarek@gmail.com>
Fixes: 5b285415 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.16+

8505e729

PCI: Add flag for devices where we can't use bus reset · f331a859

由 Alex Williamson 提交于 1月 15, 2015

Enable a mechanism for devices to quirk that they do not behave when
doing a PCI bus reset. We require a modest level of spec compliant
behavior in order to do a reset, for instance the device should come
out of reset without throwing errors and PCI config space should be
accessible after reset. This is too much to ask for some devices.

Link: http://lkml.kernel.org/r/20140923210318.498dacbd@dualc.maya.orgSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.14+

f331a859

15 1月, 2015 1 次提交

netdevice: Add missing parentheses in macro · 4ccce02e

由 Benjamin Poirier 提交于 1月 14, 2015

For example, one could conceivably call
	for_each_netdev_in_bond_rcu(condition ? bond1 : bond2, slave)
and get an unexpected result.
Signed-off-by: NBenjamin Poirier <bpoirier@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ccce02e

14 1月, 2015 2 次提交

kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val) · 43239cbe

由 Christian Borntraeger 提交于 1月 13, 2015

Feedback has shown that WRITE_ONCE(x, val) is easier to use than
ASSIGN_ONCE(val,x).
There are no in-tree users yet, so lets change it for 3.19.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

43239cbe

net: Corrected the comment describing the ndo operations to reflect the actual... · 5d632cb7

由 B Viswanath 提交于 1月 12, 2015

net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations

Corrected the comment describing the ndo operations to
reflect the actual prototype for couple of operations
Signed-off-by: NB Viswanath <marichika4@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d632cb7

12 1月, 2015 1 次提交

mmc: sdhci: Disable re-tuning for HS400 · b5540ce1

由 Adrian Hunter 提交于 12月 05, 2014

Re-tuning for HS400 mode must be done in HS200
mode. Currently there is no support for that.
That needs to be reflected in the code.
Specifically, if tuning is executed in HS400 mode
then return an error, and do not start the
tuning timer if HS200 tuning is being done prior
to switching to HS400.

Note that periodic re-tuning is not expected
to be needed for HS400 but re-tuning is still
needed after the host controller has lost power.
In the case of suspend/resume that is not necessary
because the card is fully re-initialised. That
just leaves runtime suspend/resume with no support
for HS400 re-tuning.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

b5540ce1

09 1月, 2015 6 次提交

perf: Move task_pt_regs sampling into arch code · 88a7c26a

由 Andy Lutomirski 提交于 1月 04, 2015

On x86_64, at least, task_pt_regs may be only partially initialized
in many contexts, so x86_64 should not use it without extra care
from interrupt context, let alone NMI context.

This will allow x86_64 to override the logic and will supply some
scratch space to use to make a cleaner copy of user regs.
Tested-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: chenggang.qcg@taobao.com
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/e431cd4c18c2e1c44c774f10758527fb2d1025c4.1420396372.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

88a7c26a

vfs: renumber FMODE_NONOTIFY and add to uniqueness check · 75069f2b

由 David Drysdale 提交于 1月 08, 2015

Fix clashing values for O_PATH and FMODE_NONOTIFY on sparc.  The
clashing O_PATH value was added in commit 5229645b ("vfs: add
nonconflicting values for O_PATH") but this can't be changed as it is
user-visible.

FMODE_NONOTIFY is only used internally in the kernel, but it is in the
same numbering space as the other O_* flags, as indicated by the comment
at the top of include/uapi/asm-generic/fcntl.h (and its use in
fs/notify/fanotify/fanotify_user.c).  So renumber it to avoid the clash.

All of this has happened before (commit 12ed2e36: "fanotify:
FMODE_NONOTIFY and __O_SYNC in sparc conflict"), and all of this will
happen again -- so update the uniqueness check in fcntl_init() to
include __FMODE_NONOTIFY.
Signed-off-by: NDavid Drysdale <drysdale@google.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NJan Kara <jack@suse.cz>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

75069f2b

mm: protect set_page_dirty() from ongoing truncation · 2d6d7f98

由 Johannes Weiner 提交于 1月 08, 2015

Tejun, while reviewing the code, spotted the following race condition
between the dirtying and truncation of a page:

__set_page_dirty_nobuffers()       __delete_from_page_cache()
  if (TestSetPageDirty(page))
                                     page->mapping = NULL
				     if (PageDirty())
				       dec_zone_page_state(page, NR_FILE_DIRTY);
				       dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
    if (page->mapping)
      account_page_dirtied(page)
        __inc_zone_page_state(page, NR_FILE_DIRTY);
	__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);

which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.

Dirtiers usually lock out truncation, either by holding the page lock
directly, or in case of zap_pte_range(), by pinning the mapcount with
the page table lock held.  The notable exception to this rule, though,
is do_wp_page(), for which this race exists.  However, do_wp_page()
already waits for a locked page to unlock before setting the dirty bit,
in order to prevent a race where clear_page_dirty() misses the page bit
in the presence of dirty ptes.  Upgrade that wait to a fully locked
set_page_dirty() to also cover the situation explained above.

Afterwards, the code in set_page_dirty() dealing with a truncation race
is no longer needed.  Remove it.
Reported-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d6d7f98

mm: prevent endless growth of anon_vma hierarchy · 7a3ef208

由 Konstantin Khlebnikov 提交于 1月 08, 2015

Constantly forking task causes unlimited grow of anon_vma chain.  Each
next child allocates new level of anon_vmas and links vma to all
previous levels because pages might be inherited from any level.

This patch adds heuristic which decides to reuse existing anon_vma
instead of forking new one.  It adds counter anon_vma->degree which
counts linked vmas and directly descending anon_vmas and reuses anon_vma
if counter is lower than two.  As a result each anon_vma has either vma
or at least two descending anon_vmas.  In such trees half of nodes are
leafs with alive vmas, thus count of anon_vmas is no more than two times
bigger than count of vmas.

This heuristic reuses anon_vmas as few as possible because each reuse
adds false aliasing among vmas and rmap walker ought to scan more ptes
when it searches where page is might be mapped.

Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu
Fixes: 5beb4930 ("mm: change anon_vma linking to fix multi-process server scalability issue")
[akpm@linux-foundation.org: fix typo, per Rik]
Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
Reported-by: NDaniel Forrest <dan.forrest@ssec.wisc.edu>
Tested-by: NMichal Hocko <mhocko@suse.cz>
Tested-by: NJerome Marchand <jmarchan@redhat.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>	[2.6.34+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7a3ef208

regulator: s2mps11: Fix wrong calculation of register offset · ad26aa6c

由 Jonghwa Lee 提交于 1月 08, 2015

This patch adds missing registers('BUCK7_SW' & 'LDO29_CTRL'). Since BUCK7 has
1 more register (BUCK7_SW) than others, register offset should
be added one more for which has bigger address than BUCK7 registers.

Fixes: 76b9840b(regulator: s2mps11: Add support S2MPS13 regulator device)
Signed-off-by: NJonghwa Lee <jonghwa3.lee@samsung.com>
Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: NMark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>

ad26aa6c

libceph: fix sparse endianness warnings · d7d5a007

由 Ilya Dryomov 提交于 12月 19, 2014

The only real issue is the one in auth_x.c and it came with
3.19-rc1 merge.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

d7d5a007

08 1月, 2015 7 次提交

blk-mq: Allow requests to never expire · 5b3f25fc

由 Keith Busch 提交于 1月 07, 2015

Some types of requests may be started that are not gauranteed to ever
complete. This adds a request flag that a driver can use so mark the
request as such.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5b3f25fc

blk-mq: Add helper to abort requeued requests · 1885b24d

由 Jens Axboe 提交于 1月 07, 2015

Adds a helper function a driver can use to abort requeued requests in
case any are pending when h/w queues are being removed.
Signed-off-by: NJens Axboe <axboe@fb.com>

1885b24d

blk-mq: Let drivers cancel requeue_work · c68ed59f

由 Keith Busch 提交于 1月 07, 2015

Kicking requeued requests will start h/w queues in a work_queue, which
may alter the driver's requested state to temporarily stop them. This
patch exports a method to cancel the q->requeue_work so a driver can be
assured stopped h/w queues won't be started up before it is ready.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c68ed59f

blk-mq: Export if requests were started · 973c0191

由 Keith Busch 提交于 1月 07, 2015

Drivers can iterate over all allocated request tags, but their callback
needs a way to know if the driver started the request in the first place.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

973c0191

libata: Whitelist SSDs that are known to properly return zeroes after TRIM · e61f7d1c

由 Martin K. Petersen 提交于 1月 08, 2015

As defined, the DRAT (Deterministic Read After Trim) and RZAT (Return
Zero After Trim) flags in the ATA Command Set are unreliable in the
sense that they only define what happens if the device successfully
executed the DSM TRIM command. TRIM is only advisory, however, and the
device is free to silently ignore all or parts of the request.

In practice this renders the DRAT and RZAT flags completely useless and
because the results are unpredictable we decided to disable discard in
MD for 3.18 to avoid the risk of data corruption.

Hardware vendors in the real world obviously need better guarantees than
what the standards bodies provide. Unfortuntely those guarantees are
encoded in product requirements documents rather than somewhere we can
key off of them programatically. So we are compelled to disabling
discard_zeroes_data for all devices unless we explicitly have data to
support whitelisting them.

This patch whitelists SSDs from a few of the main vendors. None of the
whitelists are based on written guarantees. They are purely based on
empirical evidence collected from internal and external users that have
tested or qualified these drives in RAID deployments.

The whitelist is only meant as a starting point and is by no means
comprehensive:

   - All intel SSD models except for 510
   - Micron M5?0/M600
   - Samsung SSDs
   - Seagate SSDs
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTejun Heo <tj@kernel.org>

e61f7d1c

time: settimeofday: Validate the values of tv from user · 6ada1fc0

由 Sasha Levin 提交于 12月 03, 2014

An unvalidated user input is multiplied by a constant, which can result in
an undefined behaviour for large values. While this is validated later,
we should avoid triggering undefined behaviour.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
[jstultz: include trivial milisecond->microsecond correction noticed
by Andy]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

6ada1fc0

blk-mq: get rid of ->cmd_size in the hardware queue · 17ded320

由 Jens Axboe 提交于 1月 07, 2015

We store it in the tag set, we don't need it in the hardware queue.
While removing cmd_size, place ->queue_num further down to avoid
a hole on 64-bit archs. It's not used in any fast paths, so we
can safely move it.
Signed-off-by: NJens Axboe <axboe@fb.com>

17ded320

07 1月, 2015 1 次提交

mm: propagate error from stack expansion even for guard page · fee7e49d

由 Linus Torvalds 提交于 1月 06, 2015

Jay Foad reports that the address sanitizer test (asan) sometimes gets
confused by a stack pointer that ends up being outside the stack vma
that is reported by /proc/maps.

This happens due to an interaction between RLIMIT_STACK and the guard
page: when we do the guard page check, we ignore the potential error
from the stack expansion, which effectively results in a missing guard
page, since the expected stack expansion won't have been done.

And since /proc/maps explicitly ignores the guard page (commit
d7824370: "mm: fix up some user-visible effects of the stack guard
page"), the stack pointer ends up being outside the reported stack area.

This is the minimal patch: it just propagates the error. It also
effectively makes the guard page part of the stack limit, which in turn
measn that the actual real stack is one page less than the stack limit.

Let's see if anybody notices. We could teach acct_stack_growth() to
allow an extra page for a grow-up/grow-down stack in the rlimit test,
but I don't want to add more complexity if it isn't needed.
Reported-and-tested-by: NJay Foad <jay.foad@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fee7e49d

06 1月, 2015 2 次提交

NFSv4: Cache the NFSv4/v4.1 client owner_id in the struct nfs_client · ceb3a16c

由 Trond Myklebust 提交于 1月 03, 2015

Ensure that we cache the NFSv4/v4.1 client owner_id so that we can
verify it when we're doing trunking detection.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ceb3a16c

ACPI / processor: Rename acpi_(un)map_lsapic() to acpi_(un)map_cpu() · d02dc27d

由 Hanjun Guo 提交于 1月 04, 2015

acpi_map_lsapic() will allocate a logical CPU number and map it to
physical CPU id (such as APIC id) for the hot-added CPU, it will also
do some mapping for NUMA node id and etc, acpi_unmap_lsapic() will
do the reverse.

We can see that the name of the function is a little bit confusing and
arch (IA64) dependent so rename them as acpi_(un)map_cpu() to make arch
agnostic and explicit.
Signed-off-by: NHanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d02dc27d

30 12月, 2014 1 次提交

mm: get rid of radix tree gfp mask for pagecache_get_page · 45f87de5

由 Michal Hocko 提交于 12月 29, 2014

Commit 2457aec6 ("mm: non-atomically mark page accessed during page
cache allocation where possible") has added a separate parameter for
specifying gfp mask for radix tree allocations.

Not only this is less than optimal from the API point of view because it
is error prone, it is also buggy currently because
grab_cache_page_write_begin is using GFP_KERNEL for radix tree and if
fgp_flags doesn't contain FGP_NOFS (mostly controlled by fs by
AOP_FLAG_NOFS flag) but the mapping_gfp_mask has __GFP_FS cleared then
the radix tree allocation wouldn't obey the restriction and might
recurse into filesystem and cause deadlocks.  This is the case for most
filesystems unfortunately because only ext4 and gfs2 are using
AOP_FLAG_NOFS.

Let's simply remove radix_gfp_mask parameter because the allocation
context is same for both page cache and for the radix tree.  Just make
sure that the radix tree gets only the sane subset of the mask (e.g.  do
not pass __GFP_WRITE).

Long term it is more preferable to convert remaining users of
AOP_FLAG_NOFS to use mapping_gfp_mask instead and simplify this
interface even further.
Reported-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

45f87de5

27 12月, 2014 1 次提交

netlink/genetlink: pass network namespace to bind/unbind · 023e2cfa

由 Johannes Berg 提交于 12月 23, 2014

Netlink families can exist in multiple namespaces, and for the most
part multicast subscriptions are per network namespace. Thus it only
makes sense to have bind/unbind notifications per network namespace.

To achieve this, pass the network namespace of a given client socket
to the bind/unbind functions.

Also do this in generic netlink, and there also make sure that any
bind for multicast groups that only exist in init_net is rejected.
This isn't really a problem if it is accepted since a client in a
different namespace will never receive any notifications from such
a group, but it can confuse the family if not rejected (it's also
possible to silently (without telling the family) accept it, but it
would also have to be ignored on unbind so families that take any
kind of action on bind/unbind won't do unnecessary work for invalid
clients like that.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

023e2cfa

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功