提交 · 9aabf810a67cd97e2d1a48f0bab338b7680f1929 · openeuler / Kernel

11 2月, 2015 1 次提交

fsnotify: fix handling of renames in audit · 6ee8e25f

由 Jan Kara 提交于 2月 10, 2015

Commit e9fd702a ("audit: convert audit watches to use fsnotify
instead of inotify") broke handling of renames in audit.  Audit code
wants to update inode number of an inode corresponding to watched name
in a directory.  When something gets renamed into a directory to a
watched name, inotify previously passed moved inode to audit code
however new fsnotify code passes directory inode where the change
happened.  That confuses audit and it starts watching parent directory
instead of a file in a directory.

This can be observed for example by doing:

  cd /tmp
  touch foo bar
  auditctl -w /tmp/foo
  touch foo
  mv bar foo
  touch foo

In audit log we see events like:

  type=CONFIG_CHANGE msg=audit(1423563584.155:90): auid=1000 ses=2 op="updated rules" path="/tmp/foo" key=(null) list=4 res=1
  ...
  type=PATH msg=audit(1423563584.155:91): item=2 name="bar" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
  type=PATH msg=audit(1423563584.155:91): item=3 name="foo" inode=1046842 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
  type=PATH msg=audit(1423563584.155:91): item=4 name="foo" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=CREATE
  ...

and that's it - we see event for the first touch after creating the
audit rule, we see events for rename but we don't see any event for the
last touch.  However we start seeing events for unrelated stuff
happening in /tmp.

Fix the problem by passing moved inode as data in the FS_MOVED_FROM and
FS_MOVED_TO events instead of the directory where the change happens.
This doesn't introduce any new problems because noone besides
audit_watch.c cares about the passed value:

  fs/notify/fanotify/fanotify.c cares only about FSNOTIFY_EVENT_PATH events.
  fs/notify/dnotify/dnotify.c doesn't care about passed 'data' value at all.
  fs/notify/inotify/inotify_fsnotify.c uses 'data' only for FSNOTIFY_EVENT_PATH.
  kernel/audit_tree.c doesn't care about passed 'data' at all.
  kernel/audit_watch.c expects moved inode as 'data'.

Fixes: e9fd702a ("audit: convert audit watches to use fsnotify instead of inotify")
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6ee8e25f

08 2月, 2015 1 次提交

tracing: Add condition check to RCU lockdep checks · a05d59a5

由 Steven Rostedt (Red Hat) 提交于 2月 06, 2015

The trace_tlb_flush() tracepoint can be called when a CPU is going offline.
When a CPU is offline, RCU is no longer watching that CPU and since the
tracepoint is protected by RCU, it must not be called. To prevent the
tlb_flush tracepoint from being called when the CPU is offline, it was
converted to a TRACE_EVENT_CONDITION where the condition checks if the
CPU is online before calling the tracepoint.

Unfortunately, this was not enough to stop lockdep from complaining about
it. Even though the RCU protected code of the tracepoint will never be
called, the condition is hidden within the tracepoint, and even though the
condition prevents RCU code from being called, the lockdep checks are
outside the tracepoint (this is to test tracepoints even when they are not
enabled).

Even though tracepoints should be checked to be RCU safe when they are not
enabled, the condition should still be considered when checking RCU.

Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com

Fixes: 3a630178 "tracing: generate RCU warnings even when tracepoints are disabled"
Cc: stable@vger.kernel.org # 3.18+
Acked-by: NDave Hansen <dave@sr71.net>
Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a05d59a5

05 2月, 2015 1 次提交

spi: spi-pxa2xx: only include mach/dma.h for legacy DMA · 9064bf3c

由 Rob Herring 提交于 2月 03, 2015

Move the include of mach/dma.h to the legacy PXA DMA code where it is used.
This enables building spi-pxa2xx on ARM64.
Signed-off-by: NRob Herring <robh@kernel.org>
Signed-off-by: NMark Brown <broonie@kernel.org>

9064bf3c

04 2月, 2015 3 次提交

regulator: Fix build breakage on !REGULATOR · 50910276

由 Krzysztof Kozlowski 提交于 2月 04, 2015

Add missing stubs for regulator_suspend_prepare() and
regulator_suspend_finish() to fix exynos_defconfig build without
REGULATOR:

arch/arm/mach-exynos/built-in.o: In function `exynos_suspend_finish':
arch/arm/mach-exynos/suspend.c:537: undefined reference to `regulator_suspend_finish'
arch/arm/mach-exynos/built-in.o: In function `exynos_suspend_prepare':
arch/arm/mach-exynos/suspend.c:520: undefined reference to `regulator_suspend_prepare'
make: *** [vmlinux] Error 1
Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Reported-by: NJoerg Roedel <joro@8bytes.org>
Reported-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: NMark Brown <broonie@kernel.org>

50910276

perf: Decouple unthrottling and rotating · 2fde4f94

由 Mark Rutland 提交于 1月 07, 2015

Currently the adjusments made as part of perf_event_task_tick() use the
percpu rotation lists to iterate over any active PMU contexts, but these
are not used by the context rotation code, having been replaced by
separate (per-context) hrtimer callbacks. However, some manipulation of
the rotation lists (i.e. removal of contexts) has remained in
perf_rotate_context(). This leads to the following issues:

* Contexts are not always removed from the rotation lists. Removal of
  PMUs which have been placed in rotation lists, but have not been
  removed by a hrtimer callback can result in corruption of the rotation
  lists (when memory backing the context is freed).

  This has been observed to result in hangs when PMU drivers built as
  modules are inserted and removed around the creation of events for
  said PMUs.

* Contexts which do not require rotation may be removed from the
  rotation lists as a result of a hrtimer, and will not be considered by
  the unthrottling code in perf_event_task_tick.

This patch fixes the issue by updating the rotation ist when events are
scheduled in/out, ensuring that each rotation list stays in sync with
the HW state. As each event holds a refcount on the module of its PMU,
this ensures that when a PMU module is unloaded none of its CPU contexts
can be in a rotation list. By maintaining a list of perf_event_contexts
rather than perf_event_cpu_contexts, we don't need separate paths to
handle the cpu and task contexts, which also makes the code a little
simpler.

As the rotation_list variables are not used for rotation, these are
renamed to active_ctx_list, which better matches their current function.
perf_pmu_rotate_{start,stop} are renamed to
perf_pmu_ctx_{activate,deactivate}.
Reported-by: NJohannes Jensen <johannes.jensen@arm.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Will Deacon <Will.Deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150129134511.GR17721@leverpostejSigned-off-by: NIngo Molnar <mingo@kernel.org>

2fde4f94

sched/wait: Introduce wait_on_bit_timeout() · 44fc0e5e

由 Johan Hedberg 提交于 1月 30, 2015

Add a new wait_on_bit_timeout() helper, basically the same as
wait_on_bit() except that it also takes a 'timeout' parameter.

All the building blocks like bit_wait_timeout() and
out_of_line_wait_on_bit_timeout() are already in place so the
addition is rather simple.
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: davem@davemloft.net
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1422616476-2917-2-git-send-email-johan.hedberg@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

44fc0e5e

03 2月, 2015 2 次提交

sched/wait: Remove might_sleep() from wait_event_cmd() · 3e875238

由 Mikulas Patocka 提交于 2月 02, 2015

The patch e22b886a ("sched/wait: Add might_sleep() checks")
introduced a bug in the raid5 subsystem.

The function raid5_quiesce() (and resize_stripes()) uses the 'cmd'
part to release and acquire a spinlock (so we call the sleep
primitives in atomic context), and therefore we cannot do the
might_sleep() check.

Remove it.

Fixes: e22b886a ("sched/wait: Add might_sleep() checks")
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1502020935580.13510@file01.intranet.prod.int.rdu2.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

3e875238

net/mlx4_core: Fix kernel Oops (mem corruption) when working with more than 80 VFs · 5a2e87b1

由 Jack Morgenstein 提交于 2月 02, 2015

Commit de966c59 (net/mlx4_core: Support more than 64 VFs) was meant to
allow up to 126 VFs.  However, due to leaving MLX4_MFUNC_MAX too low, using
more than 80 VFs resulted in memory corruptions (and Oopses) when more than
80 VFs were requested. In addition, the number of slaves was left too high.

This commit fixes these issues.

Fixes: de966c59 ("net/mlx4_core: Support more than 64 VFs")
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a2e87b1

02 2月, 2015 1 次提交

sched: don't cause task state changes in nested sleep debugging · 00845eb9

由 Linus Torvalds 提交于 2月 01, 2015

Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report
on nested sleep conditions, which we generally want to avoid because the
inner sleeping operation can re-set the thread state to TASK_RUNNING,
but that will then cause the outer sleep loop not actually sleep when it
calls schedule.

However, that's actually valid traditional behavior, with the inner
sleep being some fairly rare case (like taking a sleeping lock that
normally doesn't actually need to sleep).

And the debug code would actually change the state of the task to
TASK_RUNNING internally, which makes that kind of traditional and
working code not work at all, because now the nested sleep doesn't just
sometimes cause the outer one to not block, but will cause it to happen
every time.

In particular, it will cause the cardbus kernel daemon (pccardd) to
basically busy-loop doing scheduling, converting a laptop into a heater,
as reported by Bruno Prémont.  But there may be other legacy uses of
that nested sleep model in other drivers that are also likely to never
get converted to the new model.

This fixes both cases:

 - don't set TASK_RUNNING when the nested condition happens (note: even
   if WARN_ONCE() only _warns_ once, the return value isn't whether the
   warning happened, but whether the condition for the warning was true.
   So despite the warning only happening once, the "if (WARN_ON(..))"
   would trigger for every nested sleep.

 - in the cases where we knowingly disable the warning by using
   "sched_annotate_sleep()", don't change the task state (that is used
   for all core scheduling decisions), instead use '->task_state_change'
   that is used for the debugging decision itself.

(Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
suggested change to use 'task_state_change' as part of the test)
Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org>
Tested-by: NRafael J Wysocki <rjw@rjwysocki.net>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Hurley <peter@hurleysoftware.com>,
Cc: Davidlohr Bueso <dave@stgolabs.net>,
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00845eb9

31 1月, 2015 1 次提交

net: Fix vlan_get_protocol for stacked vlan · d4bcef3f

由 Toshiaki Makita 提交于 1月 29, 2015

vlan_get_protocol() could not get network protocol if a skb has a 802.1ad
vlan tag or multiple vlans, which caused incorrect checksum calculation
in several drivers.

Fix vlan_get_protocol() to retrieve network protocol instead of incorrect
vlan protocol.

As the logic is the same as skb_network_protocol(), create a common helper
function __vlan_get_protocol() and call it from existing functions.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4bcef3f

30 1月, 2015 1 次提交

vm: add VM_FAULT_SIGSEGV handling support · 33692f27

由 Linus Torvalds 提交于 1月 29, 2015

The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.

That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works.  However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.

In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV.  And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.

However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space.  And user space really
expected SIGSEGV, not SIGBUS.

To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it.  They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.

This is the mindless minimal patch to do this.  A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.

Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de>
Tested-by: NJan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33692f27

29 1月, 2015 3 次提交

ata: pata_platform: fix owner module reference mismatch for scsi host · 17263905

由 Akinobu Mita 提交于 1月 29, 2015

The owner module reference of the pata_of_platform's scsi_host is
initialized to pata_platform's one, because pata_of_platform driver
use a scsi_host_template defined in pata_platform.  So this drivers
can be unloaded even if the scsi device is being accessed.

This fixes it by propagating the scsi_host_template to pata_of_platform
driver.  The scsi_host_template is passed through a new
argument of __pata_platform_probe().
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: linux-ide@vger.kernel.org
Cc: linux-scsi@vger.kernel.org

17263905

ata: ahci_platform: fix owner module reference mismatch for scsi host · 018d5ef2

由 Akinobu Mita 提交于 1月 29, 2015

The owner module reference of the ahci platform's scsi_host is
initialized to libahci_platform's one, because these drivers use a
scsi_host_template defined in libahci_platform.  So these drivers can
be unloaded even if the scsi device is being accessed.

This fixes it by pushing the scsi_host_template from libahci_platform
to all leaf drivers.  The scsi_host_template is passed through a new
argument of ahci_platform_init_host().
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: linux-ide@vger.kernel.org
Cc: linux-scsi@vger.kernel.org

018d5ef2

regulator: da9211: Add gpio control for enable/disable of buck · 8c7dd8bc

由 James Ban 提交于 1月 28, 2015

This is a patch for adding gpio control about enable/disable of buck.
Signed-off-by: NJames Ban <james.ban.opensource@diasemi.com>
Signed-off-by: NMark Brown <broonie@kernel.org>

8c7dd8bc

28 1月, 2015 2 次提交

perf: Tighten (and fix) the grouping condition · c3c87e77

由 Peter Zijlstra 提交于 1月 23, 2015

The fix from 9fc81d87 ("perf: Fix events installation during
moving group") was incomplete in that it failed to recognise that
creating a group with events for different CPUs is semantically
broken -- they cannot be co-scheduled.

Furthermore, it leads to real breakage where, when we create an event
for CPU Y and then migrate it to form a group on CPU X, the code gets
confused where the counter is programmed -- triggered in practice
as well by me via the perf fuzzer.

Fix this by tightening the rules for creating groups. Only allow
grouping of counters that can be co-scheduled in the same context.
This means for the same task and/or the same cpu.

Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

c3c87e77

quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff

由 Jan Kara 提交于 10月 09, 2014

Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
tracks space limits and usage in 512-byte blocks. However VFS quotas
track usage in bytes (as some filesystems require that) and we need to
somehow pass this information. Upto now it wasn't a problem because we
didn't do any unit conversion (thus VFS quota routines happily stuck
number of bytes into d_bcount field of struct fd_disk_quota). Only if
you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
/ Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
tried this but reportedly some Samba users hit the problem in practice.
So when we want interfaces compatible we need to fix this.

We bite the bullet and define another quota structure used for passing
information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
to have more conversion routines in fs/quota/quota.c and another copying
of quota structure slows down getting of quota information by about 2%
but it seems cleaner than overloading e.g. units of d_bcount to bytes.

CC: stable@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

14bf61ff

27 1月, 2015 4 次提交

regmap: correct the description of structure element in reg_field · f27b37f5

由 Bintian Wang 提交于 1月 27, 2015

Fix incorrect description of structure element "msb", which is
described as "reg".
Signed-off-by: NBintian Wang <bintian.wang@huawei.com>
Signed-off-by: NMark Brown <broonie@kernel.org>

f27b37f5

printk: add dummy routine for when CONFIG_PRINTK=n · 07261edb

由 Pranith Kumar 提交于 1月 26, 2015

There are missing dummy routines for log_buf_addr_get() and
log_buf_len_get() for when CONFIG_PRINTK is not set causing build
failures.

This patch adds these dummy routines at the appropriate location.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NPetr Mladek <pmladek@suse.cz>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

07261edb

mm: page_alloc: embed OOM killing naturally into allocation slowpath · 9879de73

由 Johannes Weiner 提交于 1月 26, 2015

The OOM killing invocation does a lot of duplicative checks against the
task's allocation context.  Rework it to take advantage of the existing
checks in the allocator slowpath.

The OOM killer is invoked when the allocator is unable to reclaim any
pages but the allocation has to keep looping.  Instead of having a check
for __GFP_NORETRY hidden in oom_gfp_allowed(), just move the OOM
invocation to the true branch of should_alloc_retry().  The __GFP_FS
check from oom_gfp_allowed() can then be moved into the OOM avoidance
branch in __alloc_pages_may_oom(), along with the PF_DUMPCORE test.

__alloc_pages_may_oom() can then signal to the caller whether the OOM
killer was invoked, instead of requiring it to duplicate the order and
high_zoneidx checks to guess this when deciding whether to continue.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9879de73

i2c: Only include slave support if selected · d5fd120e

由 Jean Delvare 提交于 1月 26, 2015

Make the slave support depend on CONFIG_I2C_SLAVE. Otherwise it gets
included unconditionally, even when it is not needed.

I2C bus drivers which implement slave support must select
I2C_SLAVE.
Signed-off-by: NJean Delvare <jdelvare@suse.de>
Signed-off-by: NWolfram Sang <wsa@the-dreams.de>

d5fd120e

24 1月, 2015 4 次提交

rtc: Convert rtc_set_ntp_time() to use timespec64 · 9a4a445e

由 Xunlei Pang 提交于 1月 22, 2015

rtc_set_ntp_time() uses timespec which is y2038-unsafe,
so modify to use timespec64 which is y2038-safe, then
replace rtc_time_to_tm() with rtc_time64_to_tm().

Also adjust all its call sites(only NTP uses it) accordingly.

Cc: pang.xunlei <pang.xunlei@linaro.org>
Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

9a4a445e

time: Expose get_monotonic_boottime64 for in-kernel use · 2e0c78ee

由 John Stultz 提交于 12月 18, 2014

As part of the 2038 conversion process, add a
get_monotonic_boottime64 accessor so we can depracate
get_monotonic_boottime.

Cc: pang.xunlei <pang.xunlei@linaro.org>
Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

2e0c78ee

time: Expose getboottime64 for in-kernel uses · d08c0cdd

由 John Stultz 提交于 12月 08, 2014

Adds a timespec64 based getboottime64() implementation
that can be used as we convert internal users of
getboottime away from using timespecs.

Cc: pang.xunlei <pang.xunlei@linaro.org>
Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

d08c0cdd

ktime: Optimize ktime_divns for constant divisors · 8b618628

由 Nicolas Pitre 提交于 12月 03, 2014

At least on ARM, do_div() is optimized to turn constant divisors into
an inline multiplication by the reciprocal value at compile time.
However this optimization is missed entirely whenever ktime_divns() is
used and the slow out-of-line division code is used all the time.

Let ktime_divns() use do_div() inline whenever the divisor is constant
and small enough.  This will make things like ktime_to_us() and
ktime_to_ms() much faster.

Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nico@linaro.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

8b618628

23 1月, 2015 2 次提交

hrtimer: Prevent stale expiry time in hrtimer_interrupt() · 9bc74919

由 Thomas Gleixner 提交于 1月 20, 2015

hrtimer_interrupt() has the following subtle issue:

hrtimer_interrupt()
  lock(cpu_base);
  expires_next = KTIME_MAX;

  expire_timers(CLOCK_MONOTONIC);
  expires = get_next_timer(CLOCK_MONOTONIC);
  if (expires < expires_next)
    expires_next = expires;

  expire_timers(CLOCK_REALTIME);
    unlock(cpu_base);
    wakeup()
    hrtimer_start(CLOCK_MONOTONIC, newtimer);
    lock(cpu_base();  
  expires = get_next_timer(CLOCK_REALTIME);
  if (expires < expires_next)
    expires_next = expires;

So because we already evaluated the next expiring timer of
CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be
earlier than the overall next expiry time in hrtimer_interrupt().

To solve this, remove the caching of the next expiry value from
hrtimer_interrupt() and reevaluate all active clock bases for the next
expiry value. To avoid another code duplication, create a shared
evaluation function and use it for hrtimer_get_next_event(),
hrtimer_force_reprogram() and hrtimer_interrupt().

There is another subtlety in this mechanism:

While hrtimer_interrupt() is running, we want to avoid to touch the
hardware device because we will reprogram it anyway at the end of
hrtimer_interrupt(). This works nicely for hrtimers which get rearmed
via the HRTIMER_RESTART mechanism, because we drop out when the
callback on that CPU is running. But that fails, if a new timer gets
enqueued like in the example above.

This has another implication: While hrtimer_interrupt() is running we
refuse remote enqueueing of timers - see hrtimer_interrupt() and
hrtimer_check_target().

hrtimer_interrupt() tries to prevent this by setting cpu_base->expires
to KTIME_MAX, but that fails if a new timer gets queued.

Prevent both the hardware access and the remote enqueue
explicitely. We can loosen the restriction on the remote enqueue now
due to reevaluation of the next expiry value, but that needs a
seperate patch.

Folded in a fix from Vignesh Radhakrishnan.
Reported-and-tested-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
Based-on-patch-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: vigneshr@codeaurora.org
Cc: john.stultz@linaro.org
Cc: viresh.kumar@linaro.org
Cc: fweisbec@gmail.com
Cc: cl@linux.com
Cc: stuart.w.hayes@gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

9bc74919

ktime.h: Introduce ktime_ms_delta · 41fbf3b3

由 Chunyan Zhang 提交于 12月 17, 2014

This patch adds a reusable time difference function which returns the
difference in millisecond, as often used in some driver code, e.g.
mtd/test, media/rc, etc.
Signed-off-by: NChunyan Zhang <zhang.chunyan@linaro.org>
Acked-by: NArnd Bergmann <arnd@linaro.org>
Cc: zhang.lyra@gmail.com
Cc: davem@davemloft.net
Cc: john.stultz@linaro.org
Cc: dborkman@redhat.com
Link: http://lkml.kernel.org/r/1418793095-18780-1-git-send-email-zhang.chunyan@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

41fbf3b3

22 1月, 2015 2 次提交

init: Get rid of x86isms · 30b8b006

由 Thomas Gleixner 提交于 1月 15, 2015

The UP local API support can be set up from an early initcall. No need
for horrible hackery in the init code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211703.827943883@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

30b8b006

module: make module_refcount() a signed integer. · d5db139a

由 Rusty Russell 提交于 1月 22, 2015

James Bottomley points out that it will be -1 during unload.  It's
only used for diagnostics, so let's not hide that as it could be a
clue as to what's gone wrong.

Cc: Jason Wessel <jason.wessel@windriver.com>
Acked-and-documention-added-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: NMasami Hiramatsu <maasami.hiramatsu.pt@hitachi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

d5db139a

20 1月, 2015 2 次提交

module: remove mod arg from module_free, rename module_memfree(). · be1f221c

由 Rusty Russell 提交于 1月 20, 2015

Nothing needs the module pointer any more, and the next patch will
call it from RCU, where the module itself might no longer exist.
Removing the arg is the safest approach.

This just codifies the use of the module_alloc/module_free pattern
which ftrace and bpf use.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: x86@kernel.org
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linux-cris-kernel@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: nios2-dev@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: netdev@vger.kernel.org

be1f221c

module_arch_freeing_init(): new hook for archs before module->module_init freed. · d453cded

由 Rusty Russell 提交于 1月 20, 2015

Archs have been abusing module_free() to clean up their arch-specific
allocations.  Since module_free() is also (ab)used by BPF and trace code,
let's keep it to simple allocations, and provide a hook called before
that.

This means that avr32, ia64, parisc and s390 no longer need to implement
their own module_free() at all.  avr32 doesn't need module_finalize()
either.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org

d453cded

19 1月, 2015 2 次提交

ata: libahci: Allow using multiple regulators · c7d7ddee

由 Gregory CLEMENT 提交于 1月 15, 2015

The current implementation of the libahci allows using multiple PHYs
but not multiple regulators. This patch adds the support of multiple
regulators. Until now it was mandatory to have a PHY under a subnode,
now a port subnode can contain either a regulator or a PHY (or both).

In order to be able to asociate a port with a regulator the port are
now a platform device in the device tree case.

There was only one driver which used directly the regulator field of
the ahci_host_priv structure. To preserve the bisectability the change
in the ahci_imx driver was done in the same patch.
Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

c7d7ddee

libata: allow sata_sil24 to opt-out of tag ordered submission · 72dd299d

由 Dan Williams 提交于 1月 16, 2015

Ronny reports: https://bugzilla.kernel.org/show_bug.cgi?id=87101
    "Since commit 8a4aeec8 "libata/ahci: accommodate tag ordered
    controllers" the access to the harddisk on the first SATA-port is
    failing on its first access. The access to the harddisk on the
    second port is working normal.

    When reverting the above commit, access to both harddisks is working
    fine again."

Maintain tag ordered submission as the default, but allow sata_sil24 to
continue with the old behavior.

Cc: <stable@vger.kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Reported-by: NRonny Hegewald <Ronny.Hegewald@online.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

72dd299d

17 1月, 2015 3 次提交

genetlink: synchronize socket closing and family removal · ee1c2442

由 Johannes Berg 提交于 1月 16, 2015

In addition to the problem Jeff Layton reported, I looked at the code
and reproduced the same warning by subscribing and removing the genl
family with a socket still open. This is a fairly tricky race which
originates in the fact that generic netlink allows the family to go
away while sockets are still open - unlike regular netlink which has
a module refcount for every open socket so in general this cannot be
triggered.

Trying to resolve this issue by the obvious locking isn't possible as
it will result in deadlocks between unregistration and group unbind
notification (which incidentally lockdep doesn't find due to the home
grown locking in the netlink table.)

To really resolve this, introduce a "closing socket" reference counter
(for generic netlink only, as it's the only affected family) in the
core netlink code and use that in generic netlink to wait for all the
sockets that are being closed at the same time as a generic netlink
family is removed.

This fixes the race that when a socket is closed, it will should call
the unbind, but if the family is removed at the same time the unbind
will not find it, leading to the warning. The real problem though is
that in this case the unbind could actually find a new family that is
registered to have a multicast group with the same ID, and call its
mcast_unbind() leading to confusing.

Also remove the warning since it would still trigger, but is now no
longer a problem.

This also moves the code in af_netlink.c to before unreferencing the
module to avoid having the same problem in the normal non-genl case.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee1c2442

PCI: Add pci_claim_bridge_resource() to clip window if necessary · 8505e729

由 Yinghai Lu 提交于 1月 15, 2015

Add pci_claim_bridge_resource() to claim a PCI-PCI bridge window. This is
like regular pci_claim_resource(), except that if we fail to claim the
window, we check to see if we can reduce the size of the window and try
again.

This is for scenarios like this:

pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref]
pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]

The 00:01.0 window is illegal: it starts before the host bridge window, so
we have to assume the [0xbdf00000-0xbfffffff] region is inaccessible. We
can make it legal by clipping it to [mem 0xc0000000-0xddefffff 64bit pref].

Previously we discarded the 00:01.0 window and tried to reassign that part
of the hierarchy from scratch. That is a problem because Linux doesn't
always assign things optimally. For example, in this case, BIOS put the
01:00.0 device in a prefetchable window below 4GB, but after 5b285415,
Linux puts the prefetchable window above 4GB where the 32-bit 01:00.0
device can't use it.

Clipping the 00:01.0 window is less intrusive than completely reassigning
things and is sufficient to let us use most of the BIOS configuration. Of
course, it's possible that devices below 00:01.0 will no longer fit. If
that's the case, we'll have to reassign things. But that's a separate
problem.

[bhelgaas: changelog, split into separate patch]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491Reported-by: NMarek Kordik <kordikmarek@gmail.com>
Fixes: 5b285415 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.16+

8505e729

PCI: Add flag for devices where we can't use bus reset · f331a859

由 Alex Williamson 提交于 1月 15, 2015

Enable a mechanism for devices to quirk that they do not behave when
doing a PCI bus reset. We require a modest level of spec compliant
behavior in order to do a reset, for instance the device should come
out of reset without throwing errors and PCI config space should be
accessible after reset. This is too much to ask for some devices.

Link: http://lkml.kernel.org/r/20140923210318.498dacbd@dualc.maya.orgSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.14+

f331a859

16 1月, 2015 2 次提交

regulator: da9211: fix unmatched of_node · 076c3b8e

由 James Ban 提交于 1月 16, 2015

This is a patch for fixing unmatched of_node.
Signed-off-by: NJames Ban <james.ban.opensource@diasemi.com>
Signed-off-by: NMark Brown <broonie@kernel.org>

076c3b8e

rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors · 5cd37193

由 Paul E. McKenney 提交于 12月 13, 2014

Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
in places where it would be useful for it to apply to the normal RCU
flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
case for workloads that aggressively overload the system, particularly
those that generate large numbers of RCU updates on systems running
NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
from cond_resched_rcu_qs() to the normal RCU flavors.

Note that it is unfortunately necessary to leave the old ->passed_quiesce
mechanism in place to allow quiescent states that apply to only one
flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
that case, but that is not so good for debugging of RCU internals.)
In addition, if one of the RCU flavor's grace period has stalled, this
will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
quiescent state visible from other CPUs.
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
  was used in preemptible code. ]

5cd37193

15 1月, 2015 1 次提交

netdevice: Add missing parentheses in macro · 4ccce02e

由 Benjamin Poirier 提交于 1月 14, 2015

For example, one could conceivably call
	for_each_netdev_in_bond_rcu(condition ? bond1 : bond2, slave)
and get an unexpected result.
Signed-off-by: NBenjamin Poirier <bpoirier@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ccce02e

14 1月, 2015 2 次提交

perf: Avoid horrible stack usage · 86038c5e

由 Peter Zijlstra (Intel) 提交于 12月 16, 2014

Both Linus (most recent) and Steve (a while ago) reported that perf
related callbacks have massive stack bloat.

The problem is that software events need a pt_regs in order to
properly report the event location and unwind stack. And because we
could not assume one was present we allocated one on stack and filled
it with minimal bits required for operation.

Now, pt_regs is quite large, so this is undesirable. Furthermore it
turns out that most sites actually have a pt_regs pointer available,
making this even more onerous, as the stack space is pointless waste.

This patch addresses the problem by observing that software events
have well defined nesting semantics, therefore we can use static
per-cpu storage instead of on-stack.

Linus made the further observation that all but the scheduler callers
of perf_sw_event() have a pt_regs available, so we change the regular
perf_sw_event() to require a valid pt_regs (where it used to be
optional) and add perf_sw_event_sched() for the scheduler.

We have a scheduler specific call instead of a more generic _noregs()
like construct because we can assume non-recursion from the scheduler
and thereby simplify the code further (_noregs would have to put the
recursion context call inline in order to assertain which __perf_regs
element to use).

One last note on the implementation of perf_trace_buf_prepare(); we
allow .regs = NULL for those cases where we already have a pt_regs
pointer available and do not need another.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reported-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

86038c5e

locking/mcs: Better differentiate between MCS variants · d84b6728

由 Davidlohr Bueso 提交于 1月 06, 2015

We have two flavors of the MCS spinlock: standard and cancelable (OSQ).
While each one is independent of the other, we currently mix and match
them. This patch:

  - Moves the OSQ code out of mcs_spinlock.h (which only deals with the traditional
    version) into include/linux/osq_lock.h. No unnecessary code is added to the
    more global header file, anything locks that make use of OSQ must include
    it anyway.

  - Renames mcs_spinlock.c to osq_lock.c. This file only contains osq code.

  - Introduces a CONFIG_LOCK_SPIN_ON_OWNER in order to only build osq_lock
    if there is support for it.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/1420573509-24774-5-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

d84b6728

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功