1. 11 2月, 2015 1 次提交
    • J
      fsnotify: fix handling of renames in audit · 6ee8e25f
      Jan Kara 提交于
      Commit e9fd702a ("audit: convert audit watches to use fsnotify
      instead of inotify") broke handling of renames in audit.  Audit code
      wants to update inode number of an inode corresponding to watched name
      in a directory.  When something gets renamed into a directory to a
      watched name, inotify previously passed moved inode to audit code
      however new fsnotify code passes directory inode where the change
      happened.  That confuses audit and it starts watching parent directory
      instead of a file in a directory.
      
      This can be observed for example by doing:
      
        cd /tmp
        touch foo bar
        auditctl -w /tmp/foo
        touch foo
        mv bar foo
        touch foo
      
      In audit log we see events like:
      
        type=CONFIG_CHANGE msg=audit(1423563584.155:90): auid=1000 ses=2 op="updated rules" path="/tmp/foo" key=(null) list=4 res=1
        ...
        type=PATH msg=audit(1423563584.155:91): item=2 name="bar" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
        type=PATH msg=audit(1423563584.155:91): item=3 name="foo" inode=1046842 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
        type=PATH msg=audit(1423563584.155:91): item=4 name="foo" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=CREATE
        ...
      
      and that's it - we see event for the first touch after creating the
      audit rule, we see events for rename but we don't see any event for the
      last touch.  However we start seeing events for unrelated stuff
      happening in /tmp.
      
      Fix the problem by passing moved inode as data in the FS_MOVED_FROM and
      FS_MOVED_TO events instead of the directory where the change happens.
      This doesn't introduce any new problems because noone besides
      audit_watch.c cares about the passed value:
      
        fs/notify/fanotify/fanotify.c cares only about FSNOTIFY_EVENT_PATH events.
        fs/notify/dnotify/dnotify.c doesn't care about passed 'data' value at all.
        fs/notify/inotify/inotify_fsnotify.c uses 'data' only for FSNOTIFY_EVENT_PATH.
        kernel/audit_tree.c doesn't care about passed 'data' at all.
        kernel/audit_watch.c expects moved inode as 'data'.
      
      Fixes: e9fd702a ("audit: convert audit watches to use fsnotify instead of inotify")
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ee8e25f
  2. 08 2月, 2015 1 次提交
  3. 05 2月, 2015 1 次提交
  4. 04 2月, 2015 3 次提交
    • K
      regulator: Fix build breakage on !REGULATOR · 50910276
      Krzysztof Kozlowski 提交于
      Add missing stubs for regulator_suspend_prepare() and
      regulator_suspend_finish() to fix exynos_defconfig build without
      REGULATOR:
      
      arch/arm/mach-exynos/built-in.o: In function `exynos_suspend_finish':
      arch/arm/mach-exynos/suspend.c:537: undefined reference to `regulator_suspend_finish'
      arch/arm/mach-exynos/built-in.o: In function `exynos_suspend_prepare':
      arch/arm/mach-exynos/suspend.c:520: undefined reference to `regulator_suspend_prepare'
      make: *** [vmlinux] Error 1
      Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Reported-by: NJoerg Roedel <joro@8bytes.org>
      Reported-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      50910276
    • M
      perf: Decouple unthrottling and rotating · 2fde4f94
      Mark Rutland 提交于
      Currently the adjusments made as part of perf_event_task_tick() use the
      percpu rotation lists to iterate over any active PMU contexts, but these
      are not used by the context rotation code, having been replaced by
      separate (per-context) hrtimer callbacks. However, some manipulation of
      the rotation lists (i.e. removal of contexts) has remained in
      perf_rotate_context(). This leads to the following issues:
      
      * Contexts are not always removed from the rotation lists. Removal of
        PMUs which have been placed in rotation lists, but have not been
        removed by a hrtimer callback can result in corruption of the rotation
        lists (when memory backing the context is freed).
      
        This has been observed to result in hangs when PMU drivers built as
        modules are inserted and removed around the creation of events for
        said PMUs.
      
      * Contexts which do not require rotation may be removed from the
        rotation lists as a result of a hrtimer, and will not be considered by
        the unthrottling code in perf_event_task_tick.
      
      This patch fixes the issue by updating the rotation ist when events are
      scheduled in/out, ensuring that each rotation list stays in sync with
      the HW state. As each event holds a refcount on the module of its PMU,
      this ensures that when a PMU module is unloaded none of its CPU contexts
      can be in a rotation list. By maintaining a list of perf_event_contexts
      rather than perf_event_cpu_contexts, we don't need separate paths to
      handle the cpu and task contexts, which also makes the code a little
      simpler.
      
      As the rotation_list variables are not used for rotation, these are
      renamed to active_ctx_list, which better matches their current function.
      perf_pmu_rotate_{start,stop} are renamed to
      perf_pmu_ctx_{activate,deactivate}.
      Reported-by: NJohannes Jensen <johannes.jensen@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Will Deacon <Will.Deacon@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20150129134511.GR17721@leverpostejSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2fde4f94
    • J
      sched/wait: Introduce wait_on_bit_timeout() · 44fc0e5e
      Johan Hedberg 提交于
      Add a new wait_on_bit_timeout() helper, basically the same as
      wait_on_bit() except that it also takes a 'timeout' parameter.
      
      All the building blocks like bit_wait_timeout() and
      out_of_line_wait_on_bit_timeout() are already in place so the
      addition is rather simple.
      Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: davem@davemloft.net
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1422616476-2917-2-git-send-email-johan.hedberg@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      44fc0e5e
  5. 03 2月, 2015 2 次提交
  6. 02 2月, 2015 1 次提交
    • L
      sched: don't cause task state changes in nested sleep debugging · 00845eb9
      Linus Torvalds 提交于
      Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report
      on nested sleep conditions, which we generally want to avoid because the
      inner sleeping operation can re-set the thread state to TASK_RUNNING,
      but that will then cause the outer sleep loop not actually sleep when it
      calls schedule.
      
      However, that's actually valid traditional behavior, with the inner
      sleep being some fairly rare case (like taking a sleeping lock that
      normally doesn't actually need to sleep).
      
      And the debug code would actually change the state of the task to
      TASK_RUNNING internally, which makes that kind of traditional and
      working code not work at all, because now the nested sleep doesn't just
      sometimes cause the outer one to not block, but will cause it to happen
      every time.
      
      In particular, it will cause the cardbus kernel daemon (pccardd) to
      basically busy-loop doing scheduling, converting a laptop into a heater,
      as reported by Bruno Prémont.  But there may be other legacy uses of
      that nested sleep model in other drivers that are also likely to never
      get converted to the new model.
      
      This fixes both cases:
      
       - don't set TASK_RUNNING when the nested condition happens (note: even
         if WARN_ONCE() only _warns_ once, the return value isn't whether the
         warning happened, but whether the condition for the warning was true.
         So despite the warning only happening once, the "if (WARN_ON(..))"
         would trigger for every nested sleep.
      
       - in the cases where we knowingly disable the warning by using
         "sched_annotate_sleep()", don't change the task state (that is used
         for all core scheduling decisions), instead use '->task_state_change'
         that is used for the debugging decision itself.
      
      (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
      avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
      suggested change to use 'task_state_change' as part of the test)
      Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: NRafael J Wysocki <rjw@rjwysocki.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>,
      Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>,
      Cc: Davidlohr Bueso <dave@stgolabs.net>,
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00845eb9
  7. 31 1月, 2015 1 次提交
  8. 30 1月, 2015 1 次提交
    • L
      vm: add VM_FAULT_SIGSEGV handling support · 33692f27
      Linus Torvalds 提交于
      The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
      "you should SIGSEGV" error, because the SIGSEGV case was generally
      handled by the caller - usually the architecture fault handler.
      
      That results in lots of duplication - all the architecture fault
      handlers end up doing very similar "look up vma, check permissions, do
      retries etc" - but it generally works.  However, there are cases where
      the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
      
      In particular, when accessing the stack guard page, libsigsegv expects a
      SIGSEGV.  And it usually got one, because the stack growth is handled by
      that duplicated architecture fault handler.
      
      However, when the generic VM layer started propagating the error return
      from the stack expansion in commit fee7e49d ("mm: propagate error
      from stack expansion even for guard page"), that now exposed the
      existing VM_FAULT_SIGBUS result to user space.  And user space really
      expected SIGSEGV, not SIGBUS.
      
      To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
      duplicate architecture fault handlers about it.  They all already have
      the code to handle SIGSEGV, so it's about just tying that new return
      value to the existing code, but it's all a bit annoying.
      
      This is the mindless minimal patch to do this.  A more extensive patch
      would be to try to gather up the mostly shared fault handling logic into
      one generic helper routine, and long-term we really should do that
      cleanup.
      
      Just from this patch, you can generally see that most architectures just
      copied (directly or indirectly) the old x86 way of doing things, but in
      the meantime that original x86 model has been improved to hold the VM
      semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
      "newer" things, so it would be a good idea to bring all those
      improvements to the generic case and teach other architectures about
      them too.
      Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de>
      Tested-by: NJan Engelhardt <jengelh@inai.de>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33692f27
  9. 29 1月, 2015 3 次提交
  10. 28 1月, 2015 2 次提交
    • P
      perf: Tighten (and fix) the grouping condition · c3c87e77
      Peter Zijlstra 提交于
      The fix from 9fc81d87 ("perf: Fix events installation during
      moving group") was incomplete in that it failed to recognise that
      creating a group with events for different CPUs is semantically
      broken -- they cannot be co-scheduled.
      
      Furthermore, it leads to real breakage where, when we create an event
      for CPU Y and then migrate it to form a group on CPU X, the code gets
      confused where the counter is programmed -- triggered in practice
      as well by me via the perf fuzzer.
      
      Fix this by tightening the rules for creating groups. Only allow
      grouping of counters that can be co-scheduled in the same context.
      This means for the same task and/or the same cpu.
      
      Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c3c87e77
    • J
      quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff
      Jan Kara 提交于
      Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
      tracks space limits and usage in 512-byte blocks. However VFS quotas
      track usage in bytes (as some filesystems require that) and we need to
      somehow pass this information. Upto now it wasn't a problem because we
      didn't do any unit conversion (thus VFS quota routines happily stuck
      number of bytes into d_bcount field of struct fd_disk_quota). Only if
      you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
      / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
      tried this but reportedly some Samba users hit the problem in practice.
      So when we want interfaces compatible we need to fix this.
      
      We bite the bullet and define another quota structure used for passing
      information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
      to have more conversion routines in fs/quota/quota.c and another copying
      of quota structure slows down getting of quota information by about 2%
      but it seems cleaner than overloading e.g. units of d_bcount to bytes.
      
      CC: stable@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      14bf61ff
  11. 27 1月, 2015 4 次提交
  12. 24 1月, 2015 4 次提交
    • X
      rtc: Convert rtc_set_ntp_time() to use timespec64 · 9a4a445e
      Xunlei Pang 提交于
      rtc_set_ntp_time() uses timespec which is y2038-unsafe,
      so modify to use timespec64 which is y2038-safe, then
      replace rtc_time_to_tm() with rtc_time64_to_tm().
      
      Also adjust all its call sites(only NTP uses it) accordingly.
      
      Cc: pang.xunlei <pang.xunlei@linaro.org>
      Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      9a4a445e
    • J
      time: Expose get_monotonic_boottime64 for in-kernel use · 2e0c78ee
      John Stultz 提交于
      As part of the 2038 conversion process, add a
      get_monotonic_boottime64 accessor so we can depracate
      get_monotonic_boottime.
      
      Cc: pang.xunlei <pang.xunlei@linaro.org>
      Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      2e0c78ee
    • J
      time: Expose getboottime64 for in-kernel uses · d08c0cdd
      John Stultz 提交于
      Adds a timespec64 based getboottime64() implementation
      that can be used as we convert internal users of
      getboottime away from using timespecs.
      
      Cc: pang.xunlei <pang.xunlei@linaro.org>
      Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d08c0cdd
    • N
      ktime: Optimize ktime_divns for constant divisors · 8b618628
      Nicolas Pitre 提交于
      At least on ARM, do_div() is optimized to turn constant divisors into
      an inline multiplication by the reciprocal value at compile time.
      However this optimization is missed entirely whenever ktime_divns() is
      used and the slow out-of-line division code is used all the time.
      
      Let ktime_divns() use do_div() inline whenever the divisor is constant
      and small enough.  This will make things like ktime_to_us() and
      ktime_to_ms() much faster.
      
      Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      8b618628
  13. 23 1月, 2015 2 次提交
    • T
      hrtimer: Prevent stale expiry time in hrtimer_interrupt() · 9bc74919
      Thomas Gleixner 提交于
      hrtimer_interrupt() has the following subtle issue:
      
      hrtimer_interrupt()
        lock(cpu_base);
        expires_next = KTIME_MAX;
      
        expire_timers(CLOCK_MONOTONIC);
        expires = get_next_timer(CLOCK_MONOTONIC);
        if (expires < expires_next)
          expires_next = expires;
      
        expire_timers(CLOCK_REALTIME);
          unlock(cpu_base);
          wakeup()
          hrtimer_start(CLOCK_MONOTONIC, newtimer);
          lock(cpu_base();  
        expires = get_next_timer(CLOCK_REALTIME);
        if (expires < expires_next)
          expires_next = expires;
      
      So because we already evaluated the next expiring timer of
      CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be
      earlier than the overall next expiry time in hrtimer_interrupt().
      
      To solve this, remove the caching of the next expiry value from
      hrtimer_interrupt() and reevaluate all active clock bases for the next
      expiry value. To avoid another code duplication, create a shared
      evaluation function and use it for hrtimer_get_next_event(),
      hrtimer_force_reprogram() and hrtimer_interrupt().
      
      There is another subtlety in this mechanism:
      
      While hrtimer_interrupt() is running, we want to avoid to touch the
      hardware device because we will reprogram it anyway at the end of
      hrtimer_interrupt(). This works nicely for hrtimers which get rearmed
      via the HRTIMER_RESTART mechanism, because we drop out when the
      callback on that CPU is running. But that fails, if a new timer gets
      enqueued like in the example above.
      
      This has another implication: While hrtimer_interrupt() is running we
      refuse remote enqueueing of timers - see hrtimer_interrupt() and
      hrtimer_check_target().
      
      hrtimer_interrupt() tries to prevent this by setting cpu_base->expires
      to KTIME_MAX, but that fails if a new timer gets queued.
      
      Prevent both the hardware access and the remote enqueue
      explicitely. We can loosen the restriction on the remote enqueue now
      due to reevaluation of the next expiry value, but that needs a
      seperate patch.
      
      Folded in a fix from Vignesh Radhakrishnan.
      Reported-and-tested-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Based-on-patch-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: vigneshr@codeaurora.org
      Cc: john.stultz@linaro.org
      Cc: viresh.kumar@linaro.org
      Cc: fweisbec@gmail.com
      Cc: cl@linux.com
      Cc: stuart.w.hayes@gmail.com
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9bc74919
    • C
      ktime.h: Introduce ktime_ms_delta · 41fbf3b3
      Chunyan Zhang 提交于
      This patch adds a reusable time difference function which returns the
      difference in millisecond, as often used in some driver code, e.g.
      mtd/test, media/rc, etc.
      Signed-off-by: NChunyan Zhang <zhang.chunyan@linaro.org>
      Acked-by: NArnd Bergmann <arnd@linaro.org>
      Cc: zhang.lyra@gmail.com
      Cc: davem@davemloft.net
      Cc: john.stultz@linaro.org
      Cc: dborkman@redhat.com
      Link: http://lkml.kernel.org/r/1418793095-18780-1-git-send-email-zhang.chunyan@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      41fbf3b3
  14. 22 1月, 2015 2 次提交
  15. 20 1月, 2015 2 次提交
    • R
      module: remove mod arg from module_free, rename module_memfree(). · be1f221c
      Rusty Russell 提交于
      Nothing needs the module pointer any more, and the next patch will
      call it from RCU, where the module itself might no longer exist.
      Removing the arg is the safest approach.
      
      This just codifies the use of the module_alloc/module_free pattern
      which ftrace and bpf use.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: linux-cris-kernel@axis.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: nios2-dev@lists.rocketboards.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: sparclinux@vger.kernel.org
      Cc: netdev@vger.kernel.org
      be1f221c
    • R
      module_arch_freeing_init(): new hook for archs before module->module_init freed. · d453cded
      Rusty Russell 提交于
      Archs have been abusing module_free() to clean up their arch-specific
      allocations.  Since module_free() is also (ab)used by BPF and trace code,
      let's keep it to simple allocations, and provide a hook called before
      that.
      
      This means that avr32, ia64, parisc and s390 no longer need to implement
      their own module_free() at all.  avr32 doesn't need module_finalize()
      either.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      d453cded
  16. 19 1月, 2015 2 次提交
  17. 17 1月, 2015 3 次提交
    • J
      genetlink: synchronize socket closing and family removal · ee1c2442
      Johannes Berg 提交于
      In addition to the problem Jeff Layton reported, I looked at the code
      and reproduced the same warning by subscribing and removing the genl
      family with a socket still open. This is a fairly tricky race which
      originates in the fact that generic netlink allows the family to go
      away while sockets are still open - unlike regular netlink which has
      a module refcount for every open socket so in general this cannot be
      triggered.
      
      Trying to resolve this issue by the obvious locking isn't possible as
      it will result in deadlocks between unregistration and group unbind
      notification (which incidentally lockdep doesn't find due to the home
      grown locking in the netlink table.)
      
      To really resolve this, introduce a "closing socket" reference counter
      (for generic netlink only, as it's the only affected family) in the
      core netlink code and use that in generic netlink to wait for all the
      sockets that are being closed at the same time as a generic netlink
      family is removed.
      
      This fixes the race that when a socket is closed, it will should call
      the unbind, but if the family is removed at the same time the unbind
      will not find it, leading to the warning. The real problem though is
      that in this case the unbind could actually find a new family that is
      registered to have a multicast group with the same ID, and call its
      mcast_unbind() leading to confusing.
      
      Also remove the warning since it would still trigger, but is now no
      longer a problem.
      
      This also moves the code in af_netlink.c to before unreferencing the
      module to avoid having the same problem in the normal non-genl case.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee1c2442
    • Y
      PCI: Add pci_claim_bridge_resource() to clip window if necessary · 8505e729
      Yinghai Lu 提交于
      Add pci_claim_bridge_resource() to claim a PCI-PCI bridge window.  This is
      like regular pci_claim_resource(), except that if we fail to claim the
      window, we check to see if we can reduce the size of the window and try
      again.
      
      This is for scenarios like this:
      
        pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
        pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit pref]
        pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]
      
      The 00:01.0 window is illegal: it starts before the host bridge window, so
      we have to assume the [0xbdf00000-0xbfffffff] region is inaccessible.  We
      can make it legal by clipping it to [mem 0xc0000000-0xddefffff 64bit pref].
      
      Previously we discarded the 00:01.0 window and tried to reassign that part
      of the hierarchy from scratch.  That is a problem because Linux doesn't
      always assign things optimally.  For example, in this case, BIOS put the
      01:00.0 device in a prefetchable window below 4GB, but after 5b285415,
      Linux puts the prefetchable window above 4GB where the 32-bit 01:00.0
      device can't use it.
      
      Clipping the 00:01.0 window is less intrusive than completely reassigning
      things and is sufficient to let us use most of the BIOS configuration.  Of
      course, it's possible that devices below 00:01.0 will no longer fit.  If
      that's the case, we'll have to reassign things.  But that's a separate
      problem.
      
      [bhelgaas: changelog, split into separate patch]
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491Reported-by: NMarek Kordik <kordikmarek@gmail.com>
      Fixes: 5b285415 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      CC: stable@vger.kernel.org	# v3.16+
      8505e729
    • A
      PCI: Add flag for devices where we can't use bus reset · f331a859
      Alex Williamson 提交于
      Enable a mechanism for devices to quirk that they do not behave when
      doing a PCI bus reset.  We require a modest level of spec compliant
      behavior in order to do a reset, for instance the device should come
      out of reset without throwing errors and PCI config space should be
      accessible after reset.  This is too much to ask for some devices.
      
      Link: http://lkml.kernel.org/r/20140923210318.498dacbd@dualc.maya.orgSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      CC: stable@vger.kernel.org	# v3.14+
      f331a859
  18. 16 1月, 2015 2 次提交
    • J
      regulator: da9211: fix unmatched of_node · 076c3b8e
      James Ban 提交于
      This is a patch for fixing unmatched of_node.
      Signed-off-by: NJames Ban <james.ban.opensource@diasemi.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      076c3b8e
    • P
      rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors · 5cd37193
      Paul E. McKenney 提交于
      Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
      in places where it would be useful for it to apply to the normal RCU
      flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
      case for workloads that aggressively overload the system, particularly
      those that generate large numbers of RCU updates on systems running
      NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
      from cond_resched_rcu_qs() to the normal RCU flavors.
      
      Note that it is unfortunately necessary to leave the old ->passed_quiesce
      mechanism in place to allow quiescent states that apply to only one
      flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
      that case, but that is not so good for debugging of RCU internals.)
      In addition, if one of the RCU flavor's grace period has stalled, this
      will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
      quiescent state visible from other CPUs.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
        was used in preemptible code. ]
      5cd37193
  19. 15 1月, 2015 1 次提交
  20. 14 1月, 2015 2 次提交
    • P
      perf: Avoid horrible stack usage · 86038c5e
      Peter Zijlstra (Intel) 提交于
      Both Linus (most recent) and Steve (a while ago) reported that perf
      related callbacks have massive stack bloat.
      
      The problem is that software events need a pt_regs in order to
      properly report the event location and unwind stack. And because we
      could not assume one was present we allocated one on stack and filled
      it with minimal bits required for operation.
      
      Now, pt_regs is quite large, so this is undesirable. Furthermore it
      turns out that most sites actually have a pt_regs pointer available,
      making this even more onerous, as the stack space is pointless waste.
      
      This patch addresses the problem by observing that software events
      have well defined nesting semantics, therefore we can use static
      per-cpu storage instead of on-stack.
      
      Linus made the further observation that all but the scheduler callers
      of perf_sw_event() have a pt_regs available, so we change the regular
      perf_sw_event() to require a valid pt_regs (where it used to be
      optional) and add perf_sw_event_sched() for the scheduler.
      
      We have a scheduler specific call instead of a more generic _noregs()
      like construct because we can assume non-recursion from the scheduler
      and thereby simplify the code further (_noregs would have to put the
      recursion context call inline in order to assertain which __perf_regs
      element to use).
      
      One last note on the implementation of perf_trace_buf_prepare(); we
      allow .regs = NULL for those cases where we already have a pt_regs
      pointer available and do not need another.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86038c5e
    • D
      locking/mcs: Better differentiate between MCS variants · d84b6728
      Davidlohr Bueso 提交于
      We have two flavors of the MCS spinlock: standard and cancelable (OSQ).
      While each one is independent of the other, we currently mix and match
      them. This patch:
      
        - Moves the OSQ code out of mcs_spinlock.h (which only deals with the traditional
          version) into include/linux/osq_lock.h. No unnecessary code is added to the
          more global header file, anything locks that make use of OSQ must include
          it anyway.
      
        - Renames mcs_spinlock.c to osq_lock.c. This file only contains osq code.
      
        - Introduces a CONFIG_LOCK_SPIN_ON_OWNER in order to only build osq_lock
          if there is support for it.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Link: http://lkml.kernel.org/r/1420573509-24774-5-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d84b6728