1. 04 2月, 2015 2 次提交
  2. 02 2月, 2015 1 次提交
    • L
      sched: don't cause task state changes in nested sleep debugging · 00845eb9
      Linus Torvalds 提交于
      Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report
      on nested sleep conditions, which we generally want to avoid because the
      inner sleeping operation can re-set the thread state to TASK_RUNNING,
      but that will then cause the outer sleep loop not actually sleep when it
      calls schedule.
      
      However, that's actually valid traditional behavior, with the inner
      sleep being some fairly rare case (like taking a sleeping lock that
      normally doesn't actually need to sleep).
      
      And the debug code would actually change the state of the task to
      TASK_RUNNING internally, which makes that kind of traditional and
      working code not work at all, because now the nested sleep doesn't just
      sometimes cause the outer one to not block, but will cause it to happen
      every time.
      
      In particular, it will cause the cardbus kernel daemon (pccardd) to
      basically busy-loop doing scheduling, converting a laptop into a heater,
      as reported by Bruno Prémont.  But there may be other legacy uses of
      that nested sleep model in other drivers that are also likely to never
      get converted to the new model.
      
      This fixes both cases:
      
       - don't set TASK_RUNNING when the nested condition happens (note: even
         if WARN_ONCE() only _warns_ once, the return value isn't whether the
         warning happened, but whether the condition for the warning was true.
         So despite the warning only happening once, the "if (WARN_ON(..))"
         would trigger for every nested sleep.
      
       - in the cases where we knowingly disable the warning by using
         "sched_annotate_sleep()", don't change the task state (that is used
         for all core scheduling decisions), instead use '->task_state_change'
         that is used for the debugging decision itself.
      
      (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
      avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
      suggested change to use 'task_state_change' as part of the test)
      Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: NRafael J Wysocki <rjw@rjwysocki.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>,
      Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>,
      Cc: Davidlohr Bueso <dave@stgolabs.net>,
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00845eb9
  3. 28 1月, 2015 1 次提交
    • P
      perf: Tighten (and fix) the grouping condition · c3c87e77
      Peter Zijlstra 提交于
      The fix from 9fc81d87 ("perf: Fix events installation during
      moving group") was incomplete in that it failed to recognise that
      creating a group with events for different CPUs is semantically
      broken -- they cannot be co-scheduled.
      
      Furthermore, it leads to real breakage where, when we create an event
      for CPU Y and then migrate it to form a group on CPU X, the code gets
      confused where the counter is programmed -- triggered in practice
      as well by me via the perf fuzzer.
      
      Fix this by tightening the rules for creating groups. Only allow
      grouping of counters that can be co-scheduled in the same context.
      This means for the same task and/or the same cpu.
      
      Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c3c87e77
  4. 27 1月, 2015 1 次提交
  5. 23 1月, 2015 1 次提交
  6. 22 1月, 2015 2 次提交
  7. 20 1月, 2015 4 次提交
    • R
      module: fix race in kallsyms resolution during module load success. · c7496379
      Rusty Russell 提交于
      The kallsyms routines (module_symbol_name, lookup_module_* etc) disable
      preemption to walk the modules rather than taking the module_mutex:
      this is because they are used for symbol resolution during oopses.
      
      This works because there are synchronize_sched() and synchronize_rcu()
      in the unload and failure paths.  However, there's one case which doesn't
      have that: the normal case where module loading succeeds, and we free
      the init section.
      
      We don't want a synchronize_rcu() there, because it would slow down
      module loading: this bug was introduced in 2009 to speed module
      loading in the first place.
      
      Thus, we want to do the free in an RCU callback.  We do this in the
      simplest possible way by allocating a new rcu_head: if we put it in
      the module structure we'd have to worry about that getting freed.
      Reported-by: NRui Xiang <rui.xiang@huawei.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      c7496379
    • R
      module: remove mod arg from module_free, rename module_memfree(). · be1f221c
      Rusty Russell 提交于
      Nothing needs the module pointer any more, and the next patch will
      call it from RCU, where the module itself might no longer exist.
      Removing the arg is the safest approach.
      
      This just codifies the use of the module_alloc/module_free pattern
      which ftrace and bpf use.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: linux-cris-kernel@axis.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: nios2-dev@lists.rocketboards.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: sparclinux@vger.kernel.org
      Cc: netdev@vger.kernel.org
      be1f221c
    • R
      module_arch_freeing_init(): new hook for archs before module->module_init freed. · d453cded
      Rusty Russell 提交于
      Archs have been abusing module_free() to clean up their arch-specific
      allocations.  Since module_free() is also (ab)used by BPF and trace code,
      let's keep it to simple allocations, and provide a hook called before
      that.
      
      This means that avr32, ia64, parisc and s390 no longer need to implement
      their own module_free() at all.  avr32 doesn't need module_finalize()
      either.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      d453cded
    • R
      param: fix uninitialized read with CONFIG_DEBUG_LOCK_ALLOC · c772be52
      Rusty Russell 提交于
      ignore_lockdep is uninitialized, and sysfs_attr_init() doesn't initialize
      it, so memset to 0.
      Reported-by: NHuang Ying <ying.huang@intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      c772be52
  8. 17 1月, 2015 2 次提交
    • L
      kernel: avoid overflow in cmp_range · fc7f0dd3
      Louis Langholtz 提交于
      Avoid overflow possibility.
      
      [ The overflow is purely theoretical, since this is used for memory
        ranges that aren't even close to using the full 64 bits, but this is
        the right thing to do regardless.  - Linus ]
      Signed-off-by: NLouis Langholtz <lou_langholtz@me.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Peter Anvin <hpa@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc7f0dd3
    • T
      workqueue: fix subtle pool management issue which can stall whole worker_pool · 29187a9e
      Tejun Heo 提交于
      A worker_pool's forward progress is guaranteed by the fact that the
      last idle worker assumes the manager role to create more workers and
      summon the rescuers if creating workers doesn't succeed in timely
      manner before proceeding to execute work items.
      
      This manager role is implemented in manage_workers(), which indicates
      whether the worker may proceed to work item execution with its return
      value.  This is necessary because multiple workers may contend for the
      manager role, and, if there already is a manager, others should
      proceed to work item execution.
      
      Unfortunately, the function also indicates that the worker may proceed
      to work item execution if need_to_create_worker() is false at the head
      of the function.  need_to_create_worker() tests the following
      conditions.
      
      	pending work items && !nr_running && !nr_idle
      
      The first and third conditions are protected by pool->lock and thus
      won't change while holding pool->lock; however, nr_running can change
      asynchronously as other workers block and resume and while it's likely
      to be zero, as someone woke this worker up in the first place, some
      other workers could have become runnable inbetween making it non-zero.
      
      If this happens, manage_worker() could return false even with zero
      nr_idle making the worker, the last idle one, proceed to execute work
      items.  If then all workers of the pool end up blocking on a resource
      which can only be released by a work item which is pending on that
      pool, the whole pool can deadlock as there's no one to create more
      workers or summon the rescuers.
      
      This patch fixes the problem by removing the early exit condition from
      maybe_create_worker() and making manage_workers() return false iff
      there's already another manager, which ensures that the last worker
      doesn't start executing work items.
      
      We can leave the early exit condition alone and just ignore the return
      value but the only reason it was put there is because the
      manage_workers() used to perform both creations and destructions of
      workers and thus the function may be invoked while the pool is trying
      to reduce the number of workers.  Now that manage_workers() is called
      only when more workers are needed, the only case this early exit
      condition is triggered is rare race conditions rendering it pointless.
      
      Tested with simulated workload and modified workqueue code which
      trigger the pool deadlock reliably without this patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NEric Sandeen <sandeen@sandeen.net>
      Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: stable@vger.kernel.org
      29187a9e
  9. 15 1月, 2015 4 次提交
  10. 14 1月, 2015 1 次提交
    • P
      perf: Avoid horrible stack usage · 86038c5e
      Peter Zijlstra (Intel) 提交于
      Both Linus (most recent) and Steve (a while ago) reported that perf
      related callbacks have massive stack bloat.
      
      The problem is that software events need a pt_regs in order to
      properly report the event location and unwind stack. And because we
      could not assume one was present we allocated one on stack and filled
      it with minimal bits required for operation.
      
      Now, pt_regs is quite large, so this is undesirable. Furthermore it
      turns out that most sites actually have a pt_regs pointer available,
      making this even more onerous, as the stack space is pointless waste.
      
      This patch addresses the problem by observing that software events
      have well defined nesting semantics, therefore we can use static
      per-cpu storage instead of on-stack.
      
      Linus made the further observation that all but the scheduler callers
      of perf_sw_event() have a pt_regs available, so we change the regular
      perf_sw_event() to require a valid pt_regs (where it used to be
      optional) and add perf_sw_event_sched() for the scheduler.
      
      We have a scheduler specific call instead of a more generic _noregs()
      like construct because we can assume non-recursion from the scheduler
      and thereby simplify the code further (_noregs would have to put the
      recursion context call inline in order to assertain which __perf_regs
      element to use).
      
      One last note on the implementation of perf_trace_buf_prepare(); we
      allow .regs = NULL for those cases where we already have a pt_regs
      pointer available and do not need another.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86038c5e
  11. 09 1月, 2015 7 次提交
  12. 08 1月, 2015 2 次提交
  13. 30 12月, 2014 1 次提交
    • P
      audit: create private file name copies when auditing inodes · fcf22d82
      Paul Moore 提交于
      Unfortunately, while commit 4a928436 ("audit: correctly record file
      names with different path name types") fixed a problem where we were
      not recording filenames, it created a new problem by attempting to use
      these file names after they had been freed.  This patch resolves the
      issue by creating a copy of the filename which the audit subsystem
      frees after it is done with the string.
      
      At some point it would be nice to resolve this issue with refcounts,
      or something similar, instead of having to allocate/copy strings, but
      that is almost surely beyond the scope of a -rcX patch so we'll defer
      that for later.  On the plus side, only audit users should be impacted
      by the string copying.
      Reported-by: NToralf Foerster <toralf.foerster@gmx.de>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      fcf22d82
  14. 27 12月, 2014 1 次提交
    • J
      netlink/genetlink: pass network namespace to bind/unbind · 023e2cfa
      Johannes Berg 提交于
      Netlink families can exist in multiple namespaces, and for the most
      part multicast subscriptions are per network namespace. Thus it only
      makes sense to have bind/unbind notifications per network namespace.
      
      To achieve this, pass the network namespace of a given client socket
      to the bind/unbind functions.
      
      Also do this in generic netlink, and there also make sure that any
      bind for multicast groups that only exist in init_net is rejected.
      This isn't really a problem if it is accepted since a client in a
      different namespace will never receive any notifications from such
      a group, but it can confuse the family if not rejected (it's also
      possible to silently (without telling the family) accept it, but it
      would also have to be ignored on unbind so families that take any
      kind of action on bind/unbind won't do unnecessary work for invalid
      clients like that.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      023e2cfa
  15. 24 12月, 2014 1 次提交
    • R
      audit: restore AUDIT_LOGINUID unset ABI · 041d7b98
      Richard Guy Briggs 提交于
      A regression was caused by commit 780a7654:
      	 audit: Make testing for a valid loginuid explicit.
      (which in turn attempted to fix a regression caused by e1760bd5)
      
      When audit_krule_to_data() fills in the rules to get a listing, there was a
      missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID.
      
      This broke userspace by not returning the same information that was sent and
      expected.
      
      The rule:
      	auditctl -a exit,never -F auid=-1
      gives:
      	auditctl -l
      		LIST_RULES: exit,never f24=0 syscall=all
      when it should give:
      		LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all
      
      Tag it so that it is reported the same way it was set.  Create a new
      private flags audit_krule field (pflags) to store it that won't interact with
      the public one from the API.
      
      Cc: stable@vger.kernel.org # v3.10-rc1+
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      041d7b98
  16. 23 12月, 2014 3 次提交
    • A
      sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation · b74e6278
      Alex Thorlton 提交于
      When allocating space for load_balance_mask, in sched_init, when
      CPUMASK_OFFSTACK is set, we've managed to spill over
      KMALLOC_MAX_SIZE on our 6144 core machine.  The patch below
      breaks up the allocations so that they don't overflow the max
      alloc size.  It also allocates the masks on the the node from
      which they'll most commonly be accessed, to minimize remote
      accesses on NUMA machines.
      Suggested-by: NGeorge Beshers <gbeshers@sgi.com>
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: George Beshers <gbeshers@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1418928270-148543-1-git-send-email-athorlton@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b74e6278
    • R
      param: initialize store function to NULL if not available. · 574732c7
      Rusty Russell 提交于
      I rebased Kees' 'param: do not set store func without write perm'
      on top of my 'params: cleanup sysfs allocation'.  However, my patch
      uses krealloc which doesn't zero memory, leaving .store unset.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      574732c7
    • P
      audit: correctly record file names with different path name types · 4a928436
      Paul Moore 提交于
      There is a problem with the audit system when multiple audit records
      are created for the same path, each with a different path name type.
      The root cause of the problem is in __audit_inode() when an exact
      match (both the path name and path name type) is not found for a
      path name record; the existing code creates a new path name record,
      but it never sets the path name in this record, leaving it NULL.
      This patch corrects this problem by assigning the path name to these
      newly created records.
      
      There are many ways to reproduce this problem, but one of the
      easiest is the following (assuming auditd is running):
      
        # mkdir /root/tmp/test
        # touch /root/tmp/test/567
        # auditctl -a always,exit -F dir=/root/tmp/test
        # touch /root/tmp/test/567
      
      Afterwards, or while the commands above are running, check the audit
      log and pay special attention to the PATH records.  A faulty kernel
      will display something like the following for the file creation:
      
        type=SYSCALL msg=audit(1416957442.025:93): arch=c000003e syscall=2
          success=yes exit=3 ... comm="touch" exe="/usr/bin/touch"
        type=CWD msg=audit(1416957442.025:93):  cwd="/root/tmp"
        type=PATH msg=audit(1416957442.025:93): item=0 name="test/"
          inode=401409 ... nametype=PARENT
        type=PATH msg=audit(1416957442.025:93): item=1 name=(null)
          inode=393804 ... nametype=NORMAL
        type=PATH msg=audit(1416957442.025:93): item=2 name=(null)
          inode=393804 ... nametype=NORMAL
      
      While a patched kernel will show the following:
      
        type=SYSCALL msg=audit(1416955786.566:89): arch=c000003e syscall=2
          success=yes exit=3 ... comm="touch" exe="/usr/bin/touch"
        type=CWD msg=audit(1416955786.566:89):  cwd="/root/tmp"
        type=PATH msg=audit(1416955786.566:89): item=0 name="test/"
          inode=401409 ... nametype=PARENT
        type=PATH msg=audit(1416955786.566:89): item=1 name="test/567"
          inode=393804 ... nametype=NORMAL
      
      This issue was brought up by a number of people, but special credit
      should go to hujianyang@huawei.com for reporting the problem along
      with an explanation of the problem and a patch.  While the original
      patch did have some problems (see the archive link below), it did
      demonstrate the problem and helped kickstart the fix presented here.
      
        * https://lkml.org/lkml/2014/9/5/66Reported-by: Nhujianyang <hujianyang@huawei.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Acked-by: NRichard Guy Briggs <rgb@redhat.com>
      4a928436
  17. 20 12月, 2014 3 次提交
    • R
      audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb · 54dc77d9
      Richard Guy Briggs 提交于
      Eric Paris explains: Since kauditd_send_multicast_skb() gets called in
      audit_log_end(), which can come from any context (aka even a sleeping context)
      GFP_KERNEL can't be used.  Since the audit_buffer knows what context it should
      use, pass that down and use that.
      
      See: https://lkml.org/lkml/2014/12/16/542
      
      BUG: sleeping function called from invalid context at mm/slab.c:2849
      in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin
      2 locks held by sulogin/885:
        #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b
        #1:  (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b
      CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30
      Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014
        ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375
        ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006
        0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38
      Call Trace:
        [<ffffffff916ba529>] dump_stack+0x50/0xa8
        [<ffffffff91063185>] ___might_sleep+0x1b6/0x1be
        [<ffffffff910632a6>] __might_sleep+0x119/0x128
        [<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
        [<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9
        [<ffffffff914e148d>] __alloc_skb+0x42/0x1a3
        [<ffffffff914e2b62>] skb_copy+0x3e/0xa3
        [<ffffffff910c263e>] audit_log_end+0x83/0x100
        [<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103
        [<ffffffff91252a73>] common_lsm_audit+0x441/0x450
        [<ffffffff9123c163>] slow_avc_audit+0x63/0x67
        [<ffffffff9123c42c>] avc_has_perm+0xca/0xe3
        [<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65
        [<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b
        [<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10
        [<ffffffff911515e6>] install_exec_creds+0xe/0x79
        [<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7
        [<ffffffff9115198e>] search_binary_handler+0x81/0x18c
        [<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7
        [<ffffffff91153669>] do_execve+0x1f/0x21
        [<ffffffff91153967>] SyS_execve+0x25/0x29
        [<ffffffff916c61a9>] stub_execve+0x69/0xa0
      
      Cc: stable@vger.kernel.org #v3.16-rc1
      Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Tested-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      54dc77d9
    • P
      audit: don't attempt to lookup PIDs when changing PID filtering audit rules · 3640dcfa
      Paul Moore 提交于
      Commit f1dc4867 ("audit: anchor all pid references in the initial pid
      namespace") introduced a find_vpid() call when adding/removing audit
      rules with PID/PPID filters; unfortunately this is problematic as
      find_vpid() only works if there is a task with the associated PID
      alive on the system.  The following commands demonstrate a simple
      reproducer.
      
      	# auditctl -D
      	# auditctl -l
      	# autrace /bin/true
      	# auditctl -l
      
      This patch resolves the problem by simply using the PID provided by
      the user without any additional validation, e.g. no calls to check to
      see if the task/PID exists.
      
      Cc: stable@vger.kernel.org # 3.15
      Cc: Richard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>
      3640dcfa
    • R
      PM: Eliminate CONFIG_PM_RUNTIME · 464ed18e
      Rafael J. Wysocki 提交于
      Having switched over all of the users of CONFIG_PM_RUNTIME to use
      CONFIG_PM directly, turn the latter into a user-selectable option
      and drop the former entirely from the tree.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
      Acked-by: NKevin Hilman <khilman@linaro.org>
      464ed18e
  18. 19 12月, 2014 1 次提交
    • T
      tick/powerclamp: Remove tick_nohz_idle abuse · a5fd9733
      Thomas Gleixner 提交于
      commit 4dbd2771 "tick: export nohz tick idle symbols for module
      use" was merged via the thermal tree without an explicit ack from the
      relevant maintainers.
      
      The exports are abused by the intel powerclamp driver which implements
      a fake idle state from a sched FIFO task. This causes all kinds of
      wreckage in the NOHZ core code which rightfully assumes that
      tick_nohz_idle_enter/exit() are only called from the idle task itself.
      
      Recent changes in the NOHZ core lead to a failure of the powerclamp
      driver and now people try to hack completely broken and backwards
      workarounds into the NOHZ core code. This is completely unacceptable
      and just papers over the real problem. There are way more subtle
      issues lurking around the corner.
      
      The real solution is to fix the powerclamp driver by rewriting it with
      a sane concept, but that's beyond the scope of this.
      
      So the only solution for now is to remove the calls into the core NOHZ
      code from the powerclamp trainwreck along with the exports. 
      
      Fixes: d6d71ee4 "PM: Introduce Intel PowerClamp Driver"
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Pan Jacob jun <jacob.jun.pan@intel.com>
      Cc: LKP <lkp@01.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a5fd9733
  19. 18 12月, 2014 1 次提交
    • K
      param: do not set store func without write perm · b0a65b0c
      Kees Cook 提交于
      When a module_param is defined without DAC write permissions, it can
      still be changed at runtime and updated. Drivers using a 0444 permission
      may be surprised that these values can still be changed.
      
      For drivers that want to allow updates, any S_IW* flag will set the
      "store" function as before. Drivers without S_IW* flags will have the
      "store" function unset, unforcing a read-only value. Drivers that wish
      neither "store" nor "get" can continue to use "0" for perms to stay out
      of sysfs entirely.
      
      Old behavior:
        # cd /sys/module/snd/parameters
        # ls -l
        total 0
        -r--r--r-- 1 root root 4096 Dec 11 13:55 cards_limit
        -r--r--r-- 1 root root 4096 Dec 11 13:55 major
        -r--r--r-- 1 root root 4096 Dec 11 13:55 slots
        # cat major
        116
        # echo -1 > major
        -bash: major: Permission denied
        # chmod u+w major
        # echo -1 > major
        # cat major
        -1
      
      New behavior:
        ...
        # chmod u+w major
        # echo -1 > major
        -bash: echo: write error: Input/output error
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      b0a65b0c
  20. 15 12月, 2014 1 次提交