1. 18 2月, 2015 1 次提交
    • V
      clockevents: Introduce mode specific callbacks · bd624d75
      Viresh Kumar 提交于
      It is not possible for the clockevents core to know which modes (other than
      those with a corresponding feature flag) are supported by a particular
      implementation. And drivers are expected to handle transition to all modes
      elegantly, as ->set_mode() would be issued for them unconditionally.
      
      Now, adding support for a new mode complicates things a bit if we want to use
      the legacy ->set_mode() callback. We need to closely review all clockevents
      drivers to see if they would break on addition of a new mode. And after such
      reviews, it is found that we have to do non-trivial changes to most of the
      drivers [1].
      
      Introduce mode-specific set_mode_*() callbacks, some of which the drivers may or
      may not implement. A missing callback would clearly convey the message that the
      corresponding mode isn't supported.
      
      A driver may still choose to keep supporting the legacy ->set_mode() callback,
      but ->set_mode() wouldn't be supporting any new modes beyond RESUME. If a driver
      wants to benefit from using a new mode, it would be required to migrate to
      the mode specific callbacks.
      
      The legacy ->set_mode() callback and the newly introduced mode-specific
      callbacks are mutually exclusive. Only one of them should be supported by the
      driver.
      
      Sanity check is done at the time of registration to distinguish between optional
      and required callbacks and to make error recovery and handling simpler. If the
      legacy ->set_mode() callback is provided, all mode specific ones would be
      ignored by the core but a warning is thrown if they are present.
      
      Call sites calling ->set_mode() directly are also updated to use
      __clockevents_set_mode() instead, as ->set_mode() may not be available anymore
      for few drivers.
      
       [1] https://lkml.org/lkml/2014/12/9/605
       [2] https://lkml.org/lkml/2015/1/23/255
      
      Suggested-by: Thomas Gleixner <tglx@linutronix.de> [2]
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: linaro-kernel@lists.linaro.org
      Cc: linaro-networking@linaro.org
      Link: http://lkml.kernel.org/r/792d59a40423f0acffc9bb0bec9de1341a06fa02.1423788565.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bd624d75
  2. 24 1月, 2015 4 次提交
  3. 23 1月, 2015 1 次提交
    • T
      hrtimer: Prevent stale expiry time in hrtimer_interrupt() · 9bc74919
      Thomas Gleixner 提交于
      hrtimer_interrupt() has the following subtle issue:
      
      hrtimer_interrupt()
        lock(cpu_base);
        expires_next = KTIME_MAX;
      
        expire_timers(CLOCK_MONOTONIC);
        expires = get_next_timer(CLOCK_MONOTONIC);
        if (expires < expires_next)
          expires_next = expires;
      
        expire_timers(CLOCK_REALTIME);
          unlock(cpu_base);
          wakeup()
          hrtimer_start(CLOCK_MONOTONIC, newtimer);
          lock(cpu_base();  
        expires = get_next_timer(CLOCK_REALTIME);
        if (expires < expires_next)
          expires_next = expires;
      
      So because we already evaluated the next expiring timer of
      CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be
      earlier than the overall next expiry time in hrtimer_interrupt().
      
      To solve this, remove the caching of the next expiry value from
      hrtimer_interrupt() and reevaluate all active clock bases for the next
      expiry value. To avoid another code duplication, create a shared
      evaluation function and use it for hrtimer_get_next_event(),
      hrtimer_force_reprogram() and hrtimer_interrupt().
      
      There is another subtlety in this mechanism:
      
      While hrtimer_interrupt() is running, we want to avoid to touch the
      hardware device because we will reprogram it anyway at the end of
      hrtimer_interrupt(). This works nicely for hrtimers which get rearmed
      via the HRTIMER_RESTART mechanism, because we drop out when the
      callback on that CPU is running. But that fails, if a new timer gets
      enqueued like in the example above.
      
      This has another implication: While hrtimer_interrupt() is running we
      refuse remote enqueueing of timers - see hrtimer_interrupt() and
      hrtimer_check_target().
      
      hrtimer_interrupt() tries to prevent this by setting cpu_base->expires
      to KTIME_MAX, but that fails if a new timer gets queued.
      
      Prevent both the hardware access and the remote enqueue
      explicitely. We can loosen the restriction on the remote enqueue now
      due to reevaluation of the next expiry value, but that needs a
      seperate patch.
      
      Folded in a fix from Vignesh Radhakrishnan.
      Reported-and-tested-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Based-on-patch-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: vigneshr@codeaurora.org
      Cc: john.stultz@linaro.org
      Cc: viresh.kumar@linaro.org
      Cc: fweisbec@gmail.com
      Cc: cl@linux.com
      Cc: stuart.w.hayes@gmail.com
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9bc74919
  4. 17 1月, 2015 1 次提交
  5. 15 1月, 2015 4 次提交
  6. 09 1月, 2015 7 次提交
  7. 30 12月, 2014 1 次提交
    • P
      audit: create private file name copies when auditing inodes · fcf22d82
      Paul Moore 提交于
      Unfortunately, while commit 4a928436 ("audit: correctly record file
      names with different path name types") fixed a problem where we were
      not recording filenames, it created a new problem by attempting to use
      these file names after they had been freed.  This patch resolves the
      issue by creating a copy of the filename which the audit subsystem
      frees after it is done with the string.
      
      At some point it would be nice to resolve this issue with refcounts,
      or something similar, instead of having to allocate/copy strings, but
      that is almost surely beyond the scope of a -rcX patch so we'll defer
      that for later.  On the plus side, only audit users should be impacted
      by the string copying.
      Reported-by: NToralf Foerster <toralf.foerster@gmx.de>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      fcf22d82
  8. 27 12月, 2014 1 次提交
    • J
      netlink/genetlink: pass network namespace to bind/unbind · 023e2cfa
      Johannes Berg 提交于
      Netlink families can exist in multiple namespaces, and for the most
      part multicast subscriptions are per network namespace. Thus it only
      makes sense to have bind/unbind notifications per network namespace.
      
      To achieve this, pass the network namespace of a given client socket
      to the bind/unbind functions.
      
      Also do this in generic netlink, and there also make sure that any
      bind for multicast groups that only exist in init_net is rejected.
      This isn't really a problem if it is accepted since a client in a
      different namespace will never receive any notifications from such
      a group, but it can confuse the family if not rejected (it's also
      possible to silently (without telling the family) accept it, but it
      would also have to be ignored on unbind so families that take any
      kind of action on bind/unbind won't do unnecessary work for invalid
      clients like that.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      023e2cfa
  9. 24 12月, 2014 1 次提交
    • R
      audit: restore AUDIT_LOGINUID unset ABI · 041d7b98
      Richard Guy Briggs 提交于
      A regression was caused by commit 780a7654:
      	 audit: Make testing for a valid loginuid explicit.
      (which in turn attempted to fix a regression caused by e1760bd5)
      
      When audit_krule_to_data() fills in the rules to get a listing, there was a
      missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID.
      
      This broke userspace by not returning the same information that was sent and
      expected.
      
      The rule:
      	auditctl -a exit,never -F auid=-1
      gives:
      	auditctl -l
      		LIST_RULES: exit,never f24=0 syscall=all
      when it should give:
      		LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all
      
      Tag it so that it is reported the same way it was set.  Create a new
      private flags audit_krule field (pflags) to store it that won't interact with
      the public one from the API.
      
      Cc: stable@vger.kernel.org # v3.10-rc1+
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      041d7b98
  10. 23 12月, 2014 2 次提交
    • A
      sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation · b74e6278
      Alex Thorlton 提交于
      When allocating space for load_balance_mask, in sched_init, when
      CPUMASK_OFFSTACK is set, we've managed to spill over
      KMALLOC_MAX_SIZE on our 6144 core machine.  The patch below
      breaks up the allocations so that they don't overflow the max
      alloc size.  It also allocates the masks on the the node from
      which they'll most commonly be accessed, to minimize remote
      accesses on NUMA machines.
      Suggested-by: NGeorge Beshers <gbeshers@sgi.com>
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: George Beshers <gbeshers@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1418928270-148543-1-git-send-email-athorlton@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b74e6278
    • P
      audit: correctly record file names with different path name types · 4a928436
      Paul Moore 提交于
      There is a problem with the audit system when multiple audit records
      are created for the same path, each with a different path name type.
      The root cause of the problem is in __audit_inode() when an exact
      match (both the path name and path name type) is not found for a
      path name record; the existing code creates a new path name record,
      but it never sets the path name in this record, leaving it NULL.
      This patch corrects this problem by assigning the path name to these
      newly created records.
      
      There are many ways to reproduce this problem, but one of the
      easiest is the following (assuming auditd is running):
      
        # mkdir /root/tmp/test
        # touch /root/tmp/test/567
        # auditctl -a always,exit -F dir=/root/tmp/test
        # touch /root/tmp/test/567
      
      Afterwards, or while the commands above are running, check the audit
      log and pay special attention to the PATH records.  A faulty kernel
      will display something like the following for the file creation:
      
        type=SYSCALL msg=audit(1416957442.025:93): arch=c000003e syscall=2
          success=yes exit=3 ... comm="touch" exe="/usr/bin/touch"
        type=CWD msg=audit(1416957442.025:93):  cwd="/root/tmp"
        type=PATH msg=audit(1416957442.025:93): item=0 name="test/"
          inode=401409 ... nametype=PARENT
        type=PATH msg=audit(1416957442.025:93): item=1 name=(null)
          inode=393804 ... nametype=NORMAL
        type=PATH msg=audit(1416957442.025:93): item=2 name=(null)
          inode=393804 ... nametype=NORMAL
      
      While a patched kernel will show the following:
      
        type=SYSCALL msg=audit(1416955786.566:89): arch=c000003e syscall=2
          success=yes exit=3 ... comm="touch" exe="/usr/bin/touch"
        type=CWD msg=audit(1416955786.566:89):  cwd="/root/tmp"
        type=PATH msg=audit(1416955786.566:89): item=0 name="test/"
          inode=401409 ... nametype=PARENT
        type=PATH msg=audit(1416955786.566:89): item=1 name="test/567"
          inode=393804 ... nametype=NORMAL
      
      This issue was brought up by a number of people, but special credit
      should go to hujianyang@huawei.com for reporting the problem along
      with an explanation of the problem and a patch.  While the original
      patch did have some problems (see the archive link below), it did
      demonstrate the problem and helped kickstart the fix presented here.
      
        * https://lkml.org/lkml/2014/9/5/66Reported-by: Nhujianyang <hujianyang@huawei.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Acked-by: NRichard Guy Briggs <rgb@redhat.com>
      4a928436
  11. 20 12月, 2014 3 次提交
    • R
      audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb · 54dc77d9
      Richard Guy Briggs 提交于
      Eric Paris explains: Since kauditd_send_multicast_skb() gets called in
      audit_log_end(), which can come from any context (aka even a sleeping context)
      GFP_KERNEL can't be used.  Since the audit_buffer knows what context it should
      use, pass that down and use that.
      
      See: https://lkml.org/lkml/2014/12/16/542
      
      BUG: sleeping function called from invalid context at mm/slab.c:2849
      in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin
      2 locks held by sulogin/885:
        #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b
        #1:  (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b
      CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30
      Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014
        ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375
        ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006
        0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38
      Call Trace:
        [<ffffffff916ba529>] dump_stack+0x50/0xa8
        [<ffffffff91063185>] ___might_sleep+0x1b6/0x1be
        [<ffffffff910632a6>] __might_sleep+0x119/0x128
        [<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
        [<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9
        [<ffffffff914e148d>] __alloc_skb+0x42/0x1a3
        [<ffffffff914e2b62>] skb_copy+0x3e/0xa3
        [<ffffffff910c263e>] audit_log_end+0x83/0x100
        [<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103
        [<ffffffff91252a73>] common_lsm_audit+0x441/0x450
        [<ffffffff9123c163>] slow_avc_audit+0x63/0x67
        [<ffffffff9123c42c>] avc_has_perm+0xca/0xe3
        [<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65
        [<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b
        [<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10
        [<ffffffff911515e6>] install_exec_creds+0xe/0x79
        [<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7
        [<ffffffff9115198e>] search_binary_handler+0x81/0x18c
        [<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7
        [<ffffffff91153669>] do_execve+0x1f/0x21
        [<ffffffff91153967>] SyS_execve+0x25/0x29
        [<ffffffff916c61a9>] stub_execve+0x69/0xa0
      
      Cc: stable@vger.kernel.org #v3.16-rc1
      Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Tested-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      54dc77d9
    • P
      audit: don't attempt to lookup PIDs when changing PID filtering audit rules · 3640dcfa
      Paul Moore 提交于
      Commit f1dc4867 ("audit: anchor all pid references in the initial pid
      namespace") introduced a find_vpid() call when adding/removing audit
      rules with PID/PPID filters; unfortunately this is problematic as
      find_vpid() only works if there is a task with the associated PID
      alive on the system.  The following commands demonstrate a simple
      reproducer.
      
      	# auditctl -D
      	# auditctl -l
      	# autrace /bin/true
      	# auditctl -l
      
      This patch resolves the problem by simply using the PID provided by
      the user without any additional validation, e.g. no calls to check to
      see if the task/PID exists.
      
      Cc: stable@vger.kernel.org # 3.15
      Cc: Richard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>
      3640dcfa
    • R
      PM: Eliminate CONFIG_PM_RUNTIME · 464ed18e
      Rafael J. Wysocki 提交于
      Having switched over all of the users of CONFIG_PM_RUNTIME to use
      CONFIG_PM directly, turn the latter into a user-selectable option
      and drop the former entirely from the tree.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
      Acked-by: NKevin Hilman <khilman@linaro.org>
      464ed18e
  12. 19 12月, 2014 1 次提交
    • T
      tick/powerclamp: Remove tick_nohz_idle abuse · a5fd9733
      Thomas Gleixner 提交于
      commit 4dbd2771 "tick: export nohz tick idle symbols for module
      use" was merged via the thermal tree without an explicit ack from the
      relevant maintainers.
      
      The exports are abused by the intel powerclamp driver which implements
      a fake idle state from a sched FIFO task. This causes all kinds of
      wreckage in the NOHZ core code which rightfully assumes that
      tick_nohz_idle_enter/exit() are only called from the idle task itself.
      
      Recent changes in the NOHZ core lead to a failure of the powerclamp
      driver and now people try to hack completely broken and backwards
      workarounds into the NOHZ core code. This is completely unacceptable
      and just papers over the real problem. There are way more subtle
      issues lurking around the corner.
      
      The real solution is to fix the powerclamp driver by rewriting it with
      a sane concept, but that's beyond the scope of this.
      
      So the only solution for now is to remove the calls into the core NOHZ
      code from the powerclamp trainwreck along with the exports. 
      
      Fixes: d6d71ee4 "PM: Introduce Intel PowerClamp Driver"
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Pan Jacob jun <jacob.jun.pan@intel.com>
      Cc: LKP <lkp@01.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a5fd9733
  13. 18 12月, 2014 1 次提交
    • K
      param: do not set store func without write perm · b0a65b0c
      Kees Cook 提交于
      When a module_param is defined without DAC write permissions, it can
      still be changed at runtime and updated. Drivers using a 0444 permission
      may be surprised that these values can still be changed.
      
      For drivers that want to allow updates, any S_IW* flag will set the
      "store" function as before. Drivers without S_IW* flags will have the
      "store" function unset, unforcing a read-only value. Drivers that wish
      neither "store" nor "get" can continue to use "0" for perms to stay out
      of sysfs entirely.
      
      Old behavior:
        # cd /sys/module/snd/parameters
        # ls -l
        total 0
        -r--r--r-- 1 root root 4096 Dec 11 13:55 cards_limit
        -r--r--r-- 1 root root 4096 Dec 11 13:55 major
        -r--r--r-- 1 root root 4096 Dec 11 13:55 slots
        # cat major
        116
        # echo -1 > major
        -bash: major: Permission denied
        # chmod u+w major
        # echo -1 > major
        # cat major
        -1
      
      New behavior:
        ...
        # chmod u+w major
        # echo -1 > major
        -bash: echo: write error: Input/output error
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      b0a65b0c
  14. 15 12月, 2014 2 次提交
  15. 14 12月, 2014 8 次提交
  16. 13 12月, 2014 2 次提交
    • T
      genirq: Prevent proc race against freeing of irq descriptors · c291ee62
      Thomas Gleixner 提交于
      Since the rework of the sparse interrupt code to actually free the
      unused interrupt descriptors there exists a race between the /proc
      interfaces to the irq subsystem and the code which frees the interrupt
      descriptor.
      
      CPU0				CPU1
      				show_interrupts()
      				  desc = irq_to_desc(X);
      free_desc(desc)
        remove_from_radix_tree();
        kfree(desc);
      				  raw_spinlock_irq(&desc->lock);
      
      /proc/interrupts is the only interface which can actively corrupt
      kernel memory via the lock access. /proc/stat can only read from freed
      memory. Extremly hard to trigger, but possible.
      
      The interfaces in /proc/irq/N/ are not affected by this because the
      removal of the proc file is serialized in procfs against concurrent
      readers/writers. The removal happens before the descriptor is freed.
      
      For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
      as the descriptor is never freed. It's merely cleared out with the irq
      descriptor lock held. So any concurrent proc access will either see
      the old correct value or the cleared out ones.
      
      Protect the lookup and access to the irq descriptor in
      show_interrupts() with the sparse_irq_lock.
      
      Provide kstat_irqs_usr() which is protecting the lookup and access
      with sparse_irq_lock and switch /proc/stat to use it.
      
      Document the existing kstat_irqs interfaces so it's clear that the
      caller needs to take care about protection. The users of these
      interfaces are either not affected due to SPARSE_IRQ=n or already
      protected against removal.
      
      Fixes: 1f5a5b87 "genirq: Implement a sane sparse_irq allocator"
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      c291ee62
    • R
      tracing / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM · 798bc6d8
      Rafael J. Wysocki 提交于
      After commit b2b49ccb (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
      selected) PM_RUNTIME is always set if PM is set, so files that are
      build conditionally if CONFIG_PM_RUNTIME is set may now be build
      if CONFIG_PM is set.
      
      Replace CONFIG_PM_RUNTIME with CONFIG_PM in kernel/trace/Makefile
      for this reason.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: Steven Rostedt <rostedt@goodmis.org.
      798bc6d8