1. 04 2月, 2015 3 次提交
    • S
      tracing: Convert the tracing facility over to use tracefs · 8434dc93
      Steven Rostedt (Red Hat) 提交于
      debugfs was fine for the tracing facility as a quick way to get
      an interface. Now that tracing has matured, it should separate itself
      from debugfs such that it can be mounted separately without needing
      to mount all of debugfs with it. That is, users resist using tracing
      because it requires mounting debugfs. Having tracing have its own file
      system lets users get the features of tracing without needing to bring
      in the rest of the kernel's debug infrastructure.
      
      Another reason for tracefs is that debubfs does not support mkdir.
      Currently, to create instances, one does a mkdir in the tracing/instance
      directory. This is implemented via a hack that forces debugfs to do
      something it is not intended on doing. By converting over to tracefs, this
      hack can be removed and mkdir can be properly implemented. This patch does
      not address this yet, but it lays the ground work for that to be done.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8434dc93
    • S
      tracing: Create cmdline tracer options on tracing fs init · 09d23a1d
      Steven Rostedt (Red Hat) 提交于
      The options for cmdline tracers are not created if the debugfs system
      is not ready yet. If tracing has started before debugfs is up, then the
      option files for the tracer are not created. Create them when creating
      the tracing directory if the current tracer requires option files.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      09d23a1d
    • S
      tracing: Only create tracer options files if directory exists · 0f67f04f
      Steven Rostedt (Red Hat) 提交于
      Do not bother creating tracer options if no tracing directory
      exists. If a tracer is enabled via the command line, and is
      started before the tracing directory is created, then it wont have
      its tracer specific options created.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0f67f04f
  2. 02 2月, 2015 2 次提交
  3. 29 1月, 2015 1 次提交
  4. 28 1月, 2015 2 次提交
  5. 23 1月, 2015 2 次提交
  6. 17 1月, 2015 1 次提交
  7. 15 1月, 2015 4 次提交
  8. 09 1月, 2015 7 次提交
  9. 30 12月, 2014 1 次提交
    • P
      audit: create private file name copies when auditing inodes · fcf22d82
      Paul Moore 提交于
      Unfortunately, while commit 4a928436 ("audit: correctly record file
      names with different path name types") fixed a problem where we were
      not recording filenames, it created a new problem by attempting to use
      these file names after they had been freed.  This patch resolves the
      issue by creating a copy of the filename which the audit subsystem
      frees after it is done with the string.
      
      At some point it would be nice to resolve this issue with refcounts,
      or something similar, instead of having to allocate/copy strings, but
      that is almost surely beyond the scope of a -rcX patch so we'll defer
      that for later.  On the plus side, only audit users should be impacted
      by the string copying.
      Reported-by: NToralf Foerster <toralf.foerster@gmx.de>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      fcf22d82
  10. 27 12月, 2014 1 次提交
    • J
      netlink/genetlink: pass network namespace to bind/unbind · 023e2cfa
      Johannes Berg 提交于
      Netlink families can exist in multiple namespaces, and for the most
      part multicast subscriptions are per network namespace. Thus it only
      makes sense to have bind/unbind notifications per network namespace.
      
      To achieve this, pass the network namespace of a given client socket
      to the bind/unbind functions.
      
      Also do this in generic netlink, and there also make sure that any
      bind for multicast groups that only exist in init_net is rejected.
      This isn't really a problem if it is accepted since a client in a
      different namespace will never receive any notifications from such
      a group, but it can confuse the family if not rejected (it's also
      possible to silently (without telling the family) accept it, but it
      would also have to be ignored on unbind so families that take any
      kind of action on bind/unbind won't do unnecessary work for invalid
      clients like that.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      023e2cfa
  11. 24 12月, 2014 1 次提交
    • R
      audit: restore AUDIT_LOGINUID unset ABI · 041d7b98
      Richard Guy Briggs 提交于
      A regression was caused by commit 780a7654:
      	 audit: Make testing for a valid loginuid explicit.
      (which in turn attempted to fix a regression caused by e1760bd5)
      
      When audit_krule_to_data() fills in the rules to get a listing, there was a
      missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID.
      
      This broke userspace by not returning the same information that was sent and
      expected.
      
      The rule:
      	auditctl -a exit,never -F auid=-1
      gives:
      	auditctl -l
      		LIST_RULES: exit,never f24=0 syscall=all
      when it should give:
      		LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all
      
      Tag it so that it is reported the same way it was set.  Create a new
      private flags audit_krule field (pflags) to store it that won't interact with
      the public one from the API.
      
      Cc: stable@vger.kernel.org # v3.10-rc1+
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      041d7b98
  12. 23 12月, 2014 4 次提交
    • A
      sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation · b74e6278
      Alex Thorlton 提交于
      When allocating space for load_balance_mask, in sched_init, when
      CPUMASK_OFFSTACK is set, we've managed to spill over
      KMALLOC_MAX_SIZE on our 6144 core machine.  The patch below
      breaks up the allocations so that they don't overflow the max
      alloc size.  It also allocates the masks on the the node from
      which they'll most commonly be accessed, to minimize remote
      accesses on NUMA machines.
      Suggested-by: NGeorge Beshers <gbeshers@sgi.com>
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: George Beshers <gbeshers@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1418928270-148543-1-git-send-email-athorlton@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b74e6278
    • S
      tracing: Remove taking of trace_types_lock in pipe files · d716ff71
      Steven Rostedt (Red Hat) 提交于
      Taking the global mutex "trace_types_lock" in the trace_pipe files
      causes a bottle neck as most the pipe files can be read per cpu
      and there's no reason to serialize them.
      
      The current_trace variable was given a ref count and it can not
      change when the ref count is not zero. Opening the trace_pipe
      files will up the ref count (and decremented on close), so that
      the lock no longer needs to be taken when accessing the
      current_trace variable.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d716ff71
    • S
      tracing: Add ref count to tracer for when they are being read by pipe · cf6ab6d9
      Steven Rostedt (Red Hat) 提交于
      When one of the trace pipe files are being read (by either the trace_pipe
      or trace_pipe_raw), do not allow the current_trace to change. By adding
      a ref count that is incremented when the pipe files are opened, will
      prevent the current_trace from being changed.
      
      This will allow for the removal of the global trace_types_lock from
      reading the pipe buffers (which is currently a bottle neck).
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      cf6ab6d9
    • P
      audit: correctly record file names with different path name types · 4a928436
      Paul Moore 提交于
      There is a problem with the audit system when multiple audit records
      are created for the same path, each with a different path name type.
      The root cause of the problem is in __audit_inode() when an exact
      match (both the path name and path name type) is not found for a
      path name record; the existing code creates a new path name record,
      but it never sets the path name in this record, leaving it NULL.
      This patch corrects this problem by assigning the path name to these
      newly created records.
      
      There are many ways to reproduce this problem, but one of the
      easiest is the following (assuming auditd is running):
      
        # mkdir /root/tmp/test
        # touch /root/tmp/test/567
        # auditctl -a always,exit -F dir=/root/tmp/test
        # touch /root/tmp/test/567
      
      Afterwards, or while the commands above are running, check the audit
      log and pay special attention to the PATH records.  A faulty kernel
      will display something like the following for the file creation:
      
        type=SYSCALL msg=audit(1416957442.025:93): arch=c000003e syscall=2
          success=yes exit=3 ... comm="touch" exe="/usr/bin/touch"
        type=CWD msg=audit(1416957442.025:93):  cwd="/root/tmp"
        type=PATH msg=audit(1416957442.025:93): item=0 name="test/"
          inode=401409 ... nametype=PARENT
        type=PATH msg=audit(1416957442.025:93): item=1 name=(null)
          inode=393804 ... nametype=NORMAL
        type=PATH msg=audit(1416957442.025:93): item=2 name=(null)
          inode=393804 ... nametype=NORMAL
      
      While a patched kernel will show the following:
      
        type=SYSCALL msg=audit(1416955786.566:89): arch=c000003e syscall=2
          success=yes exit=3 ... comm="touch" exe="/usr/bin/touch"
        type=CWD msg=audit(1416955786.566:89):  cwd="/root/tmp"
        type=PATH msg=audit(1416955786.566:89): item=0 name="test/"
          inode=401409 ... nametype=PARENT
        type=PATH msg=audit(1416955786.566:89): item=1 name="test/567"
          inode=393804 ... nametype=NORMAL
      
      This issue was brought up by a number of people, but special credit
      should go to hujianyang@huawei.com for reporting the problem along
      with an explanation of the problem and a patch.  While the original
      patch did have some problems (see the archive link below), it did
      demonstrate the problem and helped kickstart the fix presented here.
      
        * https://lkml.org/lkml/2014/9/5/66Reported-by: Nhujianyang <hujianyang@huawei.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Acked-by: NRichard Guy Briggs <rgb@redhat.com>
      4a928436
  13. 20 12月, 2014 3 次提交
    • R
      audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb · 54dc77d9
      Richard Guy Briggs 提交于
      Eric Paris explains: Since kauditd_send_multicast_skb() gets called in
      audit_log_end(), which can come from any context (aka even a sleeping context)
      GFP_KERNEL can't be used.  Since the audit_buffer knows what context it should
      use, pass that down and use that.
      
      See: https://lkml.org/lkml/2014/12/16/542
      
      BUG: sleeping function called from invalid context at mm/slab.c:2849
      in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin
      2 locks held by sulogin/885:
        #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b
        #1:  (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b
      CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30
      Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014
        ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375
        ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006
        0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38
      Call Trace:
        [<ffffffff916ba529>] dump_stack+0x50/0xa8
        [<ffffffff91063185>] ___might_sleep+0x1b6/0x1be
        [<ffffffff910632a6>] __might_sleep+0x119/0x128
        [<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
        [<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9
        [<ffffffff914e148d>] __alloc_skb+0x42/0x1a3
        [<ffffffff914e2b62>] skb_copy+0x3e/0xa3
        [<ffffffff910c263e>] audit_log_end+0x83/0x100
        [<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103
        [<ffffffff91252a73>] common_lsm_audit+0x441/0x450
        [<ffffffff9123c163>] slow_avc_audit+0x63/0x67
        [<ffffffff9123c42c>] avc_has_perm+0xca/0xe3
        [<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65
        [<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b
        [<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10
        [<ffffffff911515e6>] install_exec_creds+0xe/0x79
        [<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7
        [<ffffffff9115198e>] search_binary_handler+0x81/0x18c
        [<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7
        [<ffffffff91153669>] do_execve+0x1f/0x21
        [<ffffffff91153967>] SyS_execve+0x25/0x29
        [<ffffffff916c61a9>] stub_execve+0x69/0xa0
      
      Cc: stable@vger.kernel.org #v3.16-rc1
      Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Tested-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      54dc77d9
    • P
      audit: don't attempt to lookup PIDs when changing PID filtering audit rules · 3640dcfa
      Paul Moore 提交于
      Commit f1dc4867 ("audit: anchor all pid references in the initial pid
      namespace") introduced a find_vpid() call when adding/removing audit
      rules with PID/PPID filters; unfortunately this is problematic as
      find_vpid() only works if there is a task with the associated PID
      alive on the system.  The following commands demonstrate a simple
      reproducer.
      
      	# auditctl -D
      	# auditctl -l
      	# autrace /bin/true
      	# auditctl -l
      
      This patch resolves the problem by simply using the PID provided by
      the user without any additional validation, e.g. no calls to check to
      see if the task/PID exists.
      
      Cc: stable@vger.kernel.org # 3.15
      Cc: Richard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>
      3640dcfa
    • R
      PM: Eliminate CONFIG_PM_RUNTIME · 464ed18e
      Rafael J. Wysocki 提交于
      Having switched over all of the users of CONFIG_PM_RUNTIME to use
      CONFIG_PM directly, turn the latter into a user-selectable option
      and drop the former entirely from the tree.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
      Acked-by: NKevin Hilman <khilman@linaro.org>
      464ed18e
  14. 19 12月, 2014 1 次提交
    • T
      tick/powerclamp: Remove tick_nohz_idle abuse · a5fd9733
      Thomas Gleixner 提交于
      commit 4dbd2771 "tick: export nohz tick idle symbols for module
      use" was merged via the thermal tree without an explicit ack from the
      relevant maintainers.
      
      The exports are abused by the intel powerclamp driver which implements
      a fake idle state from a sched FIFO task. This causes all kinds of
      wreckage in the NOHZ core code which rightfully assumes that
      tick_nohz_idle_enter/exit() are only called from the idle task itself.
      
      Recent changes in the NOHZ core lead to a failure of the powerclamp
      driver and now people try to hack completely broken and backwards
      workarounds into the NOHZ core code. This is completely unacceptable
      and just papers over the real problem. There are way more subtle
      issues lurking around the corner.
      
      The real solution is to fix the powerclamp driver by rewriting it with
      a sane concept, but that's beyond the scope of this.
      
      So the only solution for now is to remove the calls into the core NOHZ
      code from the powerclamp trainwreck along with the exports. 
      
      Fixes: d6d71ee4 "PM: Introduce Intel PowerClamp Driver"
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Pan Jacob jun <jacob.jun.pan@intel.com>
      Cc: LKP <lkp@01.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a5fd9733
  15. 18 12月, 2014 1 次提交
    • K
      param: do not set store func without write perm · b0a65b0c
      Kees Cook 提交于
      When a module_param is defined without DAC write permissions, it can
      still be changed at runtime and updated. Drivers using a 0444 permission
      may be surprised that these values can still be changed.
      
      For drivers that want to allow updates, any S_IW* flag will set the
      "store" function as before. Drivers without S_IW* flags will have the
      "store" function unset, unforcing a read-only value. Drivers that wish
      neither "store" nor "get" can continue to use "0" for perms to stay out
      of sysfs entirely.
      
      Old behavior:
        # cd /sys/module/snd/parameters
        # ls -l
        total 0
        -r--r--r-- 1 root root 4096 Dec 11 13:55 cards_limit
        -r--r--r-- 1 root root 4096 Dec 11 13:55 major
        -r--r--r-- 1 root root 4096 Dec 11 13:55 slots
        # cat major
        116
        # echo -1 > major
        -bash: major: Permission denied
        # chmod u+w major
        # echo -1 > major
        # cat major
        -1
      
      New behavior:
        ...
        # chmod u+w major
        # echo -1 > major
        -bash: echo: write error: Input/output error
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      b0a65b0c
  16. 15 12月, 2014 2 次提交
  17. 14 12月, 2014 4 次提交
    • J
      fsnotify: unify inode and mount marks handling · 0809ab69
      Jan Kara 提交于
      There's a lot of common code in inode and mount marks handling.  Factor it
      out to a common helper function.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0809ab69
    • R
      gcov: enable GCOV_PROFILE_ALL from ARCH Kconfigs · 957e3fac
      Riku Voipio 提交于
      Following the suggestions from Andrew Morton and Stephen Rothwell,
      Dont expand the ARCH list in kernel/gcov/Kconfig. Instead,
      define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures
      can enable.
      
      set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was
      previously allowed + ARM64 which I tested.
      Signed-off-by: NRiku Voipio <riku.voipio@linaro.org>
      Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      957e3fac
    • M
      kexec: remove unnecessary KERN_ERR from kexec.c · d5393955
      Masanari Iida 提交于
      Remove unnecessary KERN_ERR from pr_err() within kexec.c.
      Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5393955
    • D
      syscalls: implement execveat() system call · 51f39a1f
      David Drysdale 提交于
      This patchset adds execveat(2) for x86, and is derived from Meredydd
      Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
      
      The primary aim of adding an execveat syscall is to allow an
      implementation of fexecve(3) that does not rely on the /proc filesystem,
      at least for executables (rather than scripts).  The current glibc version
      of fexecve(3) is implemented via /proc, which causes problems in sandboxed
      or otherwise restricted environments.
      
      Given the desire for a /proc-free fexecve() implementation, HPA suggested
      (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
      an appropriate generalization.
      
      Also, having a new syscall means that it can take a flags argument without
      back-compatibility concerns.  The current implementation just defines the
      AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
      added in future -- for example, flags for new namespaces (as suggested at
      https://lkml.org/lkml/2006/7/11/474).
      
      Related history:
       - https://lkml.org/lkml/2006/12/27/123 is an example of someone
         realizing that fexecve() is likely to fail in a chroot environment.
       - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
         documenting the /proc requirement of fexecve(3) in its manpage, to
         "prevent other people from wasting their time".
       - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
         problem where a process that did setuid() could not fexecve()
         because it no longer had access to /proc/self/fd; this has since
         been fixed.
      
      This patch (of 4):
      
      Add a new execveat(2) system call.  execveat() is to execve() as openat()
      is to open(): it takes a file descriptor that refers to a directory, and
      resolves the filename relative to that.
      
      In addition, if the filename is empty and AT_EMPTY_PATH is specified,
      execveat() executes the file to which the file descriptor refers.  This
      replicates the functionality of fexecve(), which is a system call in other
      UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and
      so relies on /proc being mounted).
      
      The filename fed to the executed program as argv[0] (or the name of the
      script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
      (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
      reflecting how the executable was found.  This does however mean that
      execution of a script in a /proc-less environment won't work; also, script
      execution via an O_CLOEXEC file descriptor fails (as the file will not be
      accessible after exec).
      
      Based on patches by Meredydd Luff.
      Signed-off-by: NDavid Drysdale <drysdale@google.com>
      Cc: Meredydd Luff <meredydd@senatehouse.org>
      Cc: Shuah Khan <shuah.kh@samsung.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Rich Felker <dalias@aerifal.cx>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51f39a1f