1. 18 2月, 2012 3 次提交
  2. 13 2月, 2012 2 次提交
  3. 10 2月, 2012 3 次提交
  4. 05 2月, 2012 1 次提交
    • S
      PM / Freezer: Thaw only kernel threads if freezing of kernel threads fails · 379e0be8
      Srivatsa S. Bhat 提交于
      If freezing of kernel threads fails, we are expected to automatically
      thaw tasks in the error recovery path. However, at times, we encounter
      situations in which we would like the automatic error recovery path
      to thaw only the kernel threads, because we want to be able to do
      some more cleanup before we thaw userspace. Something like:
      
      error = freeze_kernel_threads();
      if (error) {
      	/* Do some cleanup */
      
      	/* Only then thaw userspace tasks*/
      	thaw_processes();
      }
      
      An example of such a situation is where we freeze/thaw filesystems
      during suspend/hibernation. There, if freezing of kernel threads
      fails, we would like to thaw the frozen filesystems before thawing
      the userspace tasks.
      
      So, modify freeze_kernel_threads() to thaw only kernel threads in
      case of freezing failure. And change suspend_freeze_processes()
      accordingly. (At the same time, let us also get rid of the rather
      cryptic usage of the conditional operator (:?) in that function.)
      
      [rjw: In fact, this patch fixes a regression introduced during the
       3.3 merge window, because without it thaw_processes() may be called
       before swsusp_free() in some situations and that may lead to massive
       memory allocation failures.]
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNigel Cunningham <nigel@tuxonice.net>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      379e0be8
  5. 02 2月, 2012 1 次提交
  6. 30 1月, 2012 2 次提交
    • R
      PM / Sleep: Introduce "late suspend" and "early resume" of devices · cf579dfb
      Rafael J. Wysocki 提交于
      The current device suspend/resume phases during system-wide power
      transitions appear to be insufficient for some platforms that want
      to use the same callback routines for saving device states and
      related operations during runtime suspend/resume as well as during
      system suspend/resume.  In principle, they could point their
      .suspend_noirq() and .resume_noirq() to the same callback routines
      as their .runtime_suspend() and .runtime_resume(), respectively,
      but at least some of them require device interrupts to be enabled
      while the code in those routines is running.
      
      It also makes sense to have device suspend-resume callbacks that will
      be executed with runtime PM disabled and with device interrupts
      enabled in case someone needs to run some special code in that
      context during system-wide power transitions.
      
      Apart from this, .suspend_noirq() and .resume_noirq() were introduced
      as a workaround for drivers using shared interrupts and failing to
      prevent their interrupt handlers from accessing suspended hardware.
      It appears to be better not to use them for other porposes, or we may
      have to deal with some serious confusion (which seems to be happening
      already).
      
      For the above reasons, introduce new device suspend/resume phases,
      "late suspend" and "early resume" (and analogously for hibernation)
      whose callback will be executed with runtime PM disabled and with
      device interrupts enabled and whose callback pointers generally may
      point to runtime suspend/resume routines.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reviewed-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      cf579dfb
    • R
      PM / Hibernate: Fix s2disk regression related to freezing workqueues · 181e9bde
      Rafael J. Wysocki 提交于
      Commit 2aede851
      
        PM / Hibernate: Freeze kernel threads after preallocating memory
      
      introduced a mechanism by which kernel threads were frozen after
      the preallocation of hibernate image memory to avoid problems with
      frozen kernel threads not responding to memory freeing requests.
      However, it overlooked the s2disk code path in which the
      SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
      which caused freeze_workqueues_begin() to BUG(), because it saw
      that worqueues had been already frozen.
      
      Although in principle this issue might be addressed by removing
      the relevant BUG_ON() from freeze_workqueues_begin(), that would
      reintroduce the very problem that commit 2aede851
      attempted to avoid into that particular code path.  For this reason,
      to fix the issue at hand, introduce thaw_kernel_threads() and make
      the SNAPSHOT_FREE ioctl execute it.
      
      Special thanks to Srivatsa S. Bhat for detailed analysis of the
      problem.
      Reported-and-tested-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: stable@kernel.org
      181e9bde
  7. 24 1月, 2012 3 次提交
  8. 23 1月, 2012 1 次提交
    • G
      net: introduce res_counter_charge_nofail() for socket allocations · 0e90b31f
      Glauber Costa 提交于
      There is a case in __sk_mem_schedule(), where an allocation
      is beyond the maximum, but yet we are allowed to proceed.
      It happens under the following condition:
      
      	sk->sk_wmem_queued + size >= sk->sk_sndbuf
      
      The network code won't revert the allocation in this case,
      meaning that at some point later it'll try to do it. Since
      this is never communicated to the underlying res_counter
      code, there is an inbalance in res_counter uncharge operation.
      
      I see two ways of fixing this:
      
      1) storing the information about those allocations somewhere
         in memcg, and then deducting from that first, before
         we start draining the res_counter,
      2) providing a slightly different allocation function for
         the res_counter, that matches the original behavior of
         the network code more closely.
      
      I decided to go for #2 here, believing it to be more elegant,
      since #1 would require us to do basically that, but in a more
      obscure way.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      CC: Tejun Heo <tj@kernel.org>
      CC: Li Zefan <lizf@cn.fujitsu.com>
      CC: Laurent Chavey <chavey@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e90b31f
  9. 21 1月, 2012 2 次提交
  10. 20 1月, 2012 1 次提交
  11. 18 1月, 2012 21 次提交
    • K
      audit: no leading space in audit_log_d_path prefix · c158a35c
      Kees Cook 提交于
      audit_log_d_path() injects an additional space before the prefix,
      which serves no purpose and doesn't mix well with other audit_log*()
      functions that do not sneak extra characters into the log.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      c158a35c
    • X
      audit: fix signedness bug in audit_log_execve_info() · 5afb8a3f
      Xi Wang 提交于
      In the loop, a size_t "len" is used to hold the return value of
      audit_log_single_execve_arg(), which returns -1 on error.  In that
      case the error handling (len <= 0) will be bypassed since "len" is
      unsigned, and the loop continues with (p += len) being wrapped.
      Change the type of "len" to signed int to fix the error handling.
      
      	size_t len;
      	...
      	for (...) {
      		len = audit_log_single_execve_arg(...);
      		if (len <= 0)
      			break;
      		p += len;
      	}
      Signed-off-by: NXi Wang <xi.wang@gmail.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      5afb8a3f
    • P
      audit: comparison on interprocess fields · 10d68360
      Peter Moody 提交于
      This allows audit to specify rules in which we compare two fields of a
      process.  Such as is the running process uid != to the running process
      euid?
      Signed-off-by: NPeter Moody <pmoody@google.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      10d68360
    • P
      audit: implement all object interfield comparisons · 4a6633ed
      Peter Moody 提交于
      This completes the matrix of interfield comparisons between uid/gid
      information for the current task and the uid/gid information for inodes.
      aka I can audit based on differences between the euid of the process and
      the uid of fs objects.
      Signed-off-by: NPeter Moody <pmoody@google.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      4a6633ed
    • E
      audit: allow interfield comparison between gid and ogid · c9fe685f
      Eric Paris 提交于
      Allow audit rules to compare the gid of the running task to the gid of the
      inode in question.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      c9fe685f
    • E
      audit: complex interfield comparison helper · b34b0393
      Eric Paris 提交于
      Rather than code the same loop over and over implement a helper function which
      uses some pointer magic to make it generic enough to be used numerous places
      as we implement more audit interfield comparisons
      Signed-off-by: NEric Paris <eparis@redhat.com>
      b34b0393
    • E
      audit: allow interfield comparison in audit rules · 02d86a56
      Eric Paris 提交于
      We wish to be able to audit when a uid=500 task accesses a file which is
      uid=0.  Or vice versa.  This patch introduces a new audit filter type
      AUDIT_FIELD_COMPARE which takes as an 'enum' which indicates which fields
      should be compared.  At this point we only define the task->uid vs
      inode->uid, but other comparisons can be added.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      02d86a56
    • E
      audit: do not call audit_getname on error · 4043cde8
      Eric Paris 提交于
      Just a code cleanup really.  We don't need to make a function call just for
      it to return on error.  This also makes the VFS function even easier to follow
      and removes a conditional on a hot path.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      4043cde8
    • E
      audit: only allow tasks to set their loginuid if it is -1 · 633b4545
      Eric Paris 提交于
      At the moment we allow tasks to set their loginuid if they have
      CAP_AUDIT_CONTROL.  In reality we want tasks to set the loginuid when they
      log in and it be impossible to ever reset.  We had to make it mutable even
      after it was once set (with the CAP) because on update and admin might have
      to restart sshd.  Now sshd would get his loginuid and the next user which
      logged in using ssh would not be able to set his loginuid.
      
      Systemd has changed how userspace works and allowed us to make the kernel
      work the way it should.  With systemd users (even admins) are not supposed
      to restart services directly.  The system will restart the service for
      them.  Thus since systemd is going to loginuid==-1, sshd would get -1, and
      sshd would be allowed to set a new loginuid without special permissions.
      
      If an admin in this system were to manually start an sshd he is inserting
      himself into the system chain of trust and thus, logically, it's his
      loginuid that should be used!  Since we have old systems I make this a
      Kconfig option.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      633b4545
    • E
      audit: remove task argument to audit_set_loginuid · 0a300be6
      Eric Paris 提交于
      The function always deals with current.  Don't expose an option
      pretending one can use it for something.  You can't.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      0a300be6
    • E
      audit: allow audit matching on inode gid · 54d3218b
      Eric Paris 提交于
      Much like the ability to filter audit on the uid of an inode collected, we
      should be able to filter on the gid of the inode.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      54d3218b
    • E
      audit: allow matching on obj_uid · efaffd6e
      Eric Paris 提交于
      Allow syscall exit filter matching based on the uid of the owner of an
      inode used in a syscall.  aka:
      
      auditctl -a always,exit -S open -F obj_uid=0 -F perm=wa
      Signed-off-by: NEric Paris <eparis@redhat.com>
      efaffd6e
    • E
      audit: remove audit_finish_fork as it can't be called · 6422e78d
      Eric Paris 提交于
      Audit entry,always rules are not allowed and are automatically changed in
      exit,always rules in userspace.  The kernel refuses to load such rules.
      
      Thus a task in the middle of a syscall (and thus in audit_finish_fork())
      can only be in one of two states: AUDIT_BUILD_CONTEXT or AUDIT_DISABLED.
      Since the current task cannot be in AUDIT_RECORD_CONTEXT we aren't every
      going to actually use the code in audit_finish_fork() since it will
      return without doing anything.  Thus drop the code.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      6422e78d
    • E
      audit: reject entry,always rules · 7ff68e53
      Eric Paris 提交于
      We deprecated entry,always rules a long time ago.  Reject those rules as
      invalid.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      7ff68e53
    • E
      audit: inline audit_free to simplify the look of generic code · a4ff8dba
      Eric Paris 提交于
      make the conditional a static inline instead of doing it in generic code.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      a4ff8dba
    • E
      audit: inline checks for not needing to collect aux records · 07c49417
      Eric Paris 提交于
      A number of audit hooks make function calls before they determine that
      auxilary records do not need to be collected.  Do those checks as static
      inlines since the most common case is going to be that records are not
      needed and we can skip the function call overhead.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      07c49417
    • E
      audit: drop some potentially inadvisable likely notations · 56179a6e
      Eric Paris 提交于
      The audit code makes heavy use of likely() and unlikely() macros, but they
      don't always make sense.  Drop any that seem questionable and let the
      computer do it's thing.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      56179a6e
    • E
      audit: remove AUDIT_SETUP_CONTEXT as it isn't used · 997f5b64
      Eric Paris 提交于
      Audit contexts have 3 states.  Disabled, which doesn't collect anything,
      build, which collects info but might not emit it, and record, which
      collects and emits.  There is a 4th state, setup, which isn't used.  Get
      rid of it.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      997f5b64
    • E
      audit: inline audit_syscall_entry to reduce burden on archs · b05d8447
      Eric Paris 提交于
      Every arch calls:
      
      if (unlikely(current->audit_context))
      	audit_syscall_entry()
      
      which requires knowledge about audit (the existance of audit_context) in
      the arch code.  Just do it all in static inline in audit.h so that arch's
      can remain blissfully ignorant.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      b05d8447
    • E
      Audit: push audit success and retcode into arch ptrace.h · d7e7528b
      Eric Paris 提交于
      The audit system previously expected arches calling to audit_syscall_exit to
      supply as arguments if the syscall was a success and what the return code was.
      Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
      by converting from negative retcodes to an audit internal magic value stating
      success or failure.  This helper was wrong and could indicate that a valid
      pointer returned to userspace was a failed syscall.  The fix is to fix the
      layering foolishness.  We now pass audit_syscall_exit a struct pt_reg and it
      in turns calls back into arch code to collect the return value and to
      determine if the syscall was a success or failure.  We also define a generic
      is_syscall_success() macro which determines success/failure based on if the
      value is < -MAX_ERRNO.  This works for arches like x86 which do not use a
      separate mechanism to indicate syscall failure.
      
      We make both the is_syscall_success() and regs_return_value() static inlines
      instead of macros.  The reason is because the audit function must take a void*
      for the regs.  (uml calls theirs struct uml_pt_regs instead of just struct
      pt_regs so audit_syscall_exit can't take a struct pt_regs).  Since the audit
      function takes a void* we need to use static inlines to cast it back to the
      arch correct structure to dereference it.
      
      The other major change is that on some arches, like ia64, MIPS and ppc, we
      change regs_return_value() to give us the negative value on syscall failure.
      THE only other user of this macro, kretprobe_example.c, won't notice and it
      makes the value signed consistently for the audit functions across all archs.
      
      In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
      audit code as the return value.  But the ptrace_64.h code defined the macro
      regs_return_value() as regs[3].  I have no idea which one is correct, but this
      patch now uses the regs_return_value() function, so it now uses regs[3].
      
      For powerpc we previously used regs->result but now use the
      regs_return_value() function which uses regs->gprs[3].  regs->gprs[3] is
      always positive so the regs_return_value(), much like ia64 makes it negative
      before calling the audit code when appropriate.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
      Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
      Acked-by: Richard Weinberger <richard@nod.at> [for uml]
      Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
      Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
      Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]
      d7e7528b
    • E
      seccomp: audit abnormal end to a process due to seccomp · 85e7bac3
      Eric Paris 提交于
      The audit system likes to collect information about processes that end
      abnormally (SIGSEGV) as this may me useful intrusion detection information.
      This patch adds audit support to collect information when seccomp forces a
      task to exit because of misbehavior in a similar way.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      85e7bac3