1. 31 3月, 2014 1 次提交
    • E
      AUDIT: Allow login in non-init namespaces · aa4af831
      Eric Paris 提交于
      It its possible to configure your PAM stack to refuse login if audit
      messages (about the login) were unable to be sent.  This is common in
      many distros and thus normal configuration of many containers.  The PAM
      modules determine if audit is enabled/disabled in the kernel based on
      the return value from sending an audit message on the netlink socket.
      If userspace gets back ECONNREFUSED it believes audit is disabled in the
      kernel.  If it gets any other error else it refuses to let the login
      proceed.
      
      Just about ever since the introduction of namespaces the kernel audit
      subsystem has returned EPERM if the task sending a message was not in
      the init user or pid namespace.  So many forms of containers have never
      worked if audit was enabled in the kernel.
      
      BUT if the container was not in net_init then the kernel network code
      would send ECONNREFUSED (instead of the audit code sending EPERM).  Thus
      by pure accident/dumb luck/bug if an admin configured the PAM stack to
      reject all logins that didn't talk to audit, but then ran the login
      untility in the non-init_net namespace, it would work!! Clearly this was
      a bug, but it is a bug some people expected.
      
      With the introduction of network namespace support in 3.14-rc1 the two
      bugs stopped cancelling each other out.  Now, containers in the
      non-init_net namespace refused to let users log in (just like PAM was
      configfured!) Obviously some people were not happy that what used to let
      users log in, now didn't!
      
      This fix is kinda hacky.  We return ECONNREFUSED for all non-init
      relevant namespaces.  That means that not only will the old broken
      non-init_net setups continue to work, now the broken non-init_pid or
      non-init_user setups will 'work'.  They don't really work, since audit
      isn't logging things.  But it's what most users want.
      
      In 3.15 we should have patches to support not only the non-init_net
      (3.14) namespace but also the non-init_pid and non-init_user namespace.
      So all will be right in the world.  This just opens the doors wide open
      on 3.14 and hopefully makes users happy, if not the audit system...
      Reported-by: NAndre Tomt <andre@tomt.net>
      Reported-by: NAdam Richter <adam_richter2004@yahoo.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa4af831
  2. 28 3月, 2014 1 次提交
  3. 26 3月, 2014 4 次提交
  4. 21 3月, 2014 4 次提交
    • L
      futex: revert back to the explicit waiter counting code · 11d4616b
      Linus Torvalds 提交于
      Srikar Dronamraju reports that commit b0c29f79 ("futexes: Avoid
      taking the hb->lock if there's nothing to wake up") causes java threads
      getting stuck on futexes when runing specjbb on a power7 numa box.
      
      The cause appears to be that the powerpc spinlocks aren't using the same
      ticket lock model that we use on x86 (and other) architectures, which in
      turn result in the "spin_is_locked()" test in hb_waiters_pending()
      occasionally reporting an unlocked spinlock even when there are pending
      waiters.
      
      So this reinstates Davidlohr Bueso's original explicit waiter counting
      code, which I had convinced Davidlohr to drop in favor of figuring out
      the pending waiters by just using the existing state of the spinlock and
      the wait queue.
      Reported-and-tested-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Original-code-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11d4616b
    • P
      rcu: Provide grace-period piggybacking API · 765a3f4f
      Paul E. McKenney 提交于
      The following pattern is currently not well supported by RCU:
      
      1.	Make data element inaccessible to RCU readers.
      
      2.	Do work that probably lasts for more than one grace period.
      
      3.	Do something to make sure RCU readers in flight before #1 above
      	have completed.
      
      Here are some things that could currently be done:
      
      a.	Do a synchronize_rcu() unconditionally at either #1 or #3 above.
      	This works, but imposes needless work and latency.
      
      b.	Post an RCU callback at #1 above that does a wakeup, then
      	wait for the wakeup at #3.  This works well, but likely results
      	in an extra unneeded grace period.  Open-coding this is also
      	a bit more semi-tricky code than would be good.
      
      This commit therefore adds get_state_synchronize_rcu() and
      cond_synchronize_rcu() APIs.  Call get_state_synchronize_rcu() at #1
      above and pass its return value to cond_synchronize_rcu() at #3 above.
      This results in a call to synchronize_rcu() if no grace period has
      elapsed between #1 and #3, but requires only a load, comparison, and
      memory barrier if a full grace period did elapse.
      Requested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      765a3f4f
    • D
      Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC · 8c90487c
      Dave Jones 提交于
      Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC, so we can repurpose
      the flag to encompass a wider range of pushing the CPU beyond its
      warrany.
      Signed-off-by: NDave Jones <davej@fedoraproject.org>
      Link: http://lkml.kernel.org/r/20140226154949.GA770@redhat.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      8c90487c
    • V
      tracing: Fix array size mismatch in format string · 87291347
      Vaibhav Nagarnaik 提交于
      In event format strings, the array size is reported in two locations.
      One in array subscript and then via the "size:" attribute. The values
      reported there have a mismatch.
      
      For e.g., in sched:sched_switch the prev_comm and next_comm character
      arrays have subscript values as [32] where as the actual field size is
      16.
      
      name: sched_switch
      ID: 301
      format:
              field:unsigned short common_type;       offset:0;       size:2; signed:0;
              field:unsigned char common_flags;       offset:2;       size:1; signed:0;
              field:unsigned char common_preempt_count;       offset:3;       size:1;signed:0;
              field:int common_pid;   offset:4;       size:4; signed:1;
      
              field:char prev_comm[32];       offset:8;       size:16;        signed:1;
              field:pid_t prev_pid;   offset:24;      size:4; signed:1;
              field:int prev_prio;    offset:28;      size:4; signed:1;
              field:long prev_state;  offset:32;      size:8; signed:1;
              field:char next_comm[32];       offset:40;      size:16;        signed:1;
              field:pid_t next_pid;   offset:56;      size:4; signed:1;
              field:int next_prio;    offset:60;      size:4; signed:1;
      
      After bisection, the following commit was blamed:
      92edca07 tracing: Use direct field, type and system names
      
      This commit removes the duplication of strings for field->name and
      field->type assuming that all the strings passed in
      __trace_define_field() are immutable. This is not true for arrays, where
      the type string is created in event_storage variable and field->type for
      all array fields points to event_storage.
      
      Use __stringify() to create a string constant for the type string.
      
      Also, get rid of event_storage and event_storage_mutex that are not
      needed anymore.
      
      also, an added benefit is that this reduces the overhead of events a bit more:
      
         text    data     bss     dec     hex filename
      8424787 2036472 1302528 11763787         b3804b vmlinux
      8420814 2036408 1302528 11759750         b37086 vmlinux.patched
      
      Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com
      
      Cc: Laurent Chavey <chavey@google.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      87291347
  5. 20 3月, 2014 2 次提交
  6. 19 3月, 2014 1 次提交
  7. 13 3月, 2014 2 次提交
    • F
      sched: Remove needless round trip nsecs <-> tick conversion of steal time · 300a9d88
      Frederic Weisbecker 提交于
      When update_rq_clock_task() accounts the pending steal time for a task,
      it converts the steal delta from nsecs to tick then from tick to nsecs.
      
      There is no apparent good reason for doing that though because both
      the task clock and the prev steal delta are u64 and store values
      in nsecs.
      
      So lets remove the needless conversion.
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      300a9d88
    • F
      cputime: Fix jiffies based cputime assumption on steal accounting · dee08a72
      Frederic Weisbecker 提交于
      The steal guest time accounting code assumes that cputime_t is based on
      jiffies. So when CONFIG_NO_HZ_FULL=y, which implies that cputime_t
      is based on nsecs, steal_account_process_tick() passes the delta in
      jiffies to account_steal_time() which then accounts it as if it's a
      value in nsecs.
      
      As a result, accounting 1 second of steal time (with HZ=100 that would
      be 100 jiffies) is spuriously accounted as 100 nsecs.
      
      As such /proc/stat may report 0 values of steal time even when two
      guests have run concurrently for a few seconds on the same host and
      same CPU.
      
      In order to fix this, lets convert the nsecs based steal delta to
      cputime instead of jiffies by using the right conversion API.
      
      Given that the steal time is stored in cputime_t and this type can have
      a smaller granularity than nsecs, we only account the rounded converted
      value and leave the remaining nsecs for the next deltas.
      Reported-by: NHuiqingding <huding@redhat.com>
      Reported-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      dee08a72
  8. 12 3月, 2014 4 次提交
  9. 11 3月, 2014 17 次提交
  10. 09 3月, 2014 1 次提交
  11. 06 3月, 2014 3 次提交