1. 05 7月, 2013 2 次提交
  2. 03 7月, 2013 6 次提交
    • F
      posix_timers: fix racy timer delta caching on task exit · a0b2062b
      Frederic Weisbecker 提交于
      When a task exits, we perform a caching of the remaining cputime delta
      before expiring of its timers.
      
      This is done from the following places:
      
      * When the task is reaped. We iterate through its list of
        posix cpu timers and store the remaining timer delta to
        the timer struct instead of the absolute value.
        (See posix_cpu_timers_exit() / posix_cpu_timers_exit_group() )
      
      * When we call posix_cpu_timer_get() or posix_cpu_timer_schedule().
        If the timer's task is considered dying when watched from these
        places, the same conversion from absolute to relative expiry time
        is performed. Then the given task's reference is released.
        (See clear_dead_task() ).
      
      The relevance of this caching is questionable but this is another
      and deeper debate.
      
      The big issue here is that these two sources of caching don't mix
      up very well together.
      
      More specifically, the caching can easily be done twice, resulting
      in a wrong delta as it gets spuriously substracted a second time by
      the elapsed clock. This can happen in the following scenario:
      
      1) The task exits and gets reaped: we call posix_cpu_timers_exit()
         and the absolute timer expiry values are converted to a relative
         delta.
      
      2) timer_gettime() -> posix_cpu_timer_get() is called and relies on
         clear_dead_task() because  tsk->exit_state == EXIT_DEAD.
         The delta gets substracted again by the elapsed clock and we return
         a wrong result.
      
      To fix this, just remove the caching done on task reaping time.  It
      doesn't bring much value on its own.  The caching done from
      posix_cpu_timer_get/schedule is enough.
      
      And it would also be hard to get it really right: we could make it put and
      clear the target task in the timer struct so that readers know if they are
      dealing with a relative cached of absolute value.  But it would be racy.
      The only safe way to do it would be to lock the itimer->it_lock so that we
      know nobody reads the cputime expiry value while we modify it and its
      target task reference.  Doing so would involve some funny workarounds to
      avoid circular lock against the sighand lock.  There is just no reason to
      maintain this.
      
      The user visible effect of this patch can be observed by running the
      following code: it creates a subthread that launches a posix cputimer
      which expires after 10 seconds. But then the subthread only busy loops for 2
      seconds and exits. The parent reaps the subthread and read the timer value.
      Its expected value should the be the initial timer's expiration value
      minus the cputime elapsed in the subthread. Roughly 10 - 2 = 8 seconds:
      
      	#include <sys/time.h>
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <time.h>
      	#include <pthread.h>
      
      	static timer_t id;
      	static struct itimerspec val = { .it_value.tv_sec = 10, }, new;
      
      	static void *thread(void *unused)
      	{
      		int err;
      		struct timeval start, end, diff;
      
      		timer_create(CLOCK_THREAD_CPUTIME_ID, NULL, &id);
      		if (err < 0) {
      			perror("Can't create timer\n");
      			return NULL;
      		}
      
      		/* Arm 10 sec timer */
      		err = timer_settime(id, 0, &val, NULL);
      		if (err < 0) {
      			perror("Can't set timer\n");
      			return NULL;
      		}
      
      		/* Exit after 2 seconds of execution */
      		gettimeofday(&start, NULL);
      	        do {
      			gettimeofday(&end, NULL);
      			timersub(&end, &start, &diff);
      		} while (diff.tv_sec < 2);
      
      		return NULL;
      	}
      
      	int main(int argc, char **argv)
      	{
      		pthread_t pthread;
      		int err;
      
      		err = pthread_create(&pthread, NULL, thread, NULL);
      		if (err) {
      			perror("Can't create thread\n");
      			return -1;
      		}
      		pthread_join(pthread, NULL);
      		/* Just wait a little bit to make sure the child got reaped */
      		sleep(1);
      		err = timer_gettime(id, &new);
      		if (err)
      			perror("Can't get timer value\n");
      		printf("%d %ld\n", new.it_value.tv_sec, new.it_value.tv_nsec);
      
      		return 0;
      	}
      
      Before the patch:
      
             $ ./posix_cpu_timers
             6 2278074
      
      After the patch:
      
            $ ./posix_cpu_timers
            8 1158766
      
      Before the patch, the elapsed time got two more seconds spuriously accounted.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      a0b2062b
    • F
      posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule() · 76cdcdd9
      Frederic Weisbecker 提交于
      In order to re-arm a timer after it fired, we take a sample of the current
      process or thread cputime.
      
      If the task is dying though, we don't arm anything but we cache the
      remaining timer expiration delta for further reads.
      
      Something similar is performed in posix_cpu_timer_get() but here we forget
      to take the process wide cputime sample before caching it.
      
      As a result we are storing random stack content, leading every further
      reads of that timer to return junk values.
      
      Fix this by taking the appropriate sample in the case of process wide
      timers.
      
      This probably doesn't matter much in practice because, at this stage, the
      thread is the last one in the group and we reached exit_notify().  This
      implies that we called exit_itimers() and there should be no more timers
      to handle for that task.
      
      So this is likely dead code anyway but let's fix the current logic
      and the warning that came along:
      
          kernel/posix-cpu-timers.c: In function 'posix_cpu_timer_schedule':
          kernel/posix-cpu-timers.c:1127: warning: 'now' may be used uninitialized in this function
      
      Then we can start to think further about cleaning up that code.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Reported-by: NChen Gang <gang.chen@asianux.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Chen Gang <gang.chen@asianux.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      76cdcdd9
    • F
      selftests: add basic posix timers selftests · 0bc4b0cf
      Frederic Weisbecker 提交于
      Add some initial basic tests on a few posix timers interface such as
      setitimer() and timer_settime().
      
      These simply check that expiration happens in a reasonable timeframe after
      expected elapsed clock time (user time, user + system time, real time,
      ...).
      
      This is helpful for finding basic breakages while hacking
      on this subsystem.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      0bc4b0cf
    • F
      posix_cpu_timers: consolidate expired timers check · 2473f3e7
      Frederic Weisbecker 提交于
      Consolidate the common code amongst per thread and per process timers list
      on tick time.
      
      List traversal, expiry check and subsequent updates can be shared in a
      common helper.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      2473f3e7
    • F
      posix_cpu_timers: consolidate timer list cleanups · 1a7fa510
      Frederic Weisbecker 提交于
      Cleaning up the posix cpu timers on task exit shares some common code
      among timer list types, most notably the list traversal and expiry time
      update.
      
      Unify this in a common helper.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      1a7fa510
    • F
      posix_cpu_timer: consolidate expiry time type · 55ccb616
      Frederic Weisbecker 提交于
      The posix cpu timer expiry time is stored in a union of two types: a 64
      bits field if we rely on scheduler precise accounting, or a cputime_t if
      we rely on jiffies.
      
      This results in quite some duplicate code and special cases to handle the
      two types.
      
      Just unify this into a single 64 bits field.  cputime_t can always fit
      into it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      55ccb616
  3. 02 7月, 2013 3 次提交
    • T
      tick: Sanitize broadcast control logic · 07bd1172
      Thomas Gleixner 提交于
      The recent implementation of a generic dummy timer resulted in a
      different registration order of per cpu local timers which made the
      broadcast control logic go belly up.
      
      If the dummy timer is the first clock event device which is registered
      for a CPU, then it is installed, the broadcast timer is initialized
      and the CPU is marked as broadcast target.
      
      If a real clock event device is installed after that, we can fail to
      take the CPU out of the broadcast mask. In the worst case we end up
      with two periodic timer events firing for the same CPU. One from the
      per cpu hardware device and one from the broadcast.
      
      Now the problem is that we have no way to distinguish whether the
      system is in a state which makes broadcasting necessary or the
      broadcast bit was set due to the nonfunctional dummy timer
      installment.
      
      To solve this we need to keep track of the system state seperately and
      provide a more detailed decision logic whether we keep the CPU in
      broadcast mode or not.
      
      The old decision logic only clears the broadcast mode, if the newly
      installed clock event device is not affected by power states.
      
      The new logic clears the broadcast mode if one of the following is
      true:
      
        - The new device is not affected by power states.
      
        - The system is not in a power state affected mode
      
        - The system has switched to oneshot mode. The oneshot broadcast is
          controlled from the deep idle state. The CPU is not in idle at
          this point, so it's safe to remove it from the mask.
      
      If we clear the broadcast bit for the CPU when a new device is
      installed, we also shutdown the broadcast device when this was the
      last CPU in the broadcast mask.
      
      If the broadcast bit is kept, then we leave the new device in shutdown
      state and rely on the broadcast to deliver the timer interrupts via
      the broadcast ipis.
      Reported-and-tested-by: NStehle Vincent-B46079 <B46079@freescale.com>
      Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: John Stultz <john.stultz@linaro.org>,
      Cc: Mark Rutland <mark.rutland@arm.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      07bd1172
    • T
      tick: Prevent uncontrolled switch to oneshot mode · 1f73a980
      Thomas Gleixner 提交于
      When the system switches from periodic to oneshot mode, the broadcast
      logic causes a possibility that a CPU which has not yet switched to
      oneshot mode puts its own clock event device into oneshot mode without
      updating the state and the timer handler.
      
      CPU0				CPU1
      				per cpu tickdev is in periodic mode
      				and switched to broadcast
      
      Switch to oneshot mode
       tick_broadcast_switch_to_oneshot()
        cpumask_copy(tick_oneshot_broacast_mask,
      	       tick_broadcast_mask);
      
        broadcast device mode = oneshot
      
      				Timer interrupt
      						
      				irq_enter()
      				 tick_check_oneshot_broadcast()
      				  dev->set_mode(ONESHOT);
      
      				tick_handle_periodic()
      				 if (dev->mode == ONESHOT)
      				   dev->next_event += period;
      				   FAIL.
      
      We fail, because dev->next_event contains KTIME_MAX, if the device was
      in periodic mode before the uncontrolled switch to oneshot happened.
      
      We must copy the broadcast bits over to the oneshot mask, because
      otherwise a CPU which relies on the broadcast would not been woken up
      anymore after the broadcast device switched to oneshot mode.
      
      So we need to verify in tick_check_oneshot_broadcast() whether the CPU
      has already switched to oneshot mode. If not, leave the device
      untouched and let the CPU switch controlled into oneshot mode.
      
      This is a long standing bug, which was never noticed, because the main
      user of the broadcast x86 cannot run into that scenario, AFAICT. The
      nonarchitected timer mess of ARM creates a gazillion of differently
      broken abominations which trigger the shortcomings of that broadcast
      code, which better had never been necessary in the first place.
      Reported-and-tested-by: NStehle Vincent-B46079 <B46079@freescale.com>
      Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: John Stultz <john.stultz@linaro.org>,
      Cc: Mark Rutland <mark.rutland@arm.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1f73a980
    • T
      tick: Make oneshot broadcast robust vs. CPU offlining · c9b5a266
      Thomas Gleixner 提交于
      In periodic mode we remove offline cpus from the broadcast propagation
      mask. In oneshot mode we fail to do so. This was not a problem so far,
      but the recent changes to the broadcast propagation introduced a
      constellation which can result in a NULL pointer dereference.
      
      What happens is:
      
      CPU0			CPU1
      			idle()
      			  arch_idle()
      			    tick_broadcast_oneshot_control(OFF);
      			      set cpu1 in tick_broadcast_force_mask
      			  if (cpu_offline())
      			     arch_cpu_dead()
      
      cpu_dead_cleanup(cpu1)
       cpu1 tickdevice pointer = NULL
      
      broadcast interrupt
        dereference cpu1 tickdevice pointer -> OOPS
      
      We dereference the pointer because cpu1 is still set in
      tick_broadcast_force_mask and tick_do_broadcast() expects a valid
      cpumask and therefor lacks any further checks.
      
      Remove the cpu from the tick_broadcast_force_mask before we set the
      tick device pointer to NULL. Also add a sanity check to the oneshot
      broadcast function, so we can detect such issues w/o crashing the
      machine.
      Reported-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: athorlton@sgi.com
      Cc: CAI Qian <caiqian@redhat.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1306261303260.4013@ionos.tec.linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c9b5a266
  4. 01 7月, 2013 3 次提交
    • L
      Linux 3.10 · 8bb495e3
      Linus Torvalds 提交于
      8bb495e3
    • L
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · f0277dce
      Linus Torvalds 提交于
      Pull another powerpc fix from Benjamin Herrenschmidt:
       "I mentioned that while we had fixed the kernel crashes, EEH error
        recovery didn't always recover...  It appears that I had a fix for
        that already in powerpc-next (with a stable CC).
      
        I cherry-picked it today and did a few tests and it seems that things
        now work quite well.  The patch is also pretty simple, so I see no
        reason to wait before merging it."
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc/eeh: Fix fetching bus for single-dev-PE
      f0277dce
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 4b483802
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "This is a set of seven bug fixes.  Several fcoe fixes for locking
        problems, initiator issues and a VLAN API change, all of which could
        eventually lead to data corruption, one fix for a qla2xxx locking
        problem which could lead to multiple completions of the same request
        (and subsequent data corruption) and a use after free in the ipr
        driver.  Plus one minor MAINTAINERS file update"
      
      (only six bugfixes in this pull, since I had already pulled the fcoe API
      fix directly from Robert Love)
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        [SCSI] ipr: Avoid target_destroy accessing memory after it was freed
        [SCSI] qla2xxx: Fix for locking issue between driver ISR and mailbox routines
        MAINTAINERS: Fix fcoe mailing list
        libfc: extend ex_lock to protect all of fc_seq_send
        libfc: Correct check for initiator role
        libfcoe: Fix Conflicting FCFs issue in the fabric
      4b483802
  5. 30 6月, 2013 12 次提交
  6. 29 6月, 2013 8 次提交
  7. 28 6月, 2013 5 次提交
    • A
      mn10300: Use early_param() to parse "mem=" parameter · e3f12a53
      Akira Takeuchi 提交于
      This fixes the problem that "init=" options may not be passed to kernel
      correctly.
      
      parse_mem_cmdline() of mn10300 arch gets rid of "mem=" string from
      redboot_command_line. Then init_setup() parses the "init=" options from
      static_command_line, which is a copy of redboot_command_line, and keeps
      the pointer to the init options in execute_command variable.
      
      Since the commit 026cee00 upstream (params: <level>_initcall-like kernel
      parameters), static_command_line becomes overwritten by saved_command_line at
      do_initcall_level(). Notice that saved_command_line is a command line
      which includes "mem=" string.
      
      As a result, execute_command may point to weird string by the length of
      "mem=" parameter.
      I noticed this problem when using the command line like this:
      
          mem=128M console=ttyS0,115200 init=/bin/sh
      
      Here is the processing flow of command line parameters.
          start_kernel()
            setup_arch(&command_line)
               parse_mem_cmdline(cmdline_p)
                 * strcpy(boot_command_line, redboot_command_line);
                 * Remove "mem=xxx" from redboot_command_line.
                 * *cmdline_p = redboot_command_line;
            setup_command_line(command_line) <-- command_line is redboot_command_line
              * strcpy(saved_command_line, boot_command_line)
              * strcpy(static_command_line, command_line)
            parse_early_param()
              strlcpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
              parse_early_options(tmp_cmdline);
                parse_args("early options", cmdline, NULL, 0, 0, 0, do_early_param);
            parse_args("Booting ..", static_command_line, ...);
              init_setup() <-- save the pointer in execute_command
            rest_init()
              kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
      
      At this point, execute_command points to "/bin/sh" string.
      
          kernel_init()
            kernel_init_freeable()
              do_basic_setup()
                do_initcalls()
                  do_initcall_level()
                    (*) strcpy(static_command_line, saved_command_line);
      
      Here, execute_command gets to point to "200" string !!
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e3f12a53
    • A
      mn10300: Allow to pass array name to get_user() · c6dc9f0a
      Akira Takeuchi 提交于
      This fixes the following compile error:
      
      CC block/scsi_ioctl.o
      block/scsi_ioctl.c: In function 'sg_scsi_ioctl':
      block/scsi_ioctl.c:449: error: invalid initializer
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c6dc9f0a
    • B
      timer: Fix jiffies wrap behavior of round_jiffies_common() · 9e04d380
      Bart Van Assche 提交于
      Direct compare of jiffies related values does not work in the wrap
      around case. Replace it with time_is_after_jiffies().
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/519BC066.5080600@acm.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9e04d380
    • D
    • T
      powerpc/eeh: Add eeh_dev to the cache during boot · 1abd6018
      Thadeu Lima de Souza Cascardo 提交于
      commit f8f7d63f ("powerpc/eeh: Trace eeh
      device from I/O cache") broke EEH on pseries for devices that were
      present during boot and have not been hotplugged/DLPARed.
      
      eeh_check_failure will get the eeh_dev from the cache, and will get
      NULL. eeh_addr_cache_build adds the addresses to the cache, but eeh_dev
      for the giving pci_device is not set yet. Just reordering the call to
      eeh_addr_cache_insert_dev works fine. The ordering is similar to the one
      in eeh_add_device_late.
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Acked-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1abd6018
  8. 27 6月, 2013 1 次提交