1. 30 3月, 2009 3 次提交
  2. 18 3月, 2009 1 次提交
  3. 17 3月, 2009 2 次提交
    • L
      Fast TSC calibration: calculate proper frequency error bounds · 9e8912e0
      Linus Torvalds 提交于
      In order for ntpd to correctly synchronize the clocks, the frequency of
      the system clock must not be off by more than 500 ppm (or, put another
      way, 1:2000), or ntpd will end up giving up on trying to synchronize
      properly, and ends up reseting the clock in jumps instead.
      
      The fast TSC PIT calibration sometimes failed this test - it was
      assuming that the PIT reads always took about one microsecond each (2us
      for the two reads to get a 16-bit timer), and that calibrating TSC to
      the PIT over 15ms should thus be sufficient to get much closer than
      500ppm (max 2us error on both sides giving 4us over 15ms: a 270 ppm
      error value).
      
      However, that assumption does not always hold: apparently some hardware
      is either very much slower at reading the PIT registers, or there was
      other noise causing at least one machine to get 700+ ppm errors.
      
      So instead of using a fixed 15ms timing loop, this changes the fast PIT
      calibration to read the TSC delta over the individual PIT timer reads,
      and use the result to calculate the error bars on the PIT read timing
      properly.  We then successfully calibrate the TSC only if the maximum
      error bars fall below 500ppm.
      
      In the process, we also relax the timing to allow up to 25ms for the
      calibration, although it can happen much faster depending on hardware.
      Reported-and-tested-by: NJesper Krogh <jesper@krogh.cc>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e8912e0
    • L
      Fix potential fast PIT TSC calibration startup glitch · a6a80e1d
      Linus Torvalds 提交于
      During bootup, when we reprogram the PIT (programmable interval timer)
      to start counting down from 0xffff in order to use it for the fast TSC
      calibration, we should also make sure to delay a bit afterwards to allow
      the PIT hardware to actually start counting with the new value.
      
      That will happens at the next CLK pulse (1.193182 MHz), so the easiest
      way to do that is to just wait at least one microsecond after
      programming the new PIT counter value.  We do that by just reading the
      counter value back once - which will take about 2us on PC hardware.
      Reported-and-tested-by: Njohn stultz <johnstul@us.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6a80e1d
  4. 11 3月, 2009 1 次提交
    • I
      x86, sched_clock(): mark variables read-mostly · f24ade3a
      Ingo Molnar 提交于
      Impact: micro-optimization
      
      There's a number of variables in the sched_clock() path that are
      in .data/.bss - but not marked __read_mostly. This creates the
      danger of accidental false cacheline sharing with some other,
      write-often variable.
      
      So mark them __read_mostly.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f24ade3a
  5. 10 3月, 2009 3 次提交
    • T
      percpu: generalize embedding first chunk setup helper · 66c3a757
      Tejun Heo 提交于
      Impact: code reorganization
      
      Separate out embedding first chunk setup helper from x86 embedding
      first chunk allocator and put it in mm/percpu.c.  This will be used by
      the default percpu first chunk allocator and possibly by other archs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      66c3a757
    • T
      percpu: more flexibility for @dyn_size of pcpu_setup_first_chunk() · 6074d5b0
      Tejun Heo 提交于
      Impact: cleanup, more flexibility for first chunk init
      
      Non-negative @dyn_size used to be allowed iff @unit_size wasn't auto.
      This restriction stemmed from implementation detail and made things a
      bit less intuitive.  This patch allows @dyn_size to be specified
      regardless of @unit_size and swaps the positions of @dyn_size and
      @unit_size so that the parameter order makes more sense (static,
      reserved and dyn sizes followed by enclosing unit_size).
      
      While at it, add @unit_size >= PCPU_MIN_UNIT_SIZE sanity check.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6074d5b0
    • D
      Revert "[CPUFREQ] Disable sysfs ui for p4-clockmod." · 129f8ae9
      Dave Jones 提交于
      This reverts commit e088e4c9.
      
      Removing the sysfs interface for p4-clockmod was flagged as a
      regression in bug 12826.
      
      Course of action:
       - Find out the remaining causes of overheating, and fix them
         if possible. ACPI should be doing the right thing automatically.
         If it isn't, we need to fix that.
       - mark p4-clockmod ui as deprecated
       - try again with the removal in six months.
      
      It's not really feasible to printk about the deprecation, because
      it needs to happen at all the sysfs entry points, which means adding
      a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch.
      Signed-off-by: NDave Jones <davej@redhat.com>
      129f8ae9
  6. 08 3月, 2009 1 次提交
    • C
      x86: UV: remove uv_flush_tlb_others() WARN_ON · 3a450de1
      Cliff Wickman 提交于
      In uv_flush_tlb_others() (arch/x86/kernel/tlb_uv.c),
      the "WARN_ON(!in_atomic())" fails if CONFIG_PREEMPT is not enabled.
      
      And CONFIG_PREEMPT is not enabled by default in the distribution that
      most UV owners will use.
      
      We could #ifdef CONFIG_PREEMPT the warning, but that is not good form.
      And there seems to be no suitable fix to in_atomic() when CONFIG_PREMPT
      is not on.
      
      As Ingo commented:
      
        > and we have no proper primitive to test for atomicity. (mainly
        > because we dont know about atomicity on a non-preempt kernel)
      
      So we drop the WARN_ON.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a450de1
  7. 06 3月, 2009 6 次提交
    • M
      x86, pebs: correct qualifier passed to ds_write_config() from ds_request_pebs() · 73bf1b62
      Markus Metzger 提交于
      ds_write_config() can write the BTS as well as the PEBS part of
      the DS config. ds_request_pebs() passes the wrong qualifier, which
      results in the wrong configuration to be written.
      Reported-by: NStephane Eranian <eranian@googlemail.com>
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      LKML-Reference: <20090305085721.A22550@sedona.ch.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      73bf1b62
    • M
      x86, bts: remove bad warning · 9ca0791d
      Markus Metzger 提交于
      In case a ptraced task is reaped (while the tracer is still attached),
      ds_exit_thread() is called before ptrace_exit(). The latter will
      release the bts_tracer and remove the thread's ds_ctx.
      The former will WARN() if the context is not NULL.
      
      Oleg Nesterov submitted patches that move ptrace_exit() before
      exit_thread() and thus reverse the order of the above calls.
      
      Remove the bad warning. I will add it again when Oleg's changes are in.
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      LKML-Reference: <20090305084954.A22000@sedona.ch.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9ca0791d
    • T
      x86, percpu: setup reserved percpu area for x86_64 · 6b19b0c2
      Tejun Heo 提交于
      Impact: fix relocation overflow during module load
      
      x86_64 uses 32bit relocations for symbol access and static percpu
      symbols whether in core or modules must be inside 2GB of the percpu
      segement base which the dynamic percpu allocator doesn't guarantee.
      This patch makes x86_64 reserve PERCPU_MODULE_RESERVE bytes in the
      first chunk so that module percpu areas are always allocated from the
      first chunk which is always inside the relocatable range.
      
      This problem exists for any percpu allocator but is easily triggered
      when using the embedding allocator because the second chunk is located
      beyond 2GB on it.
      
      This patch also changes the meaning of PERCPU_DYNAMIC_RESERVE such
      that it only indicates the size of the area to reserve for dynamic
      allocation as static and dynamic areas can be separate.  New
      PERCPU_DYNAMIC_RESERVED is increased by 4k for both 32 and 64bits as
      the reserved area separation eats away some allocatable space and
      having slightly more headroom (currently between 4 and 8k after
      minimal boot sans module area) makes sense for common case
      performance.
      
      x86_32 can address anywhere from anywhere and doesn't need reserving.
      
      Mike Galbraith first reported the problem first and bisected it to the
      embedding percpu allocator commit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMike Galbraith <efault@gmx.de>
      Reported-by: NJaswinder Singh Rajput <jaswinder@kernel.org>
      6b19b0c2
    • T
      percpu, module: implement reserved allocation and use it for module percpu variables · edcb4639
      Tejun Heo 提交于
      Impact: add reserved allocation functionality and use it for module
      	percpu variables
      
      This patch implements reserved allocation from the first chunk.  When
      setting up the first chunk, arch can ask to set aside certain number
      of bytes right after the core static area which is available only
      through a separate reserved allocator.  This will be used primarily
      for module static percpu variables on architectures with limited
      relocation range to ensure that the module perpcu symbols are inside
      the relocatable range.
      
      If reserved area is requested, the first chunk becomes reserved and
      isn't available for regular allocation.  If the first chunk also
      includes piggy-back dynamic allocation area, a separate chunk mapping
      the same region is created to serve dynamic allocation.  The first one
      is called static first chunk and the second dynamic first chunk.
      Although they share the page map, their different area map
      initializations guarantee they serve disjoint areas according to their
      purposes.
      
      If arch doesn't setup reserved area, reserved allocation is handled
      like any other allocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      edcb4639
    • T
      x86: make embedding percpu allocator return excessive free space · 9a4f8a87
      Tejun Heo 提交于
      Impact: reduce unnecessary memory usage on certain configurations
      
      Embedding percpu allocator allocates unit_size *
      smp_num_possible_cpus() bytes consecutively and use it for the first
      chunk.  However, if the static area is small, this can result in
      excessive prellocated free space in the first chunk due to
      PCPU_MIN_UNIT_SIZE restriction.
      
      This patch makes embedding percpu allocator preallocate only what's
      necessary as described by PERPCU_DYNAMIC_RESERVE and return the
      leftover to the bootmem allocator.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9a4f8a87
    • T
      percpu: use negative for auto for pcpu_setup_first_chunk() arguments · cafe8816
      Tejun Heo 提交于
      Impact: argument semantic cleanup
      
      In pcpu_setup_first_chunk(), zero @unit_size and @dyn_size meant
      auto-sizing.  It's okay for @unit_size as 0 doesn't make sense but 0
      dynamic reserve size is valid.  Alos, if arch @dyn_size is calculated
      from other parameters, it might end up passing in 0 @dyn_size and
      malfunction when the size is automatically adjusted.
      
      This patch makes both @unit_size and @dyn_size ssize_t and use -1 for
      auto sizing.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      cafe8816
  8. 05 3月, 2009 5 次提交
    • D
      [CPUFREQ] Prevent p4-clockmod from auto-binding to the ondemand governor. · 36e8abf3
      Dave Jones 提交于
      The latency of p4-clockmod sucks so hard that scaling on a regular
      basis with ondemand is a really bad idea.
      Signed-off-by: NMatthew Garrett <mjg59@srcf.ucam.org>
      Signed-off-by: NDave Jones <davej@redhat.com>
      36e8abf3
    • L
      x86: add Dell XPS710 reboot quirk · dd4124a8
      Leann Ogasawara 提交于
      Dell XPS710 will hang on reboot.  This is resolved by adding a quirk to
      set bios reboot.
      Signed-off-by: NLeann Ogasawara <leann.ogasawara@canonical.com>
      Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
      Cc: "manoj.iyer" <manoj.iyer@canonical.com>
      Cc: <stable@kernel.org>
      LKML-Reference: <1236196380.3231.89.camel@emiko>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dd4124a8
    • D
      x86, math-emu: fix init_fpu for task != current · ab9e1858
      Daniel Glöckner 提交于
      Impact: fix math-emu related crash while using GDB/ptrace
      
      init_fpu() calls finit to initialize a task's xstate, while finit always
      works on the current task. If we use PTRACE_GETFPREGS on another
      process and both processes did not already use floating point, we get
      a null pointer exception in finit.
      
      This patch creates a new function finit_task that takes a task_struct
      parameter. finit becomes a wrapper that simply calls finit_task with
      current. On the plus side this avoids many calls to get_current which
      would each resolve to an inline assembler mov instruction.
      
      An empty finit_task has been added to i387.h to avoid linker errors in
      case the compiler still emits the call in init_fpu when
      CONFIG_MATH_EMULATION is not defined.
      
      The declaration of finit in i387.h has been removed as the remaining
      code using this function gets its prototype from fpu_proto.h.
      Signed-off-by: NDaniel Glöckner <dg@emlix.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Bill Metzenthen <billm@melbpc.org.au>
      LKML-Reference: <E1Lew31-0004il-Fg@mailer.emlix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ab9e1858
    • H
      x86: EFI: Back efi_ioremap with init_memory_mapping instead of FIX_MAP · dd39ecf5
      Huang Ying 提交于
      Impact: Fix boot failure on EFI system with large runtime memory range
      
      Brian Maly reported that some EFI system with large runtime memory
      range can not boot. Because the FIX_MAP used to map runtime memory
      range is smaller than run time memory range.
      
      This patch fixes this issue by re-implement efi_ioremap() with
      init_memory_mapping().
      Reported-and-tested-by: NBrian Maly <bmaly@redhat.com>
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Cc: Brian Maly <bmaly@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1236135513.6204.306.camel@yhuang-dev.sh.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dd39ecf5
    • B
      x86: fix DMI on EFI · ff0c0874
      Brian Maly 提交于
      Impact: reactivate DMI quirks on EFI hardware
      
      DMI tables are loaded by EFI, so the dmi calls must happen after
      efi_init() and not before.
      
      Currently Apple hardware uses DMI to determine the framebuffer mappings
      for efifb. Without DMI working you also have no video on MacBook Pro.
      
      This patch resolves the DMI issue for EFI hardware (DMI is now properly
      detected at boot), and additionally efifb now loads on Apple hardware
      (i.e. video works).
      Signed-off-by: NBrian Maly <bmaly@redhat>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Cc: ying.huang@intel.com
      LKML-Reference: <49ADEDA3.1030406@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
       arch/x86/kernel/setup.c |    5 +++--
       1 file changed, 3 insertions(+), 2 deletions(-)
      ff0c0874
  9. 03 3月, 2009 2 次提交
  10. 02 3月, 2009 9 次提交
  11. 01 3月, 2009 1 次提交
  12. 28 2月, 2009 5 次提交
  13. 27 2月, 2009 1 次提交
    • I
      x86: set X86_FEATURE_TSC_RELIABLE · 83ce4009
      Ingo Molnar 提交于
      If the TSC is constant and non-stop, also set it reliable.
      
      (We will turn this off in DMI quirks for multi-chassis systems)
      
      The performance number on a 16-way Nehalem system running
      32 tasks that context-switch between each other is significant:
      
         sched_clock_stable=0		sched_clock_stable=1
         ....................         ....................
         22.456925 million/sec        24.306972 million/sec   [+8.2%]
      
      lmbench's "lat_ctx -s 0 2" goes from 0.63 microseconds to
      0.59 microseconds - a 6.7% increase in context-switching
      performance.
      
      Perfstat of 1 million pipe context switches between two tasks:
      
       Performance counter stats for './pipe-test-1m':
      
             [before]           [after]
         ............      ............
         37621.421089      36436.848378    task clock ticks     (msecs)
      
                    0                 0    CPU migrations       (events)
              2000274           2000189    context switches     (events)
                  194               193    pagefaults           (events)
           8433799643        8171016416    CPU cycles           (events) -3.21%
           8370133368        8180999694    instructions         (events) -2.31%
              4158565           3895941    cache references     (events) -6.74%
                44312             46264    cache misses         (events)
      
          2349.287976       2279.362465    wall-time            (msecs)  -3.06%
      
      The speedup comes straight from the reduction in the instruction
      count. sched_clock_cpu() got simpler and the whole workload thus
      executes faster.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      83ce4009