1. 27 7月, 2010 6 次提交
    • J
      timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset · 7615856e
      John Stultz 提交于
      update_vsyscall() did not provide the wall_to_monotoinc offset,
      so arch specific implementations tend to reference wall_to_monotonic
      directly. This limits future cleanups in the timekeeping core, so
      this patch fixes the update_vsyscall interface to provide
      wall_to_monotonic, allowing wall_to_monotonic to be made static
      as planned in Documentation/feature-removal-schedule.txt
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Tony Luck <tony.luck@intel.com>
      LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7615856e
    • J
      x86: Fix vtime/file timestamp inconsistencies · 8c73626a
      John Stultz 提交于
      Due to vtime calling vgettimeofday(), its possible that an application
      could call  time();create("stuff",O_RDRW);  only to see the file's
      creation timestamp to be before the value returned by time.
      
      A similar way to reproduce the issue is to compare the vsyscall time()
      with the syscall time(), and observe ordering issues.
      
      The modified test case from Oleg Nesterov below can illustrate this:
      
      int main(void)
      {
      	time_t sec1,sec2;
      	do {
      		sec1 = time(&sec2);
      		sec2 = syscall(__NR_time, NULL);
      	} while (sec1 <= sec2);
      
      	printf("vtime: %d.000000\n", sec1);
      	printf("time: %d.000000\n", sec2);
      	return 0;
      }
      
      The proper fix is to make vtime use the same time value as
      current_kernel_time() (which is exported via update_vsyscall) instead of
      vgettime().
      
      Thanks to Jiri Olsa for bringing up the issue and catching bugs in
      earlier verisons of this fix.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      LKML-Reference: <1279068988-21864-2-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      8c73626a
    • B
      [CPUFREQ] powernow-k8: Limit Pstate transition latency check · 3581ced3
      Borislav Petkov 提交于
      The Pstate transition latency check was added for broken F10h BIOSen
      which wrongly contain a value of 0 for transition and bus master
      latency. Fam11h and later, however, (will) have similar transition
      latency so extend that behavior for them too.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      3581ced3
    • M
      [CPUFREQ] Fix PCC driver error path · 179ee434
      Matthew Garrett 提交于
      The PCC cpufreq driver unmaps the mailbox address range if any CPUs fail to
      initialise, but doesn't do anything to remove the registered CPUs from the
      cpufreq core resulting in failures further down the line. We're better off
      simply returning a failure - the cpufreq core will unregister us cleanly if
      we end up with no successfully registered CPUs. Tidy up the failure path
      and also add a sanity check to ensure that the firmware gives us a realistic
      frequency - the core deals badly with that being set to 0.
      Signed-off-by: NMatthew Garrett <mjg@redhat.com>
      Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      179ee434
    • D
      [CPUFREQ] fix double freeing in error path of pcc-cpufreq · 3847d223
      Daniel J Blueman 提交于
      Prevent double freeing on error path.
      Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      3847d223
    • M
      [CPUFREQ] pcc driver should check for pcch method before calling _OSC · 47f8bcf3
      Matthew Garrett 提交于
      The pcc specification documents an _OSC method that's incompatible with the
      one defined as part of the ACPI spec. This shouldn't be a problem as both
      are supposed to be guarded with a UUID. Unfortunately approximately nobody
      (including HP, who wrote this spec) properly check the UUID on entry to the
      _OSC call. Right now this could result in surprising behaviour if the pcc
      driver performs an _OSC call on a machine that doesn't implement the pcc
      specification. Check whether the PCCH method exists first in order to reduce
      this probability.
      Signed-off-by: NMatthew Garrett <mjg@redhat.com>
      Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      47f8bcf3
  2. 25 7月, 2010 1 次提交
  3. 23 7月, 2010 2 次提交
  4. 22 7月, 2010 2 次提交
  5. 21 7月, 2010 2 次提交
    • Y
      x86, numa: fix boot without RAM on node0 again · 9aebbdb6
      Yinghai Lu 提交于
      Commit e534c7c5 ("numa: x86_64: use generic percpu var
      numa_node_id() implementation") broke numa systems that don't have ram
      on node0 when MEMORY_HOTPLUG is enabled, because cpu_up() will call
      cpu_to_node() before per_cpu(numa_node) is setup for APs.
      
      When Node0 doesn't have RAM, on x86, cpus already round it to nearest
      node with RAM in x86_cpu_to_node_map.  and per_cpu(numa_node) is not set
      up until in c_init for APs.
      
      When later cpu_up() calling cpu_to_node() will get 0 again, and make it
      online even there is no RAM on node0.  so later all APs can not booted up,
      and later will have panic.
      
      [    1.611101] On node 0 totalpages: 0
      .........
      [    2.608558] On node 0 totalpages: 0
      [    2.612065] Brought up 1 CPUs
      [    2.615199] Total of 1 processors activated (3990.31 BogoMIPS).
      ...
         93.225341] calling  loop_init+0x0/0x1a4 @ 1
      [   93.229314] PERCPU: allocation failed, size=80 align=8, failed to populate
      [   93.246539] Pid: 1, comm: swapper Tainted: G        W   2.6.35-rc4-tip-yh-04371-gd64e6c4-dirty #354
      [   93.264621] Call Trace:
      [   93.266533]  [<ffffffff81125e43>] pcpu_alloc+0x83a/0x8e7
      [   93.270710]  [<ffffffff81125f15>] __alloc_percpu+0x10/0x12
      [   93.285849]  [<ffffffff8140786c>] alloc_disk_node+0x94/0x16d
      [   93.291811]  [<ffffffff81407956>] alloc_disk+0x11/0x13
      [   93.306157]  [<ffffffff81503e51>] loop_alloc+0xa7/0x180
      [   93.310538]  [<ffffffff8277ef48>] loop_init+0x9b/0x1a4
      [   93.324909]  [<ffffffff8277eead>] ? loop_init+0x0/0x1a4
      [   93.329650]  [<ffffffff810001f2>] do_one_initcall+0x57/0x136
      [   93.345197]  [<ffffffff827486d0>] kernel_init+0x184/0x20e
      [   93.348146]  [<ffffffff81034954>] kernel_thread_helper+0x4/0x10
      [   93.365194]  [<ffffffff81c7cc3c>] ? restore_args+0x0/0x30
      [   93.369305]  [<ffffffff8274854c>] ? kernel_init+0x0/0x20e
      [   93.386011]  [<ffffffff81034950>] ? kernel_thread_helper+0x0/0x10
      [   93.392047] loop: out of memory
      ...
      
      Try to assign per_cpu(numa_node) early
      
      [akpm@linux-foundation.org: tidy up code comment]
      Signed-off-by: NYinghai <yinghai@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Acked-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9aebbdb6
    • A
      x86, i8259: Only register sysdev if we have a real 8259 PIC · 087b255a
      Adam Lackorzynski 提交于
      My platform makes use of the null_legacy_pic choice and oopses when doing
      a shutdown as the shutdown code goes through all the registered sysdevs
      and calls their shutdown method which in my case poke on a non-existing
      i8259.  Imho the i8259 specific sysdev should only be registered if the
      i8259 is actually there.
      
      Do not register the sysdev function when the null_legacy_pic is used so
      that the i8259 resume, suspend and shutdown functions are not called.
      Signed-off-by: NAdam Lackorzynski <adam@os.inf.tu-dresden.de>
      LKML-Reference: <201007202218.o6KMIJ3m020955@imap1.linux-foundation.org>
      Cc: Jacob Pan <jacob.jun.pan@intel.com>
      Cc: <stable@kernel.org> 2.6.34
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      087b255a
  6. 19 7月, 2010 1 次提交
  7. 17 7月, 2010 1 次提交
    • Y
      x86: Fix x2apic preenabled system with kexec · fd19dce7
      Yinghai Lu 提交于
      Found one x2apic system kexec loop test failed
      when CONFIG_NMI_WATCHDOG=y (old) or CONFIG_LOCKUP_DETECTOR=y (current tip)
      
      first kernel can kexec second kernel, but second kernel can not kexec third one.
      
      it can be duplicated on another system with BIOS preenabled x2apic.
      First kernel can not kexec second kernel.
      
      It turns out, when kernel boot with pre-enabled x2apic, it will not execute
      disable_local_APIC on shutdown path.
      
      when init_apic_mappings() is called in setup_arch, it will skip setting of
      apic_phys when x2apic_mode is set. ( x2apic_mode is much early check_x2apic())
      Then later, disable_local_APIC() will bail out early because !apic_phys.
      
      So check !x2apic_mode in x2apic_mode in disable_local_APIC with !apic_phys.
      
      another solution could be updating init_apic_mappings() to set apic_phys even
      for preenabled x2apic system. Actually even for x2apic system, that lapic
      address is mapped already in early stage.
      
      BTW: is there any x2apic preenabled system with apicid of boot cpu > 255?
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4C3EB22B.3000701@kernel.org>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: stable@kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      fd19dce7
  8. 15 7月, 2010 1 次提交
  9. 08 7月, 2010 3 次提交
  10. 03 7月, 2010 1 次提交
  11. 01 7月, 2010 1 次提交
  12. 30 6月, 2010 1 次提交
    • F
      x86: Send a SIGTRAP for user icebp traps · a1e80faf
      Frederic Weisbecker 提交于
      Before we had a generic breakpoint layer, x86 used to send a
      sigtrap for any debug event that happened in userspace,
      except if it was caused by lazy dr7 switches.
      
      Currently we only send such signal for single step or breakpoint
      events.
      
      However, there are three other kind of debug exceptions:
      
      - debug register access detected: trigger an exception if the
        next instruction touches the debug registers. We don't use
        it.
      - task switch, but we don't use tss.
      - icebp/int01 trap. This instruction (0xf1) is undocumented and
        generates an int 1 exception. Unlike single step through TF
        flag, it doesn't set the single step origin of the exception
        in dr6.
      
      icebp then used to be reported in userspace using trap signals
      but this have been incidentally broken with the new breakpoint
      code. Reenable this. Since this is the only debug event that
      doesn't set anything in dr6, this is all we have to check.
      
      This fixes a regression in Wine where World Of Warcraft got broken
      as it uses this for software protection checks purposes. And
      probably other apps do.
      Reported-and-tested-by: NAlexandre Julliard <julliard@winehq.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: 2.6.33.x 2.6.34.x <stable@kernel.org>
      a1e80faf
  13. 25 6月, 2010 1 次提交
  14. 20 6月, 2010 1 次提交
  15. 10 6月, 2010 3 次提交
  16. 02 6月, 2010 1 次提交
  17. 01 6月, 2010 2 次提交
  18. 31 5月, 2010 2 次提交
    • A
      x86/mm: Remove unused DBG() macro · e565813a
      Akinobu Mita 提交于
      DBG() macro for CONFIG_DEBUG_PER_CPU_MAPS is unused.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      LKML-Reference: <1274706291-13554-1-git-send-email-akinobu.mita@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e565813a
    • S
      perf_events: Fix event scheduling issues introduced by transactional API · 90151c35
      Stephane Eranian 提交于
      The transactional API patch between the generic and model-specific
      code introduced several important bugs with event scheduling, at
      least on X86. If you had pinned events, e.g., watchdog,  and were
      over-committing the PMU, you would get bogus counts. The bug was
      showing up on Intel CPU because events would move around more
      often that on AMD. But the problem also existed on AMD, though
      harder to expose.
      
      The issues were:
      
       - group_sched_in() was missing a cancel_txn() in the error path
      
       - cpuc->n_added was not properly maintained, leading to missing
         actions in hw_perf_enable(), i.e., n_running being 0. You cannot
         update n_added until you know the transaction has succeeded. In
         case of failed transaction n_added was not adjusted back.
      
       - in case of failed transactions, event_sched_out() was called
         and eventually invoked x86_disable_event() to touch the HW reg.
         But with transactions, on X86, event_sched_in() does not touch
         HW registers, it simply collects events into a list. Thus, you
         could end up calling x86_disable_event() on a counter which
         did not correspond to the current event when idx != -1.
      
      The patch modifies the generic and X86 code to avoid all those problems.
      
      First, we keep track of the number of events added last. In case the
      transaction fails, we substract them from n_added. This approach is
      necessary (as opposed to delaying updates to n_added) because not all
      event updates use the transaction API, e.g., single events.
      
      Second, we encapsulate the event_sched_in() and event_sched_out() in
      group_sched_in() inside the transaction. That makes the operations
      symmetrical and you can also detect that you are inside a transaction
      and skip the HW reg access by checking cpuc->group_flag.
      
      With this patch, you can now overcommit the PMU even with pinned
      system-wide events present and still get valid counts.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1274796225.5882.1389.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      90151c35
  19. 28 5月, 2010 3 次提交
  20. 27 5月, 2010 1 次提交
  21. 26 5月, 2010 2 次提交
    • B
      x86, k8: Fix section mismatch for powernowk8_exit() · fe501f1e
      Borislav Petkov 提交于
      Fix the following warning:
      
      "WARNING: arch/x86/kernel/built-in.o(.exit.text+0x72):
      Section mismatch in reference from the function powernowk8_exit() to the variable .cpuinit.data:cpb_nb
      
      The function __exit powernowk8_exit() references a variable
      __cpuinitdata cpb_nb. This is often seen when error handling in the exit
      function uses functionality in the init path. The fix is often to remove
      the __cpuinitdata annotation of cpb_nb so it may be used outside an init
      section."
      
      Cc: <stable@kernel.org>
      Reported-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <20100525152858.GA24836@aftab>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      fe501f1e
    • K
      driver core: add devname module aliases to allow module on-demand auto-loading · 578454ff
      Kay Sievers 提交于
      This adds:
        alias: devname:<name>
      to some common kernel modules, which will allow the on-demand loading
      of the kernel module when the device node is accessed.
      
      Ideally all these modules would be compiled-in, but distros seems too
      much in love with their modularization that we need to cover the common
      cases with this new facility. It will allow us to remove a bunch of pretty
      useless init scripts and modprobes from init scripts.
      
      The static device node aliases will be carried in the module itself. The
      program depmod will extract this information to a file in the module directory:
        $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname
        # Device nodes to trigger on-demand module loading.
        microcode cpu/microcode c10:184
        fuse fuse c10:229
        ppp_generic ppp c108:0
        tun net/tun c10:200
        dm_mod mapper/control c10:235
      
      Udev will pick up the depmod created file on startup and create all the
      static device nodes which the kernel modules specify, so that these modules
      get automatically loaded when the device node is accessed:
        $ /sbin/udevd --debug
        ...
        static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
        static_dev_create_from_modules: mknod '/dev/fuse' c10:229
        static_dev_create_from_modules: mknod '/dev/ppp' c108:0
        static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
        static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
        udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
        udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666
      
      A few device nodes are switched to statically allocated numbers, to allow
      the static nodes to work. This might also useful for systems which still run
      a plain static /dev, which is completely unsafe to use with any dynamic minor
      numbers.
      
      Note:
      The devname aliases must be limited to the *common* and *single*instance*
      device nodes, like the misc devices, and never be used for conceptually limited
      systems like the loop devices, which should rather get fixed properly and get a
      control node for losetup to talk to, instead of creating a random number of
      device nodes in advance, regardless if they are ever used.
      
      This facility is to hide the mess distros are creating with too modualized
      kernels, and just to hide that these modules are not compiled-in, and not to
      paper-over broken concepts. Thanks! :)
      
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Ian Kent <raven@themaw.net>
      Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      578454ff
  22. 25 5月, 2010 2 次提交