1. 25 2月, 2009 5 次提交
  2. 24 2月, 2009 1 次提交
  3. 23 2月, 2009 3 次提交
  4. 22 2月, 2009 2 次提交
  5. 21 2月, 2009 4 次提交
  6. 20 2月, 2009 14 次提交
    • I
      x86: use the right protections for split-up pagetables · 07a66d7c
      Ingo Molnar 提交于
      Steven Rostedt found a bug in where in his modified kernel
      ftrace was unable to modify the kernel text, due to the PMD
      itself having been marked read-only as well in
      split_large_page().
      
      The fix, suggested by Linus, is to not try to 'clone' the
      reference protection of a huge-page, but to use the standard
      (and permissive) page protection bits of KERNPG_TABLE.
      
      The 'cloning' makes sense for the ptes but it's a confused and
      incorrect concept at the page table level - because the
      pagetable entry is a set of all ptes and hence cannot
      'clone' any single protection attribute - the ptes can be any
      mixture of protections.
      
      With the permissive KERNPG_TABLE, even if the pte protections
      get changed after this point (due to ftrace doing code-patching
      or other similar activities like kprobes), the resulting combined
      protections will still be correct and the pte's restrictive
      (or permissive) protections will control it.
      
      Also update the comment.
      
      This bug was there for a long time but has not caused visible
      problems before as it needs a rather large read-only area to
      trigger. Steve possibly hacked his kernel with some really
      large arrays or so. Anyway, the bug is definitely worth fixing.
      
      [ Huang Ying also experienced problems in this area when writing
        the EFI code, but the real bug in split_large_page() was not
        realized back then. ]
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Reported-by: NHuang Ying <ying.huang@intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      07a66d7c
    • A
      x86, vmi: TSC going backwards check in vmi clocksource · 48ffc70b
      Alok N Kataria 提交于
      Impact: fix time warps under vmware
      
      Similar to the check for TSC going backwards in the TSC clocksource,
      we also need this check for VMI clocksource.
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      48ffc70b
    • H
      x86, mce: use %ll instead of %L for 64-bit numbers · f6d1826d
      H. Peter Anvin 提交于
      Impact: Cleanup
      
      The standard spelling of a printf pattern for long long is "ll", not
      "L", which is for long double.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f6d1826d
    • A
      x86, mce: separate correct machine check poller and fatal exception handler · b79109c3
      Andi Kleen 提交于
      Impact: cleanup, performance enhancement
      
      The machine check poller is diverging more and more from the fatal
      exception handler. Instead of adding more special cases separate the code
      paths completely. The corrected poll path is actually quite simple,
      and this doesn't result in much code duplication.
      
      This makes both handlers much easier to read and results in
      cleaner code flow.  The exception handler now only needs to care
      about uncorrected errors, which also simplifies the handling of multiple
      errors. The corrected poller also now always runs in standard interrupt
      context and does not need to do anything special to handle NMI context.
      
      Minor behaviour changes:
      - MCG status is now not cleared on polling.
      - Only the banks which had corrected errors get cleared on polling
      - The exception handler only clears banks with errors now
      
      v2: Forward port to new patch order. Add "uc" argument.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      b79109c3
    • A
      x86, mce: factor out duplicated struct mce setup into one function · b5f2fa4e
      Andi Kleen 提交于
      Impact: cleanup
      
      This merely factors out duplicated code to set up
      the initial struct mce state into a single function.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      b5f2fa4e
    • A
      x86, mce: implement dynamic machine check banks support · 0d7482e3
      Andi Kleen 提交于
      Impact: cleanup; making code future proof; memory saving on small systems
      
      This patch replaces the hardcoded max number of machine check banks with 
      dynamic allocation depending on what the CPU reports. The sysfs
      data structures and the banks array are dynamically allocated.
      
      There is still a hard bank limit (128) because the mcelog protocol uses
      banks >= 128 as pseudo banks to escape other events. But we expect
      that 128 banks is beyond any reasonable CPU for now.
      
      This supersedes an earlier patch by Venki, but it solves the problem
      more completely by making the limit fully dynamic (up to the 128
      boundary).
      
      This saves some memory on machines with less than 6 banks because
      they won't need sysdevs for unused ones and also allows to 
      use sysfs to control these banks on possible future CPUs with
      more than 6 banks.
      
      This is an updated patch addressing Venki's comments.  I also added in
      another patch from Thomas which fixed the error allocation path (that
      patch was previously separated)
      
      Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0d7482e3
    • A
      x86, mce: enable machine checks in 64-bit defconfig · e35849e9
      Andi Kleen 提交于
      Impact: Low priority fix
      
      The 32-bit defconfig already had it enabled. And it's a pretty
      fundamental feature, so better enable it on 64 bits too.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      e35849e9
    • T
      [IA64] xen_domu build fix · ec8148de
      Tony Luck 提交于
      arch/ia64/xen/xen_pv_ops.c:156: error: xen_init_ops causes a section type conflict
      arch/ia64/xen/xen_pv_ops.c:340: error: xen_iosapic_ops causes a section type conflict
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      ec8148de
    • I
      [IA64] fixes configs and add default config for ia64 xen domU · 1d5b20f4
      Isaku Yamahata 提交于
      This patch fixes xen related Kconfigs and add default config
      file for ia64 xen domU.
      Signed-off-by: NIsaku Yamahata <yamahata@valinux.co.jp>
      Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>
      1d5b20f4
    • A
      [IA64] Remove redundant cpu_clear() in __cpu_disable path · c0acdea2
      Alex Chiang 提交于
      The second call to cpu_clear() is redundant, as we've already removed
      the CPU from cpu_online_map before calling migrate_platform_irqs().
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>
      c0acdea2
    • A
      [IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs" · 66db2e63
      Alex Chiang 提交于
      This reverts commit e7b14036.
      
      Commit e7b14036 removes the targetted disabled CPU from the
      cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.
      
      Paul McKenney states that the reasoning behind the patch was to
      prevent irq handlers from running on CPUs marked offline because:
      
      	RCU happily ignores CPUs that don't have their bits set in
      	cpu_online_map, so if there are RCU read-side critical sections
      	in the irq handlers being run, RCU will ignore them.  If the
      	other CPUs were running, they might sequence through the RCU
      	state machine, which could result in data structures being
      	yanked out from under those irq handlers, which in turn could
      	result in oopses or worse.
      
      Unfortunately, both ia64 functions above look at cpu_online_map to find
      a new CPU to migrate interrupts onto. This means we can potentially
      migrate an interrupt off ourself back to... ourself. Uh oh.
      
      This causes an oops when we finally try to process pending interrupts on
      the CPU we want to disable. The oops results from calling __do_IRQ with
      a NULL pt_regs:
      
      Unable to handle kernel NULL pointer dereference (address 0000000000000040)
      Call Trace:
       [<a000000100016930>] show_stack+0x50/0xa0
                                      sp=e0000009c922fa00 bsp=e0000009c92214d0
       [<a0000001000171a0>] show_regs+0x820/0x860
                                      sp=e0000009c922fbd0 bsp=e0000009c9221478
       [<a00000010003c700>] die+0x1a0/0x2e0
                                      sp=e0000009c922fbd0 bsp=e0000009c9221438
       [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
                                      sp=e0000009c922fbd0 bsp=e0000009c92213d8
       [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
                                      sp=e0000009c922fc60 bsp=e0000009c92213d8
       [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
                                      sp=e0000009c922fe30 bsp=e0000009c9221398
       [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
                                      sp=e0000009c922fe30 bsp=e0000009c9221330
       [<a00000010013a800>] handle_IRQ_event+0x80/0x120
                                      sp=e0000009c922fe30 bsp=e0000009c92212f8
       [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
                                      sp=e0000009c922fe30 bsp=e0000009c9221290
       [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
                                      sp=e0000009c922fe30 bsp=e0000009c9221208
       [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
                                      sp=e0000009c922fe30 bsp=e0000009c92211a8
       [<a00000010005bd80>] __cpu_disable+0x140/0x240
                                      sp=e0000009c922fe30 bsp=e0000009c9221168
       [<a0000001006c5870>] take_cpu_down+0x50/0xa0
                                      sp=e0000009c922fe30 bsp=e0000009c9221148
       [<a000000100122610>] stop_cpu+0xd0/0x200
                                      sp=e0000009c922fe30 bsp=e0000009c92210f0
       [<a0000001000e0440>] kthread+0xc0/0x140
                                      sp=e0000009c922fe30 bsp=e0000009c92210c8
       [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
                                      sp=e0000009c922fe30 bsp=e0000009c92210a0
       [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                      sp=e0000009c922fe30 bsp=e0000009c92210a0
      
      I don't like this revert because it is fragile. ia64 is getting lucky
      because we seem to only ever process timer interrupts in this path, but
      if we ever race with an IPI here, we definitely use RCU and have the
      potential of hitting an oops that Paul describes above.
      
      Patching ia64's timer_interrupt() to check for NULL pt_regs is
      insufficient though, as we still hit the above oops.
      
      As a short term solution, I do think that this revert is the right
      answer. The revert hold up under repeated testing (24+ hour test runs)
      with this setup:
      
      	- 8-way rx6600
      	- randomly toggling CPU online/offline state every 2 seconds
      	- running CPU exercisers, memory hog, disk exercisers, and
      	  network stressors
      	- average system load around ~160
      
      In the long term, we really need to figure out why we set pt_regs = NULL
      in ia64_process_pending_intr(). If it turns out that it is unnecessary
      to do so, then we could safely re-introduce e7b14036 (along with some
      other logic to be smarter about migrating interrupts).
      
      One final note: x86 also removes the disabled CPU from cpu_online_map
      and then re-enables interrupts for 1ms, presumably to handle any pending
      interrupts:
      
      arch/x86/kernel/irq_32.c (and irq_64.c):
      cpu_disable_common:
      	[remove cpu from cpu_online_map]
      
      	fixup_irqs():
      		for_each_irq:
      			[break CPU affinities]
      
      		local_irq_enable();
      		mdelay(1);
      		local_irq_disable();
      
      So they are doing implicitly what ia64 is doing explicitly.
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>
      66db2e63
    • R
      [IA64] bte_copy of BTE_MAX_XFER trips BUG_ON. · 39d481cb
      Robin Holt 提交于
      BTE_MAX_XFER is wrong.  It is one greater than the number of cache
      lines the BTE is actually able to transfer.  If you request a transfer
      of exactly BTE_MAX_XFER size, you trip a very cryptic BUG_ON() which
      should certainly be made more clear.
      
      This patch fixes that constant and also cleans up the BUG_ON()s in
      arch/ia64/sn/kernel/bte.c to test one condition per line.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>
      39d481cb
    • T
      [IA64] Build fix for __early_pfn_to_nid() undefined link error · 334f85b6
      Tony Luck 提交于
      ia64 only defines __early_pfn_to_nid() for SPARSEMEM && NUMA configurations,
      so the recent:
      
      	commit: f2dbcfa7
      	mm: clean up for early_pfn_to_nid()
      
      ends up with some link problems for certain configuration files.
      
      Fix arch/ia64/Kconfig to only define HAVE_ARCH_EARLY_PFN_TO_NID in the
      cases where we do provide this function.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      334f85b6
    • H
      [ARM] 5405/1: ep93xx: remove unused gesbc9312.h header · 9dd446f6
      Hartley Sweeten 提交于
      Remove the gesbc9312.h header since it is unused.
      Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      9dd446f6
  7. 19 2月, 2009 7 次提交
  8. 18 2月, 2009 4 次提交
    • H
      x86, mce: fix a race condition in mce_read() · ef41df43
      Huang Ying 提交于
      Impact: bugfix
      
      Considering the situation as follow:
      
      before: mcelog.next == 1, mcelog.entry[0].finished = 1
      
      +--------------------------------------------------------------------------
      R                   W1                  W2                  W3
      
      read mcelog.next (1)
                          mcelog.next++ (2)
                          (working on entry 1,
                          finished == 0)
      
      mcelog.next = 0
                                              mcelog.next++ (1)
                                              (working on entry 0)
                                                                 mcelog.next++ (2)
                                                                 (working on entry 1)
                              <----------------- race ---------------->
                          (done on entry 1,
                          finished = 1)
                                                                 (done on entry 1,
                                                                 finished = 1)
      
      To fix the race condition, a cmpxchg loop is added to mce_read() to
      ensure no new MCE record can be added between mcelog.next reading and
      mcelog.next = 0.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ef41df43
    • A
      x86, mce: disable machine checks on offlined CPUs · d6b75584
      Andi Kleen 提交于
      Impact: Lower priority bug fix
      
      Offlined CPUs could still get machine checks, but the machine check handler
      cannot handle them properly, leading to an unconditional crash. Disable
      machine checks on CPUs that are going down.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d6b75584
    • A
      x86, mce: don't set up mce sysdev devices with mce=off · 5b4408fd
      Andi Kleen 提交于
      Impact: bug fix, in this case the resume handler shouldn't run which
      	avoids incorrectly reenabling machine checks on resume
      
      When MCEs are completely disabled on the command line don't set
      up the sysdev devices for them either.
      
      Includes a comment fix from Thomas Gleixner.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5b4408fd
    • A
      x86, mce: switch machine check polling to per CPU timer · 52d168e2
      Andi Kleen 提交于
      Impact: Higher priority bug fix
      
      The machine check poller runs a single timer and then broadcasted an
      IPI to all CPUs to check them. This leads to unnecessary
      synchronization between CPUs. The original CPU running the timer has
      to wait potentially a long time for all other CPUs answering. This is
      also real time unfriendly and in general inefficient.
      
      This was especially a problem on systems with a lot of events where
      the poller run with a higher frequency after processing some events.
      There could be more and more CPU time wasted with this, to
      the point of significantly slowing down machines.
      
      The machine check polling is actually fully independent per CPU, so
      there's no reason to not just do this all with per CPU timers.  This
      patch implements that.
      
      Also switch the poller also to use standard timers instead of work
      queues. It was using work queues to be able to execute a user program
      on a event, but mce_notify_user() handles this case now with a
      separate callback. So instead always run the poll code in in a
      standard per CPU timer, which means that in the common case of not
      having to execute a trigger there will be less overhead.
      
      This allows to clean up the initialization significantly, because
      standard timers are already up when machine checks get init'ed.  No
      multiple initialization functions.
      
      Thanks to Thomas Gleixner for some help.
      
      Cc: thockin@google.com
      v2: Use del_timer_sync() on cpu shutdown and don't try to handle
      migrated timers.
      v3: Add WARN_ON for timer running on unexpected CPU
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      52d168e2