1. 03 7月, 2009 6 次提交
    • I
      x86: atomic64: Reduce size of functions · 3ac805d2
      Ingo Molnar 提交于
      cmpxchg8b is a huge instruction in terms of register footprint,
      we almost never want to inline it, not even within the same
      code module.
      
      GCC 4.3 still messes up for two functions, under-judging the
      true cost of this instruction - so annotate two key functions
      to reduce the bloat:
      
      arch/x86/lib/atomic64_32.o:
      
         text	   data	    bss	    dec	    hex	filename
         1763	      0	      0	   1763	    6e3	atomic64_32.o.before
          435	      0	      0	    435	    1b3	atomic64_32.o.after
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3ac805d2
    • I
      x86: atomic64: Improve atomic64_add_return() · 824975ef
      Ingo Molnar 提交于
      Linus noted (based on Eric Dumazet's numbers) that we would
      probably be better off not trying an atomic_read() in
      atomic64_add_return() but intead intentionally let the first
      cmpxchg8b fail - to get a cache-friendly 'give me ownership
      of this cacheline' transaction. That can then be followed
      by the real cmpxchg8b which sets the value local to the CPU.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      824975ef
    • E
      x86: atomic64: Improve cmpxchg8b() · 69237f94
      Eric Dumazet 提交于
      Rewrite cmpxchg8b() to not use %edi register but a generic "+m"
      constraint, to increase compiler freedom in code generation and
      possibly better code.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      69237f94
    • E
      x86: atomic64: Improve atomic64_read() · aacf682f
      Eric Dumazet 提交于
      Linus noticed that the 32-bit version of atomic64_read() was
      being overly complex with re-reading the value and doing a
      retry loop over that.
      
      Instead we can just rely on cmpxchg8b returning either the new
      value or returning the current value.
      
      We can use any 'old' value, which will be faster as it can be
      loaded via immediates. Using some value that is not equal to
      the real value in memory the instruction gets faster.
      
      This also has the advantage that the CPU could avoid dirtying
      the cacheline.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aacf682f
    • I
      x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file · b7882b7c
      Ingo Molnar 提交于
      Linus noted that the atomic64_t primitives are all inlines
      currently which is crazy because these functions have a large
      register footprint anyway.
      
      Move them to a separate file: arch/x86/lib/atomic64_32.c
      
      Also, while at it, rename all uses of 'unsigned long long' to
      the much shorter u64.
      
      This makes the appearance of the prototypes a lot nicer - and
      it also uncovered a few bugs where (yet unused) API variants
      had 'long' as their return type instead of u64.
      
      [ More intrusive changes are not yet done in this patch. ]
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7882b7c
    • E
      x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too · bbf2a330
      Eric Dumazet 提交于
      Locked instructions on two cache lines at once are painful. If
      atomic64_t uses two cache lines, my test program is 10x slower.
      
      The chance for that is significant: 4/32 or 12.5%.
      
      Make sure an atomic64_t is 8 bytes aligned.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      [ changed it to __aligned(8) as per Andrew's suggestion ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bbf2a330
  2. 02 7月, 2009 1 次提交
    • F
      perf_counter: Ignore the nmi call frames in the x86-64 backtraces · 0406ca6d
      Frederic Weisbecker 提交于
      About every callchains recorded with perf record are filled up
      including the internal perfcounter nmi frame:
      
       perf_callchain
       perf_counter_overflow
       intel_pmu_handle_irq
       perf_counter_nmi_handler
       notifier_call_chain
       atomic_notifier_call_chain
       notify_die
       do_nmi
       nmi
      
      We want ignore this frame as it's not interesting for
      instrumentation. To solve this, we simply ignore every frames
      from nmi context.
      
      New example of "perf report -s sym -c" after this patch:
      
      9.59%  [k] search_by_key
                   4.88%
                      search_by_key
                      reiserfs_read_locked_inode
                      reiserfs_iget
                      reiserfs_lookup
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      vfs_fstatat
                      vfs_lstat
                      sys_newlstat
                      system_call_fastpath
                      __lxstat
                      0x406fb1
      
                   3.19%
                      search_by_key
                      search_by_entry_key
                      reiserfs_find_entry
                      reiserfs_lookup
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      vfs_fstatat
                      vfs_lstat
                      sys_newlstat
                      system_call_fastpath
                      __lxstat
                      0x406fb1
      [...]
      
      For now this patch only solves the problem in x86-64.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246474930-6088-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0406ca6d
  3. 01 7月, 2009 8 次提交
  4. 29 6月, 2009 2 次提交
  5. 28 6月, 2009 8 次提交
  6. 27 6月, 2009 1 次提交
  7. 26 6月, 2009 14 次提交
    • B
      powerpc/mm: Fix potential access to freed pages when using hugetlbfs · 6c16a74d
      Benjamin Herrenschmidt 提交于
      When using 64k page sizes, our PTE pages are split in two halves,
      the second half containing the "extension" used to keep track of
      individual 4k pages when not using HW 64k pages.
      
      However, our page tables used for hugetlb have a slightly different
      format and don't carry that "second half".
      
      Our code that batched PTEs to be invalidated unconditionally reads
      the "second half" (to put it into the batch), which means that when
      called to invalidate hugetlb PTEs, it will access unrelated memory.
      
      It breaks when CONFIG_DEBUG_PAGEALLOC is enabled.
      
      This fixes it by only accessing the second half when the _PAGE_COMBO
      bit is set in the first half, which indicates that we are dealing with
      a "combo" page which represents 16x4k subpages. Anything else shouldn't
      have this bit set and thus not require loading from the second half.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6c16a74d
    • B
      powerpc/440: Fix warning early debug code · f694cda8
      Benjamin Herrenschmidt 提交于
      The function udbg_44x_as1_flush() has the wrong prototype causing
      a warning when enabling 440 early debug.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f694cda8
    • B
      powerpc/of: Fix usage of dev_set_name() in of_device_alloc() · 03c01aa7
      Benjamin Herrenschmidt 提交于
      dev_set_name() takes a format string, so use it properly and avoid
      a warning with recent gcc's
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      03c01aa7
    • B
      powerpc/pasemi: Use raw spinlock in SMP TB sync · 6893ce6c
      Benjamin Herrenschmidt 提交于
      spin_lock() can hang if called while the timebase is frozen,
      so use a raw lock instead, also disable interrupts while
      at it.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6893ce6c
    • B
      powerpc: Use one common impl. of RTAS timebase sync and use raw spinlock · c4007a2f
      Benjamin Herrenschmidt 提交于
      Several platforms use their own copy of what is essentially the same code,
      using RTAS to synchronize the timebases when bringing up new CPUs. This
      moves it all into a single common implementation and additionally
      turns the spinlock into a raw spinlock since the former can rely on
      the timebase not being frozen when spinlock debugging is enabled, and finally
      masks interrupts while the timebase is disabled.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c4007a2f
    • B
      powerpc/rtas: Turn rtas lock into a raw spinlock · f97bb36f
      Benjamin Herrenschmidt 提交于
      RTAS currently uses a normal spinlock. However it can be called from
      contexts where this is not necessarily a good idea. For example, it
      can be called while syncing timebases, with the core timebase being
      frozen. Unfortunately, that will deadlock in case of lock contention
      when spinlock debugging is enabled as the spin lock debugging code
      will try to use __delay() which ... relies on the timebase being
      enabled.
      
      Also RTAS can be used in some low level IRQ handling code path so it
      may as well be a raw spinlock for -rt sake.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f97bb36f
    • B
      powerpc: Add irqtrace support for 32-bit powerpc · 5d38902c
      Benjamin Herrenschmidt 提交于
      Based on initial work from: Dale Farnsworth <dale@farnsworth.org>
      
      Add the low level irq tracing hooks for 32-bit powerpc needed
      to enable full lockdep functionality.
      
      The approach taken to deal with the code in entry_32.S is that
      we don't trace all the transitions of MSR:EE when we just turn
      it off to peek at TI_FLAGS without races. Only when we are
      calling into C code or returning from exceptions with a state
      that have changed from what lockdep thinks.
      
      There's a little bugger though: If we take an exception that
      keeps interrupts enabled (such as an alignment exception) while
      interrupts are enabled, we will call trace_hardirqs_on() on the
      way back spurriously. Not a big deal, but to get rid of it would
      require remembering in pt_regs that the exception was one of the
      type that kept interrupts enabled which we don't know at this
      stage. (Well, we could test all cases for regs->trap but that
      sucks too much).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: NKumar Gala <galak@kernel.crashing.org>
      5d38902c
    • B
      powerpc: Map more memory early on 601 processors · 4a5cbf17
      Benjamin Herrenschmidt 提交于
      The 32-bit kernel relies on some memory being mapped covering
      the kernel text,data and bss at least, early during boot before
      the full MMU setup is done. On 32-bit "classic" processors, this
      is done using BAT registers.
      
      On 601, the size of BATs is limited to 8M and we use 2 of them
      for that initial mapping. This can become quite tight when enabling
      features like lockdep, so let's use a 3rd one to bump that mapping
      from 16M to 24M. We keep the 4th BAT free as it can be useful for
      debugging early boot code to map things like serial ports.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4a5cbf17
    • B
      powerpc/mm: Make k(un)map_atomic out of line · 850f6ac3
      Benjamin Herrenschmidt 提交于
      Those functions are way too big to be inline, besides, kmap_atomic()
      wants to call debug_kmap_atomic() which isn't exported for modules
      and causes module link failures.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      850f6ac3
    • K
      powerpc: Fix mpic alloc warning · 85355bb2
      Kumar Gala 提交于
      Since we can use kmalloc earlier we are getting the following since the
      mpic_alloc() code calls alloc_bootmem().  Move to using kzalloc() to
      remove the warning.
      
      ------------[ cut here ]------------
      Badness at c0583248 [verbose debug info unavailable]
      NIP: c0583248 LR: c0583210 CTR: 00000004
      REGS: c0741de0 TRAP: 0700   Not tainted  (2.6.30-06736-g12a31df)
      MSR: 00021000 <ME,CE>  CR: 22024024  XER: 00000000
      TASK = c070d3b8[0] 'swapper' THREAD: c0740000 CPU: 0
      <6>GPR00: 00000001 c0741e90 c070d3b8 00000001 00000210 00000020 3fffffff 00000000
      <6>GPR08: 00000000 c0c85700 c04f8c40 0000002d 22044022 1004a388 7ffd9400 00000000
      <6>GPR16: 00000000 7ffcd100 7ffcd100 7ffcd100 c04f8c40 00000000 c059f62c c075a0c0
      <6>GPR24: c059f648 00000000 0000000f 00000210 00000020 00000000 3fffffff 00000210
      NIP [c0583248] alloc_arch_preferred_bootmem+0x50/0x80
      LR [c0583210] alloc_arch_preferred_bootmem+0x18/0x80
      Call Trace:
      [c0741e90] [c07343b0] devtree_lock+0x0/0x24 (unreliable)
      [c0741ea0] [c0583b14] ___alloc_bootmem_nopanic+0x54/0x108
      [c0741ee0] [c0583e18] ___alloc_bootmem+0x18/0x50
      [c0741ef0] [c057b9cc] mpic_alloc+0x48/0x710
      [c0741f40] [c057ecf4] mpc85xx_ds_pic_init+0x190/0x1b8
      [c0741f90] [c057633c] init_IRQ+0x24/0x34
      [c0741fa0] [c05738b8] start_kernel+0x260/0x3dc
      [c0741ff0] [c00003c8] skpinv+0x2e0/0x31c
      Instruction dump:
      409e001c 7c030378 80010014 83e1000c 38210010 7c0803a6 4e800020 3d20c0c8
      39295700 80090004 7c000034 5400d97e <0f000000> 2f800000 409e001c 38800000
      
      BenH: Changed to use GFP_KERNEL, the allocator will do the right thing
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      85355bb2
    • K
      powerpc: Fix output from show_regs · a2367194
      Kumar Gala 提交于
      For some reason we've had an explicit KERN_INFO for GPR dumps.  With
      recent changes we get output like:
      
      <6>GPR00: 00000000 ef855eb0 ef858000 00000001 000000d0 f1000000 ffbc8000 ffffffff
      
      The KERN_INFO is causing the <6>.  Don't see any reason to keep it
      around.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a2367194
    • B
      powerpc/pmac: Fix issues with PowerMac "PowerSurge" SMP · 7ccbe504
      Benjamin Herrenschmidt 提交于
      The old PowerSurge SMP (ie, dual or quad 604 machines) code has
      numerous issues in modern world.
      
      One is cpu_possible_map is set too late (the device-tree is bogus)
      so we fail to allocate the interrupt stacks and crash. Another
      problem is the fact the timebase is frozen by the bringup of the
      second CPU so the delays in the generic code will hang, we need
      to move some of the calling procedure to inside the powermac code.
      
      This makes it boot again for me
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7ccbe504
    • G
      powerpc/amigaone: Limit ISA I/O range to 4k in the device tree · 6bb2ae53
      Gerhard Pircher 提交于
      The kernel reserves the I/O address space from 0x0 to 0xfff for legacy
      ISA devices. Change the ranges property for the PCI2ISA bridge to match
      the kernels behavior, even if the ranges property isn't used for now.
      Signed-off-by: NGerhard Pircher <gerhard_pircher@gmx.net>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6bb2ae53
    • S
      powerpc/warp: Platform fix for i2c change · 3984114f
      Sean MacLennan 提交于
      A change to the i2c subsystem breaks the warp platform code. The patch
      is cleaner anyway, the old way was a bit crufty.
      
      For those with keen eyes, the gratuitous change in the string from
      PIKA to Warp is just so the logs look a bit nicer. The following two
      lines tend to be printed one after another.
      
        Warp POST OK
        Warp DTM thread running.
      
      Yeah, this will be the third patch to warp.c submitted in this
      release....
      
      Cheers,
         Sean
      
      The i2c_client struct changed, breaking the code that looked for the ad7414
      chip. Use the new of_find_i2c_device_by_node function added in 2.6.29.
      Signed-off-by: NSean MacLennan <smaclennan@pikatech.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3984114f