1. 29 10月, 2009 2 次提交
    • R
      percpu: remove per_cpu__ prefix. · dd17c8f7
      Rusty Russell 提交于
      Now that the return from alloc_percpu is compatible with the address
      of per-cpu vars, it makes sense to hand around the address of per-cpu
      variables.  To make this sane, we remove the per_cpu__ prefix we used
      created to stop people accidentally using these vars directly.
      
      Now we have sparse, we can use that (next patch).
      
      tj: * Updated to convert stuff which were missed by or added after the
            original patch.
      
          * Kill per_cpu_var() macro.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      dd17c8f7
    • T
      percpu: make percpu symbols in ia64 unique · 877105cc
      Tejun Heo 提交于
      This patch updates percpu related symbols in ia64 such that percpu
      symbols are unique and don't clash with local symbols.  This serves
      two purposes of decreasing the possibility of global percpu symbol
      collision and allowing dropping per_cpu__ prefix from percpu symbols.
      
      * arch/ia64/kernel/setup.c: s/cpu_info/ia64_cpu_info/
      
      Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
      which cause name clashes" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: linux-ia64@vger.kernel.org
      877105cc
  2. 02 10月, 2009 1 次提交
    • T
      ia64: don't alias VMALLOC_END to vmalloc_end · 126b3fcd
      Tejun Heo 提交于
      If CONFIG_VIRTUAL_MEM_MAP is enabled, ia64 defines macro VMALLOC_END
      as unsigned long variable vmalloc_end which is adjusted to prepare
      room for vmemmap.  This becomes probnlematic if a local variables
      vmalloc_end is defined in some function (not very unlikely) and
      VMALLOC_END is used in the function - the function thinks its
      referencing the global VMALLOC_END value but would be referencing its
      own local vmalloc_end variable.
      
      There's no reason VMALLOC_END should be a macro.  Just define it as an
      unsigned long variable if CONFIG_VIRTUAL_MEM_MAP is set to avoid nasty
      surprises.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: linux-ia64 <linux-ia64@vger.kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      126b3fcd
  3. 27 9月, 2009 1 次提交
  4. 25 9月, 2009 1 次提交
    • T
      [IA64] implement ticket locks for Itanium · 2c86963b
      Tony Luck 提交于
      Back in January 2008 Nick Piggin implemented "ticket" spinlocks
      for X86 (See commit 314cdbef).
      
      IA64 implementation has a couple of differences because of the
      available atomic operations ... e.g. we have no fetchadd2 instruction
      that operates on a 16-bit quantity so we make ticket locks use
      a 32-bit word for each of the current ticket and now-serving values.
      
      Performance on uncontended locks is about 8% worse than the previous
      implementation, but this seems a good trade for determinism in the
      contended case. Performance impact on macro-level benchmarks is in
      the noise.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      2c86963b
  5. 24 9月, 2009 3 次提交
  6. 22 9月, 2009 2 次提交
  7. 16 9月, 2009 1 次提交
    • P
      sched: Disable wakeup balancing · 182a85f8
      Peter Zijlstra 提交于
      Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't
      really mind too much, SD_BALANCE_NEWIDLE picks up most of the
      slack.
      
      On a dual socket, quad core, dual thread nehalem system:
      
      sysbench (--num_threads=16):
      
       SD_BALANCE_WAKE-: 13982 tx/s
       SD_BALANCE_WAKE+: 15688 tx/s
      
      kbuild (-j16):
      
       SD_BALANCE_WAKE-: 47.648295846  seconds time elapsed   ( +-   0.312% )
       SD_BALANCE_WAKE+: 47.608607360  seconds time elapsed   ( +-   0.026% )
      
      (same within noise)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      182a85f8
  8. 15 9月, 2009 6 次提交
    • P
      sched: Reduce forkexec_idx · b8a543ea
      Peter Zijlstra 提交于
      If we're looking to place a new task, we might as well find the
      idlest position _now_, not 1 tick ago.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b8a543ea
    • M
      sched: Improve latencies and throughput · 0ec9fab3
      Mike Galbraith 提交于
      Make the idle balancer more agressive, to improve a
      x264 encoding workload provided by Jason Garrett-Glaser:
      
       NEXT_BUDDY NO_LB_BIAS
       encoded 600 frames, 252.82 fps, 22096.60 kb/s
       encoded 600 frames, 250.69 fps, 22096.60 kb/s
       encoded 600 frames, 245.76 fps, 22096.60 kb/s
      
       NO_NEXT_BUDDY LB_BIAS
       encoded 600 frames, 344.44 fps, 22096.60 kb/s
       encoded 600 frames, 346.66 fps, 22096.60 kb/s
       encoded 600 frames, 352.59 fps, 22096.60 kb/s
      
       NO_NEXT_BUDDY NO_LB_BIAS
       encoded 600 frames, 425.75 fps, 22096.60 kb/s
       encoded 600 frames, 425.45 fps, 22096.60 kb/s
       encoded 600 frames, 422.49 fps, 22096.60 kb/s
      
      Peter pointed out that this is better done via newidle_idx,
      not via LB_BIAS, newidle balancing should look for where
      there is load _now_, not where there was load 2 ticks ago.
      
      Worst-case latencies are improved as well as no buddies
      means less vruntime spread. (as per prior lkml discussions)
      
      This change improves kbuild-peak parallelism as well.
      Reported-by: NJason Garrett-Glaser <darkshikari@gmail.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1253011667.9128.16.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0ec9fab3
    • P
      sched: Tweak wake_idx · 78e7ed53
      Peter Zijlstra 提交于
      When merging select_task_rq_fair() and sched_balance_self() we lost
      the use of wake_idx, restore that and set them to 0 to make wake
      balancing more aggressive.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78e7ed53
    • P
      sched: Merge select_task_rq_fair() and sched_balance_self() · c88d5910
      Peter Zijlstra 提交于
      The problem with wake_idle() is that is doesn't respect things like
      cpu_power, which means it doesn't deal well with SMT nor the recent
      RT interaction.
      
      To cure this, it needs to do what sched_balance_self() does, which
      leads to the possibility of merging select_task_rq_fair() and
      sched_balance_self().
      
      Modify sched_balance_self() to:
      
        - update_shares() when walking up the domain tree,
          (it only called it for the top domain, but it should
           have done this anyway), which allows us to remove
          this ugly bit from try_to_wake_up().
      
        - do wake_affine() on the smallest domain that contains
          both this (the waking) and the prev (the wakee) cpu for
          WAKE invocations.
      
      Then use the top-down balance steps it had to replace wake_idle().
      
      This leads to the dissapearance of SD_WAKE_BALANCE and
      SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.
      
      SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.
      
      Touch all topology bits to replace the old with new SD flags --
      platforms might need re-tuning, enabling SD_BALANCE_WAKE
      conditionally on a NUMA distance seems like a good additional
      feature, magny-core and small nehalem systems would want this
      enabled, systems with slow interconnects would not.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c88d5910
    • H
      [IA64] kexec: Make INIT safe while transition to · 07a6a4ae
      Hidetoshi Seto 提交于
      kdump/kexec kernel
      
      Summary:
      
        Asserting INIT on the beginning of kdump/kexec kernel will result
        in unexpected behavior because INIT handler for previous kernel is
        invoked on new kernel.
      
      Description:
      
        In panic situation, we can receive INIT while kernel transition,
        i.e. from beginning of panic to bootstrap of kdump kernel.
        Since we initialize registers on leave from current kernel, no
        longer monarch/slave handlers of current kernel in virtual mode are
        called safely.  (In fact system goes hang as far as I confirmed)
      
      How to Reproduce:
      
        Start kdump
          # echo c > /proc/sysrq-trigger
        Then assert INIT while kdump kernel is booting, before new INIT
        handler for kdump kernel is registered.
      
      Expected(Desirable) result:
      
        kdump kernel boots without any problem, crashdump retrieved
      
      Actual result:
      
        INIT handler for previous kernel is invoked on kdump kernel
        => panic, hang etc. (unexpected)
      
      Proposed fix:
      
        We can unregister these init handlers from SAL before jumping into
        new kernel, however then the INIT will fallback to default behavior,
        result in warmboot by SAL (according to the SAL specification) and
        we cannot retrieve the crashdump.
      
        Therefore this patch introduces a NOP init handler and register it
        to SAL before leave from current kernel, to start kdump safely by
        preventing INITs from entering virtual mode and resulting in warmboot.
      
        On the other hand, in case of kexec that not for kdump, it also
        has same problem with INIT while kernel transition.
        This patch handles this case differently, because for kexec
        unregistering handlers will be preferred than registering NOP
        handler, since the situation "no handlers registered" is usual
        state for kernel's entry.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Haren Myneni <hbabu@us.ibm.com>
      Cc: kexec@lists.infradead.org
      Acked-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      07a6a4ae
    • H
      [IA64] kdump: Mask MCA/INIT on frozen cpus · 4295ab34
      Hidetoshi Seto 提交于
      Summary:
      
        INIT asserted on kdump kernel invokes INIT handler not only on a
        cpu that running on the kdump kernel, but also BSP of the panicked
        kernel, because the (badly) frozen BSP can be thawed by INIT.
      
      Description:
      
        The kdump_cpu_freeze() is called on cpus except one that initiates
        panic and/or kdump, to stop/offline the cpu (on ia64, it means we
        pass control of cpus to SAL, or put them in spinloop).  Note that
        CPU0(BSP) always go to spinloop, so if panic was happened on an AP,
        there are at least 2cpus (= the AP and BSP) which not back to SAL.
      
        On the spinning cpus, interrupts are disabled (rsm psr.i), but INIT
        is still interruptible because psr.mc for mask them is not set unless
        kdump_cpu_freeze() is not called from MCA/INIT context.
      
        Therefore, assume that a panic was happened on an AP, kdump was
        invoked, new INIT handlers for kdump kernel was registered and then
        an INIT is asserted.  From the viewpoint of SAL, there are 2 online
        cpus, so INIT will be delivered to both of them.  It likely means
        that not only the AP (= a cpu executing kdump) enters INIT handler
        which is newly registered, but also BSP (= another cpu spinning in
        panicked kernel) enters the same INIT handler.  Of course setting of
        registers in BSP are still old (for panicked kernel), so what happen
        with running handler with wrong setting will be extremely unexpected.
        I believe this is not desirable behavior.
      
      How to Reproduce:
      
        Start kdump on one of APs (e.g. cpu1)
          # taskset 0x2 echo c > /proc/sysrq-trigger
        Then assert INIT after kdump kernel is booted, after new INIT handler
        for kdump kernel is registered.
      
      Expected results:
      
        An INIT handler is invoked only on the AP.
      
      Actual results:
      
        An INIT handler is invoked on the AP and BSP.
      
      Sample of results:
      
        I got following console log by asserting INIT after prompt "root:/>".
        It seems that two monarchs appeared by one INIT, and one panicked at
        last.  And it also seems that the panicked one supposed there were
        4 online cpus and no one did rendezvous:
      
          :
          [  0 %]dropping to initramfs shell
          exiting this shell will reboot your system
          root:/> Entered OS INIT handler. PSP=fff301a0 cpu=0 monarch=0
          ia64_init_handler: Promoting cpu 0 to monarch.
          Delaying for 5 seconds...
          All OS INIT slaves have reached rendezvous
          Processes interrupted by INIT - 0 (cpu 0 task 0xa000000100af0000)
          :
          <<snip>>
          :
          Entered OS INIT handler. PSP=fff301a0 cpu=0 monarch=1
          Delaying for 5 seconds...
          mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
          OS INIT slave did not rendezvous on cpu 1 2 3
          INIT swapper 0[0]: bugcheck! 0 [1]
          :
          <<snip>>
          :
          Kernel panic - not syncing: Attempted to kill the idle task!
      
      Proposed fix:
      
        To avoid this problem, this patch inserts ia64_set_psr_mc() to mask
        INIT on cpus going to be frozen.  This masking have no effect if the
        kdump_cpu_freeze() is called from INIT handler when kdump_on_init == 1,
        because psr.mc is already turned on to 1 before entering OS_INIT.
        I confirmed that weird log like above are disappeared after applying
        this patch.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Haren Myneni <hbabu@us.ibm.com>
      Cc: kexec@lists.infradead.org
      Acked-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      4295ab34
  9. 10 9月, 2009 4 次提交
  10. 12 8月, 2009 2 次提交
  11. 10 8月, 2009 1 次提交
  12. 06 8月, 2009 2 次提交
    • J
      net: implement a SO_DOMAIN getsockoption · 0d6038ee
      Jan Engelhardt 提交于
      This sockopt goes in line with SO_TYPE and SO_PROTOCOL. It makes it
      possible for userspace programs to pass around file descriptors — I
      am referring to arguments-to-functions, but it may even work for the
      fd passing over UNIX sockets — without needing to also pass the
      auxiliary information (PF_INET6/IPPROTO_TCP).
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d6038ee
    • J
      net: implement a SO_PROTOCOL getsockoption · 49c794e9
      Jan Engelhardt 提交于
      Similar to SO_TYPE returning the socket type, SO_PROTOCOL allows to
      retrieve the protocol used with a given socket.
      
      I am not quite sure why we have that-many copies of socket.h, and why
      the values are not the same on all arches either, but for where hex
      numbers dominate, I use 0x1029 for SO_PROTOCOL as that seems to be
      the next free unused number across a bunch of operating systems, or
      so Google results make me want to believe. SO_PROTOCOL for others
      just uses the next free Linux number, 38.
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49c794e9
  13. 03 8月, 2009 2 次提交
  14. 28 7月, 2009 3 次提交
  15. 22 7月, 2009 1 次提交
    • M
      Driver Core: Add platform device arch data V3 · d7aacadd
      Magnus Damm 提交于
      Allow architecture specific data in struct platform_device V3.
      
      With this patch struct pdev_archdata is added to struct
      platform_device, similar to struct dev_archdata in found in
      struct device. Useful for architecture code that needs to
      keep extra data associated with each platform device.
      
      Struct pdev_archdata is different from dev.platform_data, the
      convention is that dev.platform_data points to driver-specific
      data. It may or may not be required by the driver. The format
      of this depends on driver but is the same across architectures.
      
      The structure pdev_archdata is a place for architecture specific
      data. This data is handled by architecture specific code (for
      example runtime PM), and since it is architecture specific it
      should _never_ be touched by device driver code. Exactly like
      struct dev_archdata but for platform devices.
      
      [rjw: This change is for power management mostly and that's why it
       goes through the suspend tree.]
      Signed-off-by: NMagnus Damm <damm@igel.co.jp>
      Acked-by: NKevin Hilman <khilman@deeprootsystems.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      d7aacadd
  16. 17 7月, 2009 2 次提交
  17. 11 7月, 2009 1 次提交
  18. 19 6月, 2009 2 次提交
  19. 18 6月, 2009 2 次提交
  20. 17 6月, 2009 1 次提交
    • R
      kmap_types: make most arches use generic header file · e4c9dd0f
      Randy Dunlap 提交于
      Convert most arches to use asm-generic/kmap_types.h.
      
      Move the KM_FENCE_ macro additions into asm-generic/kmap_types.h,
      controlled by __WITH_KM_FENCE from each arch's kmap_types.h file.
      
      Would be nice to be able to add custom KM_types per arch, but I don't yet
      see a nice, clean way to do that.
      
      Built on x86_64, i386, mips, sparc, alpha(tonyb), powerpc(tonyb), and
      68k(tonyb).
      
      Note: avr32 should be able to remove KM_PTE2 (since it's not used) and
      then just use the generic kmap_types.h file.  Get avr32 maintainer
      approval.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: <linux-arch@vger.kernel.org>
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Bryan Wu <cooloney@kernel.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: "Luck Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4c9dd0f