1. 09 5月, 2007 1 次提交
    • B
      [POWERPC] Introduce address space "slices" · d0f13e3c
      Benjamin Herrenschmidt 提交于
      The basic issue is to be able to do what hugetlbfs does but with
      different page sizes for some other special filesystems; more
      specifically, my need is:
      
       - Huge pages
      
       - SPE local store mappings using 64K pages on a 4K base page size
      kernel on Cell
      
       - Some special 4K segments in 64K-page kernels for mapping a dodgy
      type of powerpc-specific infiniband hardware that requires 4K MMU
      mappings for various reasons I won't explain here.
      
      The main issues are:
      
       - To maintain/keep track of the page size per "segment" (as we can
      only have one page size per segment on powerpc, which are 256MB
      divisions of the address space).
      
       - To make sure special mappings stay within their allotted
      "segments" (including MAP_FIXED crap)
      
       - To make sure everybody else doesn't mmap/brk/grow_stack into a
      "segment" that is used for a special mapping
      
      Some of the necessary mechanisms to handle that were present in the
      hugetlbfs code, but mostly in ways not suitable for anything else.
      
      The patch relies on some changes to the generic get_unmapped_area()
      that just got merged.  It still hijacks hugetlb callbacks here or
      there as the generic code hasn't been entirely cleaned up yet but
      that shouldn't be a problem.
      
      So what is a slice ?  Well, I re-used the mechanism used formerly by our
      hugetlbfs implementation which divides the address space in
      "meta-segments" which I called "slices".  The division is done using
      256MB slices below 4G, and 1T slices above.  Thus the address space is
      divided currently into 16 "low" slices and 16 "high" slices.  (Special
      case: high slice 0 is the area between 4G and 1T).
      
      Doing so simplifies significantly the tracking of segments and avoids
      having to keep track of all the 256MB segments in the address space.
      
      While I used the "concepts" of hugetlbfs, I mostly re-implemented
      everything in a more generic way and "ported" hugetlbfs to it.
      
      Slices can have an associated page size, which is encoded in the mmu
      context and used by the SLB miss handler to set the segment sizes.  The
      hash code currently doesn't care, it has a specific check for hugepages,
      though I might add a mechanism to provide per-slice hash mapping
      functions in the future.
      
      The slice code provide a pair of "generic" get_unmapped_area() (bottomup
      and topdown) functions that should work with any slice size.  There is
      some trickiness here so I would appreciate people to have a look at the
      implementation of these and let me know if I got something wrong.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      d0f13e3c
  2. 24 4月, 2007 1 次提交
  3. 21 3月, 2007 1 次提交
  4. 16 10月, 2006 1 次提交
    • P
      [POWERPC] Lazy interrupt disabling for 64-bit machines · d04c56f7
      Paul Mackerras 提交于
      This implements a lazy strategy for disabling interrupts.  This means
      that local_irq_disable() et al. just clear the 'interrupts are
      enabled' flag in the paca.  If an interrupt comes along, the interrupt
      entry code notices that interrupts are supposed to be disabled, and
      clears the EE bit in SRR1, clears the 'interrupts are hard-enabled'
      flag in the paca, and returns.  This means that interrupts only
      actually get disabled in the processor when an interrupt comes along.
      
      When interrupts are enabled by local_irq_enable() et al., the code
      sets the interrupts-enabled flag in the paca, and then checks whether
      interrupts got hard-disabled.  If so, it also sets the EE bit in the
      MSR to hard-enable the interrupts.
      
      This has the potential to improve performance, and also makes it
      easier to make a kernel that can boot on iSeries and on other 64-bit
      machines, since this lazy-disable strategy is very similar to the
      soft-disable strategy that iSeries already uses.
      
      This version renames paca->proc_enabled to paca->soft_enabled, and
      changes a couple of soft-disables in the kexec code to hard-disables,
      which should fix the crash that Michael Ellerman saw.  This doesn't
      yet use a reserved CR field for the soft_enabled and hard_enabled
      flags.  This applies on top of Stephen Rothwell's patches to make it
      possible to build a combined iSeries/other kernel.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      d04c56f7
  5. 13 9月, 2006 1 次提交
    • P
      [POWERPC] Fix MMIO ops to provide expected barrier behaviour · f007cacf
      Paul Mackerras 提交于
      This changes the writeX family of functions to have a sync instruction
      before the MMIO store rather than after, because the generally expected
      behaviour is that the device receiving the MMIO store can be guaranteed
      to see the effects of any preceding writes to normal memory.
      
      To preserve ordering between writeX and readX, and to preserve ordering
      between preceding stores and the readX, the readX family of functions
      have had an sync added before the load.
      
      Although writeX followed by spin_unlock is not officially guaranteed
      to keep the writeX inside the spin-locked region unless an mmiowb()
      is used, there are currently drivers that depend on the previous
      behaviour on powerpc, which was that the mmiowb wasn't actually required.
      Therefore we have a per-cpu flag that is set by writeX, cleared by
      __raw_spin_lock and mmiowb, and tested by __raw_spin_unlock.  If it is
      set, __raw_spin_unlock does a sync and clears it.
      
      This changes both 32-bit and 64-bit readX/writeX.  32-bit already has a
      sync in __raw_spin_unlock (since lwsync doesn't exist on 32-bit), and thus
      doesn't need the per-cpu flag.
      
      Tested on G5 (PPC970) and POWER5.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f007cacf
  6. 08 8月, 2006 1 次提交
  7. 15 6月, 2006 1 次提交
    • P
      powerpc: Use 64k pages without needing cache-inhibited large pages · bf72aeba
      Paul Mackerras 提交于
      Some POWER5+ machines can do 64k hardware pages for normal memory but
      not for cache-inhibited pages.  This patch lets us use 64k hardware
      pages for most user processes on such machines (assuming the kernel
      has been configured with CONFIG_PPC_64K_PAGES=y).  User processes
      start out using 64k pages and get switched to 4k pages if they use any
      non-cacheable mappings.
      
      With this, we use 64k pages for the vmalloc region and 4k pages for
      the imalloc region.  If anything creates a non-cacheable mapping in
      the vmalloc region, the vmalloc region will get switched to 4k pages.
      I don't know of any driver other than the DRM that would do this,
      though, and these machines don't have AGP.
      
      When a region gets switched from 64k pages to 4k pages, we do not have
      to clear out all the 64k HPTEs from the hash table immediately.  We
      use the _PAGE_COMBO bit in the Linux PTE to indicate whether the page
      was hashed in as a 64k page or a set of 4k pages.  If hash_page is
      trying to insert a 4k page for a Linux PTE and it sees that it has
      already been inserted as a 64k page, it first invalidates the 64k HPTE
      before inserting the 4k HPTE.  The hash invalidation routines also use
      the _PAGE_COMBO bit, to determine whether to look for a 64k HPTE or a
      set of 4k HPTEs to remove.  With those two changes, we can tolerate a
      mix of 4k and 64k HPTEs in the hash table, and they will all get
      removed when the address space is torn down.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bf72aeba
  8. 12 6月, 2006 1 次提交
  9. 26 4月, 2006 1 次提交
  10. 27 3月, 2006 1 次提交
    • A
      [PATCH] powerpc: Allow non zero boot cpuids · 4df20460
      Anton Blanchard 提交于
      We currently have a hack to flip the boot cpu and its secondary thread
      to logical cpuid 0 and 1. This means the logical - physical mapping will
      differ depending on which cpu is boot cpu. This is most apparent on
      kexec, where we might kexec on any cpu and therefore change the mapping
      from boot to boot.
      
      The patch below does a first pass early on to work out the logical cpuid
      of the boot thread. We then fix up some paca structures to match.
      
      Ive also removed the boot_cpuid_phys variable for ppc64, to be
      consistent we use get_hard_smp_processor_id(boot_cpuid) everywhere.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      4df20460
  11. 24 2月, 2006 1 次提交
    • P
      powerpc: Implement accurate task and CPU time accounting · c6622f63
      Paul Mackerras 提交于
      This implements accurate task and cpu time accounting for 64-bit
      powerpc kernels.  Instead of accounting a whole jiffy of time to a
      task on a timer interrupt because that task happened to be running at
      the time, we now account time in units of timebase ticks according to
      the actual time spent by the task in user mode and kernel mode.  We
      also count the time spent processing hardware and software interrupts
      accurately.  This is conditional on CONFIG_VIRT_CPU_ACCOUNTING.  If
      that is not set, we do tick-based approximate accounting as before.
      
      To get this accurate information, we read either the PURR (processor
      utilization of resources register) on POWER5 machines, or the timebase
      on other machines on
      
      * each entry to the kernel from usermode
      * each exit to usermode
      * transitions between process context, hard irq context and soft irq
        context in kernel mode
      * context switches.
      
      On POWER5 systems with shared-processor logical partitioning we also
      read both the PURR and the timebase at each timer interrupt and
      context switch in order to determine how much time has been taken by
      the hypervisor to run other partitions ("steal" time).  Unfortunately,
      since we need values of the PURR on both threads at the same time to
      accurately calculate the steal time, and since we can only calculate
      steal time on a per-core basis, the apportioning of the steal time
      between idle time (time which we ceded to the hypervisor in the idle
      loop) and actual stolen time is somewhat approximate at the moment.
      
      This is all based quite heavily on what s390 does, and it uses the
      generic interfaces that were added by the s390 developers,
      i.e. account_system_time(), account_user_time(), etc.
      
      This patch doesn't add any new interfaces between the kernel and
      userspace, and doesn't change the units in which time is reported to
      userspace by things such as /proc/stat, /proc/<pid>/stat, getrusage(),
      times(), etc.  Internally the various task and cpu times are stored in
      timebase units, but they are converted to USER_HZ units (1/100th of a
      second) when reported to userspace.  Some precision is therefore lost
      but there should not be any accumulating error, since the internal
      accumulation is at full precision.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c6622f63
  12. 10 2月, 2006 1 次提交
  13. 13 1月, 2006 1 次提交
    • D
      [PATCH] powerpc: Remove lppaca structure from the PACA · 3356bb9f
      David Gibson 提交于
      At present the lppaca - the structure shared with the iSeries
      hypervisor and phyp - is contained within the PACA, our own low-level
      per-cpu structure.  This doesn't have to be so, the patch below
      removes it, making a separate array of lppaca structures.
      
      This saves approximately 500*NR_CPUS bytes of image size and kernel
      memory, because we don't need aligning gap between the Linux and
      hypervisor portions of every PACA.  On the other hand it means an
      extra level of dereference in many accesses to the lppaca.
      
      The patch also gets rid of several places where we assign the paca
      address to a local variable for no particular reason.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      3356bb9f
  14. 11 1月, 2006 1 次提交
    • A
      [PATCH] powerpc/64: per cpu data optimisations · 7a0268fa
      Anton Blanchard 提交于
      The current ppc64 per cpu data implementation is quite slow. eg:
      
              lhz 11,18(13)           /* smp_processor_id() */
              ld 9,.LC63-.LCTOC1(30)  /* per_cpu__variable_name */
              ld 8,.LC61-.LCTOC1(30)  /* __per_cpu_offset */
              sldi 11,11,3            /* form index into __per_cpu_offset */
              mr 10,9
              ldx 9,11,8              /* __per_cpu_offset[smp_processor_id()] */
              ldx 0,10,9              /* load per cpu data */
      
      5 loads for something that is supposed to be fast, pretty awful. One
      reason for the large number of loads is that we have to synthesize 2
      64bit constants (per_cpu__variable_name and __per_cpu_offset).
      
      By putting __per_cpu_offset into the paca we can avoid the 2 loads
      associated with it:
      
              ld 11,56(13)            /* paca->data_offset */
              ld 9,.LC59-.LCTOC1(30)  /* per_cpu__variable_name */
              ldx 0,9,11              /* load per cpu data
      
      Longer term we can should be able to do even better than 3 loads.
      If per_cpu__variable_name wasnt a 64bit constant and paca->data_offset
      was in a register we could cut it down to one load. A suggestion from
      Rusty is to use gcc's __thread extension here. In order to do this we
      would need to free up r13 (the __thread register and where the paca
      currently is). So far Ive had a few unsuccessful attempts at doing that :)
      
      The patch also allocates per cpu memory node local on NUMA machines.
      This patch from Rusty has been sitting in my queue _forever_ but stalled
      when I hit the compiler bug. Sorry about that.
      
      Finally I also only allocate per cpu data for possible cpus, which comes
      straight out of the x86-64 port. On a pseries kernel (with NR_CPUS == 128)
      and 4 possible cpus we see some nice gains:
      
                   total       used       free     shared    buffers cached
      Mem:       4012228     212860    3799368          0          0 162424
      
                   total       used       free     shared    buffers cached
      Mem:       4016200     212984    3803216          0          0 162424
      
      A saving of 3.75MB. Quite nice for smaller machines. Note: we now have
      to be careful of per cpu users that touch data for !possible cpus.
      
      At this stage it might be worth making the NUMA and possible cpu
      optimisations generic, but per cpu init is done so early we have to be
      careful that all architectures have their possible map setup correctly.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      7a0268fa
  15. 09 1月, 2006 3 次提交
    • A
      [PATCH] powerpc: sanitize header files for user space includes · 88ced031
      Arnd Bergmann 提交于
      include/asm-ppc/ had #ifdef __KERNEL__ in all header files that
      are not meant for use by user space, include/asm-powerpc does
      not have this yet.
      
      This patch gets us a lot closer there. There are a few cases
      where I was not sure, so I left them out. I have verified
      that no CONFIG_* symbols are used outside of __KERNEL__
      any more and that there are no obvious compile errors when
      including any of the headers in user space libraries.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      88ced031
    • D
      [PATCH] powerpc: Remove some unneeded fields from the paca · 404849bb
      David Gibson 提交于
      This patch removes several unnecessary fields from the paca:
      
      - next_jiffy_update_tb was simply unused.  Remove trivially.
      
      - The exdsi exception save area was not used.  There were plans to use
        it, but they never seem to have gone anywhere.  If they ever do, we
        can put it back.  Remove from the paca, and from asm-offsets.c
      
      - The default_decr field was used from asm, but was only ever assigned
        the value of tb_ticks_per_jiffy.  Just access tb_ticks_per_jiffy from
        asm directly instead.
      
      Built and booted on POWER5 LPAR and iSeries RS64.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      404849bb
    • D
      [PATCH] powerpc: Remove ItLpRegSave area from the paca · 1888e7b5
      David Gibson 提交于
      On iSeries, the paca contains, amongst other things an ItLpRegSave
      structure used by the hypervisor to save registers.  The hypervisor
      locates this area through a pointer at the beginning of the paca, so
      the structure itself can be located elsewhere.  This patch moves the
      reg_save area out into its own array.  This reduces the amount of
      iSeries specific gunk which is visible to general powerpc code via
      paca.h
      
      Built and booted on POWER5 LPAR and iSeries RS64.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      1888e7b5
  16. 10 11月, 2005 1 次提交
    • D
      [PATCH] powerpc: Move various ppc64 files with no ppc32 equivalent to powerpc · 8882a4da
      David Gibson 提交于
      This patch moves a bunch of files from arch/ppc64 and
      include/asm-ppc64 which have no equivalents in ppc32 code into
      arch/powerpc and include/asm-powerpc.  The file affected are:
      	abs_addr.h
      	compat.h
      	lppaca.h
      	paca.h
      	tce.h
      	cpu_setup_power4.S
      	ioctl32.c
      	firmware.c
      	pacaData.c
      
      The only changes apart from the move and corresponding Makefile
      changes are:
      	- #ifndef/#define in includes updated to _ASM_POWERPC_ form
      	- trailing whitespace removed
      	- comments giving full paths removed
      	- pacaData.c renamed paca.c to remove studlyCaps
      	- Misplaced { moved in lppaca.h
      
      Built and booted on POWER5 LPAR (ARCH=powerpc and ARCH=ppc64), built
      for 32-bit powermac (ARCH=powerpc).
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      8882a4da
  17. 07 11月, 2005 1 次提交
    • B
      [PATCH] ppc64: support 64k pages · 3c726f8d
      Benjamin Herrenschmidt 提交于
      Adds a new CONFIG_PPC_64K_PAGES which, when enabled, changes the kernel
      base page size to 64K.  The resulting kernel still boots on any
      hardware.  On current machines with 4K pages support only, the kernel
      will maintain 16 "subpages" for each 64K page transparently.
      
      Note that while real 64K capable HW has been tested, the current patch
      will not enable it yet as such hardware is not released yet, and I'm
      still verifying with the firmware architects the proper to get the
      information from the newer hypervisors.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3c726f8d
  18. 02 11月, 2005 1 次提交
  19. 30 6月, 2005 2 次提交
  20. 22 6月, 2005 1 次提交
  21. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4