1. 26 4月, 2014 1 次提交
    • L
      mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts · 1cf35d47
      Linus Torvalds 提交于
      The mmu-gather operation 'tlb_flush_mmu()' has done two things: the
      actual tlb flush operation, and the batched freeing of the pages that
      the TLB entries pointed at.
      
      This splits the operation into separate phases, so that the forced
      batched flushing done by zap_pte_range() can now do the actual TLB flush
      while still holding the page table lock, but delay the batched freeing
      of all the pages to after the lock has been dropped.
      
      This in turn allows us to avoid a race condition between
      set_page_dirty() (as called by zap_pte_range() when it finds a dirty
      shared memory pte) and page_mkclean(): because we now flush all the
      dirty page data from the TLB's while holding the pte lock,
      page_mkclean() will be held up walking the (recently cleaned) page
      tables until after the TLB entries have been flushed from all CPU's.
      Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: NDave Hansen <dave.hansen@intel.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1cf35d47
  2. 25 4月, 2014 1 次提交
  3. 23 4月, 2014 2 次提交
    • N
      ARM: 8032/1: bL_switcher: fix validation check before its activation · 4530e4b6
      Nicolas Pitre 提交于
      The switcher should not depend on MAX_CLUSTER to determine ifit should
      be activated or not. In a multiplatform kernel binary it is possible to
      have dual-cluster and quad-cluster platforms configured in. In that case
      MAX_CLUSTER which is a build time limit should be 4 and that shouldn't
      prevent the switcher from working if the kernel is booted on a b.L
      dual-cluster system.
      
      In bL_switcher_halve_cpus() we already have a runtime validation check
      to make sure we're dealing with only two clusters, so booting on a quad
      cluster system will be caught and switcher activation aborted.
      
      However, the b.L switcher must ensure the MCPM layer is initialized on
      the booted hardware before doing anything.  The mcpm_is_available()
      function is added to that effect.
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Tested-by: NAbhilash Kesavan <kesavan.abhilash@gmail.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      4530e4b6
    • X
      ARM: 8027/1: fix do_div() bug in big-endian systems · 80bb3ef1
      Xiangyu Lu 提交于
      In big-endian systems, "%1" get the most significant part of the value, cause the instruction to get the wrong result.
      
      When viewing ftrace record in big-endian ARM systems, we found that
      the timestamp errors:
      
      swapper-0   [001] 1325.970000:   0:120:R ==> [001]    16:120:R events/1
      events/1-16 [001] 1325.970000:   16:120:S ==> [001]    0:120:R swapper
      swapper-0   [000] 1325.1000000:  0:120:R   + [000]    15:120:R events/0
      swapper-0   [000] 1325.1000000:  0:120:R ==> [000]    15:120:R events/0
      swapper-0   [000] 1326.030000:   0:120:R   + [000]  1150:120:R sshd
      swapper-0   [000] 1326.030000:   0:120:R ==> [000]  1150:120:R sshd
      
      When viewed ftrace records, it will call the do_div(n, base) function, which achieved arch/arm/include/asm/div64.h in. When n = 10000000, base = 1000000, in do_div(n, base) will execute "umull %Q0, %R0, %1, %Q2".
      Reviewed-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NNicolas Pitre <nico@linaro.org>
      Cc: <stable@vger.kernel.org> # 2.6.20+
      Signed-off-by: NAlex Wu <wuquanming@huawei.com>
      Signed-off-by: NXiangyu Lu <luxiangyu@huawei.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      80bb3ef1
  4. 09 4月, 2014 3 次提交
  5. 04 4月, 2014 1 次提交
    • R
      ARM: Better virt_to_page() handling · e26a9e00
      Russell King 提交于
      virt_to_page() is incredibly inefficient when virt-to-phys patching is
      enabled.  This is because we end up with this calculation:
      
        page = &mem_map[asm virt_to_phys(addr) >> 12 - __pv_phys_offset >> 12]
      
      in assembly.  The asm virt_to_phys() is equivalent this this operation:
      
        addr - PAGE_OFFSET + __pv_phys_offset
      
      and we can see that because this is assembly, the compiler has no chance
      to optimise some of that away.  This should reduce down to:
      
        page = &mem_map[(addr - PAGE_OFFSET) >> 12]
      
      for the common cases.  Permit the compiler to make this optimisation by
      giving it more of the information it needs - do this by providing a
      virt_to_pfn() macro.
      
      Another issue which makes this more complex is that __pv_phys_offset is
      a 64-bit type on all platforms.  This is needlessly wasteful - if we
      store the physical offset as a PFN, we can save a lot of work having
      to deal with 64-bit values, which sometimes ends up producing incredibly
      horrid code:
      
           a4c:       e3009000        movw    r9, #0
                              a4c: R_ARM_MOVW_ABS_NC  __pv_phys_offset
           a50:       e3409000        movt    r9, #0          ; r9 = &__pv_phys_offset
                              a50: R_ARM_MOVT_ABS     __pv_phys_offset
           a54:       e3002000        movw    r2, #0
                              a54: R_ARM_MOVW_ABS_NC  __pv_phys_offset
           a58:       e3402000        movt    r2, #0          ; r2 = &__pv_phys_offset
                              a58: R_ARM_MOVT_ABS     __pv_phys_offset
           a5c:       e5999004        ldr     r9, [r9, #4]    ; r9 = high word of __pv_phys_offset
           a60:       e3001000        movw    r1, #0
                              a60: R_ARM_MOVW_ABS_NC  mem_map
           a64:       e592c000        ldr     ip, [r2]        ; ip = low word of __pv_phys_offset
      Reviewed-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      e26a9e00
  6. 22 3月, 2014 1 次提交
    • A
      ARM: sunxi: fix build for THUMB2_KERNEL · 1146b600
      Arnd Bergmann 提交于
      Building an SMP kernel for the sunxi platform with THUMB2 instructions
      fails with this error at the moment:
      
      headsmp.S:7: Error: Thumb encoding does not support an immediate here -- `msr cpsr_fsxc,#0xd3'
      
      Since the generic secondary_startup function already does
      the same thing in a safe way, we can just drop the private
      sunxi implementation and jump straight to secondary_startup.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
      1146b600
  7. 20 3月, 2014 3 次提交
  8. 19 3月, 2014 8 次提交
  9. 18 3月, 2014 1 次提交
  10. 12 3月, 2014 1 次提交
  11. 11 3月, 2014 1 次提交
  12. 08 3月, 2014 1 次提交
    • R
      ARM: fix noMMU kallsyms symbol filtering · 006fa259
      Russell King 提交于
      With noMMU, CONFIG_PAGE_OFFSET was not being set correctly.  As there's
      no MMU, PAGE_OFFSET should be equal to PHYS_OFFSET in all cases.  This
      commit makes that explicit.
      
      Since we do this, we don't need to mess around in asm/memory.h with
      ifdefs to sort this out, so let's get rid of that, and there's no point
      offering the "Memory split" option for noMMU as that's meaningless
      there.
      
      Fixes: b9b32bf7 ("ARM: use linker magic for vectors and vector stubs")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      006fa259
  13. 03 3月, 2014 7 次提交
  14. 28 2月, 2014 2 次提交
    • M
      arm: dma-mapping: remove order parameter from arm_iommu_create_mapping() · 68efd7d2
      Marek Szyprowski 提交于
      The 'order' parameter for IOMMU-aware dma-mapping implementation was
      introduced mainly as a hack to reduce size of the bitmap used for
      tracking IO virtual address space. Since now it is possible to dynamically
      resize the bitmap, this hack is not needed and can be removed without any
      impact on the client devices. This way the parameters for
      arm_iommu_create_mapping() becomes much easier to understand. 'size'
      parameter now means the maximum supported IO address space size.
      
      The code will allocate (resize) bitmap in chunks, ensuring that a single
      chunk is not larger than a single memory page to avoid unreliable
      allocations of size larger than PAGE_SIZE in atomic context.
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      68efd7d2
    • A
      arm: dma-mapping: Add support to extend DMA IOMMU mappings · 4d852ef8
      Andreas Herrmann 提交于
      Instead of using just one bitmap to keep track of IO virtual addresses
      (handed out for IOMMU use) introduce an array of bitmaps. This allows
      us to extend existing mappings when running out of iova space in the
      initial mapping etc.
      
      If there is not enough space in the mapping to service an IO virtual
      address allocation request, __alloc_iova() tries to extend the mapping
      -- by allocating another bitmap -- and makes another allocation
      attempt using the freshly allocated bitmap.
      
      This allows arm iommu drivers to start with a decent initial size when
      an dma_iommu_mapping is created and still to avoid running out of IO
      virtual addresses for the mapping.
      Signed-off-by: NAndreas Herrmann <andreas.herrmann@calxeda.com>
      [mszyprow: removed extensions parameter to arm_iommu_create_mapping()
       function, which will be modified in the next patch anyway, also some
       debug messages about extending bitmap]
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      4d852ef8
  15. 25 2月, 2014 6 次提交
  16. 23 2月, 2014 1 次提交