1. 08 2月, 2017 1 次提交
  2. 25 12月, 2016 1 次提交
  3. 30 11月, 2016 2 次提交
  4. 15 11月, 2016 3 次提交
    • M
      ARM: 8623/1: mm: add ARM_L1_CACHE_SHIFT_7 for UniPhier outer cache · 01bf9278
      Masahiro Yamada 提交于
      The UniPhier outer cache (arch/arm/mm/cache-uniphier.c) has 128 byte
      line length and its tags are also managed per 128 byte line.  This
      is very unfortunate, but the current 64 byte alignment for kmalloc()
      causes sharing problems on DMA if used with this outer cache.
      
      This commit adds ARM_L1_CACHE_SHIFT_7 to increase the DMA minimum
      alignment to 128 byte if CACHE_UNIPHIER is enabled.  There are
      several drivers that assume aligning to L1_CACHE_BYTES will be DMA
      safe, so this commit also changes the L1_CACHE_BYTES for safety.
      
      Having said that, I hesitate to align all the other SoCs in Multi
      platform to the UniPhier's requirement.  So, I am disabling the
      CONFIG_CACHE_UNIPHIER by default, so that multi_v7_defconfig will
      still stay with CONFIG_ARM_L1_CACHE_SHIFT=6.  With this commit,
      UniPhier SoCs will become slower, but it is much better than system
      crash.  If desired, the outer-cache can be enabled by merge_config
      or something.
      
      Note:
      The UniPhier PH1-Pro5 SoC is equipped also with L3 cache with 256
      byte line size but its tags are managed per 128 byte sub-line.
      So, ARM_L1_CACHE_SHIFT_7 should be fine for all the UniPhier SoCs.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      01bf9278
    • M
      ARM: 8628/1: dma-mapping: preallocate DMA-debug hash tables in core_initcall · 256ff1cf
      Marek Szyprowski 提交于
      fs_initcall is definitely too late to initialize DMA-debug hash tables,
      because some drivers might get probed and use DMA mapping framework
      already in core_initcall. Late initialization of DMA-debug results in
      false warning about accessing memory, that was not allocated, like this
      one:
      ------------[ cut here ]------------
      WARNING: CPU: 5 PID: 1 at lib/dma-debug.c:1104 check_unmap+0xa1c/0xe50
      exynos-sysmmu 10a60000.sysmmu: DMA-API: device driver tries to free DMA memory it has not allocated [device
      address=0x000000006ebd0000] [size=16384 bytes]
      Modules linked in:
      CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc5-00028-g39dde3d-dirty #44
      Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
      [<c0119dd4>] (unwind_backtrace) from [<c01122bc>] (show_stack+0x20/0x24)
      [<c01122bc>] (show_stack) from [<c062714c>] (dump_stack+0x84/0xa0)
      [<c062714c>] (dump_stack) from [<c0132560>] (__warn+0x14c/0x180)
      [<c0132560>] (__warn) from [<c01325dc>] (warn_slowpath_fmt+0x48/0x50)
      [<c01325dc>] (warn_slowpath_fmt) from [<c06814f8>] (check_unmap+0xa1c/0xe50)
      [<c06814f8>] (check_unmap) from [<c06819c4>] (debug_dma_unmap_page+0x98/0xc8)
      [<c06819c4>] (debug_dma_unmap_page) from [<c076c3e8>] (exynos_iommu_domain_free+0x158/0x380)
      [<c076c3e8>] (exynos_iommu_domain_free) from [<c0764a30>] (iommu_domain_free+0x34/0x60)
      [<c0764a30>] (iommu_domain_free) from [<c011f168>] (release_iommu_mapping+0x30/0xb8)
      [<c011f168>] (release_iommu_mapping) from [<c011f23c>] (arm_iommu_release_mapping+0x4c/0x50)
      [<c011f23c>] (arm_iommu_release_mapping) from [<c0b061ac>] (s5p_mfc_probe+0x640/0x80c)
      [<c0b061ac>] (s5p_mfc_probe) from [<c07e6750>] (platform_drv_probe+0x70/0x148)
      [<c07e6750>] (platform_drv_probe) from [<c07e25c0>] (driver_probe_device+0x12c/0x6b0)
      [<c07e25c0>] (driver_probe_device) from [<c07e2c6c>] (__driver_attach+0x128/0x17c)
      [<c07e2c6c>] (__driver_attach) from [<c07df74c>] (bus_for_each_dev+0x88/0xc8)
      [<c07df74c>] (bus_for_each_dev) from [<c07e1b6c>] (driver_attach+0x34/0x58)
      [<c07e1b6c>] (driver_attach) from [<c07e1350>] (bus_add_driver+0x18c/0x32c)
      [<c07e1350>] (bus_add_driver) from [<c07e4198>] (driver_register+0x98/0x148)
      [<c07e4198>] (driver_register) from [<c07e5cb0>] (__platform_driver_register+0x58/0x74)
      [<c07e5cb0>] (__platform_driver_register) from [<c174cb30>] (s5p_mfc_driver_init+0x1c/0x20)
      [<c174cb30>] (s5p_mfc_driver_init) from [<c0102690>] (do_one_initcall+0x64/0x258)
      [<c0102690>] (do_one_initcall) from [<c17014c0>] (kernel_init_freeable+0x3d0/0x4d0)
      [<c17014c0>] (kernel_init_freeable) from [<c116eeb4>] (kernel_init+0x18/0x134)
      [<c116eeb4>] (kernel_init) from [<c010bbd8>] (ret_from_fork+0x14/0x3c)
      ---[ end trace dc54c54bd3581296 ]---
      
      This patch moves initialization of DMA-debug to core_initcall. This is
      safe from the initialization perspective. dma_debug_do_init() internally calls
      debugfs functions and debugfs also gets initialised at core_initcall(), and
      that is earlier than arch code in the link order, so it will get initialized
      just before the DMA-debug.
      Reported-by: NSeung-Woo Kim <sw0312.kim@samsung.com>
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      256ff1cf
    • N
      ARM: 8624/1: proc-v7m.S: fix init section name · 544457fa
      Nicolas Pitre 提交于
      There is no .text.init sections.
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      544457fa
  5. 19 10月, 2016 1 次提交
    • R
      ARM: fix oops when using older ARMv4T CPUs · 04946fb6
      Russell King 提交于
      Alexander Shiyan reports that CLPS711x fails at boot time in the data
      exception handler due to a NULL pointer dereference.  This is caused by
      the late-v4t abort handler overwriting R9 (which becomes zero).  Fix
      this by making the abort handler save and restore R9.
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000008
      pgd = c3b58000
      [00000008] *pgd=800000000, *pte=00000000, *ppte=feff4140
      Internal error: Oops: 63c11817 [#1] PREEMPT ARM
      CPU: 0 PID: 448 Comm: ash Not tainted 4.8.1+ #1
      Hardware name: Cirrus Logic CLPS711X (Device Tree Support)
      task: c39e03a0 ti: c3b4e000 task.ti: c3b4e000
      PC is at __dabt_svc+0x4c/0x60
      LR is at do_page_fault+0x144/0x2ac
      pc : [<c000d3ac>]    lr : [<c000fcec>]    psr: 60000093
      sp : c3b4fe6c  ip : 00000001  fp : b6f1bf88
      r10: c387a5a0  r9 : 00000000  r8 : e4e0e001
      r7 : bee3ef83  r6 : 00100000  r5 : 80000013  r4 : c022fcf8
      r3 : 00000000  r2 : 00000008  r1 : bf000000  r0 : 00000000
      Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
      Control: 0000217f  Table: c3b58055  DAC: 00000055
      Process ash (pid: 448, stack limit = 0xc3b4e190)
      Stack: (0xc3b4fe6c to 0xc3b50000)
      fe60:                            bee3ef83 c05168d1 ffffffff 00000000 c3adfe80
      fe80: c3a03300 00000000 c3b4fed0 c3a03400 bee3ef83 c387a5a0 b6f1bf88 00000001
      fea0: c3b4febc 00000076 c022fcf8 80000013 ffffffff 0000003f bf000000 bee3ef83
      fec0: 00000004 00000000 c3adfe80 c00e432c 00000812 00000005 00000001 00000006
      fee0: b6f1b000 00000000 00010000 0003c944 0004d000 0004d439 00010000 b6f1b000
      ff00: 00000005 00000000 00015ecc c3b4fed0 0000000a 00000000 00000000 c00a1dc0
      ff20: befff000 c3a03300 c3b4e000 c0507cd8 c0508024 fffffff8 c3a03300 00000000
      ff40: c0516a58 c00a35bc c39e03a0 000001c0 bea84ce8 0004e008 c3b3a000 c00a3ac0
      ff60: c3b40374 c3b3a000 bea84d11 00000000 c0500188 bea84d11 bea84ce8 00000001
      ff80: 0000000b c000a304 c3b4e000 00000000 bea84ce4 c00a3cd0 00000000 bea84d11
      ffa0: bea84ce8 c000a160 bea84d11 bea84ce8 bea84d11 bea84ce8 0004e008 0004d450
      ffc0: bea84d11 bea84ce8 00000001 0000000b b6f45ee4 00000000 b6f5ff70 bea84ce4
      ffe0: b6f2f130 bea84cb0 b6f2f194 b6ef29f4 a0000010 bea84d11 02c7cffa 02c7cffd
      [<c000d3ac>] (__dabt_svc) from [<c022fcf8>] (__copy_to_user_std+0xf8/0x330)
      [<c022fcf8>] (__copy_to_user_std) from [<c00e432c>]
      +(load_elf_binary+0x920/0x107c)
      [<c00e432c>] (load_elf_binary) from [<c00a35bc>]
      +(search_binary_handler+0x80/0x16c)
      [<c00a35bc>] (search_binary_handler) from [<c00a3ac0>]
      +(do_execveat_common+0x418/0x600)
      [<c00a3ac0>] (do_execveat_common) from [<c00a3cd0>] (do_execve+0x28/0x30)
      [<c00a3cd0>] (do_execve) from [<c000a160>] (ret_fast_syscall+0x0/0x30)
      Code: e1a0200d eb00136b e321f093 e59d104c (e5891008)
      ---[ end trace 4b4f8086ebef98c5 ]---
      
      Fixes: e6978e4b ("ARM: save and reset the address limit when entering an exception")
      Reported-by: NAlexander Shiyan <shc_work@mail.ru>
      Tested-by: NAlexander Shiyan <shc_work@mail.ru>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      04946fb6
  6. 28 9月, 2016 1 次提交
  7. 27 9月, 2016 1 次提交
  8. 12 9月, 2016 1 次提交
    • S
      ARM: 8612/1: LPAE: initialize cache policy correctly · 6b3142b2
      Stefan Agner 提交于
      The cachepolicy variable gets initialized using a masked pmd
      value. So far, the pmd has been masked with flags valid for the
      2-page table format, but the 3-page table format requires a
      different mask. On LPAE, this lead to a wrong assumption of what
      initial cache policy has been used. Later a check forces the
      cache policy to writealloc and prints the following warning:
      Forcing write-allocate cache policy for SMP
      
      This patch introduces a new definition PMD_SECT_CACHE_MASK for
      both page table formats which masks in all cache flags in both
      cases.
      Signed-off-by: NStefan Agner <stefan@agner.ch>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      6b3142b2
  9. 06 9月, 2016 6 次提交
  10. 29 8月, 2016 1 次提交
  11. 23 8月, 2016 1 次提交
    • V
      ARM: 8599/1: mm: pull asm/memory.h explicitly · f271b779
      Vladimir Murzin 提交于
      Commit d7811455 (""ARM: 8512/1: proc-v7.S: Adjust stack address when
      XIP_KERNEL"") introduced a macro which lives under asm/memory.h.
      Unfortunately, for MMU-less systems (like R-class) it leads to build failure:
      
      arch/arm/mm/proc-v7.S: Assembler messages:
      arch/arm/mm/proc-v7.S:538: Error: unrecognised relocation suffix
      make[1]: *** [arch/arm/mm/proc-v7.o] Error 1
      make: *** [arch/arm/mm] Error 2
      
      since it is implicitly pulled via asm/pgtable.h for MMU capable systems only.
      
      To fix it include asm/memory.h explicitly.
      Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      f271b779
  12. 12 8月, 2016 4 次提交
  13. 10 8月, 2016 2 次提交
    • A
      ARM: 8591/1: mm: use fully constructed struct pages for EFI pgd allocations · 61444cde
      Ard Biesheuvel 提交于
      The late_alloc() PTE allocation function used by create_mapping_late()
      does not call pgtable_page_ctor() on PTE pages it allocates, leaving
      the per-page spinlock uninitialized.
      
      Since generic page table manipulation code may assume that translation
      table pages that are not owned by init_mm are covered by fully
      constructed struct pages, the following crash may occur with the new
      UEFI memory attributes table code.
      
        efi: memattr: Processing EFI Memory Attributes table:
        efi: memattr:  0x0000ffa16000-0x0000ffa82fff [Runtime Code       |RUN|  |  |XP|  |  |  |   |  |  |  |  ]
        Unable to handle kernel NULL pointer dereference at virtual address 00000010
        pgd = c0204000
        [00000010] *pgd=00000000
        Internal error: Oops: 5 [#1] SMP ARM
        Modules linked in:
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc4-00063-g3882aa7b340b #361
        Hardware name: Generic DT based system
        task: ed858000 ti: ed842000 task.ti: ed842000
        PC is at __lock_acquire+0xa0/0x19a8
        ...
        [<c038c830>] (__lock_acquire) from [<c038e4f8>] (lock_acquire+0x6c/0x88)
        [<c038e4f8>] (lock_acquire) from [<c0c06134>] (_raw_spin_lock+0x2c/0x3c)
        [<c0c06134>] (_raw_spin_lock) from [<c0410384>] (apply_to_page_range+0xe8/0x238)
        [<c0410384>] (apply_to_page_range) from [<c1205f34>] (efi_set_mapping_permissions+0x54/0x5c)
        [<c1205f34>] (efi_set_mapping_permissions) from [<c1247474>] (efi_memattr_apply_permissions+0x2b8/0x378)
        [<c1247474>] (efi_memattr_apply_permissions) from [<c1248258>] (arm_enable_runtime_services+0x1f0/0x22c)
        [<c1248258>] (arm_enable_runtime_services) from [<c0301f0c>] (do_one_initcall+0x44/0x174)
        [<c0301f0c>] (do_one_initcall) from [<c1200d10>] (kernel_init_freeable+0x90/0x1e8)
        [<c1200d10>] (kernel_init_freeable) from [<c0bff690>] (kernel_init+0x8/0x114)
        [<c0bff690>] (kernel_init) from [<c0307ed0>] (ret_from_fork+0x14/0x24)
      
      The crash is due to the fact that the UEFI page tables are not owned by
      init_mm, but are not covered by fully constructed struct pages.
      
      Given that the UEFI subsystem is currently the only user of
      create_mapping_late(), add an unconditional call to pgtable_page_ctor() to
      late_alloc().
      
      Fixes: 9fc68b71 ("ARM/efi: Apply strict permissions for UEFI Runtime Services regions")
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      61444cde
    • N
      ARM: 8590/1: sanity_check_meminfo(): avoid overflow on vmalloc_limit · b9a01989
      Nicolas Pitre 提交于
      To limit the amount of mapped low memory, we determine a physical address
      boundary based on the start of the vmalloc area using __pa().
      Strictly speaking, the vmalloc area location is arbitrary and does not
      necessarily corresponds to a valid physical address. For example, if
      
      	PAGE_OFFSET = 0x80000000
      	PHYS_OFFSET = 0x90000000
      	vmalloc_min = 0xf0000000
      
      then __pa(vmalloc_min) overflows and returns a wrapped 0 when phys_addr_t
      is a 32-bit type. Then the code that follows determines that the entire
      physical memory is above that boundary and no low memory gets mapped at
      all:
      
      |[...]
      |Machine model: Freescale i.MX51 NA04 Board
      |Ignoring RAM at 0x90000000-0xb0000000 (!CONFIG_HIGHMEM)
      |Consider using a HIGHMEM enabled kernel.
      
      To avoid this problem let's make vmalloc_limit a 64-bit value all the
      time and determine that boundary explicitly without using __pa().
      Reported-by: NEmil Renner Berthing <kernel@esmil.dk>
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Tested-by: NEmil Renner Berthing <kernel@esmil.dk>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      b9a01989
  14. 04 8月, 2016 1 次提交
    • K
      dma-mapping: use unsigned long for dma_attrs · 00085f1e
      Krzysztof Kozlowski 提交于
      The dma-mapping core and the implementations do not change the DMA
      attributes passed by pointer.  Thus the pointer can point to const data.
      However the attributes do not have to be a bitfield.  Instead unsigned
      long will do fine:
      
      1. This is just simpler.  Both in terms of reading the code and setting
         attributes.  Instead of initializing local attributes on the stack
         and passing pointer to it to dma_set_attr(), just set the bits.
      
      2. It brings safeness and checking for const correctness because the
         attributes are passed by value.
      
      Semantic patches for this change (at least most of them):
      
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
      
          @@
          f(...,
          - struct dma_attrs *attrs
          + unsigned long attrs
          , ...)
          {
          ...
          }
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      and
      
          // Options: --all-includes
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
          type t;
      
          @@
          t f(..., struct dma_attrs *attrs);
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Acked-by: Mark Salter <msalter@redhat.com> [c6x]
      Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
      Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
      Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
      Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
      Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
      Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00085f1e
  15. 27 7月, 2016 2 次提交
  16. 15 7月, 2016 1 次提交
  17. 14 7月, 2016 5 次提交
    • G
      ARM: 8561/4: dma-mapping: Fix the coherent case when iommu is used · 56506822
      Gregory CLEMENT 提交于
      When doing dma allocation with IOMMU the __iommu_alloc_atomic() was
      used even when the system was coherent. However, this function
      allocates from a non-cacheable pool, which is fine when the device is
      not cache coherent but won't work as expected in the device is cache
      coherent. Indeed, the CPU and device must access the memory using the
      same cacheability attributes.
      
      Moreover when the devices are coherent, the mmap call must not change
      the pg_prot flags in the vma struct. The arm_coherent_iommu_mmap_attrs
      has been updated in the same way that it was done for the arm_dma_mmap
      in commit 55af8a91 ("ARM: 8387/1: arm/mm/dma-mapping.c: Add
      arm_coherent_dma_mmap").
      Suggested-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      56506822
    • G
      ARM: 8561/3: dma-mapping: Don't use outer_flush_range when the L2C is coherent · f1270896
      Gregory CLEMENT 提交于
      When a L2 cache controller is used in a system that provides hardware
      coherency, the entire outer cache operations are useless, and can be
      skipped.  Moreover, on some systems, it is harmful as it causes
      deadlocks between the Marvell coherency mechanism, the Marvell PCIe
      controller and the Cortex-A9.
      
      In the current kernel implementation, the outer cache flush range
      operation is triggered by the dma_alloc function.
      This operation can be take place during runtime and in some
      circumstances may lead to the PCIe/PL310 deadlock on Armada 375/38x
      SoCs.
      
      This patch extends the __dma_clear_buffer() function to receive a
      boolean argument related to the coherency of the system. The same
      things is done for the calling functions.
      Reported-by: NNadav Haklai <nadavh@marvell.com>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Cc: <stable@vger.kernel.org> # v3.16+
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      f1270896
    • D
      ARM: 8560/1: errata: Workaround errata A12 825619 / A17 852421 · 9f6f9354
      Doug Anderson 提交于
      The workaround for both errata is to set bit 24 in the diagnostic
      register.  There are no known end-user bugs solved by fixing this
      errata, but the fix is trivial and it seems sane to apply it.
      
      The arguments for why this needs to be in the kernel are similar to the
      arugments made in the patch "Workaround errata A12 818325/852422 A17
      852423".
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      9f6f9354
    • D
      ARM: 8559/1: errata: Workaround erratum A12 821420 · 416bcf21
      Doug Anderson 提交于
      This erratum has a very simple workaround (set a bit in a register), so
      let's apply it.  Apparently the workaround's downside is a very slight
      power impact.
      
      Note that applying this errata fixes deadlocks that are easy to
      reproduce with real world applications.
      
      The arguments for why this needs to be in the kernel are similar to the
      arugments made in the patch "Workaround errata A12 818325/852422 A17
      852423".
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      416bcf21
    • D
      ARM: 8558/1: errata: Workaround errata A12 818325/852422 A17 852423 · 62c0f4a5
      Doug Anderson 提交于
      There are several similar errata on Cortex A12 and A17 that all have the same workaround: setting bit[12] of the Feature Register.
      Technically the list of errata are:
      
      - A12 818325: Execution of an UNPREDICTABLE STR or STM instruction
        might deadlock.  Fixed in r0p1.
      - A12 852422: Execution of a sequence of instructions might lead to
        either a data corruption or a CPU deadlock.  Not fixed in any A12s
        yet.
      - A17 852423: Execution of a sequence of instructions might lead to
        either a data corruption or a CPU deadlock.  Not fixed in any A17s
        yet.
      
      Since A12 got renamed to A17 it seems likely that there won't be any
      future Cortex-A12 cores, so we'll enable for all Cortex-A12.
      
      For Cortex-A17 I believe that all known revisions are affected and that all knows revisions means <= r1p2.  Presumably if a new A17 was
      released it would have this problem fixed.
      
      Note that in <https://patchwork.kernel.org/patch/4735341/> folks
      previously expressed opposition to this change because:
      A) It was thought to only apply to r0p0 and there were no known r0p0
         boards supported in mainline.
      B) It was argued that such a workaround beloned in firmware.
      
      Now that this same fix solves other errata on real boards (like
      rk3288) point A) is addressed.
      
      Point B) is impossible to address on boards like rk3288.  On rk3288
      the firmware doesn't stay resident in RAM and isn't involved at all in
      the suspend/resume process nor in the SMP bringup process.  That means
      that the most the firmware could do would be to set the bit on "core
      0" and this bit would be lost at suspend/resume time.  It is true that
      we could write a "generic" solution that saved the boot-time "core 0"
      value of this register and applied it at SMP bringup / resume time.
      However, since this register (described as the "Feature Register" in
      errata) appears to be undocumented (as far as I can tell) and is only
      modified for these errata, that "generic" solution seems questionably
      cleaner.  The generic solution also won't fix existing users that
      haven't happened to do a FW update.
      
      Note that in ARM64 presumably PSCI will be universal and fixes like
      this will end up in ATF.  Hopefully we are nearing the end of this
      style of errata workaround.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NHuang Tao <huangtao@rock-chips.com>
      Signed-off-by: NKever Yang <kever.yang@rock-chips.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      62c0f4a5
  18. 02 7月, 2016 1 次提交
  19. 21 5月, 2016 1 次提交
    • Z
      lib/GCD.c: use binary GCD algorithm instead of Euclidean · fff7fb0b
      Zhaoxiu Zeng 提交于
      The binary GCD algorithm is based on the following facts:
      	1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
      	2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
      	3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)
      
      Even on x86 machines with reasonable division hardware, the binary
      algorithm runs about 25% faster (80% the execution time) than the
      division-based Euclidian algorithm.
      
      On platforms like Alpha and ARMv6 where division is a function call to
      emulation code, it's even more significant.
      
      There are two variants of the code here, depending on whether a fast
      __ffs (find least significant set bit) instruction is available.  This
      allows the unpredictable branches in the bit-at-a-time shifting loop to
      be eliminated.
      
      If fast __ffs is not available, the "even/odd" GCD variant is used.
      
      I use the following code to benchmark:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <stdint.h>
      	#include <string.h>
      	#include <time.h>
      	#include <unistd.h>
      
      	#define swap(a, b) \
      		do { \
      			a ^= b; \
      			b ^= a; \
      			a ^= b; \
      		} while (0)
      
      	unsigned long gcd0(unsigned long a, unsigned long b)
      	{
      		unsigned long r;
      
      		if (a < b) {
      			swap(a, b);
      		}
      
      		if (b == 0)
      			return a;
      
      		while ((r = a % b) != 0) {
      			a = b;
      			b = r;
      		}
      
      		return b;
      	}
      
      	unsigned long gcd1(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		b >>= __builtin_ctzl(b);
      
      		for (;;) {
      			a >>= __builtin_ctzl(a);
      			if (a == b)
      				return a << __builtin_ctzl(r);
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      		}
      	}
      
      	unsigned long gcd2(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		r &= -r;
      
      		while (!(b & r))
      			b >>= 1;
      
      		for (;;) {
      			while (!(a & r))
      				a >>= 1;
      			if (a == b)
      				return a;
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      			a >>= 1;
      			if (a & r)
      				a += b;
      			a >>= 1;
      		}
      	}
      
      	unsigned long gcd3(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		b >>= __builtin_ctzl(b);
      		if (b == 1)
      			return r & -r;
      
      		for (;;) {
      			a >>= __builtin_ctzl(a);
      			if (a == 1)
      				return r & -r;
      			if (a == b)
      				return a << __builtin_ctzl(r);
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      		}
      	}
      
      	unsigned long gcd4(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		r &= -r;
      
      		while (!(b & r))
      			b >>= 1;
      		if (b == r)
      			return r;
      
      		for (;;) {
      			while (!(a & r))
      				a >>= 1;
      			if (a == r)
      				return r;
      			if (a == b)
      				return a;
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      			a >>= 1;
      			if (a & r)
      				a += b;
      			a >>= 1;
      		}
      	}
      
      	static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
      		gcd0, gcd1, gcd2, gcd3, gcd4,
      	};
      
      	#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))
      
      	#if defined(__x86_64__)
      
      	#define rdtscll(val) do { \
      		unsigned long __a,__d; \
      		__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
      		(val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \
      	} while(0)
      
      	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
      								unsigned long a, unsigned long b, unsigned long *res)
      	{
      		unsigned long long start, end;
      		unsigned long long ret;
      		unsigned long gcd_res;
      
      		rdtscll(start);
      		gcd_res = gcd(a, b);
      		rdtscll(end);
      
      		if (end >= start)
      			ret = end - start;
      		else
      			ret = ~0ULL - start + 1 + end;
      
      		*res = gcd_res;
      		return ret;
      	}
      
      	#else
      
      	static inline struct timespec read_time(void)
      	{
      		struct timespec time;
      		clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
      		return time;
      	}
      
      	static inline unsigned long long diff_time(struct timespec start, struct timespec end)
      	{
      		struct timespec temp;
      
      		if ((end.tv_nsec - start.tv_nsec) < 0) {
      			temp.tv_sec = end.tv_sec - start.tv_sec - 1;
      			temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
      		} else {
      			temp.tv_sec = end.tv_sec - start.tv_sec;
      			temp.tv_nsec = end.tv_nsec - start.tv_nsec;
      		}
      
      		return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
      	}
      
      	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
      								unsigned long a, unsigned long b, unsigned long *res)
      	{
      		struct timespec start, end;
      		unsigned long gcd_res;
      
      		start = read_time();
      		gcd_res = gcd(a, b);
      		end = read_time();
      
      		*res = gcd_res;
      		return diff_time(start, end);
      	}
      
      	#endif
      
      	static inline unsigned long get_rand()
      	{
      		if (sizeof(long) == 8)
      			return (unsigned long)rand() << 32 | rand();
      		else
      			return rand();
      	}
      
      	int main(int argc, char **argv)
      	{
      		unsigned int seed = time(0);
      		int loops = 100;
      		int repeats = 1000;
      		unsigned long (*res)[TEST_ENTRIES];
      		unsigned long long elapsed[TEST_ENTRIES];
      		int i, j, k;
      
      		for (;;) {
      			int opt = getopt(argc, argv, "n:r:s:");
      			/* End condition always first */
      			if (opt == -1)
      				break;
      
      			switch (opt) {
      			case 'n':
      				loops = atoi(optarg);
      				break;
      			case 'r':
      				repeats = atoi(optarg);
      				break;
      			case 's':
      				seed = strtoul(optarg, NULL, 10);
      				break;
      			default:
      				/* You won't actually get here. */
      				break;
      			}
      		}
      
      		res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
      		memset(elapsed, 0, sizeof(elapsed));
      
      		srand(seed);
      		for (j = 0; j < loops; j++) {
      			unsigned long a = get_rand();
      			/* Do we have args? */
      			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
      			unsigned long long min_elapsed[TEST_ENTRIES];
      			for (k = 0; k < repeats; k++) {
      				for (i = 0; i < TEST_ENTRIES; i++) {
      					unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
      					if (k == 0 || min_elapsed[i] > tmp)
      						min_elapsed[i] = tmp;
      				}
      			}
      			for (i = 0; i < TEST_ENTRIES; i++)
      				elapsed[i] += min_elapsed[i];
      		}
      
      		for (i = 0; i < TEST_ENTRIES; i++)
      			printf("gcd%d: elapsed %llu\n", i, elapsed[i]);
      
      		k = 0;
      		srand(seed);
      		for (j = 0; j < loops; j++) {
      			unsigned long a = get_rand();
      			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
      			for (i = 1; i < TEST_ENTRIES; i++) {
      				if (res[j][i] != res[j][0])
      					break;
      			}
      			if (i < TEST_ENTRIES) {
      				if (k == 0) {
      					k = 1;
      					fprintf(stderr, "Error:\n");
      				}
      				fprintf(stderr, "gcd(%lu, %lu): ", a, b);
      				for (i = 0; i < TEST_ENTRIES; i++)
      					fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
      			}
      		}
      
      		if (k == 0)
      			fprintf(stderr, "PASS\n");
      
      		free(res);
      
      		return 0;
      	}
      
      Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:
      
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 10174
        gcd1: elapsed 2120
        gcd2: elapsed 2902
        gcd3: elapsed 2039
        gcd4: elapsed 2812
        PASS
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 9309
        gcd1: elapsed 2280
        gcd2: elapsed 2822
        gcd3: elapsed 2217
        gcd4: elapsed 2710
        PASS
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 9589
        gcd1: elapsed 2098
        gcd2: elapsed 2815
        gcd3: elapsed 2030
        gcd4: elapsed 2718
        PASS
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 9914
        gcd1: elapsed 2309
        gcd2: elapsed 2779
        gcd3: elapsed 2228
        gcd4: elapsed 2709
        PASS
      
      [akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
      Signed-off-by: NZhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
      Signed-off-by: NGeorge Spelvin <linux@horizon.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fff7fb0b
  20. 09 5月, 2016 1 次提交
  21. 06 5月, 2016 3 次提交