1. 09 9月, 2016 2 次提交
    • N
      kbuild: allow archs to select link dead code/data elimination · b67067f1
      Nicholas Piggin 提交于
      Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to
      select to build with -ffunction-sections, -fdata-sections, and link
      with --gc-sections. It requires some work (documented) to ensure all
      unreferenced entrypoints are live, and requires toolchain and build
      verification, so it is made a per-arch option for now.
      
      On a random powerpc64le build, this yelds a significant size saving,
      it boots and runs fine, but there is a lot I haven't tested as yet, so
      these savings may be reduced if there are bugs in the link.
      
          text      data        bss        dec   filename
      11169741   1180744    1923176	14273661   vmlinux
      10445269   1004127    1919707	13369103   vmlinux.dce
      
      ~700K text, ~170K data, 6% removed from kernel image size.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichal Marek <mmarek@suse.com>
      b67067f1
    • S
      kbuild: allow architectures to use thin archives instead of ld -r · a5967db9
      Stephen Rothwell 提交于
      ld -r is an incremental link used to create built-in.o files in build
      subdirectories. It produces relocatable object files containing all
      its input files, and these are are then pulled together and relocated
      in the final link. Aside from the bloat, this constrains the final
      link relocations, which has bitten large powerpc builds with
      unresolvable relocations in the final link.
      
      Alan Modra has recommended the kernel use thin archives for linking.
      This is an alternative and means that the linker has more information
      available to it when it links the kernel.
      
      This patch enables a config option architectures can select, which
      causes all built-in.o files to be built as thin archives. built-in.o
      files in subdirectories do not get symbol table or index attached,
      which improves speed and size. The final link pass creates a
      built-in.o archive in the root output directory which includes the
      symbol table and index. The linker then uses takes this file to link.
      
      The --whole-archive linker option is required, because the linker now
      has visibility to every individual object file, and it will otherwise
      just completely avoid including those without external references
      (consider a file with EXPORT_SYMBOL or initcall or hardware exceptions
      as its only entry points). The traditional built works "by luck" as
      built-in.o files are large enough that they're going to get external
      references. However this optimisation is unpredictable for the kernel
      (due to above external references), ineffective at culling unused, and
      costly because the .o files have to be searched for references.
      Superior alternatives for link-time culling should be used instead.
      
      Build characteristics for inclink vs thinarc, on a small powerpc64le
      pseries VM with a modest .config:
      
                                        inclink       thinarc
      sizes
      vmlinux                        15 618 680    15 625 028
      sum of all built-in.o          56 091 808     1 054 334
      sum excluding root built-in.o                   151 430
      
      find -name built-in.o | xargs rm ; time make vmlinux
      real                              22.772s       21.143s
      user                              13.280s       13.430s
      sys                                4.310s        2.750s
      
      - Final kernel pulled in only about 6K more, which shows how
        ineffective the object file culling is.
      - Build performance looks improved due to less pagecache activity.
        On IO constrained systems it could be a bigger win.
      - Build size saving is significant.
      
      Side note, the toochain understands archives, so there's some tricks,
      $ ar t built-in.o          # list all files you linked with
      $ size built-in.o          # and their sizes
      $ objdump -d built-in.o    # disassembly (unrelocated) with filenames
      
      Implementation by sfr, minor tweaks by npiggin.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichal Marek <mmarek@suse.com>
      a5967db9
  2. 08 8月, 2016 11 次提交
  3. 05 8月, 2016 14 次提交
  4. 04 8月, 2016 13 次提交
    • J
      arm: jump label may reference text in __exit · ddb45306
      Jason Baron 提交于
      The jump table can reference text found in an __exit section.  Thus,
      instead of discarding it at build time, include EXIT_TEXT as part of
      __init and it will be released when the system boots.
      
      Link: http://lkml.kernel.org/r/60284113bb759121e8ae3e99af1535647e52123f.1467837322.git.jbaron@akamai.comSigned-off-by: NJason Baron <jbaron@akamai.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddb45306
    • C
      tile: support static_key usage in non-module __exit sections · c14b4bcf
      Chris Metcalf 提交于
      Previously, all the __exit sections were just dropped by the link phase.
      However, if there are static_key (jump label) constructs in __exit
      sections that are not modules, the link fails with the message:
      
         `.exit.text' referenced in section `__jump_table' of xxx.o:
         defined in discarded section `.exit.text' of xxx.o
      
      Support this usage by keeping the .exit.text sections in the final image
      if JUMP_LABEL is defined, then discarding them once initialization is
      complete.
      
      Link: http://lkml.kernel.org/r/bfd7c107c610c30e992868ebfe2a5d796a097464.1467837322.git.jbaron@akamai.comSigned-off-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c14b4bcf
    • J
      sparc: support static_key usage in non-module __exit sections · 10d7227b
      Jason Baron 提交于
      The jump table can reference text found in an __exit section.  Thus,
      instead of discarding it at build/link time, include EXIT_TEXT as part
      of __init and release it at system boot time.
      
      Without this patch the link fails with:
      
          `.exit.text' referenced in section `__jump_table' of xxx.o:
          defined in discarded section `.exit.text' of xxx.o
      
      Link: http://lkml.kernel.org/r/d822da427ab07a02a394602eca687104ff682f83.1467837322.git.jbaron@akamai.comSigned-off-by: NJason Baron <jbaron@akamai.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10d7227b
    • J
      powerpc: add explicit #include <asm/asm-compat.h> for jump label · 5411fd7f
      Jason Baron 提交于
      The stringify_in_c() macro may not be included. Make the dependency
      explicit.
      
      Link: http://lkml.kernel.org/r/564720c5328edd53c9d56db325be7215440eec3e.1467837322.git.jbaron@akamai.comSigned-off-by: NJason Baron <jbaron@akamai.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Joe Perches <joe@perches.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5411fd7f
    • K
      dma-mapping: use unsigned long for dma_attrs · 00085f1e
      Krzysztof Kozlowski 提交于
      The dma-mapping core and the implementations do not change the DMA
      attributes passed by pointer.  Thus the pointer can point to const data.
      However the attributes do not have to be a bitfield.  Instead unsigned
      long will do fine:
      
      1. This is just simpler.  Both in terms of reading the code and setting
         attributes.  Instead of initializing local attributes on the stack
         and passing pointer to it to dma_set_attr(), just set the bits.
      
      2. It brings safeness and checking for const correctness because the
         attributes are passed by value.
      
      Semantic patches for this change (at least most of them):
      
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
      
          @@
          f(...,
          - struct dma_attrs *attrs
          + unsigned long attrs
          , ...)
          {
          ...
          }
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      and
      
          // Options: --all-includes
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
          type t;
      
          @@
          t f(..., struct dma_attrs *attrs);
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Acked-by: Mark Salter <msalter@redhat.com> [c6x]
      Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
      Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
      Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
      Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
      Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
      Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00085f1e
    • M
      tree-wide: replace config_enabled() with IS_ENABLED() · 97f2645f
      Masahiro Yamada 提交于
      The use of config_enabled() against config options is ambiguous.  In
      practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
      author might have used it for the meaning of IS_ENABLED().  Using
      IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
      clearer.
      
      This commit replaces config_enabled() with IS_ENABLED() where possible.
      This commit is only touching bool config options.
      
      I noticed two cases where config_enabled() is used against a tristate
      option:
      
       - config_enabled(CONFIG_HWMON)
        [ drivers/net/wireless/ath/ath10k/thermal.c ]
      
       - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
        [ drivers/gpu/drm/gma500/opregion.c ]
      
      I did not touch them because they should be converted to IS_BUILTIN()
      in order to keep the logic, but I was not sure it was the authors'
      intention.
      
      Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Nikolay Martynov <mar.kolya@gmail.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: James Cowgill <James.Cowgill@imgtec.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Tony Wu <tung7970@gmail.com>
      Cc: Huaitong Han <huaitong.han@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Cc: "Maciej W. Rozycki" <macro@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97f2645f
    • S
      arm64: Fix copy-on-write referencing in HugeTLB · 747a70e6
      Steve Capper 提交于
      set_pte_at(.) will set or unset the PTE_RDONLY hardware bit before
      writing the entry to the table.
      
      This can cause problems with the copy-on-write logic in hugetlb_cow:
       *) hugetlb_cow(.) called to handle a write fault on read only pte,
       *) Before the copy-on-write updates the new page table a call is
          made to pte_same(huge_ptep_get(ptep), pte)), to check for a race,
       *) Because set_pte_at(.) changed the pte, *ptep != pte, and the
          hugetlb_cow(.) code erroneously assumes that it lost the race,
       *) The new page is subsequently freed without being used.
      
      On arm64 this problem only becomes apparent when we apply:
      67961f9d mm/hugetlb: fix huge page reserve accounting for private
      mappings
      
      When one runs the libhugetlbfs test suite, there are allocation errors
      and hugetlbfs pages become erroneously locked in memory as reserved.
      (There is a high HugePages_Rsvd: count).
      
      In this patch we introduce pte_same which ignores the PTE_RDONLY bit,
      allowing for the libhugetlbfs test suite to pass as expected and
      without leaking any reserved HugeTLB pages.
      Reported-by: NHuang Shijie <shijie.huang@arm.com>
      Signed-off-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      747a70e6
    • B
      nvmx: mark ept single context invalidation as supported · 45e11817
      Bandan Das 提交于
      Commit 4b855078 ("KVM: nVMX: Don't advertise single
      context invalidation for invept") removed advertising
      single context invalidation since the spec does not mandate it.
      However, some hypervisors (such as ESX) require it to be present
      before willing to use ept in a nested environment. Advertise it
      and fallback to the global case.
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      45e11817
    • B
      nvmx: remove comment about missing nested vpid support · 03331b4b
      Bandan Das 提交于
      Nested vpid is already supported and both single/global
      modes are advertised to the guest
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      03331b4b
    • W
      KVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off · 91005300
      Wanpeng Li 提交于
      BUG: unable to handle kernel NULL pointer dereference at 000000000000008c
      IP: [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
      PGD 0
      Oops: 0000 [#1] SMP
      Call Trace:
       kvm_arch_vcpu_load+0x86/0x260 [kvm]
       vcpu_load+0x46/0x60 [kvm]
       kvm_vcpu_ioctl+0x79/0x7c0 [kvm]
       ? __lock_is_held+0x54/0x70
       do_vfs_ioctl+0x96/0x6a0
       ? __fget_light+0x2a/0x90
       SyS_ioctl+0x79/0x90
       do_syscall_64+0x7c/0x1e0
       entry_SYSCALL64_slow_path+0x25/0x25
      RIP  [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
       RSP <ffff8800db1f3d70>
      CR2: 000000000000008c
      ---[ end trace a55fb79d2b3b4ee8 ]---
      
      This can be reproduced steadily by kernel_irqchip=off.
      
      We should not access preemption timer stuff if lapic is emulated in userspace.
      This patch fix it by avoiding access preemption timer stuff when kernel_irqchip=off.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Yunhong Jiang <yunhong.jiang@intel.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      91005300
    • P
      x86: vdso: use __pvclock_read_cycles · abe9efa7
      Paolo Bonzini 提交于
      The new simplified __pvclock_read_cycles does the same computation
      as vread_pvclock, except that (because it takes the pvclock_vcpu_time_info
      pointer) it has to be moved inside the loop.  Since the loop is expected to
      never roll, this makes no difference.
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      abe9efa7
    • P
      pvclock: introduce seqcount-like API · 3aed64f6
      Paolo Bonzini 提交于
      The version field in struct pvclock_vcpu_time_info basically implements
      a seqcount.  Wrap it with the usual read_begin and read_retry functions,
      and use these APIs instead of peppering the code with smp_rmb()s.
      While at it, change it to the more pedantically correct virt_rmb().
      
      With this change, __pvclock_read_cycles can be simplified noticeably.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3aed64f6
    • M
      powerpc/mm: Move register_process_table() out of ppc_md · eea8148c
      Michael Ellerman 提交于
      We want to initialise register_process_table() before ppc_md is setup,
      so that it can be called as part of MMU init (at least on Radix ATM).
      
      That no longer works because probe_machine() requires that ppc_md be
      empty before it's called, and we now do probe_machine() much later.
      
      So make register_process_table a global for now. It will probably move
      into a mmu_radix_ops struct at some point in the future.
      
      This was broken by me when applying commit 7025776e "powerpc/mm:
      Move hash table ops to a separate structure" due to conflicts with other
      patches.
      
      Fixes: 7025776e ("powerpc/mm: Move hash table ops to a separate structure")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      eea8148c