1. 07 3月, 2017 1 次提交
    • B
      x86/asm: Optimize clear_page() · f25d3847
      Borislav Petkov 提交于
      Currently, we CALL clear_page() which then JMPs to the proper function
      chosen by the alternatives.
      
      What we should do instead is CALL the proper function directly. (This
      was something Ingo suggested a while ago). So let's do that.
      
      Measuring our favourite kernel build workload shows that there are no
      significant changes in performance.
      
      AMD
      ===
        -- /tmp/before 2017-02-09 18:01:46.451961188 +0100
        ++ /tmp/after  2017-02-09 18:01:54.883961175 +0100
        @@ -1,15 +1,15 @@
          Performance counter stats for 'system wide' (5 runs):
      
        -    1028960.373643      cpu-clock (msec)          #    6.000 CPUs utilized            ( +-  1.41% )
        +    1023086.018961      cpu-clock (msec)          #    6.000 CPUs utilized            ( +-  1.20% )
        -           518,744      context-switches          #    0.504 K/sec                    ( +-  1.04% )
        +           518,254      context-switches          #    0.507 K/sec                    ( +-  1.01% )
        -            38,112      cpu-migrations            #    0.037 K/sec                    ( +-  1.95% )
        +            37,917      cpu-migrations            #    0.037 K/sec                    ( +-  1.02% )
        -        20,874,266      page-faults               #    0.020 M/sec                    ( +-  0.07% )
        +        20,918,897      page-faults               #    0.020 M/sec                    ( +-  0.18% )
        - 2,043,646,230,667      cycles                    #    1.986 GHz                      ( +-  0.14% )  (66.67%)
        + 2,045,305,584,032      cycles                    #    1.999 GHz                      ( +-  0.16% )  (66.67%)
        -   553,698,855,431      stalled-cycles-frontend   #   27.09% frontend cycles idle     ( +-  0.07% )  (66.67%)
        +   555,099,401,413      stalled-cycles-frontend   #   27.14% frontend cycles idle     ( +-  0.13% )  (66.67%)
        -   621,544,286,390      stalled-cycles-backend    #   30.41% backend cycles idle      ( +-  0.39% )  (66.67%)
        +   621,371,430,254      stalled-cycles-backend    #   30.38% backend cycles idle      ( +-  0.32% )  (66.67%)
        - 1,738,364,431,659      instructions              #    0.85  insn per cycle
        + 1,739,895,771,901      instructions              #    0.85  insn per cycle
        -                                                  #    0.36  stalled cycles per insn  ( +-  0.11% )  (66.67%)
        +                                                  #    0.36  stalled cycles per insn  ( +-  0.13% )  (66.67%)
        -   391,170,943,850      branches                  #  380.161 M/sec                    ( +-  0.13% )  (66.67%)
        +   391,398,551,757      branches                  #  382.567 M/sec                    ( +-  0.13% )  (66.67%)
        -    22,567,810,411      branch-misses             #    5.77% of all branches          ( +-  0.11% )  (66.67%)
        +    22,574,726,683      branch-misses             #    5.77% of all branches          ( +-  0.13% )  (66.67%)
      
        -     171.480741921 seconds time elapsed                                          ( +-  1.41% )
        +     170.509229451 seconds time elapsed                                          ( +-  1.20% )
      
      Intel
      =====
      
        -- /tmp/before 2017-02-09 20:36:19.851947473 +0100
        ++ /tmp/after  2017-02-09 20:36:30.151947458 +0100
        @@ -1,15 +1,15 @@
          Performance counter stats for 'system wide' (5 runs):
      
        -    2207248.598126      cpu-clock (msec)          #    8.000 CPUs utilized            ( +-  0.69% )
        +    2213300.106631      cpu-clock (msec)          #    8.000 CPUs utilized            ( +-  0.73% )
        -           899,342      context-switches          #    0.407 K/sec                    ( +-  0.68% )
        +           898,381      context-switches          #    0.406 K/sec                    ( +-  0.79% )
        -            80,553      cpu-migrations            #    0.036 K/sec                    ( +-  1.13% )
        +            80,979      cpu-migrations            #    0.037 K/sec                    ( +-  1.11% )
        -        36,171,148      page-faults               #    0.016 M/sec                    ( +-  0.02% )
        +        36,179,791      page-faults               #    0.016 M/sec                    ( +-  0.02% )
        - 6,665,288,826,484      cycles                    #    3.020 GHz                      ( +-  0.07% )  (83.33%)
        + 6,671,638,410,799      cycles                    #    3.014 GHz                      ( +-  0.06% )  (83.33%)
        - 5,065,975,115,197      stalled-cycles-frontend   #   76.01% frontend cycles idle     ( +-  0.11% )  (83.33%)
        + 5,076,835,183,223      stalled-cycles-frontend   #   76.10% frontend cycles idle     ( +-  0.11% )  (83.33%)
        - 3,841,556,350,614      stalled-cycles-backend    #   57.64% backend cycles idle      ( +-  0.13% )  (66.67%)
        + 3,852,823,974,333      stalled-cycles-backend    #   57.75% backend cycles idle      ( +-  0.12% )  (66.67%)
        - 4,148,398,171,079      instructions              #    0.62  insn per cycle
        + 4,148,997,156,059      instructions              #    0.62  insn per cycle
        -                                                  #    1.22  stalled cycles per insn  ( +-  0.10% )  (83.33%)
        +                                                  #    1.22  stalled cycles per insn  ( +-  0.11% )  (83.33%)
        -   887,187,118,591      branches                  #  401.943 M/sec                    ( +-  0.09% )  (83.33%)
        +   887,271,341,121      branches                  #  400.882 M/sec                    ( +-  0.11% )  (83.33%)
        -    30,139,439,034      branch-misses             #    3.40% of all branches          ( +-  0.09% )  (83.33%)
        +    30,134,864,997      branch-misses             #    3.40% of all branches          ( +-  0.06% )  (83.33%)
      
        -     275.904405540 seconds time elapsed                                          ( +-  0.69% )
        +     276.660352016 seconds time elapsed                                          ( +-  0.73% )
      
      allmodconfig vmlinux size grows by a ~1Kb but that's fine - we optimize
      our calling of the clear_page variants.
      
           text    data     bss     dec     hex filename
        9051979 23067670        27009024        59128673        3863b61		vmlinux
        9053000 23067670        27009024        59129694        3863f5e		vmlinux.clear_page
      Reported-by: Nkernel test robot <fengguang.wu@intel.com>
      Tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170215111927.emdgxf2pide3kwro@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f25d3847
  2. 04 11月, 2014 1 次提交
  3. 09 8月, 2014 1 次提交
    • A
      arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area · a6c19dfe
      Andy Lutomirski 提交于
      The core mm code will provide a default gate area based on
      FIXADDR_USER_START and FIXADDR_USER_END if
      !defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).
      
      This default is only useful for ia64.  arm64, ppc, s390, sh, tile, 64-bit
      UML, and x86_32 have their own code just to disable it.  arm, 32-bit UML,
      and x86_64 have gate areas, but they have their own implementations.
      
      This gets rid of the default and moves the code into ia64.
      
      This should save some code on architectures without a gate area: it's now
      possible to inline the gate_area functions in the default case.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NNathan Lynch <nathan_lynch@mentor.com>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle]
      Acked-by: Richard Weinberger <richard@nod.at> [for um]
      Acked-by: Will Deacon <will.deacon@arm.com> [for arm64]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Nathan Lynch <Nathan_Lynch@mentor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6c19dfe
  4. 17 11月, 2012 3 次提交
  5. 12 2月, 2009 1 次提交
  6. 18 1月, 2009 1 次提交
    • B
      x86-64: Convert irqstacks to per-cpu · 26f80bd6
      Brian Gerst 提交于
      Move the irqstackptr variable from the PDA to per-cpu.  Make the
      stacks themselves per-cpu, removing some specific allocation code.
      Add a seperate flag (is_boot_cpu) to simplify the per-cpu boot
      adjustments.
      
      tj: * sprinkle some underbars around.
      
          * irq_stack_ptr is not used till traps_init(), no reason to
            initialize it early.  On SMP, just leaving it NULL till proper
            initialization in setup_per_cpu_areas() works.  Dropped
            is_boot_cpu and early irq_stack_ptr initialization.
      
          * do DECLARE/DEFINE_PER_CPU(char[IRQ_STACK_SIZE], irq_stack)
            instead of (char, irq_stack[IRQ_STACK_SIZE]).
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      26f80bd6
  7. 23 10月, 2008 2 次提交
  8. 14 9月, 2008 1 次提交
  9. 23 7月, 2008 2 次提交
  10. 09 7月, 2008 1 次提交
  11. 08 7月, 2008 3 次提交
  12. 17 4月, 2008 4 次提交
  13. 26 2月, 2008 2 次提交
    • I
      x86: rename KERNEL_TEXT_SIZE => KERNEL_IMAGE_SIZE · d4afe414
      Ingo Molnar 提交于
      The KERNEL_TEXT_SIZE constant was mis-named, as we not only map the kernel
      text but data, bss and init sections as well.
      
      That name led me on the wrong path with the KERNEL_TEXT_SIZE regression,
      because i knew how big of _text_ my images have and i knew about the 40 MB
      "text" limit so i wrongly thought to be on the safe side of the 40 MB limit
      with my 29 MB of text, while the total image size was slightly above 40 MB.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d4afe414
    • I
      x86: fix spontaneous reboot with allyesconfig bzImage · 88f3aec7
      Ingo Molnar 提交于
      recently the 64-bit allyesconfig bzImage kernel started spontaneously
      rebooting during early bootup.
      
      after a few fun hours spent with early init debugging, it turns out
      that we've got this rather annoying limit on the size of the kernel
      image:
      
            #define KERNEL_TEXT_SIZE  (40*1024*1024)
      
      which limit my vmlinux just happened to pass:
      
             text           data       bss        dec       hex   filename
         29703744        4222751   8646224c   42572719   2899baf   vmlinux
      
      40 MB is 42572719 bytes, so my vmlinux was just 1.5% above this limit :-/
      
      So it happily crashed right in head_64.S, which - as we all know - is
      the most debuggable code in the whole architecture ;-)
      
      So increase the limit to allow an up to 128MB kernel image to be mapped.
      (should anyone be that crazy or lazy)
      
      We have a full 4K of pagetable (level2_kernel_pgt) allocated for these
      mappings already, so there's no RAM overhead and the limit was rather
      pointless and arbitrary.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      88f3aec7
  14. 09 2月, 2008 1 次提交
    • M
      CONFIG_HIGHPTE vs. sub-page page tables. · 2f569afd
      Martin Schwidefsky 提交于
      Background: I've implemented 1K/2K page tables for s390.  These sub-page
      page tables are required to properly support the s390 virtualization
      instruction with KVM.  The SIE instruction requires that the page tables
      have 256 page table entries (pte) followed by 256 page status table entries
      (pgste).  The pgstes are only required if the process is using the SIE
      instruction.  The pgstes are updated by the hardware and by the hypervisor
      for a number of reasons, one of them is dirty and reference bit tracking.
      To avoid wasting memory the standard pte table allocation should return
      1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
      
      Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
      the s390 version for pte_alloc_one cannot return a pointer to a struct
      page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
      cannot return a pointer to a pte either, since that would require more than
      32 bit for the return value of pte_alloc_one (and the pte * would not be
      accessible since its not kmapped).
      
      Solution: The only solution I found to this dilemma is a new typedef: a
      pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
      later patch.  For everybody else it will be a (struct page *).  The
      additional problem with the initialization of the ptl lock and the
      NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
      a destructor pgtable_page_dtor.  The page table allocation and free
      functions need to call these two whenever a page table page is allocated or
      freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
       To get the pgtable_t back from a pmd entry that has been installed with
      pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
      call in free_pte_range and apply_to_pte_range.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f569afd
  15. 04 2月, 2008 1 次提交
  16. 30 1月, 2008 11 次提交
  17. 17 10月, 2007 1 次提交
  18. 11 10月, 2007 1 次提交
  19. 18 7月, 2007 1 次提交
    • M
      Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated · 769848c0
      Mel Gorman 提交于
      It is often known at allocation time whether a page may be migrated or not.
      This patch adds a flag called __GFP_MOVABLE and a new mask called
      GFP_HIGH_MOVABLE.  Allocations using the __GFP_MOVABLE can be either migrated
      using the page migration mechanism or reclaimed by syncing with backing
      storage and discarding.
      
      An API function very similar to alloc_zeroed_user_highpage() is added for
      __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable().  The
      flags used by alloc_zeroed_user_highpage() are not changed because it would
      change the semantics of an existing API.  After this patch is applied there
      are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
      be marked deprecated if this patch is merged.
      
      Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
      shmem.c to keep all flag modifications to inode->mapping in the
      shmem_dir_alloc() helper function.  This clean-up suggestion is courtesy of
      Hugh Dickens.
      
      Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
      concept.  Credit to Hugh Dickens for catching issues with shmem swap vector
      and ramfs allocations.
      
      [akpm@linux-foundation.org: build fix]
      [hugh@veritas.com: __GFP_ZERO cleanup]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      769848c0
  20. 11 5月, 2007 1 次提交