1. 18 11月, 2017 1 次提交
    • G
      pid: replace pid bitmap implementation with IDR API · 95846ecf
      Gargi Sharma 提交于
      Patch series "Replacing PID bitmap implementation with IDR API", v4.
      
      This series replaces kernel bitmap implementation of PID allocation with
      IDR API.  These patches are written to simplify the kernel by replacing
      custom code with calls to generic code.
      
      The following are the stats for pid and pid_namespace object files
      before and after the replacement.  There is a noteworthy change between
      the IDR and bitmap implementation.
      
      Before
         text       data        bss        dec        hex    filename
         8447       3894         64      12405       3075    kernel/pid.o
      After
         text       data        bss        dec        hex    filename
         3397        304          0       3701        e75    kernel/pid.o
      
      Before
         text       data        bss        dec        hex    filename
         5692       1842        192       7726       1e2e    kernel/pid_namespace.o
      After
         text       data        bss        dec        hex    filename
         2854        216         16       3086        c0e    kernel/pid_namespace.o
      
      The following are the stats for ps, pstree and calling readdir on /proc
      for 10,000 processes.
      
      ps:
              With IDR API    With bitmap
      real    0m1.479s        0m2.319s
      user    0m0.070s        0m0.060s
      sys     0m0.289s        0m0.516s
      
      pstree:
              With IDR API    With bitmap
      real    0m1.024s        0m1.794s
      user    0m0.348s        0m0.612s
      sys     0m0.184s        0m0.264s
      
      proc:
              With IDR API    With bitmap
      real    0m0.059s        0m0.074s
      user    0m0.000s        0m0.004s
      sys     0m0.016s        0m0.016s
      
      This patch (of 2):
      
      Replace the current bitmap implementation for Process ID allocation.
      Functions that are no longer required, for example, free_pidmap(),
      alloc_pidmap(), etc.  are removed.  The rest of the functions are
      modified to use the IDR API.  The change was made to make the PID
      allocation less complex by replacing custom code with calls to generic
      API.
      
      [gs051095@gmail.com: v6]
        Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com
      [avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl]
        Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org
      Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.comSigned-off-by: NGargi Sharma <gs051095@gmail.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Julia Lawall <julia.lawall@lip6.fr>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95846ecf
  2. 16 11月, 2017 1 次提交
  3. 27 10月, 2017 1 次提交
  4. 27 9月, 2017 1 次提交
    • D
      ACPI/init: Invoke early ACPI initialization earlier · 9c71206d
      Dou Liyang 提交于
      acpi_early_init() unmaps the temporary ACPI Table mappings which are used
      in the early startup code and prepares for permanent table mappings.
      
      Before the consolidation of the x86 APIC setup code the invocation of
      acpi_early_init() happened before the interrupt remapping unit was
      initialized. With the rework the remapping unit initialization moved in
      front of acpi_early_init() which causes an ACPI warning when the ACPI root
      tables get reallocated afterwards.
      
      Invoke acpi_early_init() before late_time_init() which is before the access
      to the DMAR tables happens.
      
      Fixes: 935356ce ("x86/apic: Initialize interrupt mode after timer init")
      Reported-by: NXiaolong Ye <xiaolong.ye@intel.com>
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-ia64@vger.kernel.org
      Cc: bhe@redhat.com
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-acpi@vger.kernel.org
      Cc: bp@alien8.de
      Cc: Lv" <lv.zheng@intel.com>
      Cc: yinghai@kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lkml.kernel.org/r/1505294274-441-1-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9c71206d
  5. 09 9月, 2017 2 次提交
    • D
      init/main.c: extract early boot entropy from the passed cmdline · 33d72f38
      Daniel Micay 提交于
      Feed the boot command-line as to the /dev/random entropy pool
      
      Existing Android bootloaders usually pass data which may not be known by
      an external attacker on the kernel command-line.  It may also be the
      case on other embedded systems.  Sample command-line from a Google Pixel
      running CopperheadOS....
      
          console=ttyHSL0,115200,n8 androidboot.console=ttyHSL0
          androidboot.hardware=sailfish user_debug=31 ehci-hcd.park=3
          lpm_levels.sleep_disabled=1 cma=32M@0-0xffffffff buildvariant=user
          veritykeyid=id:dfcb9db0089e5b3b4090a592415c28e1cb4545ab
          androidboot.bootdevice=624000.ufshc androidboot.verifiedbootstate=yellow
          androidboot.veritymode=enforcing androidboot.keymaster=1
          androidboot.serialno=FA6CE0305299 androidboot.baseband=msm
          mdss_mdp.panel=1:dsi:0:qcom,mdss_dsi_samsung_ea8064tg_1080p_cmd:1:none:cfg:single_dsi
          androidboot.slot_suffix=_b fpsimd.fpsimd_settings=0
          app_setting.use_app_setting=0 kernelflag=0x00000000 debugflag=0x00000000
          androidboot.hardware.revision=PVT radioflag=0x00000000
          radioflagex1=0x00000000 radioflagex2=0x00000000 cpumask=0x00000000
          androidboot.hardware.ddr=4096MB,Hynix,LPDDR4 androidboot.ddrinfo=00000006
          androidboot.ddrsize=4GB androidboot.hardware.color=GRA00
          androidboot.hardware.ufs=32GB,Samsung androidboot.msm.hw_ver_id=268824801
          androidboot.qf.st=2 androidboot.cid=11111111 androidboot.mid=G-2PW4100
          androidboot.bootloader=8996-012001-1704121145
          androidboot.oem_unlock_support=1 androidboot.fp_src=1
          androidboot.htc.hrdump=detected androidboot.ramdump.opt=mem@2g:2g,mem@4g:2g
          androidboot.bootreason=reboot androidboot.ramdump_enable=0 ro
          root=/dev/dm-0 dm="system none ro,0 1 android-verity /dev/sda34"
          rootwait skip_initramfs init=/init androidboot.wificountrycode=US
          androidboot.boottime=1BLL:85,1BLE:669,2BLL:0,2BLE:1777,SW:6,KL:8136
      
      Among other things, it contains a value unique to the device
      (androidboot.serialno=FA6CE0305299), unique to the OS builds for the
      device variant (veritykeyid=id:dfcb9db0089e5b3b4090a592415c28e1cb4545ab)
      and timings from the bootloader stages in milliseconds
      (androidboot.boottime=1BLL:85,1BLE:669,2BLL:0,2BLE:1777,SW:6,KL:8136).
      
      [tytso@mit.edu: changelog tweak]
      [labbott@redhat.com: line-wrapped command line]
      Link: http://lkml.kernel.org/r/20170816231458.2299-3-labbott@redhat.comSigned-off-by: NDaniel Micay <danielmicay@gmail.com>
      Signed-off-by: NLaura Abbott <labbott@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Nick Kralevich <nnk@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33d72f38
    • L
      init: move stack canary initialization after setup_arch · 121388a3
      Laura Abbott 提交于
      Patch series "Command line randomness", v3.
      
      A series to add the kernel command line as a source of randomness.
      
      This patch (of 2):
      
      Stack canary intialization involves getting a random number.  Getting this
      random number may involve accessing caches or other architectural specific
      features which are not available until after the architecture is setup.
      Move the stack canary initialization later to accommodate this.
      
      Link: http://lkml.kernel.org/r/20170816231458.2299-2-labbott@redhat.comSigned-off-by: NLaura Abbott <lauraa@codeaurora.org>
      Signed-off-by: NLaura Abbott <labbott@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Nick Kralevich <nnk@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      121388a3
  6. 07 9月, 2017 1 次提交
  7. 14 8月, 2017 1 次提交
    • W
      debugobjects: Make kmemleak ignore debug objects · caba4cbb
      Waiman Long 提交于
      The allocated debug objects are either on the free list or in the
      hashed bucket lists. So they won't get lost. However if both debug
      objects and kmemleak are enabled and kmemleak scanning is done
      while some of the debug objects are transitioning from one list to
      the others, false negative reporting of memory leaks may happen for
      those objects. For example,
      
      [38687.275678] kmemleak: 12 new suspected memory leaks (see
      /sys/kernel/debug/kmemleak)
      unreferenced object 0xffff92e98aabeb68 (size 40):
        comm "ksmtuned", pid 4344, jiffies 4298403600 (age 906.430s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 d0 bc db 92 e9 92 ff ff  ................
          01 00 00 00 00 00 00 00 38 36 8a 61 e9 92 ff ff  ........86.a....
        backtrace:
          [<ffffffff8fa5378a>] kmemleak_alloc+0x4a/0xa0
          [<ffffffff8f47c019>] kmem_cache_alloc+0xe9/0x320
          [<ffffffff8f62ed96>] __debug_object_init+0x3e6/0x400
          [<ffffffff8f62ef01>] debug_object_activate+0x131/0x210
          [<ffffffff8f330d9f>] __call_rcu+0x3f/0x400
          [<ffffffff8f33117d>] call_rcu_sched+0x1d/0x20
          [<ffffffff8f4a183c>] put_object+0x2c/0x40
          [<ffffffff8f4a188c>] __delete_object+0x3c/0x50
          [<ffffffff8f4a18bd>] delete_object_full+0x1d/0x20
          [<ffffffff8fa535c2>] kmemleak_free+0x32/0x80
          [<ffffffff8f47af07>] kmem_cache_free+0x77/0x350
          [<ffffffff8f453912>] unlink_anon_vmas+0x82/0x1e0
          [<ffffffff8f440341>] free_pgtables+0xa1/0x110
          [<ffffffff8f44af91>] exit_mmap+0xc1/0x170
          [<ffffffff8f29db60>] mmput+0x80/0x150
          [<ffffffff8f2a7609>] do_exit+0x2a9/0xd20
      
      The references in the debug objects may also hide a real memory leak.
      
      As there is no point in having kmemleak to track debug object
      allocations, kmemleak checking is now disabled for debug objects.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1502718733-8527-1-git-send-email-longman@redhat.com
      caba4cbb
  8. 10 8月, 2017 1 次提交
  9. 27 7月, 2017 1 次提交
    • D
      percpu: replace area map allocator with bitmap · 40064aec
      Dennis Zhou (Facebook) 提交于
      The percpu memory allocator is experiencing scalability issues when
      allocating and freeing large numbers of counters as in BPF.
      Additionally, there is a corner case where iteration is triggered over
      all chunks if the contig_hint is the right size, but wrong alignment.
      
      This patch replaces the area map allocator with a basic bitmap allocator
      implementation. Each subsequent patch will introduce new features and
      replace full scanning functions with faster non-scanning options when
      possible.
      
      Implementation:
      This patchset removes the area map allocator in favor of a bitmap
      allocator backed by metadata blocks. The primary goal is to provide
      consistency in performance and memory footprint with a focus on small
      allocations (< 64 bytes). The bitmap removes the heavy memmove from the
      freeing critical path and provides a consistent memory footprint. The
      metadata blocks provide a bound on the amount of scanning required by
      maintaining a set of hints.
      
      In an effort to make freeing fast, the metadata is updated on the free
      path if the new free area makes a page free, a block free, or spans
      across blocks. This causes the chunk's contig hint to potentially be
      smaller than what it could allocate by up to the smaller of a page or a
      block. If the chunk's contig hint is contained within a block, a check
      occurs and the hint is kept accurate. Metadata is always kept accurate
      on allocation, so there will not be a situation where a chunk has a
      later contig hint than available.
      
      Evaluation:
      I have primarily done testing against a simple workload of allocation of
      1 million objects (2^20) of varying size. Deallocation was done by in
      order, alternating, and in reverse. These numbers were collected after
      rebasing ontop of a80099a1. I present the worst-case numbers here:
      
        Area Map Allocator:
      
              Object Size | Alloc Time (ms) | Free Time (ms)
              ----------------------------------------------
                    4B    |        310      |     4770
                   16B    |        557      |     1325
                   64B    |        436      |      273
                  256B    |        776      |      131
                 1024B    |       3280      |      122
      
        Bitmap Allocator:
      
              Object Size | Alloc Time (ms) | Free Time (ms)
              ----------------------------------------------
                    4B    |        490      |       70
                   16B    |        515      |       75
                   64B    |        610      |       80
                  256B    |        950      |      100
                 1024B    |       3520      |      200
      
      This data demonstrates the inability for the area map allocator to
      handle less than ideal situations. In the best case of reverse
      deallocation, the area map allocator was able to perform within range
      of the bitmap allocator. In the worst case situation, freeing took
      nearly 5 seconds for 1 million 4-byte objects. The bitmap allocator
      dramatically improves the consistency of the free path. The small
      allocations performed nearly identical regardless of the freeing
      pattern.
      
      While it does add to the allocation latency, the allocation scenario
      here is optimal for the area map allocator. The area map allocator runs
      into trouble when it is allocating in chunks where the latter half is
      full. It is difficult to replicate this, so I present a variant where
      the pages are second half filled. Freeing was done sequentially. Below
      are the numbers for this scenario:
      
        Area Map Allocator:
      
              Object Size | Alloc Time (ms) | Free Time (ms)
              ----------------------------------------------
                    4B    |       4118      |     4892
                   16B    |       1651      |     1163
                   64B    |        598      |      285
                  256B    |        771      |      158
                 1024B    |       3034      |      160
      
        Bitmap Allocator:
      
              Object Size | Alloc Time (ms) | Free Time (ms)
              ----------------------------------------------
                    4B    |        481      |       67
                   16B    |        506      |       69
                   64B    |        636      |       75
                  256B    |        892      |       90
                 1024B    |       3262      |      147
      
      The data shows a parabolic curve of performance for the area map
      allocator. This is due to the memmove operation being the dominant cost
      with the lower object sizes as more objects are packed in a chunk and at
      higher object sizes, the traversal of the chunk slots is the dominating
      cost. The bitmap allocator suffers this problem as well. The above data
      shows the inability to scale for the allocation path with the area map
      allocator and that the bitmap allocator demonstrates consistent
      performance in general.
      
      The second problem of additional scanning can result in the area map
      allocator completing in 52 minutes when trying to allocate 1 million
      4-byte objects with 8-byte alignment. The same workload takes
      approximately 16 seconds to complete for the bitmap allocator.
      
      V2:
      Fixed a bug in pcpu_alloc_first_chunk end_offset was setting the bitmap
      using bytes instead of bits.
      
      Added a comment to pcpu_cnt_pop_pages to explain bitmap_weight.
      Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      40064aec
  10. 18 7月, 2017 1 次提交
    • T
      x86, swiotlb: Add memory encryption support · c7753208
      Tom Lendacky 提交于
      Since DMA addresses will effectively look like 48-bit addresses when the
      memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
      device performing the DMA does not support 48-bits. SWIOTLB will be
      initialized to create decrypted bounce buffers for use by these devices.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/aa2d29b78ae7d508db8881e46a3215231b9327a7.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c7753208
  11. 13 7月, 2017 1 次提交
    • K
      random: do not ignore early device randomness · ee7998c5
      Kees Cook 提交于
      The add_device_randomness() function would ignore incoming bytes if the
      crng wasn't ready.  This additionally makes sure to make an early enough
      call to add_latent_entropy() to influence the initial stack canary,
      which is especially important on non-x86 systems where it stays the same
      through the life of the boot.
      
      Link: http://lkml.kernel.org/r/20170626233038.GA48751@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Lokesh Vutla <lokeshvutla@ti.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee7998c5
  12. 23 5月, 2017 2 次提交
  13. 24 4月, 2017 2 次提交
  14. 19 4月, 2017 1 次提交
  15. 04 4月, 2017 1 次提交
    • S
      ftrace: Have init/main.c call ftrace directly to free init memory · b80f0f6c
      Steven Rostedt (VMware) 提交于
      Relying on free_reserved_area() to call ftrace to free init memory proved to
      not be sufficient. The issue is that on x86, when debug_pagealloc is
      enabled, the init memory is not freed, but simply set as not present. Since
      ftrace was uninformed of this, starting function tracing still tries to
      update pages that are not present according to the page tables, causing
      ftrace to bug, as well as killing the kernel itself.
      
      Instead of relying on free_reserved_area(), have init/main.c call ftrace
      directly just before it frees the init memory. Then it needs to use
      __init_begin and __init_end to know where the init memory location is.
      Looking at all archs (and testing what I can), it appears that this should
      work for each of them.
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      b80f0f6c
  16. 01 4月, 2017 1 次提交
    • M
      mm: move mm_percpu_wq initialization earlier · 597b7305
      Michal Hocko 提交于
      Yang Li has reported that drain_all_pages triggers a WARN_ON which means
      that this function is called earlier than the mm_percpu_wq is
      initialized on arm64 with CMA configured:
      
        WARNING: CPU: 2 PID: 1 at mm/page_alloc.c:2423 drain_all_pages+0x244/0x25c
        Modules linked in:
        CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc1-next-20170310-00027-g64dfbc5 #127
        Hardware name: Freescale Layerscape 2088A RDB Board (DT)
        task: ffffffc07c4a6d00 task.stack: ffffffc07c4a8000
        PC is at drain_all_pages+0x244/0x25c
        LR is at start_isolate_page_range+0x14c/0x1f0
        [...]
         drain_all_pages+0x244/0x25c
         start_isolate_page_range+0x14c/0x1f0
         alloc_contig_range+0xec/0x354
         cma_alloc+0x100/0x1fc
         dma_alloc_from_contiguous+0x3c/0x44
         atomic_pool_init+0x7c/0x208
         arm64_dma_init+0x44/0x4c
         do_one_initcall+0x38/0x128
         kernel_init_freeable+0x1a0/0x240
         kernel_init+0x10/0xfc
         ret_from_fork+0x10/0x20
      
      Fix this by moving the whole setup_vmstat which is an initcall right now
      to init_mm_internals which will be called right after the WQ subsystem
      is initialized.
      
      Link: http://lkml.kernel.org/r/20170315164021.28532-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NYang Li <pku.leo@gmail.com>
      Tested-by: NYang Li <pku.leo@gmail.com>
      Tested-by: NXiaolong Ye <xiaolong.ye@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      597b7305
  17. 25 3月, 2017 2 次提交
  18. 02 3月, 2017 5 次提交
  19. 28 2月, 2017 2 次提交
  20. 14 2月, 2017 1 次提交
    • M
      Reimplement IDR and IDA using the radix tree · 0a835c4f
      Matthew Wilcox 提交于
      The IDR is very similar to the radix tree.  It has some functionality that
      the radix tree did not have (alloc next free, cyclic allocation, a
      callback-based for_each, destroy tree), which is readily implementable on
      top of the radix tree.  A few small changes were needed in order to use a
      tag to represent nodes with free space below them.  More extensive
      changes were needed to support storing NULL as a valid entry in an IDR.
      Plain radix trees still interpret NULL as a not-present entry.
      
      The IDA is reimplemented as a client of the newly enhanced radix tree.  As
      in the current implementation, it uses a bitmap at the last level of the
      tree.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      0a835c4f
  21. 10 2月, 2017 1 次提交
    • P
      core: migrate exception table users off module.h and onto extable.h · 8a293be0
      Paul Gortmaker 提交于
      These files were including module.h for exception table related
      functions.  We've now separated that content out into its own file
      "extable.h" so now move over to that and where possible, avoid all
      the extra header content in module.h that we don't really need to
      compile these non-modular files.
      
      Note:
         init/main.c still needs module.h for __init_or_module
         kernel/extable.c still needs module.h for is_module_text_address
      
      ...and so we don't get the benefit of removing module.h from the cpp
      feed for these two files, unlike the almost universal 1:1 exchange
      of module.h for extable.h we were able to do in the arch dirs.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NJessica Yu <jeyu@redhat.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      8a293be0
  22. 08 2月, 2017 2 次提交
  23. 01 2月, 2017 1 次提交
  24. 28 1月, 2017 1 次提交
    • J
      random: use chacha20 for get_random_int/long · f5b98461
      Jason A. Donenfeld 提交于
      Now that our crng uses chacha20, we can rely on its speedy
      characteristics for replacing MD5, while simultaneously achieving a
      higher security guarantee. Before the idea was to use these functions if
      you wanted random integers that aren't stupidly insecure but aren't
      necessarily secure either, a vague gray zone, that hopefully was "good
      enough" for its users. With chacha20, we can strengthen this claim,
      since either we're using an rdrand-like instruction, or we're using the
      same crng as /dev/urandom. And it's faster than what was before.
      
      We could have chosen to replace this with a SipHash-derived function,
      which might be slightly faster, but at the cost of having yet another
      RNG construction in the kernel. By moving to chacha20, we have a single
      RNG to analyze and verify, and we also already get good performance
      improvements on all platforms.
      
      Implementation-wise, rather than use a generic buffer for both
      get_random_int/long and memcpy based on the size needs, we use a
      specific buffer for 32-bit reads and for 64-bit reads. This way, we're
      guaranteed to always have aligned accesses on all platforms. While
      slightly more verbose in C, the assembly this generates is a lot
      simpler than otherwise.
      
      Finally, on 32-bit platforms where longs and ints are the same size,
      we simply alias get_random_int to get_random_long.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Suggested-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f5b98461
  25. 14 1月, 2017 1 次提交
    • P
      sched/clock: Delay switching sched_clock to stable · 9881b024
      Peter Zijlstra 提交于
      Currently we switch to the stable sched_clock if we guess the TSC is
      usable, and then switch back to the unstable path if it turns out TSC
      isn't stable during SMP bringup after all.
      
      Delay switching to the stable path until after SMP bringup is
      complete. This way we'll avoid switching during the time we detect the
      worst of the TSC offences.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9881b024
  26. 26 12月, 2016 1 次提交
    • N
      mm: add PageWaiters indicating tasks are waiting for a page bit · 62906027
      Nicholas Piggin 提交于
      Add a new page flag, PageWaiters, to indicate the page waitqueue has
      tasks waiting. This can be tested rather than testing waitqueue_active
      which requires another cacheline load.
      
      This bit is always set when the page has tasks on page_waitqueue(page),
      and is set and cleared under the waitqueue lock. It may be set when
      there are no tasks on the waitqueue, which will cause a harmless extra
      wakeup check that will clears the bit.
      
      The generic bit-waitqueue infrastructure is no longer used for pages.
      Instead, waitqueues are used directly with a custom key type. The
      generic code was not flexible enough to have PageWaiters manipulation
      under the waitqueue lock (which simplifies concurrency).
      
      This improves the performance of page lock intensive microbenchmarks by
      2-3%.
      
      Putting two bits in the same word opens the opportunity to remove the
      memory barrier between clearing the lock bit and testing the waiters
      bit, after some work on the arch primitives (e.g., ensuring memory
      operand widths match and cover both bits).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62906027
  27. 10 12月, 2016 1 次提交
    • T
      x86/amd: Check for the C1E bug post ACPI subsystem init · e7ff3a47
      Thomas Gleixner 提交于
      AMD CPUs affected by the E400 erratum suffer from the issue that the
      local APIC timer stops when the CPU goes into C1E. Unfortunately there
      is no way to detect the affected CPUs on early boot. It's only possible
      to determine the range of possibly affected CPUs from the family/model
      range.
      
      The actual decision whether to enter C1E and thus cause the bug is done
      by the firmware and we need to detect that case late, after ACPI has
      been initialized.
      
      The current solution is to check in the idle routine whether the CPU is
      affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the
      K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is
      affected and the system is switched into forced broadcast mode.
      
      This is ineffective and on non-affected CPUs every entry to idle does
      the extra RDMSR.
      
      After doing some research it turns out that the bits are visible on the
      boot CPU right after the ACPI subsystem is initialized in the early
      boot process. So instead of polling for the bits in the idle loop, add
      a detection function after acpi_subsystem_init() and check for the MSR
      bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and
      the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it
      will stop in C1E state as well.
      
      The switch to broadcast mode cannot be done at this point because the
      boot CPU still uses HPET as a clockevent device and the local APIC timer
      is not yet calibrated and installed. The switch to broadcast mode on the
      affected CPUs needs to be done when the local APIC timer is actually set
      up.
      
      This allows to cleanup the amd_e400_idle() function in the next step.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e7ff3a47
  28. 29 11月, 2016 1 次提交
    • A
      module: fix DEBUG_SET_MODULE_RONX typo · 4d217a5a
      Arnd Bergmann 提交于
      The newly added 'rodata_enabled' global variable is protected by
      the wrong #ifdef, leading to a link error when CONFIG_DEBUG_SET_MODULE_RONX
      is turned on:
      
      kernel/module.o: In function `disable_ro_nx':
      module.c:(.text.unlikely.disable_ro_nx+0x88): undefined reference to `rodata_enabled'
      kernel/module.o: In function `module_disable_ro':
      module.c:(.text.module_disable_ro+0x8c): undefined reference to `rodata_enabled'
      kernel/module.o: In function `module_enable_ro':
      module.c:(.text.module_enable_ro+0xb0): undefined reference to `rodata_enabled'
      
      CONFIG_SET_MODULE_RONX does not exist, so use the correct one instead.
      
      Fixes: 39290b38 ("module: extend 'rodata=off' boot cmdline parameter to module mappings")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJessica Yu <jeyu@redhat.com>
      4d217a5a
  29. 28 11月, 2016 1 次提交
  30. 24 10月, 2016 1 次提交