1. 09 7月, 2019 1 次提交
  2. 08 7月, 2019 2 次提交
    • M
      kbuild: compile-test exported headers to ensure they are self-contained · d6fc9fcb
      Masahiro Yamada 提交于
      Multiple people have suggested compile-testing UAPI headers to ensure
      they can be really included from user-space. "make headers_check" is
      obviously not enough to catch bugs, and we often leak unresolved
      references to user-space.
      
      Use the new header-test-y syntax to implement it. Please note exported
      headers are compile-tested with a completely different set of compiler
      flags. The header search path is set to $(objtree)/usr/include since
      exported headers should not include unexported ones.
      
      We use -std=gnu89 for the kernel space since the kernel code highly
      depends on GNU extensions. On the other hand, UAPI headers should be
      written in more standardized C, so they are compiled with -std=c90.
      This will emit errors if C++ style comments, the keyword 'inline', etc.
      are used. Please use C style comments (/* ... */), '__inline__', etc.
      in UAPI headers.
      
      There is additional compiler requirement to enable this test because
      many of UAPI headers include <stdlib.h>, <sys/ioctl.h>, <sys/time.h>,
      etc. directly or indirectly. You cannot use kernel.org pre-built
      toolchains [1] since they lack <stdlib.h>.
      
      I reused CONFIG_CC_CAN_LINK to check the system header availability.
      The intention is slightly different, but a compiler that can link
      userspace programs provide system headers.
      
      For now, a lot of headers need to be excluded because they cannot
      be compiled standalone, but this is a good start point.
      
      [1] https://mirrors.edge.kernel.org/pub/tools/crosstool/index.htmlSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: NSam Ravnborg <sam@ravnborg.org>
      d6fc9fcb
    • M
      init/Kconfig: add CONFIG_CC_CAN_LINK · 1a927fd3
      Masahiro Yamada 提交于
      Currently, scripts/cc-can-link.sh is run just for BPFILTER_UMH, but
      defining CC_CAN_LINK will be useful in other places.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      1a927fd3
  3. 25 6月, 2019 1 次提交
    • P
      sched/uclamp: Add CPU's clamp buckets refcounting · 69842cba
      Patrick Bellasi 提交于
      Utilization clamping allows to clamp the CPU's utilization within a
      [util_min, util_max] range, depending on the set of RUNNABLE tasks on
      that CPU. Each task references two "clamp buckets" defining its minimum
      and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp
      bucket is active if there is at least one RUNNABLE tasks enqueued on
      that CPU and refcounting that bucket.
      
      When a task is {en,de}queued {on,from} a rq, the set of active clamp
      buckets on that CPU can change. If the set of active clamp buckets
      changes for a CPU a new "aggregated" clamp value is computed for that
      CPU. This is because each clamp bucket enforces a different utilization
      clamp value.
      
      Clamp values are always MAX aggregated for both util_min and util_max.
      This ensures that no task can affect the performance of other
      co-scheduled tasks which are more boosted (i.e. with higher util_min
      clamp) or less capped (i.e. with higher util_max clamp).
      
      A task has:
         task_struct::uclamp[clamp_id]::bucket_id
      to track the "bucket index" of the CPU's clamp bucket it refcounts while
      enqueued, for each clamp index (clamp_id).
      
      A runqueue has:
         rq::uclamp[clamp_id]::bucket[bucket_id].tasks
      to track how many RUNNABLE tasks on that CPU refcount each
      clamp bucket (bucket_id) of a clamp index (clamp_id).
      It also has a:
         rq::uclamp[clamp_id]::bucket[bucket_id].value
      to track the clamp value of each clamp bucket (bucket_id) of a clamp
      index (clamp_id).
      
      The rq::uclamp::bucket[clamp_id][] array is scanned every time it's
      needed to find a new MAX aggregated clamp value for a clamp_id. This
      operation is required only when it's dequeued the last task of a clamp
      bucket tracking the current MAX aggregated clamp value. In this case,
      the CPU is either entering IDLE or going to schedule a less boosted or
      more clamped task.
      The expected number of different clamp values configured at build time
      is small enough to fit the full unordered array into a single cache
      line, for configurations of up to 7 buckets.
      
      Add to struct rq the basic data structures required to refcount the
      number of RUNNABLE tasks for each clamp bucket. Add also the max
      aggregation required to update the rq's clamp value at each
      enqueue/dequeue event.
      
      Use a simple linear mapping of clamp values into clamp buckets.
      Pre-compute and cache bucket_id to avoid integer divisions at
      enqueue/dequeue time.
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190621084217.8167-2-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69842cba
  4. 21 6月, 2019 1 次提交
  5. 15 6月, 2019 3 次提交
  6. 25 5月, 2019 1 次提交
  7. 21 5月, 2019 1 次提交
  8. 15 5月, 2019 1 次提交
    • D
      mm: shuffle initial free memory to improve memory-side-cache utilization · e900a918
      Dan Williams 提交于
      Patch series "mm: Randomize free memory", v10.
      
      This patch (of 3):
      
      Randomization of the page allocator improves the average utilization of
      a direct-mapped memory-side-cache.  Memory side caching is a platform
      capability that Linux has been previously exposed to in HPC
      (high-performance computing) environments on specialty platforms.  In
      that instance it was a smaller pool of high-bandwidth-memory relative to
      higher-capacity / lower-bandwidth DRAM.  Now, this capability is going
      to be found on general purpose server platforms where DRAM is a cache in
      front of higher latency persistent memory [1].
      
      Robert offered an explanation of the state of the art of Linux
      interactions with memory-side-caches [2], and I copy it here:
      
          It's been a problem in the HPC space:
          http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/
      
          A kernel module called zonesort is available to try to help:
          https://software.intel.com/en-us/articles/xeon-phi-software
      
          and this abandoned patch series proposed that for the kernel:
          https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com
      
          Dan's patch series doesn't attempt to ensure buffers won't conflict, but
          also reduces the chance that the buffers will. This will make performance
          more consistent, albeit slower than "optimal" (which is near impossible
          to attain in a general-purpose kernel).  That's better than forcing
          users to deploy remedies like:
              "To eliminate this gradual degradation, we have added a Stream
               measurement to the Node Health Check that follows each job;
               nodes are rebooted whenever their measured memory bandwidth
               falls below 300 GB/s."
      
      A replacement for zonesort was merged upstream in commit cc9aec03
      ("x86/numa_emulation: Introduce uniform split capability").  With this
      numa_emulation capability, memory can be split into cache sized
      ("near-memory" sized) numa nodes.  A bind operation to such a node, and
      disabling workloads on other nodes, enables full cache performance.
      However, once the workload exceeds the cache size then cache conflicts
      are unavoidable.  While HPC environments might be able to tolerate
      time-scheduling of cache sized workloads, for general purpose server
      platforms, the oversubscribed cache case will be the common case.
      
      The worst case scenario is that a server system owner benchmarks a
      workload at boot with an un-contended cache only to see that performance
      degrade over time, even below the average cache performance due to
      excessive conflicts.  Randomization clips the peaks and fills in the
      valleys of cache utilization to yield steady average performance.
      
      Here are some performance impact details of the patches:
      
      1/ An Intel internal synthetic memory bandwidth measurement tool, saw a
         3X speedup in a contrived case that tries to force cache conflicts.
         The contrived cased used the numa_emulation capability to force an
         instance of the benchmark to be run in two of the near-memory sized
         numa nodes.  If both instances were placed on the same emulated they
         would fit and cause zero conflicts.  While on separate emulated nodes
         without randomization they underutilized the cache and conflicted
         unnecessarily due to the in-order allocation per node.
      
      2/ A well known Java server application benchmark was run with a heap
         size that exceeded cache size by 3X.  The cache conflict rate was 8%
         for the first run and degraded to 21% after page allocator aging.  With
         randomization enabled the rate levelled out at 11%.
      
      3/ A MongoDB workload did not observe measurable difference in
         cache-conflict rates, but the overall throughput dropped by 7% with
         randomization in one case.
      
      4/ Mel Gorman ran his suite of performance workloads with randomization
         enabled on platforms without a memory-side-cache and saw a mix of some
         improvements and some losses [3].
      
      While there is potentially significant improvement for applications that
      depend on low latency access across a wide working-set, the performance
      may be negligible to negative for other workloads.  For this reason the
      shuffle capability defaults to off unless a direct-mapped
      memory-side-cache is detected.  Even then, the page_alloc.shuffle=0
      parameter can be specified to disable the randomization on those systems.
      
      Outside of memory-side-cache utilization concerns there is potentially
      security benefit from randomization.  Some data exfiltration and
      return-oriented-programming attacks rely on the ability to infer the
      location of sensitive data objects.  The kernel page allocator, especially
      early in system boot, has predictable first-in-first out behavior for
      physical pages.  Pages are freed in physical address order when first
      onlined.
      
      Quoting Kees:
          "While we already have a base-address randomization
           (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
           memory layouts would certainly be using the predictability of
           allocation ordering (i.e. for attacks where the base address isn't
           important: only the relative positions between allocated memory).
           This is common in lots of heap-style attacks. They try to gain
           control over ordering by spraying allocations, etc.
      
           I'd really like to see this because it gives us something similar
           to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."
      
      While SLAB_FREELIST_RANDOM reduces the predictability of some local slab
      caches it leaves vast bulk of memory to be predictably in order allocated.
      However, it should be noted, the concrete security benefits are hard to
      quantify, and no known CVE is mitigated by this randomization.
      
      Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform
      a Fisher-Yates shuffle of the page allocator 'free_area' lists when they
      are initially populated with free memory at boot and at hotplug time.  Do
      this based on either the presence of a page_alloc.shuffle=Y command line
      parameter, or autodetection of a memory-side-cache (to be added in a
      follow-on patch).
      
      The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free
      pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e.  10,
      4MB this trades off randomization granularity for time spent shuffling.
      MAX_ORDER-1 was chosen to be minimally invasive to the page allocator
      while still showing memory-side cache behavior improvements, and the
      expectation that the security implications of finer granularity
      randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM.  The
      performance impact of the shuffling appears to be in the noise compared to
      other memory initialization work.
      
      This initial randomization can be undone over time so a follow-on patch is
      introduced to inject entropy on page free decisions.  It is reasonable to
      ask if the page free entropy is sufficient, but it is not enough due to
      the in-order initial freeing of pages.  At the start of that process
      putting page1 in front or behind page0 still keeps them close together,
      page2 is still near page1 and has a high chance of being adjacent.  As
      more pages are added ordering diversity improves, but there is still high
      page locality for the low address pages and this leads to no significant
      impact to the cache conflict rate.
      
      [1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
      [2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM
      [3]: https://lkml.org/lkml/2018/10/12/309
      
      [dan.j.williams@intel.com: fix shuffle enable]
        Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com
      [cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts]
        Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw
      Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NQian Cai <cai@lca.pw>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Robert Elliott <elliott@hpe.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e900a918
  9. 29 4月, 2019 2 次提交
    • J
      init/config: Do not select BUILD_BIN2C for IKCONFIG · bc0c6045
      Joel Fernandes (Google) 提交于
      Since commit 13610aa9 ("kernel/configs: use .incbin directive to
      embed config_data.gz"), IKCONFIG no longer uses BUILD_BIN2C so prevent
      it from being selected in Kconfig.
      Reviewed-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc0c6045
    • J
      Provide in-kernel headers to make extending kernel easier · 43d8ce9d
      Joel Fernandes (Google) 提交于
      Introduce in-kernel headers which are made available as an archive
      through proc (/proc/kheaders.tar.xz file). This archive makes it
      possible to run eBPF and other tracing programs that need to extend the
      kernel for tracing purposes without any dependency on the file system
      having headers.
      
      A github PR is sent for the corresponding BCC patch at:
      https://github.com/iovisor/bcc/pull/2312
      
      On Android and embedded systems, it is common to switch kernels but not
      have kernel headers available on the file system. Further once a
      different kernel is booted, any headers stored on the file system will
      no longer be useful. This is an issue even well known to distros.
      By storing the headers as a compressed archive within the kernel, we can
      avoid these issues that have been a hindrance for a long time.
      
      The best way to use this feature is by building it in. Several users
      have a need for this, when they switch debug kernels, they do not want to
      update the filesystem or worry about it where to store the headers on
      it. However, the feature is also buildable as a module in case the user
      desires it not being part of the kernel image. This makes it possible to
      load and unload the headers from memory on demand. A tracing program can
      load the module, do its operations, and then unload the module to save
      kernel memory. The total memory needed is 3.3MB.
      
      By having the archive available at a fixed location independent of
      filesystem dependencies and conventions, all debugging tools can
      directly refer to the fixed location for the archive, without concerning
      with where the headers on a typical filesystem which significantly
      simplifies tooling that needs kernel headers.
      
      The code to read the headers is based on /proc/config.gz code and uses
      the same technique to embed the headers.
      
      Other approaches were discussed such as having an in-memory mountable
      filesystem, but that has drawbacks such as requiring an in-kernel xz
      decompressor which we don't have today, and requiring usage of 42 MB of
      kernel memory to host the decompressed headers at anytime. Also this
      approach is simpler than such approaches.
      Reviewed-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43d8ce9d
  10. 19 4月, 2019 1 次提交
  11. 21 3月, 2019 1 次提交
  12. 07 3月, 2019 1 次提交
    • A
      time: Make VIRT_CPU_ACCOUNTING_GEN depend on GENERIC_CLOCKEVENTS · 041a1574
      Arnd Bergmann 提交于
      Moving the CONTEXT_TRACKING Kconfig option into kernel/time/Kconfig added
      an implicit dependency on the surrounding GENERIC_CLOCKEVENTS option, but
      this is not always enabled when it is possible to select
      VIRT_CPU_ACCOUNTING_GEN:
      
      WARNING: unmet direct dependencies detected for CONTEXT_TRACKING
        Depends on [n]: GENERIC_CLOCKEVENTS [=n]
        Selected by [y]:
        - VIRT_CPU_ACCOUNTING_GEN [=y] && <choice> && HAVE_CONTEXT_TRACKING [=y] && HAVE_VIRT_CPU_ACCOUNTING_GEN [=y]
      
      Platforms without GENERIC_CLOCKEVENTS are rare enough so that corner case
      can be just ignored. Make it a dependency for VIRT_CPU_ACCOUNTING_GEN to
      simplify the configuration.
      
      Fixes: a4cffdad ("time: Move CONTEXT_TRACKING to kernel/time/Kconfig")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: "Paul E . McKenney" <paulmck@linux.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: https://lkml.kernel.org/r/20190304200202.1163250-1-arnd@arndb.de
      041a1574
  13. 04 3月, 2019 1 次提交
  14. 28 2月, 2019 1 次提交
    • J
      Add io_uring IO interface · 2b188cc1
      Jens Axboe 提交于
      The submission queue (SQ) and completion queue (CQ) rings are shared
      between the application and the kernel. This eliminates the need to
      copy data back and forth to submit and complete IO.
      
      IO submissions use the io_uring_sqe data structure, and completions
      are generated in the form of io_uring_cqe data structures. The SQ
      ring is an index into the io_uring_sqe array, which makes it possible
      to submit a batch of IOs without them being contiguous in the ring.
      The CQ ring is always contiguous, as completion events are inherently
      unordered, and hence any io_uring_cqe entry can point back to an
      arbitrary submission.
      
      Two new system calls are added for this:
      
      io_uring_setup(entries, params)
      	Sets up an io_uring instance for doing async IO. On success,
      	returns a file descriptor that the application can mmap to
      	gain access to the SQ ring, CQ ring, and io_uring_sqes.
      
      io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
      	Initiates IO against the rings mapped to this fd, or waits for
      	them to complete, or both. The behavior is controlled by the
      	parameters passed in. If 'to_submit' is non-zero, then we'll
      	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
      	kernel will wait for 'min_complete' events, if they aren't
      	already available. It's valid to set IORING_ENTER_GETEVENTS
      	and 'min_complete' == 0 at the same time, this allows the
      	kernel to return already completed events without waiting
      	for them. This is useful only for polling, as for IRQ
      	driven IO, the application can just check the CQ ring
      	without entering the kernel.
      
      With this setup, it's possible to do async IO with a single system
      call. Future developments will enable polled IO with this interface,
      and polled submission as well. The latter will enable an application
      to do IO without doing ANY system calls at all.
      
      For IRQ driven IO, an application only needs to enter the kernel for
      completions if it wants to wait for them to occur.
      
      Each io_uring is backed by a workqueue, to support buffered async IO
      as well. We will only punt to an async context if the command would
      need to wait for IO on the device side. Any data that can be accessed
      directly in the page cache is done inline. This avoids the slowness
      issue of usual threadpools, since cached data is accessed as quickly
      as a sync interface.
      
      Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.cReviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2b188cc1
  15. 27 2月, 2019 1 次提交
  16. 02 2月, 2019 2 次提交
  17. 14 1月, 2019 1 次提交
    • P
      kbuild: Disable LD_DEAD_CODE_DATA_ELIMINATION with ftrace & GCC <= 4.7 · 16fd20aa
      Paul Burton 提交于
      When building using GCC 4.7 or older, -ffunction-sections & the -pg flag
      used by ftrace are incompatible. This causes warnings or build failures
      (where -Werror applies) such as the following:
      
        arch/mips/generic/init.c:
          error: -ffunction-sections disabled; it makes profiling impossible
      
      This used to be taken into account by the ordering of calls to cc-option
      from within the top-level Makefile, which was introduced by commit
      90ad4052 ("kbuild: avoid conflict between -ffunction-sections and
      -pg on gcc-4.7"). Unfortunately this was broken when the
      CONFIG_LD_DEAD_CODE_DATA_ELIMINATION cc-option check was moved to
      Kconfig in commit e85d1d65 ("kbuild: test dead code/data elimination
      support in Kconfig"), because the flags used by this check no longer
      include -pg.
      
      Fix this by not allowing CONFIG_LD_DEAD_CODE_DATA_ELIMINATION to be
      enabled at the same time as ftrace/CONFIG_FUNCTION_TRACER when building
      using GCC 4.7 or older.
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Fixes: e85d1d65 ("kbuild: test dead code/data elimination support in Kconfig")
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      16fd20aa
  18. 06 1月, 2019 1 次提交
    • M
      jump_label: move 'asm goto' support test to Kconfig · e9666d10
      Masahiro Yamada 提交于
      Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
      
      The jump label is controlled by HAVE_JUMP_LABEL, which is defined
      like this:
      
        #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
        # define HAVE_JUMP_LABEL
        #endif
      
      We can improve this by testing 'asm goto' support in Kconfig, then
      make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
      
      Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
      match to the real kernel capability.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      e9666d10
  19. 15 12月, 2018 1 次提交
  20. 01 12月, 2018 1 次提交
    • J
      psi: make disabling/enabling easier for vendor kernels · e0c27447
      Johannes Weiner 提交于
      Mel Gorman reports a hackbench regression with psi that would prohibit
      shipping the suse kernel with it default-enabled, but he'd still like
      users to be able to opt in at little to no cost to others.
      
      With the current combination of CONFIG_PSI and the psi_disabled bool set
      from the commandline, this is a challenge.  Do the following things to
      make it easier:
      
      1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
         to enable CONFIG_PSI in their kernel but leave the feature disabled
         unless a user requests it at boot-time.
      
         To avoid double negatives, rename psi_disabled= to psi=.
      
      2. Make psi_disabled a static branch to eliminate any branch costs
         when the feature is disabled.
      
      In terms of numbers before and after this patch, Mel says:
      
      : The following is a comparision using CONFIG_PSI=n as a baseline against
      : your patch and a vanilla kernel
      :
      :                          4.20.0-rc4             4.20.0-rc4             4.20.0-rc4
      :                 kconfigdisable-v1r1                vanilla        psidisable-v1r1
      : Amean     1       1.3100 (   0.00%)      1.3923 (  -6.28%)      1.3427 (  -2.49%)
      : Amean     3       3.8860 (   0.00%)      4.1230 *  -6.10%*      3.8860 (  -0.00%)
      : Amean     5       6.8847 (   0.00%)      8.0390 * -16.77%*      6.7727 (   1.63%)
      : Amean     7       9.9310 (   0.00%)     10.8367 *  -9.12%*      9.9910 (  -0.60%)
      : Amean     12     16.6577 (   0.00%)     18.2363 *  -9.48%*     17.1083 (  -2.71%)
      : Amean     18     26.5133 (   0.00%)     27.8833 *  -5.17%*     25.7663 (   2.82%)
      : Amean     24     34.3003 (   0.00%)     34.6830 (  -1.12%)     32.0450 (   6.58%)
      : Amean     30     40.0063 (   0.00%)     40.5800 (  -1.43%)     41.5087 (  -3.76%)
      : Amean     32     40.1407 (   0.00%)     41.2273 (  -2.71%)     39.9417 (   0.50%)
      :
      : It's showing that the vanilla kernel takes a hit (as the bisection
      : indicated it would) and that disabling PSI by default is reasonably
      : close in terms of performance for this particular workload on this
      : particular machine so;
      
      Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: NMel Gorman <mgorman@techsingularity.net>
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0c27447
  21. 20 11月, 2018 1 次提交
  22. 27 10月, 2018 2 次提交
  23. 02 10月, 2018 1 次提交
  24. 24 8月, 2018 1 次提交
  25. 23 8月, 2018 2 次提交
  26. 18 8月, 2018 1 次提交
  27. 09 8月, 2018 1 次提交
  28. 02 8月, 2018 3 次提交
  29. 18 7月, 2018 1 次提交
    • L
      kbuild: Add build salt to the kernel and modules · 9afb719e
      Laura Abbott 提交于
      In Fedora, the debug information is packaged separately (foo-debuginfo) and
      can be installed separately. There's been a long standing issue where only
      one version of a debuginfo info package can be installed at a time. There's
      been an effort for Fedora for parallel debuginfo to rectify this problem.
      
      Part of the requirement to allow parallel debuginfo to work is that build ids
      are unique between builds. The existing upstream rpm implementation ensures
      this by re-calculating the build-id using the version and release as a
      seed. This doesn't work 100% for the kernel because of the vDSO which is
      its own binary and doesn't get updated when embedded.
      
      Fix this by adding some data in an ELF note for both the kernel and modules.
      The data is controlled via a Kconfig option so distributions can set it
      to an appropriate value to ensure uniqueness between builds.
      Suggested-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NLaura Abbott <labbott@redhat.com>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      9afb719e
  30. 28 6月, 2018 1 次提交
  31. 25 6月, 2018 1 次提交