1. 12 5月, 2020 2 次提交
    • M
      kbuild: ensure full rebuild when the compiler is updated · 8b59cd81
      Masahiro Yamada 提交于
      Commit 21c54b77 ("kconfig: show compiler version text in the top
      comment") added the environment variable, CC_VERSION_TEXT in the comment
      of the top Kconfig file. It can detect the compiler update, and invoke
      the syncconfig because all environment variables referenced in Kconfig
      files are recorded in include/config/auto.conf.cmd
      
      This commit makes it a CONFIG option in order to ensure the full rebuild
      when the compiler is updated.
      
      This works like follows:
      
      include/config/kconfig.h contains "CONFIG_CC_VERSION_TEXT" in the comment
      block.
      
      The top Makefile specifies "-include $(srctree)/include/linux/kconfig.h"
      to guarantee it is included from all kernel source files.
      
      fixdep parses every source file and all headers included from it,
      searching for words prefixed with "CONFIG_". Then, fixdep finds
      CONFIG_CC_VERSION_TEXT in include/config/kconfig.h and adds
      include/config/cc/version/text.h into every .*.cmd file.
      
      When the compiler is updated, syncconfig is invoked because init/Kconfig
      contains the reference to the environment variable CC_VERTION_TEXT.
      CONFIG_CC_VERSION_TEXT is updated to the new version string, and
      include/config/cc/version/text.h is touched.
      
      In the next rebuild, Make will rebuild every files since the timestamp
      of include/config/cc/version/text.h is newer than that of target.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      8b59cd81
    • M
      kbuild: use $(CC_VERSION_TEXT) to evaluate CC_IS_GCC and CC_IS_CLANG · e33ae3ed
      Masahiro Yamada 提交于
      The result of '$(CC) --version | head -n 1' has already been computed
      by the top Makefile, and stored in the environment variable,
      CC_VERSION_TEXT.
      
      'echo' is cheaper than the two commands $(CC) and 'head' although this
      optimization is not noticeable level.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Tested-by: NNathan Chancellor <natechancellor@gmail.com>
      e33ae3ed
  2. 10 5月, 2020 1 次提交
    • L
      Stop the ad-hoc games with -Wno-maybe-initialized · 78a5255f
      Linus Torvalds 提交于
      We have some rather random rules about when we accept the
      "maybe-initialized" warnings, and when we don't.
      
      For example, we consider it unreliable for gcc versions < 4.9, but also
      if -O3 is enabled, or if optimizing for size.  And then various kernel
      config options disabled it, because they know that they trigger that
      warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES).
      
      And now gcc-10 seems to be introducing a lot of those warnings too, so
      it falls under the same heading as 4.9 did.
      
      At the same time, we have a very straightforward way to _enable_ that
      warning when wanted: use "W=2" to enable more warnings.
      
      So stop playing these ad-hoc games, and just disable that warning by
      default, with the known and straight-forward "if you want to work on the
      extra compiler warnings, use W=123".
      
      Would it be great to have code that is always so obvious that it never
      confuses the compiler whether a variable is used initialized or not?
      Yes, it would.  In a perfect world, the compilers would be smarter, and
      our source code would be simpler.
      
      That's currently not the world we live in, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78a5255f
  3. 08 4月, 2020 2 次提交
  4. 01 4月, 2020 1 次提交
  5. 31 3月, 2020 1 次提交
  6. 30 3月, 2020 1 次提交
  7. 27 3月, 2020 1 次提交
  8. 12 3月, 2020 1 次提交
    • M
      int128: fix __uint128_t compiler test in Kconfig · 3a7c7331
      Masahiro Yamada 提交于
      The support for __uint128_t is dependent on the target bit size.
      
      GCC that defaults to the 32-bit can still build the 64-bit kernel
      with -m64 flag passed.
      
      However, $(cc-option,-D__SIZEOF_INT128__=0) is evaluated against the
      default machine bit, which may not match to the kernel it is building.
      
      Theoretically, this could be evaluated separately for 64BIT/32BIT.
      
        config CC_HAS_INT128
                bool
                default !$(cc-option,$(m64-flag) -D__SIZEOF_INT128__=0) if 64BIT
                default !$(cc-option,$(m32-flag) -D__SIZEOF_INT128__=0)
      
      I simplified it more because the 32-bit compiler is unlikely to support
      __uint128_t.
      
      Fixes: c12d3362 ("int128: move __uint128_t compiler test to Kconfig")
      Reported-by: NGeorge Spelvin <lkml@sdf.org>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: NGeorge Spelvin <lkml@sdf.org>
      3a7c7331
  9. 06 3月, 2020 1 次提交
  10. 03 3月, 2020 2 次提交
    • Q
      kbuild: allow symbol whitelisting with TRIM_UNUSED_KSYMS · 1518c633
      Quentin Perret 提交于
      CONFIG_TRIM_UNUSED_KSYMS currently removes all unused exported symbols
      from ksymtab. This works really well when using in-tree drivers, but
      cannot be used in its current form if some of them are out-of-tree.
      
      Indeed, even if the list of symbols required by out-of-tree drivers is
      known at compile time, the only solution today to guarantee these don't
      get trimmed is to set CONFIG_TRIM_UNUSED_KSYMS=n. This not only wastes
      space, but also makes it difficult to control the ABI usable by vendor
      modules in distribution kernels such as Android. Being able to control
      the kernel ABI surface is particularly useful to ship a unique Generic
      Kernel Image (GKI) for all vendors, which is a first step in the
      direction of getting all vendors to contribute their code upstream.
      
      As such, attempt to improve the situation by enabling users to specify a
      symbol 'whitelist' at compile time. Any symbol specified in this
      whitelist will be kept exported when CONFIG_TRIM_UNUSED_KSYMS is set,
      even if it has no in-tree user. The whitelist is defined as a simple
      text file, listing symbols, one per line.
      Acked-by: NJessica Yu <jeyu@kernel.org>
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Tested-by: NMatthias Maennich <maennich@google.com>
      Reviewed-by: NMatthias Maennich <maennich@google.com>
      Signed-off-by: NQuentin Perret <qperret@google.com>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      1518c633
    • M
      kbuild: use KBUILD_DEFCONFIG as the fallback for DEFCONFIG_LIST · 2a86f661
      Masahiro Yamada 提交于
      Most of the Kconfig commands (except defconfig and all*config) read
      the .config file as a base set of CONFIG options.
      
      When it does not exist, the files in DEFCONFIG_LIST are searched in
      this order and loaded if found.
      
      I do not see much sense in the last two lines in DEFCONFIG_LIST.
      
      [1] ARCH_DEFCONFIG
      
      The entry for DEFCONFIG_LIST is guarded by 'depends on !UML'. So, the
      ARCH_DEFCONFIG definition in arch/x86/um/Kconfig is meaningless.
      
      arch/{sh,sparc,x86}/Kconfig define ARCH_DEFCONFIG depending on 32 or
      64 bit variant symbols. This is a little bit strange; ARCH_DEFCONFIG
      should be a fixed string because the base config file is loaded before
      the symbol evaluation stage.
      
      Using KBUILD_DEFCONFIG makes more sense because it is fixed before
      Kconfig is invoked. Fortunately, arch/{sh,sparc,x86}/Makefile define it
      in the same way, and it works as expected. Hence, replace ARCH_DEFCONFIG
      with "arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)".
      
      [2] arch/$(ARCH)/defconfig
      
      This file path is no longer valid. The defconfig files are always located
      in the arch configs/ directories.
      
        $ find arch -name defconfig | sort
        arch/alpha/configs/defconfig
        arch/arm64/configs/defconfig
        arch/csky/configs/defconfig
        arch/nds32/configs/defconfig
        arch/riscv/configs/defconfig
        arch/s390/configs/defconfig
        arch/unicore32/configs/defconfig
      
      The path arch/*/configs/defconfig is already covered by
      "arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)". So, this file path is
      not necessary.
      
      I moved the default KBUILD_DEFCONFIG to the top Makefile. Otherwise,
      the 7 architectures listed above would end up with endless loop of
      syncconfig.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      2a86f661
  11. 26 2月, 2020 1 次提交
  12. 21 2月, 2020 2 次提交
  13. 11 2月, 2020 1 次提交
  14. 22 1月, 2020 1 次提交
  15. 20 1月, 2020 1 次提交
    • J
      Revert "um: Enable CONFIG_CONSTRUCTORS" · 87c9366e
      Johannes Berg 提交于
      This reverts commit 786b2384 ("um: Enable CONFIG_CONSTRUCTORS").
      
      There are two issues with this commit, uncovered by Anton in tests
      on some (Debian) systems:
      
      1) I completely forgot to call any constructors if CONFIG_CONSTRUCTORS
         isn't set. Don't recall now if it just wasn't needed on my system, or
         if I never tested this case.
      
      2) With that fixed, it works - with CONFIG_CONSTRUCTORS *unset*. If I
         set CONFIG_CONSTRUCTORS, it fails again, which isn't totally
         unexpected since whatever wanted to run is likely to have to run
         before the kernel init etc. that calls the constructors in this case.
      
      Basically, some constructors that gcc emits (libc has?) need to run
      very early during init; the failure mode otherwise was that the ptrace
      fork test already failed:
      
      ----------------------
      $ ./linux mem=512M
      Core dump limits :
      	soft - 0
      	hard - NONE
      Checking that ptrace can change system call numbers...check_ptrace : child exited with exitcode 6, while expecting 0; status 0x67f
      Aborted
      ----------------------
      
      Thinking more about this, it's clear that we simply cannot support
      CONFIG_CONSTRUCTORS in UML. All the cases we need now (gcov, kasan)
      involve not use of the __attribute__((constructor)), but instead
      some constructor code/entry generated by gcc. Therefore, we cannot
      distinguish between kernel constructors and system constructors.
      
      Thus, revert this commit.
      
      Cc: stable@vger.kernel.org [5.4+]
      Fixes: 786b2384 ("um: Enable CONFIG_CONSTRUCTORS")
      Reported-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Acked-by: NAnton Ivanov <anton.ivanov@cambridgegreys.co.uk>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      87c9366e
  16. 14 1月, 2020 4 次提交
    • T
      lib/vdso: Prepare for time namespace support · 660fd04f
      Thomas Gleixner 提交于
      To support time namespaces in the vdso with a minimal impact on regular non
      time namespace affected tasks, the namespace handling needs to be hidden in
      a slow path.
      
      The most obvious place is vdso_seq_begin(). If a task belongs to a time
      namespace then the VVAR page which contains the system wide vdso data is
      replaced with a namespace specific page which has the same layout as the
      VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
      and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
      namespace handling path.
      
      The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
      update of the vdso data is in progress, is not really affecting regular
      tasks which are not part of a time namespace as the task is spin waiting
      for the update to finish and vdso_data->seq to become even again.
      
      If a time namespace task hits that code path, it invokes the corresponding
      time getter function which retrieves the real VVAR page, reads host time
      and then adds the offset for the requested clock which is stored in the
      special VVAR page.
      
      If VDSO time namespace support is disabled the whole magic is compiled out.
      
      Initial testing shows that the disabled case is almost identical to the
      host case which does not take the slow timens path. With the special timens
      page installed the performance hit is constant time and in the range of
      5-7%.
      
      For the vdso functions which are not using the sequence count an
      unconditional check for vdso_data->clock_mode is added which switches to
      the real vdso when the clock_mode is VCLOCK_TIMENS.
      
      [avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data
       pointer by CS_RAW offset.]
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrei Vagin <avagin@gmail.com>
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com
      
      660fd04f
    • A
      ns: Introduce Time Namespace · 769071ac
      Andrei Vagin 提交于
      Time Namespace isolates clock values.
      
      The kernel provides access to several clocks CLOCK_REALTIME,
      CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.
      
      CLOCK_REALTIME
            System-wide clock that measures real (i.e., wall-clock) time.
      
      CLOCK_MONOTONIC
            Clock that cannot be set and represents monotonic time since
            some unspecified starting point.
      
      CLOCK_BOOTTIME
            Identical to CLOCK_MONOTONIC, except it also includes any time
            that the system is suspended.
      
      For many users, the time namespace means the ability to changes date and
      time in a container (CLOCK_REALTIME). Providing per namespace notions of
      CLOCK_REALTIME would be complex with a massive overhead, but has a dubious
      value.
      
      But in the context of checkpoint/restore functionality, monotonic and
      boottime clocks become interesting. Both clocks are monotonic with
      unspecified starting points. These clocks are widely used to measure time
      slices and set timers. After restoring or migrating processes, it has to be
      guaranteed that they never go backward. In an ideal case, the behavior of
      these clocks should be the same as for a case when a whole system is
      suspended. All this means that it is required to set CLOCK_MONOTONIC and
      CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace
      offsets for clocks.
      
      A time namespace is similar to a pid namespace in the way how it is
      created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
      but doesn't set it to the current process. Then all children of the process
      will be born in the new time namespace, or a process can use the setns()
      system call to join a namespace.
      
      This scheme allows setting clock offsets for a namespace, before any
      processes appear in it.
      
      All available clone flags have been used, so CLONE_NEWTIME uses the highest
      bit of CSIGNAL. It means that it can be used only with the unshare() and
      the clone3() system calls.
      
      [ tglx: Adjusted paragraph about clone3() to reality and massaged the
        	changelog a bit. ]
      Co-developed-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NAndrei Vagin <avagin@gmail.com>
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://criu.org/Time_namespace
      Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
      Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
      
      769071ac
    • M
      bootconfig: Load boot config from the tail of initrd · 7684b858
      Masami Hiramatsu 提交于
      Load the extended boot config data from the tail of initrd
      image. If there is an SKC data there, it has
      [(u32)size][(u32)checksum] header (in really, this is a
      footer) at the end of initrd. If the checksum (simple sum
      of bytes) is match, this starts parsing it from there.
      
      Link: http://lkml.kernel.org/r/157867222435.17873.9936667353335606867.stgit@devnote2Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      7684b858
    • M
      bootconfig: Add Extra Boot Config support · 76db5a27
      Masami Hiramatsu 提交于
      Extra Boot Config (XBC) allows admin to pass a tree-structured
      boot configuration file when boot up the kernel. This extends
      the kernel command line in an efficient way.
      
      Boot config will contain some key-value commands, e.g.
      
      key.word = value1
      another.key.word = value2
      
      It can fold same keys with braces, also you can write array
      data. For example,
      
      key {
         word1 {
            setting1 = data
            setting2
         }
         word2.array = "val1", "val2"
      }
      
      User can access these key-value pair and tree structure via
      SKC APIs.
      
      Link: http://lkml.kernel.org/r/157867221257.17873.1775090991929862549.stgit@devnote2Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      76db5a27
  17. 13 12月, 2019 1 次提交
  18. 12 12月, 2019 1 次提交
  19. 05 12月, 2019 1 次提交
  20. 27 11月, 2019 1 次提交
    • E
      sysctl: Remove the sysctl system call · 61a47c1a
      Eric W. Biederman 提交于
      This system call has been deprecated almost since it was introduced, and
      in a survey of the linux distributions I can no longer find any of them
      that enable CONFIG_SYSCTL_SYSCALL.  The only indication that I can find
      that anyone might care is that a few of the defconfigs in the kernel
      enable CONFIG_SYSCTL_SYSCALL.  However this appears in only 31 of 414
      defconfigs in the kernel, so I suspect this symbols presence is simply
      because it is harmless to include rather than because it is necessary.
      
      As there appear to be no users of the sysctl system call, remove the
      code.  As this removes one of the few uses of the internal kernel mount
      of proc I hope this allows for even more simplifications of the proc
      filesystem.
      
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Anders Berg <anders.berg@lsi.com>
      Cc: Apelete Seketeli <apelete@seketeli.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Chee Nouk Phoon <cnphoon@altera.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Christian Ruppert <christian.ruppert@abilis.com>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Harvey Hunt <harvey.hunt@imgtec.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hongliang Tao <taohl@lemote.com>
      Cc: Hua Yan <yanh@lemote.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: John Crispin <blogic@openwrt.org>
      Cc: Jonas Jensen <jonas.jensen@gmail.com>
      Cc: Josh Boyer <jwboyer@gmail.com>
      Cc: Jun Nie <jun.nie@linaro.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Kevin Wells <kevin.wells@nxp.com>
      Cc: Kumar Gala <galak@codeaurora.org>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Noam Camus <noamc@ezchip.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Phil Edworthy <phil.edworthy@renesas.com>
      Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Roland Stigge <stigge@antcom.de>
      Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Scott Telford <stelford@cadence.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Steven J. Hill <Steven.Hill@imgtec.com>
      Cc: Tanmay Inamdar <tinamdar@apm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Wolfram Sang <w.sang@pengutronix.de>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      61a47c1a
  21. 17 11月, 2019 1 次提交
    • A
      int128: move __uint128_t compiler test to Kconfig · c12d3362
      Ard Biesheuvel 提交于
      In order to use 128-bit integer arithmetic in C code, the architecture
      needs to have declared support for it by setting ARCH_SUPPORTS_INT128,
      and it requires a version of the toolchain that supports this at build
      time. This is why all existing tests for ARCH_SUPPORTS_INT128 also test
      whether __SIZEOF_INT128__ is defined, since this is only the case for
      compilers that can support 128-bit integers.
      
      Let's fold this additional test into the Kconfig declaration of
      ARCH_SUPPORTS_INT128 so that we can also use the symbol in Makefiles,
      e.g., to decide whether a certain object needs to be included in the
      first place.
      
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c12d3362
  22. 14 11月, 2019 1 次提交
    • M
      kbuild: remove header compile test · fcbb8461
      Masahiro Yamada 提交于
      There are both positive and negative options about this feature.
      At first, I thought it was a good idea, but actually Linus stated a
      negative opinion (https://lkml.org/lkml/2019/9/29/227). I admit it
      is ugly and annoying.
      
      The baseline I'd like to keep is the compile-test of uapi headers.
      (Otherwise, kernel developers have no way to ensure the correctness
      of the exported headers.)
      
      I will maintain a small build rule in usr/include/Makefile.
      Remove the other header test functionality.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      fcbb8461
  23. 30 10月, 2019 1 次提交
    • J
      io_uring: replace workqueue usage with io-wq · 561fb04a
      Jens Axboe 提交于
      Drop various work-arounds we have for workqueues:
      
      - We no longer need the async_list for tracking sequential IO.
      
      - We don't have to maintain our own mm tracking/setting.
      
      - We don't need a separate workqueue for buffered writes. This didn't
        even work that well to begin with, as it was suboptimal for multiple
        buffered writers on multiple files.
      
      - We can properly cancel pending interruptible work. This fixes
        deadlocks with particularly socket IO, where we cannot cancel them
        when the io_uring is closed. Hence the ring will wait forever for
        these requests to complete, which may never happen. This is different
        from disk IO where we know requests will complete in a finite amount
        of time.
      
      - Due to being able to cancel work interruptible work that is already
        running, we can implement file table support for work. We need that
        for supporting system calls that add to a process file table.
      
      - It gets us one step closer to adding async support for any system
        call.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      561fb04a
  24. 16 9月, 2019 2 次提交
  25. 12 9月, 2019 2 次提交
  26. 10 9月, 2019 1 次提交
  27. 04 9月, 2019 1 次提交
    • M
      kbuild,arc: add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 for ARC · 15f5db60
      Masahiro Yamada 提交于
      arch/arc/Makefile overrides -O2 with -O3. This is the only user of
      ARCH_CFLAGS. There is no user of ARCH_CPPFLAGS or ARCH_AFLAGS.
      My plan is to remove ARCH_{CPP,A,C}FLAGS after refactoring the ARC
      Makefile.
      
      Currently, ARC has no way to enable -Wmaybe-uninitialized because both
      -O3 and -Os disable it. Enabling it will be useful for compile-testing.
      This commit allows allmodconfig (, which defaults to -O2) to enable it.
      
      Add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3=y to all the defconfig files
      in arch/arc/configs/ in order to keep the current config settings.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      15f5db60
  28. 03 9月, 2019 1 次提交
    • P
      sched/uclamp: Extend CPU's cgroup controller · 2480c093
      Patrick Bellasi 提交于
      The cgroup CPU bandwidth controller allows to assign a specified
      (maximum) bandwidth to the tasks of a group. However this bandwidth is
      defined and enforced only on a temporal base, without considering the
      actual frequency a CPU is running on. Thus, the amount of computation
      completed by a task within an allocated bandwidth can be very different
      depending on the actual frequency the CPU is running that task.
      The amount of computation can be affected also by the specific CPU a
      task is running on, especially when running on asymmetric capacity
      systems like Arm's big.LITTLE.
      
      With the availability of schedutil, the scheduler is now able
      to drive frequency selections based on actual task utilization.
      Moreover, the utilization clamping support provides a mechanism to
      bias the frequency selection operated by schedutil depending on
      constraints assigned to the tasks currently RUNNABLE on a CPU.
      
      Giving the mechanisms described above, it is now possible to extend the
      cpu controller to specify the minimum (or maximum) utilization which
      should be considered for tasks RUNNABLE on a cpu.
      This makes it possible to better defined the actual computational
      power assigned to task groups, thus improving the cgroup CPU bandwidth
      controller which is currently based just on time constraints.
      
      Extend the CPU controller with a couple of new attributes uclamp.{min,max}
      which allow to enforce utilization boosting and capping for all the
      tasks in a group.
      
      Specifically:
      
      - uclamp.min: defines the minimum utilization which should be considered
      	      i.e. the RUNNABLE tasks of this group will run at least at a
      	      minimum frequency which corresponds to the uclamp.min
      	      utilization
      
      - uclamp.max: defines the maximum utilization which should be considered
      	      i.e. the RUNNABLE tasks of this group will run up to a
      	      maximum frequency which corresponds to the uclamp.max
      	      utilization
      
      These attributes:
      
      a) are available only for non-root nodes, both on default and legacy
         hierarchies, while system wide clamps are defined by a generic
         interface which does not depends on cgroups. This system wide
         interface enforces constraints on tasks in the root node.
      
      b) enforce effective constraints at each level of the hierarchy which
         are a restriction of the group requests considering its parent's
         effective constraints. Root group effective constraints are defined
         by the system wide interface.
         This mechanism allows each (non-root) level of the hierarchy to:
         - request whatever clamp values it would like to get
         - effectively get only up to the maximum amount allowed by its parent
      
      c) have higher priority than task-specific clamps, defined via
         sched_setattr(), thus allowing to control and restrict task requests.
      
      Add two new attributes to the cpu controller to collect "requested"
      clamp values. Allow that at each non-root level of the hierarchy.
      Keep it simple by not caring now about "effective" values computation
      and propagation along the hierarchy.
      
      Update sysctl_sched_uclamp_handler() to use the newly introduced
      uclamp_mutex so that we serialize system default updates with cgroup
      relate updates.
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NMichal Koutny <mkoutny@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190822132811.31294-2-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2480c093
  29. 29 8月, 2019 1 次提交
  30. 22 8月, 2019 1 次提交
  31. 20 8月, 2019 1 次提交