1. 16 12月, 2020 2 次提交
    • L
      init/main: fix broken buffer_init when DEFERRED_STRUCT_PAGE_INIT set · ba8f3587
      Lin Feng 提交于
      In the booting phase if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set,
      we have following callchain:
      
      start_kernel
      ...
        mm_init
          mem_init
           memblock_free_all
             reset_all_zones_managed_pages
             free_low_memory_core_early
      ...
        buffer_init
          nr_free_buffer_pages
            zone->managed_pages
      ...
        rest_init
          kernel_init
            kernel_init_freeable
              page_alloc_init_late
                kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
                wait_for_completion(&pgdat_init_all_done_comp);
                ...
                files_maxfiles_init
      
      It's clear that buffer_init depends on zone->managed_pages, but it's reset
      in reset_all_zones_managed_pages after that pages are readded into
      zone->managed_pages, but when buffer_init runs this process is half done
      and most of them will finally be added till deferred_init_memmap done.  In
      large memory couting of nr_free_buffer_pages drifts too much, also
      drifting from kernels to kernels on same hardware.
      
      Fix is simple, it delays buffer_init run till deferred_init_memmap all
      done.
      
      But as corrected by this patch, max_buffer_heads becomes very large, the
      value is roughly as many as 4 times of totalram_pages, formula:
      max_buffer_heads = nrpages * (10%) * (PAGE_SIZE / sizeof(struct
      buffer_head));
      
      Say in a 64GB memory box we have 16777216 pages, then max_buffer_heads
      turns out to be roughly 67,108,864.  In common cases, should a buffer_head
      be mapped to one page/block(4KB)?  So max_buffer_heads never exceeds
      totalram_pages.  IMO it's likely to make buffer_heads_over_limit bool
      value alwasy false, then make codes 'if (buffer_heads_over_limit)' test in
      vmscan unnecessary.
      
      So this patch will change the original behavior related to
      buffer_heads_over_limit in vmscan since we used a half done value of
      zone->managed_pages before, or should we use a smaller factor(<10%) in
      previous formula.
      
      akpm: I think this is OK - the max_buffer_heads code is only needed on
      highmem machines, to prevent ZONE_NORMAL from being consumed by large
      amounts of buffer_heads attached to highmem pagecache.  This problem will
      not occur on 64-bit machines, so this feature's non-functionality on such
      machines is a feature, not a bug.
      
      Link: https://lkml.kernel.org/r/20201123110500.103523-1-linf@wangsu.comSigned-off-by: NLin Feng <linf@wangsu.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba8f3587
    • Z
      mm: fix page_owner initializing issue for arm32 · 7fb7ab6d
      Zhenhua Huang 提交于
      Page owner of pages used by page owner itself used is missing on arm32
      targets.  The reason is dummy_handle and failure_handle is not initialized
      correctly.  Buddy allocator is used to initialize these two handles.
      However, buddy allocator is not ready when page owner calls it.  This
      change fixed that by initializing page owner after buddy initialization.
      
      The working flow before and after this change are:
      original logic:
       1. allocated memory for page_ext(using memblock).
       2. invoke the init callback of page_ext_ops like page_owner(using buddy
          allocator).
       3. initialize buddy.
      
      after this change:
       1. allocated memory for page_ext(using memblock).
       2. initialize buddy.
       3. invoke the init callback of page_ext_ops like page_owner(using buddy
          allocator).
      
      with the change, failure/dummy_handle can get its correct value and page
      owner output for example has the one for page owner itself:
      
        Page allocated via order 2, mask 0x6202c0(GFP_USER|__GFP_NOWARN), pid 1006, ts 67278156558 ns
        PFN 543776 type Unmovable Block 531 type Unmovable Flags 0x0()
          init_page_owner+0x28/0x2f8
          invoke_init_callbacks_flatmem+0x24/0x34
          start_kernel+0x33c/0x5d8
      
      Link: https://lkml.kernel.org/r/1603104925-5888-1-git-send-email-zhenhuah@codeaurora.orgSigned-off-by: NZhenhua Huang <zhenhuah@codeaurora.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7fb7ab6d
  2. 01 12月, 2020 1 次提交
  3. 13 11月, 2020 1 次提交
  4. 10 10月, 2020 1 次提交
  5. 19 9月, 2020 2 次提交
  6. 01 9月, 2020 1 次提交
  7. 08 8月, 2020 1 次提交
  8. 05 8月, 2020 2 次提交
  9. 31 7月, 2020 4 次提交
  10. 21 7月, 2020 1 次提交
  11. 16 6月, 2020 1 次提交
    • G
      security: allow using Clang's zero initialization for stack variables · f0fe00d4
      glider@google.com 提交于
      In addition to -ftrivial-auto-var-init=pattern (used by
      CONFIG_INIT_STACK_ALL now) Clang also supports zero initialization for
      locals enabled by -ftrivial-auto-var-init=zero. The future of this flag
      is still being debated (see https://bugs.llvm.org/show_bug.cgi?id=45497).
      Right now it is guarded by another flag,
      -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang,
      which means it may not be supported by future Clang releases. Another
      possible resolution is that -ftrivial-auto-var-init=zero will persist
      (as certain users have already started depending on it), but the name
      of the guard flag will change.
      
      In the meantime, zero initialization has proven itself as a good
      production mitigation measure against uninitialized locals. Unlike pattern
      initialization, which has a higher chance of triggering existing bugs,
      zero initialization provides safe defaults for strings, pointers, indexes,
      and sizes. On the other hand, pattern initialization remains safer for
      return values. Chrome OS and Android are moving to using zero
      initialization for production builds.
      
      Performance-wise, the difference between pattern and zero initialization
      is usually negligible, although the generated code for zero
      initialization is more compact.
      
      This patch renames CONFIG_INIT_STACK_ALL to CONFIG_INIT_STACK_ALL_PATTERN
      and introduces another config option, CONFIG_INIT_STACK_ALL_ZERO, that
      enables zero initialization for locals if the corresponding flags are
      supported by Clang.
      
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Link: https://lore.kernel.org/r/20200616083435.223038-1-glider@google.comReviewed-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      f0fe00d4
  12. 09 6月, 2020 1 次提交
    • V
      kernel/sysctl: support setting sysctl parameters from kernel command line · 3db978d4
      Vlastimil Babka 提交于
      Patch series "support setting sysctl parameters from kernel command line", v3.
      
      This series adds support for something that seems like many people
      always wanted but nobody added it yet, so here's the ability to set
      sysctl parameters via kernel command line options in the form of
      sysctl.vm.something=1
      
      The important part is Patch 1.  The second, not so important part is an
      attempt to clean up legacy one-off parameters that do the same thing as
      a sysctl.  I don't want to remove them completely for compatibility
      reasons, but with generic sysctl support the idea is to remove the
      one-off param handlers and treat the parameters as aliases for the
      sysctl variants.
      
      I have identified several parameters that mention sysctl counterparts in
      Documentation/admin-guide/kernel-parameters.txt but there might be more.
      The conversion also has varying level of success:
      
       - numa_zonelist_order is converted in Patch 2 together with adding the
         necessary infrastructure. It's easy as it doesn't really do anything
         but warn on deprecated value these days.
      
       - hung_task_panic is converted in Patch 3, but there's a downside that
         now it only accepts 0 and 1, while previously it was any integer
         value
      
       - nmi_watchdog maps to two sysctls nmi_watchdog and hardlockup_panic,
         so there's no straighforward conversion possible
      
       - traceoff_on_warning is a flag without value and it would be required
         to handle that somehow in the conversion infractructure, which seems
         pointless for a single flag
      
      This patch (of 5):
      
      A recently proposed patch to add vm_swappiness command line parameter in
      addition to existing sysctl [1] made me wonder why we don't have a
      general support for passing sysctl parameters via command line.
      
      Googling found only somebody else wondering the same [2], but I haven't
      found any prior discussion with reasons why not to do this.
      
      Settings the vm_swappiness issue aside (the underlying issue might be
      solved in a different way), quick search of kernel-parameters.txt shows
      there are already some that exist as both sysctl and kernel parameter -
      hung_task_panic, nmi_watchdog, numa_zonelist_order, traceoff_on_warning.
      
      A general mechanism would remove the need to add more of those one-offs
      and might be handy in situations where configuration by e.g.
      /etc/sysctl.d/ is impractical.
      
      Hence, this patch adds a new parse_args() pass that looks for parameters
      prefixed by 'sysctl.' and tries to interpret them as writes to the
      corresponding sys/ files using an temporary in-kernel procfs mount.
      This mechanism was suggested by Eric W.  Biederman [3], as it handles
      all dynamically registered sysctl tables, even though we don't handle
      modular sysctls.  Errors due to e.g.  invalid parameter name or value
      are reported in the kernel log.
      
      The processing is hooked right before the init process is loaded, as
      some handlers might be more complicated than simple setters and might
      need some subsystems to be initialized.  At the moment the init process
      can be started and eventually execute a process writing to /proc/sys/
      then it should be also fine to do that from the kernel.
      
      Sysctls registered later on module load time are not set by this
      mechanism - it's expected that in such scenarios, setting sysctl values
      from userspace is practical enough.
      
      [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
      [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
      [3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Link: http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz
      Link: http://lkml.kernel.org/r/20200427180433.7029-2-vbabka@suse.czSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3db978d4
  13. 05 6月, 2020 1 次提交
    • C
      init: allow distribution configuration of default init · ada4ab7a
      Chris Down 提交于
      Some init systems (eg.  systemd) have init at their own paths, for
      example, /usr/lib/systemd/systemd.  A compatibility symlink to one of the
      hardcoded init paths is provided by another package, usually named
      something like systemd-sysvcompat or similar.
      
      Currently distro maintainers who are hands-off on the bootloader are more
      or less required to include those compatibility links as part of their
      base distribution, because it's hard to migrate away from them since
      there's a risk some users will not get the message to set init= on the
      kernel command line appropriately.
      
      Moreover, for distributions where the init system is something the
      distribution itself is opinionated about (eg.  Arch, which has systemd in
      the required `base` package), we could usually reasonably configure this
      ahead of time when building the distribution kernel.  However, we
      currently simply don't have any way to configure the kernel to do this.
      Here's an example discussion where removing sysvcompat was discussed by
      distro maintainers[0].
      
      This patch adds a new Kconfig tunable, CONFIG_DEFAULT_INIT, which if set
      is tried before the hardcoded fallback list.  So the order of precedence
      is now thus:
      
      1. init= on command line (on failure: panic)
      2. CONFIG_DEFAULT_INIT (on failure: try #3)
      3. Hardcoded fallback list (on failure: panic)
      
      This new config parameter will allow distribution maintainers to move away
      from these compatibility links safely, without having to worry that their
      users might not have the right init=.
      
      There are also two other benefits of this over having the distribution
      maintain a symlink:
      
      1. One of the value propositions over simply having distributions
         maintain a /sbin/init symlink via a package is that it also frees
         distributions which have a preferred default, but not mandatory, init
         system from having their package manager fight with their users for
         control of /{s,}bin/init.  Instead, the distribution simply makes
         their preference known in CONFIG_DEFAULT_INIT, and if the user
         installs another init system and uninstalls the default one they can
         still make use of /{s,}bin/init and friends for their own uses. This
         makes more cases Just Work(tm) without the user having to perform
         extra configuration via init=.
      
      2. Since before this we don't know which path the distribution actually
         _intends_ to serve init from, we don't pr_err if it is simply
         missing, and usually will just silently put the user in a /bin/sh
         shell. Now that the distribution can make a declaration of intent, we
         can be more vocal when this init system fails to launch for any
         reason, even if it's simply because no file exists at that location,
         speeding up the palaver of init/mount dependency/etc debugging a bit.
      
      [0]: https://lists.archlinux.org/pipermail/arch-dev-public/2019-January/029435.htmlSigned-off-by: NChris Down <chris@chrisdown.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Link: http://lkml.kernel.org/r/20200522160234.GA1487022@chrisdown.nameSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ada4ab7a
  14. 04 6月, 2020 1 次提交
    • D
      padata: initialize earlier · f1b192b1
      Daniel Jordan 提交于
      padata will soon initialize the system's struct pages in parallel, so it
      needs to be ready by page_alloc_init_late().
      
      The error return from padata_driver_init() triggers an initcall warning,
      so add a warning to padata_init() to avoid silent failure.
      Signed-off-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NJosh Triplett <josh@joshtriplett.org>
      Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Robert Elliott <elliott@hpe.com>
      Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Link: http://lkml.kernel.org/r/20200527173608.2885243-3-daniel.m.jordan@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1b192b1
  15. 15 5月, 2020 1 次提交
    • B
      x86: Fix early boot crash on gcc-10, third try · a9a3ed1e
      Borislav Petkov 提交于
      ... or the odyssey of trying to disable the stack protector for the
      function which generates the stack canary value.
      
      The whole story started with Sergei reporting a boot crash with a kernel
      built with gcc-10:
      
        Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary
        CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b3 #139
        Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013
        Call Trace:
          dump_stack
          panic
          ? start_secondary
          __stack_chk_fail
          start_secondary
          secondary_startup_64
        -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary
      
      This happens because gcc-10 tail-call optimizes the last function call
      in start_secondary() - cpu_startup_entry() - and thus emits a stack
      canary check which fails because the canary value changes after the
      boot_init_stack_canary() call.
      
      To fix that, the initial attempt was to mark the one function which
      generates the stack canary with:
      
        __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused)
      
      however, using the optimize attribute doesn't work cumulatively
      as the attribute does not add to but rather replaces previously
      supplied optimization options - roughly all -fxxx options.
      
      The key one among them being -fno-omit-frame-pointer and thus leading to
      not present frame pointer - frame pointer which the kernel needs.
      
      The next attempt to prevent compilers from tail-call optimizing
      the last function call cpu_startup_entry(), shy of carving out
      start_secondary() into a separate compilation unit and building it with
      -fno-stack-protector, was to add an empty asm("").
      
      This current solution was short and sweet, and reportedly, is supported
      by both compilers but we didn't get very far this time: future (LTO?)
      optimization passes could potentially eliminate this, which leads us
      to the third attempt: having an actual memory barrier there which the
      compiler cannot ignore or move around etc.
      
      That should hold for a long time, but hey we said that about the other
      two solutions too so...
      Reported-by: NSergei Trofimovich <slyfox@gentoo.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NKalle Valo <kvalo@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org
      a9a3ed1e
  16. 12 5月, 2020 1 次提交
  17. 06 5月, 2020 1 次提交
  18. 11 4月, 2020 1 次提交
  19. 04 3月, 2020 1 次提交
  20. 21 2月, 2020 4 次提交
  21. 11 2月, 2020 1 次提交
  22. 06 2月, 2020 2 次提交
  23. 05 2月, 2020 1 次提交
  24. 01 2月, 2020 4 次提交
  25. 14 1月, 2020 3 次提交