1. 08 7月, 2019 1 次提交
  2. 15 6月, 2019 1 次提交
  3. 31 5月, 2019 1 次提交
  4. 25 5月, 2019 1 次提交
  5. 21 5月, 2019 2 次提交
  6. 19 5月, 2019 1 次提交
  7. 15 5月, 2019 10 次提交
  8. 06 5月, 2019 1 次提交
  9. 30 4月, 2019 1 次提交
  10. 29 4月, 2019 2 次提交
    • J
      init/config: Do not select BUILD_BIN2C for IKCONFIG · bc0c6045
      Joel Fernandes (Google) 提交于
      Since commit 13610aa9 ("kernel/configs: use .incbin directive to
      embed config_data.gz"), IKCONFIG no longer uses BUILD_BIN2C so prevent
      it from being selected in Kconfig.
      Reviewed-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc0c6045
    • J
      Provide in-kernel headers to make extending kernel easier · 43d8ce9d
      Joel Fernandes (Google) 提交于
      Introduce in-kernel headers which are made available as an archive
      through proc (/proc/kheaders.tar.xz file). This archive makes it
      possible to run eBPF and other tracing programs that need to extend the
      kernel for tracing purposes without any dependency on the file system
      having headers.
      
      A github PR is sent for the corresponding BCC patch at:
      https://github.com/iovisor/bcc/pull/2312
      
      On Android and embedded systems, it is common to switch kernels but not
      have kernel headers available on the file system. Further once a
      different kernel is booted, any headers stored on the file system will
      no longer be useful. This is an issue even well known to distros.
      By storing the headers as a compressed archive within the kernel, we can
      avoid these issues that have been a hindrance for a long time.
      
      The best way to use this feature is by building it in. Several users
      have a need for this, when they switch debug kernels, they do not want to
      update the filesystem or worry about it where to store the headers on
      it. However, the feature is also buildable as a module in case the user
      desires it not being part of the kernel image. This makes it possible to
      load and unload the headers from memory on demand. A tracing program can
      load the module, do its operations, and then unload the module to save
      kernel memory. The total memory needed is 3.3MB.
      
      By having the archive available at a fixed location independent of
      filesystem dependencies and conventions, all debugging tools can
      directly refer to the fixed location for the archive, without concerning
      with where the headers on a typical filesystem which significantly
      simplifies tooling that needs kernel headers.
      
      The code to read the headers is based on /proc/config.gz code and uses
      the same technique to embed the headers.
      
      Other approaches were discussed such as having an in-memory mountable
      filesystem, but that has drawbacks such as requiring an in-kernel xz
      decompressor which we don't have today, and requiring usage of 42 MB of
      kernel memory to host the decompressed headers at anytime. Also this
      approach is simpler than such approaches.
      Reviewed-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43d8ce9d
  11. 20 4月, 2019 2 次提交
    • K
      random: move rand_initialize() earlier · d5553523
      Kees Cook 提交于
      Right now rand_initialize() is run as an early_initcall(), but it only
      depends on timekeeping_init() (for mixing ktime_get_real() into the
      pools). However, the call to boot_init_stack_canary() for stack canary
      initialization runs earlier, which triggers a warning at boot:
      
      random: get_random_bytes called from start_kernel+0x357/0x548 with crng_init=0
      
      Instead, this moves rand_initialize() to after timekeeping_init(), and moves
      canary initialization here as well.
      
      Note that this warning may still remain for machines that do not have
      UEFI RNG support (which initializes the RNG pools during setup_arch()),
      or for x86 machines without RDRAND (or booting without "random.trust=on"
      or CONFIG_RANDOM_TRUST_CPU=y).
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d5553523
    • D
      init: initialize jump labels before command line option parsing · 6041186a
      Dan Williams 提交于
      When a module option, or core kernel argument, toggles a static-key it
      requires jump labels to be initialized early.  While x86, PowerPC, and
      ARM64 arrange for jump_label_init() to be called before parse_args(),
      ARM does not.
      
        Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
        page_alloc_shuffle+0x12c/0x1ac
        static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
        before call to jump_label_init()
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Not tainted
        5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1
        Hardware name: ARM Integrator/CP (Device Tree)
        [<c0011c68>] (unwind_backtrace) from [<c000ec48>] (show_stack+0x10/0x18)
        [<c000ec48>] (show_stack) from [<c07e9710>] (dump_stack+0x18/0x24)
        [<c07e9710>] (dump_stack) from [<c001bb1c>] (__warn+0xe0/0x108)
        [<c001bb1c>] (__warn) from [<c001bb88>] (warn_slowpath_fmt+0x44/0x6c)
        [<c001bb88>] (warn_slowpath_fmt) from [<c0b0c4a8>]
        (page_alloc_shuffle+0x12c/0x1ac)
        [<c0b0c4a8>] (page_alloc_shuffle) from [<c0b0c550>] (shuffle_store+0x28/0x48)
        [<c0b0c550>] (shuffle_store) from [<c003e6a0>] (parse_args+0x1f4/0x350)
        [<c003e6a0>] (parse_args) from [<c0ac3c00>] (start_kernel+0x1c0/0x488)
      
      Move the fallback call to jump_label_init() to occur before
      parse_args().
      
      The redundant calls to jump_label_init() in other archs are left intact
      in case they have static key toggling use cases that are even earlier
      than option parsing.
      
      Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NGuenter Roeck <groeck@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Russell King <rmk@armlinux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6041186a
  12. 19 4月, 2019 1 次提交
  13. 09 4月, 2019 1 次提交
    • S
      treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively · d75f773c
      Sakari Ailus 提交于
      %pF and %pf are functionally equivalent to %pS and %ps conversion
      specifiers. The former are deprecated, therefore switch the current users
      to use the preferred variant.
      
      The changes have been produced by the following command:
      
      	git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
      	while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done
      
      And verifying the result.
      
      Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-um@lists.infradead.org
      Cc: xen-devel@lists.xenproject.org
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Cc: linux-mmc@vger.kernel.org
      Cc: linux-nvdimm@lists.01.org
      Cc: linux-pci@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: linux-mm@kvack.org
      Cc: ceph-devel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: NSakari Ailus <sakari.ailus@linux.intel.com>
      Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
      Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
      Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      d75f773c
  14. 21 3月, 2019 1 次提交
  15. 13 3月, 2019 1 次提交
    • M
      init/main: add checks for the return value of memblock_alloc*() · f5c7310a
      Mike Rapoport 提交于
      Add panic() calls if memblock_alloc() returns NULL.
      
      The panic() format duplicates the one used by memblock itself and in
      order to avoid explosion with long parameters list replace open coded
      allocation size calculations with a local variable.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-18-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5c7310a
  16. 08 3月, 2019 1 次提交
  17. 07 3月, 2019 1 次提交
    • A
      time: Make VIRT_CPU_ACCOUNTING_GEN depend on GENERIC_CLOCKEVENTS · 041a1574
      Arnd Bergmann 提交于
      Moving the CONTEXT_TRACKING Kconfig option into kernel/time/Kconfig added
      an implicit dependency on the surrounding GENERIC_CLOCKEVENTS option, but
      this is not always enabled when it is possible to select
      VIRT_CPU_ACCOUNTING_GEN:
      
      WARNING: unmet direct dependencies detected for CONTEXT_TRACKING
        Depends on [n]: GENERIC_CLOCKEVENTS [=n]
        Selected by [y]:
        - VIRT_CPU_ACCOUNTING_GEN [=y] && <choice> && HAVE_CONTEXT_TRACKING [=y] && HAVE_VIRT_CPU_ACCOUNTING_GEN [=y]
      
      Platforms without GENERIC_CLOCKEVENTS are rare enough so that corner case
      can be just ignored. Make it a dependency for VIRT_CPU_ACCOUNTING_GEN to
      simplify the configuration.
      
      Fixes: a4cffdad ("time: Move CONTEXT_TRACKING to kernel/time/Kconfig")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: "Paul E . McKenney" <paulmck@linux.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: https://lkml.kernel.org/r/20190304200202.1163250-1-arnd@arndb.de
      041a1574
  18. 06 3月, 2019 1 次提交
    • A
      mm: replace all open encodings for NUMA_NO_NODE · 98fa15f3
      Anshuman Khandual 提交于
      Patch series "Replace all open encodings for NUMA_NO_NODE", v3.
      
      All these places for replacement were found by running the following
      grep patterns on the entire kernel code.  Please let me know if this
      might have missed some instances.  This might also have replaced some
      false positives.  I will appreciate suggestions, inputs and review.
      
      1. git grep "nid == -1"
      2. git grep "node == -1"
      3. git grep "nid = -1"
      4. git grep "node = -1"
      
      This patch (of 2):
      
      At present there are multiple places where invalid node number is
      encoded as -1.  Even though implicitly understood it is always better to
      have macros in there.  Replace these open encodings for an invalid node
      number with the global macro NUMA_NO_NODE.  This helps remove NUMA
      related assumptions like 'invalid node' from various places redirecting
      them to a common definition.
      
      Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[ixgbe]
      Acked-by: Jens Axboe <axboe@kernel.dk>			[mtip32xx]
      Acked-by: Vinod Koul <vkoul@kernel.org>			[dmaengine.c]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Acked-by: Doug Ledford <dledford@redhat.com>		[drivers/infiniband]
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98fa15f3
  19. 04 3月, 2019 1 次提交
  20. 28 2月, 2019 1 次提交
    • J
      Add io_uring IO interface · 2b188cc1
      Jens Axboe 提交于
      The submission queue (SQ) and completion queue (CQ) rings are shared
      between the application and the kernel. This eliminates the need to
      copy data back and forth to submit and complete IO.
      
      IO submissions use the io_uring_sqe data structure, and completions
      are generated in the form of io_uring_cqe data structures. The SQ
      ring is an index into the io_uring_sqe array, which makes it possible
      to submit a batch of IOs without them being contiguous in the ring.
      The CQ ring is always contiguous, as completion events are inherently
      unordered, and hence any io_uring_cqe entry can point back to an
      arbitrary submission.
      
      Two new system calls are added for this:
      
      io_uring_setup(entries, params)
      	Sets up an io_uring instance for doing async IO. On success,
      	returns a file descriptor that the application can mmap to
      	gain access to the SQ ring, CQ ring, and io_uring_sqes.
      
      io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
      	Initiates IO against the rings mapped to this fd, or waits for
      	them to complete, or both. The behavior is controlled by the
      	parameters passed in. If 'to_submit' is non-zero, then we'll
      	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
      	kernel will wait for 'min_complete' events, if they aren't
      	already available. It's valid to set IORING_ENTER_GETEVENTS
      	and 'min_complete' == 0 at the same time, this allows the
      	kernel to return already completed events without waiting
      	for them. This is useful only for polling, as for IRQ
      	driven IO, the application can just check the CQ ring
      	without entering the kernel.
      
      With this setup, it's possible to do async IO with a single system
      call. Future developments will enable polled IO with this interface,
      and polled submission as well. The latter will enable an application
      to do IO without doing ANY system calls at all.
      
      For IRQ driven IO, an application only needs to enter the kernel for
      completions if it wants to wait for them to occur.
      
      Each io_uring is backed by a workqueue, to support buffered async IO
      as well. We will only punt to an async context if the command would
      need to wait for IO on the device side. Any data that can be accessed
      directly in the page cache is done inline. This avoids the slowness
      issue of usual threadpools, since cached data is accessed as quickly
      as a sync interface.
      
      Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.cReviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2b188cc1
  21. 27 2月, 2019 1 次提交
  22. 22 2月, 2019 1 次提交
  23. 13 2月, 2019 1 次提交
    • Q
      Revert "mm: use early_pfn_to_nid in page_ext_init" · 2f1ee091
      Qian Cai 提交于
      This reverts commit fe53ca54 ("mm: use early_pfn_to_nid in
      page_ext_init").
      
      When booting a system with "page_owner=on",
      
      start_kernel
        page_ext_init
          invoke_init_callbacks
            init_section_page_ext
              init_page_owner
                init_early_allocated_pages
                  init_zones_in_node
                    init_pages_in_zone
                      lookup_page_ext
                        page_to_nid
      
      The issue here is that page_to_nid() will not work since some page flags
      have no node information until later in page_alloc_init_late() due to
      DEFERRED_STRUCT_PAGE_INIT.  Hence, it could trigger an out-of-bounds
      access with an invalid nid.
      
        UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
        index 7 is out of range for type 'zone [5]'
      
      Also, kernel will panic since flags were poisoned earlier with,
      
      CONFIG_DEBUG_VM_PGFLAGS=y
      CONFIG_NODE_NOT_IN_PAGE_FLAGS=n
      
      start_kernel
        setup_arch
          pagetable_init
            paging_init
              sparse_init
                sparse_init_nid
                  memblock_alloc_try_nid_raw
      
      It did not handle it well in init_pages_in_zone() which ends up calling
      page_to_nid().
      
        page:ffffea0004200000 is uninitialized and poisoned
        raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
        raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
        page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
        page_owner info is not active (free page?)
        kernel BUG at include/linux/mm.h:990!
        RIP: 0010:init_page_owner+0x486/0x520
      
      This means that assumptions behind commit fe53ca54 ("mm: use
      early_pfn_to_nid in page_ext_init") are incomplete.  Therefore, revert
      the commit for now.  A proper way to move the page_owner initialization
      to sooner is to hook into memmap initialization.
      
      Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
      Acked-by: NMichal Hocko <mhocko@kernel.org>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f1ee091
  24. 04 2月, 2019 3 次提交
    • E
      sched/core: Convert task_struct.stack_refcount to refcount_t · f0b89d39
      Elena Reshetova 提交于
      atomic_t variables are currently used to implement reference
      counters with the following properties:
      
       - counter is initialized to 1 using atomic_set()
       - a resource is freed upon counter reaching zero
       - once counter reaches zero, its further
         increments aren't allowed
       - counter schema uses basic atomic operations
         (set, inc, inc_not_zero, dec_and_test, etc.)
      
      Such atomic variables should be converted to a newly provided
      refcount_t type and API that prevents accidental counter overflows
      and underflows. This is important since overflows and underflows
      can lead to use-after-free situation and be exploitable.
      
      The variable task_struct.stack_refcount is used as pure reference counter.
      Convert it to refcount_t and fix up the operations.
      
      ** Important note for maintainers:
      
      Some functions from refcount_t API defined in lib/refcount.c
      have different memory ordering guarantees than their atomic
      counterparts.
      
      The full comparison can be seen in
      https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
      in state to be merged to the documentation tree.
      
      Normally the differences should not matter since refcount_t provides
      enough guarantees to satisfy the refcounting use cases, but in
      some rare cases it might matter.
      
      Please double check that you don't have some undocumented
      memory guarantees for this variable usage.
      
      For the task_struct.stack_refcount it might make a difference
      in following places:
      
       - try_get_task_stack(): increment in refcount_inc_not_zero() only
         guarantees control dependency on success vs. fully ordered
         atomic counterpart
       - put_task_stack(): decrement in refcount_dec_and_test() only
         provides RELEASE ordering and control dependency on success
         vs. fully ordered atomic counterpart
      Suggested-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
      Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
      Reviewed-by: NAndrea Parri <andrea.parri@amarulasolutions.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: viro@zeniv.linux.org.uk
      Link: https://lkml.kernel.org/r/1547814450-18902-6-git-send-email-elena.reshetova@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f0b89d39
    • E
      sched/core: Convert task_struct.usage to refcount_t · ec1d2819
      Elena Reshetova 提交于
      atomic_t variables are currently used to implement reference
      counters with the following properties:
      
       - counter is initialized to 1 using atomic_set()
       - a resource is freed upon counter reaching zero
       - once counter reaches zero, its further
         increments aren't allowed
       - counter schema uses basic atomic operations
         (set, inc, inc_not_zero, dec_and_test, etc.)
      
      Such atomic variables should be converted to a newly provided
      refcount_t type and API that prevents accidental counter overflows
      and underflows. This is important since overflows and underflows
      can lead to use-after-free situation and be exploitable.
      
      The variable task_struct.usage is used as pure reference counter.
      Convert it to refcount_t and fix up the operations.
      
      ** Important note for maintainers:
      
      Some functions from refcount_t API defined in lib/refcount.c
      have different memory ordering guarantees than their atomic
      counterparts.
      
      The full comparison can be seen in
      https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
      in state to be merged to the documentation tree.
      
      Normally the differences should not matter since refcount_t provides
      enough guarantees to satisfy the refcounting use cases, but in
      some rare cases it might matter.
      
      Please double check that you don't have some undocumented
      memory guarantees for this variable usage.
      
      For the task_struct.usage it might make a difference
      in following places:
      
       - put_task_struct(): decrement in refcount_dec_and_test() only
         provides RELEASE ordering and control dependency on success
         vs. fully ordered atomic counterpart
      Suggested-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
      Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
      Reviewed-by: NAndrea Parri <andrea.parri@amarulasolutions.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: viro@zeniv.linux.org.uk
      Link: https://lkml.kernel.org/r/1547814450-18902-5-git-send-email-elena.reshetova@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ec1d2819
    • E
      sched/core: Convert signal_struct.sigcnt to refcount_t · 60d4de3f
      Elena Reshetova 提交于
      atomic_t variables are currently used to implement reference
      counters with the following properties:
      
       - counter is initialized to 1 using atomic_set()
       - a resource is freed upon counter reaching zero
       - once counter reaches zero, its further
         increments aren't allowed
       - counter schema uses basic atomic operations
         (set, inc, inc_not_zero, dec_and_test, etc.)
      
      Such atomic variables should be converted to a newly provided
      refcount_t type and API that prevents accidental counter overflows
      and underflows. This is important since overflows and underflows
      can lead to use-after-free situation and be exploitable.
      
      The variable signal_struct.sigcnt is used as pure reference counter.
      Convert it to refcount_t and fix up the operations.
      
      ** Important note for maintainers:
      
      Some functions from refcount_t API defined in lib/refcount.c
      have different memory ordering guarantees than their atomic
      counterparts.
      
      The full comparison can be seen in
      https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
      in state to be merged to the documentation tree.
      
      Normally the differences should not matter since refcount_t provides
      enough guarantees to satisfy the refcounting use cases, but in
      some rare cases it might matter.
      
      Please double check that you don't have some undocumented
      memory guarantees for this variable usage.
      
      For the signal_struct.sigcnt it might make a difference
      in following places:
      
       - put_signal_struct(): decrement in refcount_dec_and_test() only
         provides RELEASE ordering and control dependency on success
         vs. fully ordered atomic counterpart
      Suggested-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
      Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
      Reviewed-by: NAndrea Parri <andrea.parri@amarulasolutions.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: viro@zeniv.linux.org.uk
      Link: https://lkml.kernel.org/r/1547814450-18902-3-git-send-email-elena.reshetova@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      60d4de3f
  25. 02 2月, 2019 2 次提交