1. 22 1月, 2020 1 次提交
    • M
      genirq, sched/isolation: Isolate from handling managed interrupts · 11ea68f5
      Ming Lei 提交于
      The affinity of managed interrupts is completely handled in the kernel and
      cannot be changed via the /proc/irq/* interfaces from user space. As the
      kernel tries to spread out interrupts evenly accross CPUs on x86 to prevent
      vector exhaustion, it can happen that a managed interrupt whose affinity
      mask contains both isolated and housekeeping CPUs is routed to an isolated
      CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts
      on the isolated CPU.
      
      Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding
      logic in the interrupt affinity selection code.
      
      The subparameter indicates to the interrupt affinity selection logic that
      it should try to avoid the above scenario.
      
      This isolation is best effort and only effective if the automatically
      assigned interrupt mask of a device queue contains isolated and
      housekeeping CPUs. If housekeeping CPUs are online then such interrupts are
      directed to the housekeeping CPU so that IO submitted on the housekeeping
      CPU cannot disturb the isolated CPU.
      
      If a queue's affinity mask contains only isolated CPUs then this parameter
      has no effect on the interrupt routing decision, though interrupts are only
      happening when tasks running on those isolated CPUs submit IO. IO submitted
      on housekeeping CPUs has no influence on those queues.
      
      If the affinity mask contains both housekeeping and isolated CPUs, but none
      of the contained housekeeping CPUs is online, then the interrupt is also
      routed to an isolated CPU. Interrupts are only delivered when one of the
      isolated CPUs in the affinity mask submits IO. If one of the contained
      housekeeping CPUs comes online, the CPU hotplug logic migrates the
      interrupt automatically back to the upcoming housekeeping CPU. Depending on
      the type of interrupt controller, this can require that at least one
      interrupt is delivered to the isolated CPU in order to complete the
      migration.
      
      [ tglx: Removed unused parameter, added and edited comments/documentation
        	and rephrased the changelog so it contains more details. ]
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20200120091625.17912-1-ming.lei@redhat.com
      11ea68f5
  2. 05 1月, 2020 2 次提交
    • P
      Documentation: riscv: add patch acceptance guidelines · 0e194d9d
      Paul Walmsley 提交于
      Formalize, in kernel documentation, the patch acceptance policy for
      arch/riscv.  In summary, it states that as maintainers, we plan to
      only accept patches for new modules or extensions that have been
      frozen or ratified by the RISC-V Foundation.
      
      We've been following these guidelines for the past few months.  In the
      meantime, we've received quite a bit of feedback that it would be
      helpful to have these guidelines formally documented.
      
      Based on a suggestion from Matthew Wilcox, we also add a link to this
      file to Documentation/process/index.rst, to make this document easier
      to find.  The format of this document has also been changed to align
      to the format outlined in the maintainer entry profiles, in accordance
      with comments from Jon Corbet and Dan Williams.
      Signed-off-by: NPaul Walmsley <paul.walmsley@sifive.com>
      Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Krste Asanovic <krste@berkeley.edu>
      Cc: Andrew Waterman <waterman@eecs.berkeley.edu>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      0e194d9d
    • A
      kcov: fix struct layout for kcov_remote_arg · a69b83e1
      Andrey Konovalov 提交于
      Make the layout of kcov_remote_arg the same for 32-bit and 64-bit code.
      This makes it more convenient to write userspace apps that can be
      compiled into 32-bit or 64-bit binaries and still work with the same
      64-bit kernel.
      
      Also use proper __u32 types in uapi headers instead of unsigned ints.
      
      Link: http://lkml.kernel.org/r/9e91020876029cfefc9211ff747685eba9536426.1575638983.git.andreyknvl@google.com
      Fixes: eec028c9 ("kcov: remote coverage support")
      Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Acked-by: NMarco Elver <elver@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Felipe Balbi <balbi@kernel.org>
      Cc: Chunfeng Yun <chunfeng.yun@mediatek.com>
      Cc: "Jacky . Cao @ sony . com" <Jacky.Cao@sony.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a69b83e1
  3. 03 1月, 2020 1 次提交
  4. 31 12月, 2019 1 次提交
  5. 24 12月, 2019 2 次提交
  6. 22 12月, 2019 1 次提交
  7. 21 12月, 2019 2 次提交
    • M
      kbuild: clarify the difference between obj-y and obj-m w.r.t. descending · 28f94a44
      Masahiro Yamada 提交于
      Kbuild descends into a directory by either 'y' or 'm', but there is an
      important difference.
      
      Kbuild combines the built-in objects into built-in.a in each directory.
      The built-in.a in the directory visited by obj-y is merged into the
      built-in.a in the parent directory. This merge happens recursively
      when Kbuild is ascending back towards the top directory, then built-in
      objects are linked into vmlinux eventually. This works properly only
      when the Makefile specifying obj-y is reachable by the chain of obj-y.
      
      On the other hand, Kbuild does not take built-in.a from the directory
      visited by obj-m. This it, all the objects in that directory are
      supposed to be modular. If Kbuild descends into a directory by obj-m,
      but the Makefile in the sub-directory specifies obj-y, those objects
      are just left orphan.
      
      The current statement "Kbuild only uses this information to decide that
      it needs to visit the directory" is misleading. Clarify the difference.
      Reported-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: NJohan Hovold <johan@kernel.org>
      28f94a44
    • L
      platform/mellanox: fix the mlx-bootctl sysfs · 77dcc95e
      Liming Sun 提交于
      This is a follow-up commit for the sysfs attributes to change
      from DRIVER_ATTR to DEVICE_ATTR according to some initial comments.
      In such case, it's better to point the sysfs path to the device
      itself instead of the driver. The ABI document is also updated.
      
      Fixes: 79e29cb8 ("platform/mellanox: Add bootctl driver for Mellanox BlueField Soc")
      Signed-off-by: NLiming Sun <lsun@mellanox.com>
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      77dcc95e
  8. 18 12月, 2019 2 次提交
  9. 13 12月, 2019 2 次提交
  10. 12 12月, 2019 2 次提交
  11. 11 12月, 2019 2 次提交
  12. 10 12月, 2019 8 次提交
  13. 09 12月, 2019 2 次提交
  14. 08 12月, 2019 4 次提交
  15. 05 12月, 2019 2 次提交
    • A
      kcov: remote coverage support · eec028c9
      Andrey Konovalov 提交于
      Patch series " kcov: collect coverage from usb and vhost", v3.
      
      This patchset extends kcov to allow collecting coverage from backgound
      kernel threads.  This extension requires custom annotations for each of
      the places where coverage collection is desired.  This patchset
      implements this for hub events in the USB subsystem and for vhost
      workers.  See the first patch description for details about the kcov
      extension.  The other two patches apply this kcov extension to USB and
      vhost.
      
      Examples of other subsystems that might potentially benefit from this
      when custom annotations are added (the list is based on
      process_one_work() callers for bugs recently reported by syzbot):
      
      1. fs: writeback wb_workfn() worker,
      2. net: addrconf_dad_work()/addrconf_verify_work() workers,
      3. net: neigh_periodic_work() worker,
      4. net/p9: p9_write_work()/p9_read_work() workers,
      5. block: blk_mq_run_work_fn() worker.
      
      These patches have been used to enable coverage-guided USB fuzzing with
      syzkaller for the last few years, see the details here:
      
        https://github.com/google/syzkaller/blob/master/docs/linux/external_fuzzing_usb.md
      
      This patchset has been pushed to the public Linux kernel Gerrit
      instance:
      
        https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/1524
      
      This patch (of 3):
      
      Add background thread coverage collection ability to kcov.
      
      With KCOV_ENABLE coverage is collected only for syscalls that are issued
      from the current process.  With KCOV_REMOTE_ENABLE it's possible to
      collect coverage for arbitrary parts of the kernel code, provided that
      those parts are annotated with kcov_remote_start()/kcov_remote_stop().
      
      This allows to collect coverage from two types of kernel background
      threads: the global ones, that are spawned during kernel boot in a
      limited number of instances (e.g.  one USB hub_event() worker thread is
      spawned per USB HCD); and the local ones, that are spawned when a user
      interacts with some kernel interface (e.g.  vhost workers).
      
      To enable collecting coverage from a global background thread, a unique
      global handle must be assigned and passed to the corresponding
      kcov_remote_start() call.  Then a userspace process can pass a list of
      such handles to the KCOV_REMOTE_ENABLE ioctl in the handles array field
      of the kcov_remote_arg struct.  This will attach the used kcov device to
      the code sections, that are referenced by those handles.
      
      Since there might be many local background threads spawned from
      different userspace processes, we can't use a single global handle per
      annotation.  Instead, the userspace process passes a non-zero handle
      through the common_handle field of the kcov_remote_arg struct.  This
      common handle gets saved to the kcov_handle field in the current
      task_struct and needs to be passed to the newly spawned threads via
      custom annotations.  Those threads should in turn be annotated with
      kcov_remote_start()/kcov_remote_stop().
      
      Internally kcov stores handles as u64 integers.  The top byte of a
      handle is used to denote the id of a subsystem that this handle belongs
      to, and the lower 4 bytes are used to denote the id of a thread instance
      within that subsystem.  A reserved value 0 is used as a subsystem id for
      common handles as they don't belong to a particular subsystem.  The
      bytes 4-7 are currently reserved and must be zero.  In the future the
      number of bytes used for the subsystem or handle ids might be increased.
      
      When a particular userspace process collects coverage by via a common
      handle, kcov will collect coverage for each code section that is
      annotated to use the common handle obtained as kcov_handle from the
      current task_struct.  However non common handles allow to collect
      coverage selectively from different subsystems.
      
      Link: http://lkml.kernel.org/r/e90e315426a384207edbec1d6aa89e43008e4caf.1572366574.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: David Windsor <dwindsor@gmail.com>
      Cc: Elena Reshetova <elena.reshetova@intel.com>
      Cc: Anders Roxell <anders.roxell@linaro.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eec028c9
    • H
      lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr · 964975ac
      Huang Shijie 提交于
      Follow the kernel conventions, rename addr_in_gen_pool to
      gen_pool_has_addr.
      
      [sjhuang@iluvatar.ai: fix Documentation/ too]
       Link: http://lkml.kernel.org/r/20181229015914.5573-1-sjhuang@iluvatar.ai
      Link: http://lkml.kernel.org/r/20181228083950.20398-1-sjhuang@iluvatar.aiSigned-off-by: NHuang Shijie <sjhuang@iluvatar.ai>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      964975ac
  16. 04 12月, 2019 1 次提交
    • M
      docs/core-api: Remove possibly confusing sub-headings from Bit Operations · 4f4afc2c
      Michael Ellerman 提交于
      The recent commit 81d2c6f8 ("kasan: support instrumented bitops
      combined with generic bitops"), split the KASAN instrumented bitops
      into separate headers for atomic, non-atomic and locking operations.
      
      This was done to allow arches to include just the instrumented bitops
      they need, while also using some of the generic bitops in
      asm-generic/bitops (which are automatically instrumented). The generic
      bitops are already split into atomic, non-atomic and locking headers.
      
      This split required an update to kernel-api.rst because it included
      include/asm-generic/bitops-instrumented.h, which no longer exists. So
      now kernel-api.rst includes all three instrumented headers to get the
      definitions for all the bitops.
      
      When adding the three headers it seemed sensible to add sub-headings
      for each, ie. "Atomic", "Non-atomic" and "Locking".
      
      The confusion is that test_bit() is (and always has been) in
      non-atomic.h, but is documented elsewhere (atomic_bitops.txt) as being
      atomic. So having it appear under the "Non-atomic" heading is possibly
      confusing.
      
      Probably test_bit() should move from bitops/non-atomic.h to atomic.h,
      but that has flow on effects. For now just remove the newly added
      sub-headings in the documentation, so we at least aren't adding to the
      confusion about whether test_bit() is atomic or not.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4f4afc2c
  17. 02 12月, 2019 1 次提交
    • D
      kasan: support backing vmalloc space with real shadow memory · 3c5c3cfb
      Daniel Axtens 提交于
      Patch series "kasan: support backing vmalloc space with real shadow
      memory", v11.
      
      Currently, vmalloc space is backed by the early shadow page.  This means
      that kasan is incompatible with VMAP_STACK.
      
      This series provides a mechanism to back vmalloc space with real,
      dynamically allocated memory.  I have only wired up x86, because that's
      the only currently supported arch I can work with easily, but it's very
      easy to wire up other architectures, and it appears that there is some
      work-in-progress code to do this on arm64 and s390.
      
      This has been discussed before in the context of VMAP_STACK:
       - https://bugzilla.kernel.org/show_bug.cgi?id=202009
       - https://lkml.org/lkml/2018/7/22/198
       - https://lkml.org/lkml/2019/7/19/822
      
      In terms of implementation details:
      
      Most mappings in vmalloc space are small, requiring less than a full
      page of shadow space.  Allocating a full shadow page per mapping would
      therefore be wasteful.  Furthermore, to ensure that different mappings
      use different shadow pages, mappings would have to be aligned to
      KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.
      
      Instead, share backing space across multiple mappings.  Allocate a
      backing page when a mapping in vmalloc space uses a particular page of
      the shadow region.  This page can be shared by other vmalloc mappings
      later on.
      
      We hook in to the vmap infrastructure to lazily clean up unused shadow
      memory.
      
      Testing with test_vmalloc.sh on an x86 VM with 2 vCPUs shows that:
      
       - Turning on KASAN, inline instrumentation, without vmalloc, introuduces
         a 4.1x-4.2x slowdown in vmalloc operations.
      
       - Turning this on introduces the following slowdowns over KASAN:
           * ~1.76x slower single-threaded (test_vmalloc.sh performance)
           * ~2.18x slower when both cpus are performing operations
             simultaneously (test_vmalloc.sh sequential_test_order=1)
      
      This is unfortunate but given that this is a debug feature only, not the
      end of the world.  The benchmarks are also a stress-test for the vmalloc
      subsystem: they're not indicative of an overall 2x slowdown!
      
      This patch (of 4):
      
      Hook into vmalloc and vmap, and dynamically allocate real shadow memory
      to back the mappings.
      
      Most mappings in vmalloc space are small, requiring less than a full
      page of shadow space.  Allocating a full shadow page per mapping would
      therefore be wasteful.  Furthermore, to ensure that different mappings
      use different shadow pages, mappings would have to be aligned to
      KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.
      
      Instead, share backing space across multiple mappings.  Allocate a
      backing page when a mapping in vmalloc space uses a particular page of
      the shadow region.  This page can be shared by other vmalloc mappings
      later on.
      
      We hook in to the vmap infrastructure to lazily clean up unused shadow
      memory.
      
      To avoid the difficulties around swapping mappings around, this code
      expects that the part of the shadow region that covers the vmalloc space
      will not be covered by the early shadow page, but will be left unmapped.
      This will require changes in arch-specific code.
      
      This allows KASAN with VMAP_STACK, and may be helpful for architectures
      that do not have a separate module space (e.g.  powerpc64, which I am
      currently working on).  It also allows relaxing the module alignment
      back to PAGE_SIZE.
      
      Testing with test_vmalloc.sh on an x86 VM with 2 vCPUs shows that:
      
       - Turning on KASAN, inline instrumentation, without vmalloc, introuduces
         a 4.1x-4.2x slowdown in vmalloc operations.
      
       - Turning this on introduces the following slowdowns over KASAN:
           * ~1.76x slower single-threaded (test_vmalloc.sh performance)
           * ~2.18x slower when both cpus are performing operations
             simultaneously (test_vmalloc.sh sequential_test_order=3D1)
      
      This is unfortunate but given that this is a debug feature only, not the
      end of the world.
      
      The full benchmark results are:
      
      Performance
      
                                    No KASAN      KASAN original x baseline  KASAN vmalloc x baseline    x KASAN
      
      fix_size_alloc_test             662004            11404956      17.23       19144610      28.92       1.68
      full_fit_alloc_test             710950            12029752      16.92       13184651      18.55       1.10
      long_busy_list_alloc_test      9431875            43990172       4.66       82970178       8.80       1.89
      random_size_alloc_test         5033626            23061762       4.58       47158834       9.37       2.04
      fix_align_alloc_test           1252514            15276910      12.20       31266116      24.96       2.05
      random_size_align_alloc_te     1648501            14578321       8.84       25560052      15.51       1.75
      align_shift_alloc_test             147                 830       5.65           5692      38.72       6.86
      pcpu_alloc_test                  80732              125520       1.55         140864       1.74       1.12
      Total Cycles              119240774314        763211341128       6.40  1390338696894      11.66       1.82
      
      Sequential, 2 cpus
      
                                    No KASAN      KASAN original x baseline  KASAN vmalloc x baseline    x KASAN
      
      fix_size_alloc_test            1423150            14276550      10.03       27733022      19.49       1.94
      full_fit_alloc_test            1754219            14722640       8.39       15030786       8.57       1.02
      long_busy_list_alloc_test     11451858            52154973       4.55      107016027       9.34       2.05
      random_size_alloc_test         5989020            26735276       4.46       68885923      11.50       2.58
      fix_align_alloc_test           2050976            20166900       9.83       50491675      24.62       2.50
      random_size_align_alloc_te     2858229            17971700       6.29       38730225      13.55       2.16
      align_shift_alloc_test             405                6428      15.87          26253      64.82       4.08
      pcpu_alloc_test                 127183              151464       1.19         216263       1.70       1.43
      Total Cycles               54181269392        308723699764       5.70   650772566394      12.01       2.11
      fix_size_alloc_test            1420404            14289308      10.06       27790035      19.56       1.94
      full_fit_alloc_test            1736145            14806234       8.53       15274301       8.80       1.03
      long_busy_list_alloc_test     11404638            52270785       4.58      107550254       9.43       2.06
      random_size_alloc_test         6017006            26650625       4.43       68696127      11.42       2.58
      fix_align_alloc_test           2045504            20280985       9.91       50414862      24.65       2.49
      random_size_align_alloc_te     2845338            17931018       6.30       38510276      13.53       2.15
      align_shift_alloc_test             472                3760       7.97           9656      20.46       2.57
      pcpu_alloc_test                 118643              132732       1.12         146504       1.23       1.10
      Total Cycles               54040011688        309102805492       5.72   651325675652      12.05       2.11
      
      [dja@axtens.net: fixups]
        Link: http://lkml.kernel.org/r/20191120052719.7201-1-dja@axtens.net
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=3D202009
      Link: http://lkml.kernel.org/r/20191031093909.9228-2-dja@axtens.net
      Signed-off-by: Mark Rutland <mark.rutland@arm.com> [shadow rework]
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Co-developed-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NVasily Gorbik <gor@linux.ibm.com>
      Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Qian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c5c3cfb
  18. 01 12月, 2019 4 次提交