1. 13 11月, 2019 10 次提交
    • T
      cgroup: use cgrp->kn->id as the cgroup ID · 74321038
      Tejun Heo 提交于
      cgroup ID is currently allocated using a dedicated per-hierarchy idr
      and used internally and exposed through tracepoints and bpf.  This is
      confusing because there are tracepoints and other interfaces which use
      the cgroupfs ino as IDs.
      
      The preceding changes made kn->id exposed as ino as 64bit ino on
      supported archs or ino+gen (low 32bits as ino, high gen).  There's no
      reason for cgroup to use different IDs.  The kernfs IDs are unique and
      userland can easily discover them and map them back to paths using
      standard file operations.
      
      This patch replaces cgroup IDs with kernfs IDs.
      
      * cgroup_id() is added and all cgroup ID users are converted to use it.
      
      * kernfs_node creation is moved to earlier during cgroup init so that
        cgroup_id() is available during init.
      
      * While at it, s/cgroup/cgrp/ in psi helpers for consistency.
      
      * Fallback ID value is changed to 1 to be consistent with root cgroup
        ID.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      74321038
    • T
      kernfs: use 64bit inos if ino_t is 64bit · 40430452
      Tejun Heo 提交于
      Each kernfs_node is identified with a 64bit ID.  The low 32bit is
      exposed as ino and the high gen.  While this already allows using inos
      as keys by looking up with wildcard generation number of 0, it's
      adding unnecessary complications for 64bit ino archs which can
      directly use kernfs_node IDs as inos to uniquely identify each cgroup
      instance.
      
      This patch exposes IDs directly as inos on 64bit ino archs.  The
      conversion is mostly straight-forward.
      
      * 32bit ino archs behave the same as before.  64bit ino archs now use
        the whole 64bit ID as ino and the generation number is fixed at 1.
      
      * 64bit inos still use the same idr allocator which gurantees that the
        lower 32bits identify the current live instance uniquely and the
        high 32bits are incremented whenever the low bits wrap.  As the
        upper 32bits are no longer used as gen and we don't wanna start ino
        allocation with 33rd bit set, the initial value for highbits
        allocation is changed to 0 on 64bit ino archs.
      
      * blktrace exposes two 32bit numbers - (INO,GEN) pair - to identify
        the issuing cgroup.  Userland builds FILEID_INO32_GEN fids from
        these numbers to look up the cgroups.  To remain compatible with the
        behavior, always output (LOW32,HIGH32) which will be constructed
        back to the original 64bit ID by __kernfs_fh_to_dentry().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      40430452
    • T
      kernfs: implement custom exportfs ops and fid type · 33c5ac91
      Tejun Heo 提交于
      The current kernfs exportfs implementation uses the generic_fh_*()
      helpers and FILEID_INO32_GEN[_PARENT] which limits ino to 32bits.
      Let's implement custom exportfs operations and fid type to remove the
      restriction.
      
      * FILEID_KERNFS is a single u64 value whose content is
        kernfs_node->id.  This is the only native fid type.
      
      * For backward compatibility with blk_log_action() path which exposes
        (ino,gen) pairs which userland assembles into FILEID_INO32_GEN keys,
        combine the generic keys into 64bit IDs in the same order.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      33c5ac91
    • T
      kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id() · fe0f726c
      Tejun Heo 提交于
      kernfs_find_and_get_node_by_ino() looks the kernfs_node matching the
      specified ino.  On top of that, kernfs_get_node_by_id() and
      kernfs_fh_get_inode() implement full ID matching by testing the rest
      of ID.
      
      On surface, confusingly, the two are slightly different in that the
      latter uses 0 gen as wildcard while the former doesn't - does it mean
      that the latter can't uniquely identify inodes w/ 0 gen?  In practice,
      this is a distinction without a difference because generation number
      starts at 1.  There are no actual IDs with 0 gen, so it can always
      safely used as wildcard.
      
      Let's simplify the code by renaming kernfs_find_and_get_node_by_ino()
      to kernfs_find_and_get_node_by_id(), moving all lookup logics into it,
      and removing now unnecessary kernfs_get_node_by_id().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe0f726c
    • T
      kernfs: convert kernfs_node->id from union kernfs_node_id to u64 · 67c0496e
      Tejun Heo 提交于
      kernfs_node->id is currently a union kernfs_node_id which represents
      either a 32bit (ino, gen) pair or u64 value.  I can't see much value
      in the usage of the union - all that's needed is a 64bit ID which the
      current code is already limited to.  Using a union makes the code
      unnecessarily complicated and prevents using 64bit ino without adding
      practical benefits.
      
      This patch drops union kernfs_node_id and makes kernfs_node->id a u64.
      ino is stored in the lower 32bits and gen upper.  Accessors -
      kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the
      ino and gen.  This simplifies ID handling less cumbersome and will
      allow using 64bit inos on supported archs.
      
      This patch doesn't make any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      67c0496e
    • T
      kernfs: kernfs_find_and_get_node_by_ino() should only look up activated nodes · 880df131
      Tejun Heo 提交于
      kernfs node can be created in two separate steps - allocation and
      activation.  This is used to make kernfs nodes visible only after the
      internal states attached to the node are fully initialized.
      kernfs_find_and_get_node_by_id() currently allows lookups of nodes
      which aren't activated yet and thus can expose nodes are which are
      still being prepped by kernfs users.
      
      Fix it by disallowing lookups of nodes which aren't activated yet.
      
      kernfs_find_and_get_node_by_ino()
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      880df131
    • T
      kernfs: use dumber locking for kernfs_find_and_get_node_by_ino() · b680b081
      Tejun Heo 提交于
      kernfs_find_and_get_node_by_ino() uses RCU protection.  It's currently
      a bit buggy because it can look up a node which hasn't been activated
      yet and thus may end up exposing a node that the kernfs user is still
      prepping.
      
      While it can be fixed by pushing it further in the current direction,
      it's already complicated and isn't clear whether the complexity is
      justified.  The main use of kernfs_find_and_get_node_by_ino() is for
      exportfs operations.  They aren't super hot and all the follow-up
      operations (e.g. mapping to path) use normal locking anyway.
      
      Let's switch to a dumber locking scheme and protect the lookup with
      kernfs_idr_lock.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      b680b081
    • T
      netprio: use css ID instead of cgroup ID · db53c73a
      Tejun Heo 提交于
      netprio uses cgroup ID to index the priority mapping table.  This is
      currently okay as cgroup IDs are allocated using idr and packed.
      However, cgroup IDs will be changed to use full 64bit range and won't
      be packed making this impractical.  netprio doesn't care what type of
      IDs it uses as long as they can identify the controller instances and
      are packed.  Let's switch to css IDs instead of cgroup IDs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Namhyung Kim <namhyung@kernel.org>
      db53c73a
    • T
      writeback: use ino_t for inodes in tracepoints · f05499a0
      Tejun Heo 提交于
      Writeback TPs currently use mix of 32 and 64bits for inos.  This isn't
      currently broken because only cgroup inos are using 32bits and they're
      limited to 32bits.  cgroup inos will make use of 64bits.  Let's
      uniformly use ino_t.
      
      While at it, switch the default cgroup ino value used when cgroup is
      disabled to 1 instead of -1U as root cgroup always uses ino 1.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Namhyung Kim <namhyung@kernel.org>
      f05499a0
    • T
      kernfs: fix ino wrap-around detection · e23f568a
      Tejun Heo 提交于
      When the 32bit ino wraps around, kernfs increments the generation
      number to distinguish reused ino instances.  The wrap-around detection
      tests whether the allocated ino is lower than what the cursor but the
      cursor is pointing to the next ino to allocate so the condition never
      triggers.
      
      Fix it by remembering the last ino and comparing against that.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Fixes: 4a3ef68a ("kernfs: implement i_generation")
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: stable@vger.kernel.org # v4.14+
      e23f568a
  2. 12 11月, 2019 1 次提交
  3. 07 11月, 2019 2 次提交
    • H
      cgroup: freezer: don't change task and cgroups status unnecessarily · 742e8cd3
      Honglei Wang 提交于
      It's not necessary to adjust the task state and revisit the state
      of source and destination cgroups if the cgroups are not in freeze
      state and the task itself is not frozen.
      
      And in this scenario, it wakes up the task who's not supposed to be
      ready to run.
      
      Don't do the unnecessary task state adjustment can help stop waking
      up the task without a reason.
      Signed-off-by: NHonglei Wang <honglei.wang@oracle.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      742e8cd3
    • T
      cgroup: use cgroup->last_bstat instead of cgroup->bstat_pending for consistency · 1bb5ec2e
      Tejun Heo 提交于
      cgroup->bstat_pending is used to determine the base stat delta to
      propagate to the parent.  While correct, this is different from how
      percpu delta is determined for no good reason and the inconsistency
      makes the code more difficult to understand.
      
      This patch makes parent propagation delta calculation use the same
      method as percpu to global propagation.
      
      * cgroup_base_stat_accumulate() is renamed to cgroup_base_stat_add()
        and cgroup_base_stat_sub() is added.
      
      * percpu propagation calculation is updated to use the above helpers.
      
      * cgroup->bstat_pending is replaced with cgroup->last_bstat and
        updated to use the same calculation as percpu propagation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      1bb5ec2e
  4. 25 10月, 2019 2 次提交
  5. 07 10月, 2019 9 次提交
    • M
      selftests: cgroup: Run test_core under interfering stress · 1a99fcc0
      Michal Koutný 提交于
      test_core tests various cgroup creation/removal and task migration
      paths. Run the tests repeatedly with interfering noise (for lockdep
      checks). Currently, forking noise and subsystem enabled/disabled
      switching are the implemented noises.
      Signed-off-by: NMichal Koutný <mkoutny@suse.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      1a99fcc0
    • M
      selftests: cgroup: Add task migration tests · 11318989
      Michal Koutný 提交于
      Add two new tests that verify that thread and threadgroup migrations
      work as expected.
      Signed-off-by: NMichal Koutný <mkoutny@suse.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      11318989
    • M
      selftests: cgroup: Simplify task self migration · 58c9f75b
      Michal Koutný 提交于
      Simplify task migration by being oblivious about its PID during
      migration. This allows to easily migrate individual threads as well.
      This change brings no functional change and prepares grounds for thread
      granularity migrating tests.
      Signed-off-by: NMichal Koutný <mkoutny@suse.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      58c9f75b
    • M
      cgroup: Optimize single thread migration · 9a3284fa
      Michal Koutný 提交于
      There are reports of users who use thread migrations between cgroups and
      they report performance drop after d59cfc09 ("sched, cgroup: replace
      signal_struct->group_rwsem with a global percpu_rwsem"). The effect is
      pronounced on machines with more CPUs.
      
      The migration is affected by forking noise happening in the background,
      after the mentioned commit a migrating thread must wait for all
      (forking) processes on the system, not only of its threadgroup.
      
      There are several places that need to synchronize with migration:
      	a) do_exit,
      	b) de_thread,
      	c) copy_process,
      	d) cgroup_update_dfl_csses,
      	e) parallel migration (cgroup_{proc,thread}s_write).
      
      In the case of self-migrating thread, we relax the synchronization on
      cgroup_threadgroup_rwsem to avoid the cost of waiting. d) and e) are
      excluded with cgroup_mutex, c) does not matter in case of single thread
      migration and the executing thread cannot exec(2) or exit(2) while it is
      writing into cgroup.threads. In case of do_exit because of signal
      delivery, we either exit before the migration or finish the migration
      (of not yet PF_EXITING thread) and die afterwards.
      
      This patch handles only the case of self-migration by writing "0" into
      cgroup.threads. For simplicity, we always take cgroup_threadgroup_rwsem
      with numeric PIDs.
      
      This change improves migration dependent workload performance similar
      to per-signal_struct state.
      Signed-off-by: NMichal Koutný <mkoutny@suse.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9a3284fa
    • M
      cgroup: Update comments about task exit path · e7c7b1d8
      Michal Koutný 提交于
      We no longer take cgroup_mutex in cgroup_exit and the exiting tasks are
      not moved to init_css_set, reflect that in several comments to prevent
      confusion.
      Signed-off-by: NMichal Koutný <mkoutny@suse.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e7c7b1d8
    • M
      cgroup: short-circuit current_cgns_cgroup_from_root() on the default hierarchy · 61e867fd
      Miaohe Lin 提交于
      Like commit 13d82fb7 ("cgroup: short-circuit cset_cgroup_from_root() on
      the default hierarchy"), short-circuit current_cgns_cgroup_from_root() on
      the default hierarchy.
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      61e867fd
    • L
      Linux 5.4-rc2 · da0c9ea1
      Linus Torvalds 提交于
      da0c9ea1
    • L
      elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings · b212921b
      Linus Torvalds 提交于
      In commit 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") we
      changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
      executable mappings.
      
      Then, people reported that it broke some binaries that had overlapping
      segments from the same file, and commit ad55eac7 ("elf: enforce
      MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
      overlaying elf segment cases.  But only some - despite the summary line
      of that commit, it only did it when it also does a temporary brk vma for
      one obvious overlapping case.
      
      Now Russell King reports another overlapping case with old 32-bit x86
      binaries, which doesn't trigger that limited case.  End result: we had
      better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.
      
      Yes, it's a sign of old binaries generated with old tool-chains, but we
      do pride ourselves on not breaking existing setups.
      
      This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
      and the old load_elf_library() use-cases, because nobody has reported
      breakage for those. Yet.
      
      Note that in all the cases seen so far, the overlapping elf sections
      seem to be just re-mapping of the same executable with different section
      attributes.  We could possibly introduce a new MAP_FIXED_NOFILECHANGE
      flag or similar, which acts like NOREPLACE, but allows just remapping
      the same executable file using different protection flags.
      
      It's not clear that would make a huge difference to anything, but if
      people really hate that "elf remaps over previous maps" behavior, maybe
      at least a more limited form of remapping would alleviate some concerns.
      
      Alternatively, we should take a look at our elf_map() logic to see if we
      end up not mapping things properly the first time.
      
      In the meantime, this is the minimal "don't do that then" patch while
      people hopefully think about it more.
      Reported-by: NRussell King <linux@armlinux.org.uk>
      Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map")
      Fixes: ad55eac7 ("elf: enforce  MAP_FIXED on overlaying elf segments")
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b212921b
    • L
      Merge tag 'dma-mapping-5.4-1' of git://git.infradead.org/users/hch/dma-mapping · 7cdb85df
      Linus Torvalds 提交于
      Pull dma-mapping regression fix from Christoph Hellwig:
       "Revert an incorret hunk from a patch that caused problems on various
        arm boards (Andrey Smirnov)"
      
      * tag 'dma-mapping-5.4-1' of git://git.infradead.org/users/hch/dma-mapping:
        dma-mapping: fix false positive warnings in dma_common_free_remap()
      7cdb85df
  6. 06 10月, 2019 6 次提交
    • L
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 43b815c6
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Olof Johansson:
       "A few fixes this time around:
      
         - Fixup of some clock specifications for DRA7 (device-tree fix)
      
         - Removal of some dead/legacy CPU OPP/PM code for OMAP that throws
           warnings at boot
      
         - A few more minor fixups for OMAPs, most around display
      
         - Enable STM32 QSPI as =y since their rootfs sometimes comes from
           there
      
         - Switch CONFIG_REMOTEPROC to =y since it went from tristate to bool
      
         - Fix of thermal zone definition for ux500 (5.4 regression)"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        ARM: multi_v7_defconfig: Fix SPI_STM32_QSPI support
        ARM: dts: ux500: Fix up the CPU thermal zone
        arm64/ARM: configs: Change CONFIG_REMOTEPROC from m to y
        ARM: dts: am4372: Set memory bandwidth limit for DISPC
        ARM: OMAP2+: Fix warnings with broken omap2_set_init_voltage()
        ARM: OMAP2+: Add missing LCDC midlemode for am335x
        ARM: OMAP2+: Fix missing reset done flag for am3 and am43
        ARM: dts: Fix gpio0 flags for am335x-icev2
        ARM: omap2plus_defconfig: Enable more droid4 devices as loadable modules
        ARM: omap2plus_defconfig: Enable DRM_TI_TFP410
        DTS: ARM: gta04: introduce legacy spi-cs-high to make display work again
        ARM: dts: Fix wrong clocks for dra7 mcasp
        clk: ti: dra7: Fix mcasp8 clock bits
      43b815c6
    • L
      Merge tag 'kbuild-fixes-v5.4' of... · 2d00aee2
      Linus Torvalds 提交于
      Merge tag 'kbuild-fixes-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - remove unneeded ar-option and KBUILD_ARFLAGS
      
       - remove long-deprecated SUBDIRS
      
       - fix modpost to suppress false-positive warnings for UML builds
      
       - fix namespace.pl to handle relative paths to ${objtree}, ${srctree}
      
       - make setlocalversion work for /bin/sh
      
       - make header archive reproducible
      
       - fix some Makefiles and documents
      
      * tag 'kbuild-fixes-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kheaders: make headers archive reproducible
        kbuild: update compile-test header list for v5.4-rc2
        kbuild: two minor updates for Documentation/kbuild/modules.rst
        scripts/setlocalversion: clear local variable to make it work for sh
        namespace: fix namespace.pl script to support relative paths
        video/logo: do not generate unneeded logo C files
        video/logo: remove unneeded *.o pattern from clean-files
        integrity: remove pointless subdir-$(CONFIG_...)
        integrity: remove unneeded, broken attempt to add -fshort-wchar
        modpost: fix static EXPORT_SYMBOL warnings for UML build
        kbuild: correct formatting of header in kbuild module docs
        kbuild: remove SUBDIRS support
        kbuild: remove ar-option and KBUILD_ARFLAGS
      2d00aee2
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 126195c9
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "Twelve patches mostly small but obvious fixes or cosmetic but small
        updates"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: qla2xxx: Fix Nport ID display value
        scsi: qla2xxx: Fix N2N link up fail
        scsi: qla2xxx: Fix N2N link reset
        scsi: qla2xxx: Optimize NPIV tear down process
        scsi: qla2xxx: Fix stale mem access on driver unload
        scsi: qla2xxx: Fix unbound sleep in fcport delete path.
        scsi: qla2xxx: Silence fwdump template message
        scsi: hisi_sas: Make three functions static
        scsi: megaraid: disable device when probe failed after enabled device
        scsi: storvsc: setup 1:1 mapping between hardware queue and CPU queue
        scsi: qedf: Remove always false 'tmp_prio < 0' statement
        scsi: ufs: skip shutdown if hba is not powered
        scsi: bnx2fc: Handle scope bits when array returns BUSY or TSF
      126195c9
    • L
      Merge branch 'readdir' (readdir speedup and sanity checking) · 4f11918a
      Linus Torvalds 提交于
      This makes getdents() and getdents64() do sanity checking on the
      pathname that it gives to user space.  And to mitigate the performance
      impact of that, it first cleans up the way it does the user copying, so
      that the code avoids doing the SMAP/PAN updates between each part of the
      dirent structure write.
      
      I really wanted to do this during the merge window, but didn't have
      time.  The conversion of filldir to unsafe_put_user() is something I've
      had around for years now in a private branch, but the extra pathname
      checking finally made me clean it up to the point where it is mergable.
      
      It's worth noting that the filename validity checking really should be a
      bit smarter: it would be much better to delay the error reporting until
      the end of the readdir, so that non-corrupted filenames are still
      returned.  But that involves bigger changes, so let's see if anybody
      actually hits the corrupt directory entry case before worrying about it
      further.
      
      * branch 'readdir':
        Make filldir[64]() verify the directory entry filename is valid
        Convert filldir[64]() from __put_user() to unsafe_put_user()
      4f11918a
    • L
      Make filldir[64]() verify the directory entry filename is valid · 8a23eb80
      Linus Torvalds 提交于
      This has been discussed several times, and now filesystem people are
      talking about doing it individually at the filesystem layer, so head
      that off at the pass and just do it in getdents{64}().
      
      This is partially based on a patch by Jann Horn, but checks for NUL
      bytes as well, and somewhat simplified.
      
      There's also commentary about how it might be better if invalid names
      due to filesystem corruption don't cause an immediate failure, but only
      an error at the end of the readdir(), so that people can still see the
      filenames that are ok.
      
      There's also been discussion about just how much POSIX strictly speaking
      requires this since it's about filesystem corruption.  It's really more
      "protect user space from bad behavior" as pointed out by Jann.  But
      since Eric Biederman looked up the POSIX wording, here it is for context:
      
       "From readdir:
      
         The readdir() function shall return a pointer to a structure
         representing the directory entry at the current position in the
         directory stream specified by the argument dirp, and position the
         directory stream at the next entry. It shall return a null pointer
         upon reaching the end of the directory stream. The structure dirent
         defined in the <dirent.h> header describes a directory entry.
      
        From definitions:
      
         3.129 Directory Entry (or Link)
      
         An object that associates a filename with a file. Several directory
         entries can associate names with the same file.
      
        ...
      
         3.169 Filename
      
         A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
         characters composing the name may be selected from the set of all
         character values excluding the slash character and the null byte. The
         filenames dot and dot-dot have special meaning. A filename is
         sometimes referred to as a 'pathname component'."
      
      Note that I didn't bother adding the checks to any legacy interfaces
      that nobody uses.
      
      Also note that if this ends up being noticeable as a performance
      regression, we can fix that to do a much more optimized model that
      checks for both NUL and '/' at the same time one word at a time.
      
      We haven't really tended to optimize 'memchr()', and it only checks for
      one pattern at a time anyway, and we really _should_ check for NUL too
      (but see the comment about "soft errors" in the code about why it
      currently only checks for '/')
      
      See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
      lookup code looks for pathname terminating characters in parallel.
      
      Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jann Horn <jannh@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a23eb80
    • L
      Convert filldir[64]() from __put_user() to unsafe_put_user() · 9f79b78e
      Linus Torvalds 提交于
      We really should avoid the "__{get,put}_user()" functions entirely,
      because they can easily be mis-used and the original intent of being
      used for simple direct user accesses no longer holds in a post-SMAP/PAN
      world.
      
      Manually optimizing away the user access range check makes no sense any
      more, when the range check is generally much cheaper than the "enable
      user accesses" code that the __{get,put}_user() functions still need.
      
      So instead of __put_user(), use the unsafe_put_user() interface with
      user_access_{begin,end}() that really does generate better code these
      days, and which is generally a nicer interface.  Under some loads, the
      multiple user writes that filldir() does are actually quite noticeable.
      
      This also makes the dirent name copy use unsafe_put_user() with a couple
      of macros.  We do not want to make function calls with SMAP/PAN
      disabled, and the code this generates is quite good when the
      architecture uses "asm goto" for unsafe_put_user() like x86 does.
      
      Note that this doesn't bother with the legacy cases.  Nobody should use
      them anyway, so performance doesn't really matter there.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f79b78e
  7. 05 10月, 2019 10 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9819a30c
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix ieeeu02154 atusb driver use-after-free, from Johan Hovold.
      
       2) Need to validate TCA_CBQ_WRROPT netlink attributes, from Eric
          Dumazet.
      
       3) txq null deref in mac80211, from Miaoqing Pan.
      
       4) ionic driver needs to select NET_DEVLINK, from Arnd Bergmann.
      
       5) Need to disable bh during nft_connlimit GC, from Pablo Neira Ayuso.
      
       6) Avoid division by zero in taprio scheduler, from Vladimir Oltean.
      
       7) Various xgmac fixes in stmmac driver from Jose Abreu.
      
       8) Avoid 64-bit division in mlx5 leading to link errors on 32-bit from
          Michal Kubecek.
      
       9) Fix bad VLAN check in rtl8366 DSA driver, from Linus Walleij.
      
      10) Fix sleep while atomic in sja1105, from Vladimir Oltean.
      
      11) Suspend/resume deadlock in stmmac, from Thierry Reding.
      
      12) Various UDP GSO fixes from Josh Hunt.
      
      13) Fix slab out of bounds access in tcp_zerocopy_receive(), from Eric
          Dumazet.
      
      14) Fix OOPS in __ipv6_ifa_notify(), from David Ahern.
      
      15) Memory leak in NFC's llcp_sock_bind, from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
        selftests/net: add nettest to .gitignore
        net: qlogic: Fix memory leak in ql_alloc_large_buffers
        nfc: fix memory leak in llcp_sock_bind()
        sch_dsmark: fix potential NULL deref in dsmark_init()
        net: phy: at803x: use operating parameters from PHY-specific status
        net: phy: extract pause mode
        net: phy: extract link partner advertisement reading
        net: phy: fix write to mii-ctrl1000 register
        ipv6: Handle missing host route in __ipv6_ifa_notify
        net: phy: allow for reset line to be tied to a sleepy GPIO controller
        net: ipv4: avoid mixed n_redirects and rate_tokens usage
        r8152: Set macpassthru in reset_resume callback
        cxgb4:Fix out-of-bounds MSI-X info array access
        Revert "ipv6: Handle race in addrconf_dad_work"
        net: make sock_prot_memory_pressure() return "const char *"
        rxrpc: Fix rxrpc_recvmsg tracepoint
        qmi_wwan: add support for Cinterion CLS8 devices
        tcp: fix slab-out-of-bounds in tcp_zerocopy_receive()
        lib: textsearch: fix escapes in example code
        udp: only do GSO if # of segs > 1
        ...
      9819a30c
    • L
      Merge tag 's390-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 6fe137cb
      Linus Torvalds 提交于
      Pull s390 fixes from Vasily Gorbik:
      
       - defconfig updates
      
       - Fix build errors with CC_OPTIMIZE_FOR_SIZE due to usage of "i"
         constraint for function arguments. Two kvm changes acked-by Christian
         Borntraeger.
      
       - Fix -Wunused-but-set-variable warnings in mm code.
      
       - Avoid a constant misuse in qdio.
      
       - Handle a case when cpumf is temporarily unavailable.
      
      * tag 's390-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        KVM: s390: mark __insn32_query() as __always_inline
        KVM: s390: fix __insn32_query() inline assembly
        s390: update defconfigs
        s390/pci: mark function(s) __always_inline
        s390/mm: mark function(s) __always_inline
        s390/jump_label: mark function(s) __always_inline
        s390/cpu_mf: mark function(s) __always_inline
        s390/atomic,bitops: mark function(s) __always_inline
        s390/mm: fix -Wunused-but-set-variable warnings
        s390: mark __cpacf_query() as __always_inline
        s390/qdio: clarify size of the QIB parm area
        s390/cpumf: Fix indentation in sampling device driver
        s390/cpumsf: Check for CPU Measurement sampling
        s390/cpumf: Use consistant debug print format
      6fe137cb
    • H
      KVM: s390: mark __insn32_query() as __always_inline · d0dea733
      Heiko Carstens 提交于
      __insn32_query() will not compile if the compiler decides to not
      inline it, since it contains an inline assembly with an "i" constraint
      with variable contents.
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      d0dea733
    • H
      KVM: s390: fix __insn32_query() inline assembly · b1c41ac3
      Heiko Carstens 提交于
      The inline assembly constraints of __insn32_query() tell the compiler
      that only the first byte of "query" is being written to. Intended was
      probably that 32 bytes are written to.
      
      Fix and simplify the code and just use a "memory" clobber.
      
      Fixes: d6681397 ("KVM: s390: provide query function for instructions returning 32 byte")
      Cc: stable@vger.kernel.org # v5.2+
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      b1c41ac3
    • A
      dma-mapping: fix false positivse warnings in dma_common_free_remap() · 2cf2aa6a
      Andrey Smirnov 提交于
      Commit 5cf45379 ("dma-mapping: introduce a dma_common_find_pages
      helper") changed invalid input check in dma_common_free_remap() from:
      
          if (!area || !area->flags != VM_DMA_COHERENT)
      
      to
      
          if (!area || !area->flags != VM_DMA_COHERENT || !area->pages)
      
      which seem to produce false positives for memory obtained via
      dma_common_contiguous_remap()
      
      This triggers the following warning message when doing "reboot" on ZII
      VF610 Dev Board Rev B:
      
      WARNING: CPU: 0 PID: 1 at kernel/dma/remap.c:112 dma_common_free_remap+0x88/0x8c
      trying to free invalid coherent area: 9ef82980
      Modules linked in:
      CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 5.3.0-rc6-next-20190820 #119
      Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
      Backtrace:
      [<8010d1ec>] (dump_backtrace) from [<8010d588>] (show_stack+0x20/0x24)
       r7:8015ed78 r6:00000009 r5:00000000 r4:9f4d9b14
      [<8010d568>] (show_stack) from [<8077e3f0>] (dump_stack+0x24/0x28)
      [<8077e3cc>] (dump_stack) from [<801197a0>] (__warn.part.3+0xcc/0xe4)
      [<801196d4>] (__warn.part.3) from [<80119830>] (warn_slowpath_fmt+0x78/0x94)
       r6:00000070 r5:808e540c r4:81c03048
      [<801197bc>] (warn_slowpath_fmt) from [<8015ed78>] (dma_common_free_remap+0x88/0x8c)
       r3:9ef82980 r2:808e53e0
       r7:00001000 r6:a0b1e000 r5:a0b1e000 r4:00001000
      [<8015ecf0>] (dma_common_free_remap) from [<8010fa9c>] (remap_allocator_free+0x60/0x68)
       r5:81c03048 r4:9f4d9b78
      [<8010fa3c>] (remap_allocator_free) from [<801100d0>] (__arm_dma_free.constprop.3+0xf8/0x148)
       r5:81c03048 r4:9ef82900
      [<8010ffd8>] (__arm_dma_free.constprop.3) from [<80110144>] (arm_dma_free+0x24/0x2c)
       r5:9f563410 r4:80110120
      [<80110120>] (arm_dma_free) from [<8015d80c>] (dma_free_attrs+0xa0/0xdc)
      [<8015d76c>] (dma_free_attrs) from [<8020f3e4>] (dma_pool_destroy+0xc0/0x154)
       r8:9efa8860 r7:808f02f0 r6:808f02d0 r5:9ef82880 r4:9ef82780
      [<8020f324>] (dma_pool_destroy) from [<805525d0>] (ehci_mem_cleanup+0x6c/0x150)
       r7:9f563410 r6:9efa8810 r5:00000000 r4:9efd0148
      [<80552564>] (ehci_mem_cleanup) from [<80558e0c>] (ehci_stop+0xac/0xc0)
       r5:9efd0148 r4:9efd0000
      [<80558d60>] (ehci_stop) from [<8053c4bc>] (usb_remove_hcd+0xf4/0x1b0)
       r7:9f563410 r6:9efd0074 r5:81c03048 r4:9efd0000
      [<8053c3c8>] (usb_remove_hcd) from [<8056361c>] (host_stop+0x48/0xb8)
       r7:9f563410 r6:9efd0000 r5:9f5f4040 r4:9f5f5040
      [<805635d4>] (host_stop) from [<80563d0c>] (ci_hdrc_host_destroy+0x34/0x38)
       r7:9f563410 r6:9f5f5040 r5:9efa8800 r4:9f5f4040
      [<80563cd8>] (ci_hdrc_host_destroy) from [<8055ef18>] (ci_hdrc_remove+0x50/0x10c)
      [<8055eec8>] (ci_hdrc_remove) from [<804a2ed8>] (platform_drv_remove+0x34/0x4c)
       r7:9f563410 r6:81c4f99c r5:9efa8810 r4:9efa8810
      [<804a2ea4>] (platform_drv_remove) from [<804a18a8>] (device_release_driver_internal+0xec/0x19c)
       r5:00000000 r4:9efa8810
      [<804a17bc>] (device_release_driver_internal) from [<804a1978>] (device_release_driver+0x20/0x24)
       r7:9f563410 r6:81c41ed0 r5:9efa8810 r4:9f4a1dac
      [<804a1958>] (device_release_driver) from [<804a01b8>] (bus_remove_device+0xdc/0x108)
      [<804a00dc>] (bus_remove_device) from [<8049c204>] (device_del+0x150/0x36c)
       r7:9f563410 r6:81c03048 r5:9efa8854 r4:9efa8810
      [<8049c0b4>] (device_del) from [<804a3368>] (platform_device_del.part.2+0x20/0x84)
       r10:9f563414 r9:809177e0 r8:81cb07dc r7:81c78320 r6:9f563454 r5:9efa8800
       r4:9efa8800
      [<804a3348>] (platform_device_del.part.2) from [<804a3420>] (platform_device_unregister+0x28/0x34)
       r5:9f563400 r4:9efa8800
      [<804a33f8>] (platform_device_unregister) from [<8055dce0>] (ci_hdrc_remove_device+0x1c/0x30)
       r5:9f563400 r4:00000001
      [<8055dcc4>] (ci_hdrc_remove_device) from [<805652ac>] (ci_hdrc_imx_remove+0x38/0x118)
       r7:81c78320 r6:9f563454 r5:9f563410 r4:9f541010
      [<8056538c>] (ci_hdrc_imx_shutdown) from [<804a2970>] (platform_drv_shutdown+0x2c/0x30)
      [<804a2944>] (platform_drv_shutdown) from [<8049e4fc>] (device_shutdown+0x158/0x1f0)
      [<8049e3a4>] (device_shutdown) from [<8013ac80>] (kernel_restart_prepare+0x44/0x48)
       r10:00000058 r9:9f4d8000 r8:fee1dead r7:379ce700 r6:81c0b280 r5:81c03048
       r4:00000000
      [<8013ac3c>] (kernel_restart_prepare) from [<8013ad14>] (kernel_restart+0x1c/0x60)
      [<8013acf8>] (kernel_restart) from [<8013af84>] (__do_sys_reboot+0xe0/0x1d8)
       r5:81c03048 r4:00000000
      [<8013aea4>] (__do_sys_reboot) from [<8013b0ec>] (sys_reboot+0x18/0x1c)
       r8:80101204 r7:00000058 r6:00000000 r5:00000000 r4:00000000
      [<8013b0d4>] (sys_reboot) from [<80101000>] (ret_fast_syscall+0x0/0x54)
      Exception stack(0x9f4d9fa8 to 0x9f4d9ff0)
      9fa0:                   00000000 00000000 fee1dead 28121969 01234567 379ce700
      9fc0: 00000000 00000000 00000000 00000058 00000000 00000000 00000000 00016d04
      9fe0: 00028e0c 7ec87c64 000135ec 76c1f410
      
      Restore original invalid input check in dma_common_free_remap() to
      avoid this problem.
      
      Fixes: 5cf45379 ("dma-mapping: introduce a dma_common_find_pages helper")
      Signed-off-by: NAndrey Smirnov <andrew.smirnov@gmail.com>
      [hch: just revert the offending hunk instead of creating a new helper]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      2cf2aa6a
    • D
      kheaders: make headers archive reproducible · 86cdd2fd
      Dmitry Goldin 提交于
      In commit 43d8ce9d ("Provide in-kernel headers to make
      extending kernel easier") a new mechanism was introduced, for kernels
      >=5.2, which embeds the kernel headers in the kernel image or a module
      and exposes them in procfs for use by userland tools.
      
      The archive containing the header files has nondeterminism caused by
      header files metadata. This patch normalizes the metadata and utilizes
      KBUILD_BUILD_TIMESTAMP if provided and otherwise falls back to the
      default behaviour.
      
      In commit f7b101d3 ("kheaders: Move from proc to sysfs") it was
      modified to use sysfs and the script for generation of the archive was
      renamed to what is being patched.
      Signed-off-by: NDmitry Goldin <dgoldin+lkml@protonmail.ch>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      86cdd2fd
    • M
      kbuild: update compile-test header list for v5.4-rc2 · d188b8c9
      Masahiro Yamada 提交于
      Commit 6dc280eb ("coda: remove uapi/linux/coda_psdev.h") removed
      a header in question. Some more build errors were fixed. Add more
      headers into the test coverage.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      d188b8c9
    • M
      kbuild: two minor updates for Documentation/kbuild/modules.rst · 43496709
      Masahiro Yamada 提交于
      Capitalize the first word in the sentence.
      
      Use obj-m instead of obj-y. obj-y still works, but we have no built-in
      objects in external module builds. So, obj-m is better IMHO.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      43496709
    • M
      scripts/setlocalversion: clear local variable to make it work for sh · 7a82e3fa
      Masahiro Yamada 提交于
      Geert Uytterhoeven reports a strange side-effect of commit 858805b3
      ("kbuild: add $(BASH) to run scripts with bash-extension"), which
      inserts the contents of a localversion file in the build directory twice.
      
      [Steps to Reproduce]
        $ echo bar > localversion
        $ mkdir build
        $ cd build/
        $ echo foo > localversion
        $ make -s -f ../Makefile defconfig include/config/kernel.release
        $ cat include/config/kernel.release
        5.4.0-rc1foofoobar
      
      This comes down to the behavior change of local variables.
      
      The 'man sh' on my Ubuntu machine, where sh is an alias to dash,
      explains as follows:
        When a variable is made local, it inherits the initial value and
        exported and readonly flags from the variable with the same name
        in the surrounding scope, if there is one. Otherwise, the variable
        is initially unset.
      
      [Test Code]
      
        foo ()
        {
                local res
                echo "res: $res"
        }
      
        res=1
        foo
      
      [Result]
      
        $ sh test.sh
        res: 1
        $ bash test.sh
        res:
      
      So, scripts/setlocalversion correctly works only for bash in spite of
      its hashbang being #!/bin/sh. Nobody had noticed it before because
      CONFIG_SHELL was previously set to bash almost all the time.
      
      Now that CONFIG_SHELL is set to sh, we must write portable and correct
      code. I gave the Fixes tag to the commit that uncovered the issue.
      
      Clear the variable 'res' in collect_files() to make it work for sh
      (and it also works on distributions where sh is an alias to bash).
      
      Fixes: 858805b3 ("kbuild: add $(BASH) to run scripts with bash-extension")
      Reported-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      7a82e3fa
    • J
      namespace: fix namespace.pl script to support relative paths · 82fdd12b
      Jacob Keller 提交于
      The namespace.pl script does not work properly if objtree is not set to
      an absolute path. The do_nm function is run from within the find
      function, which changes directories.
      
      Because of this, appending objtree, $File::Find::dir, and $source, will
      return a path which is not valid from the current directory.
      
      This used to work when objtree was set to an absolute path when using
      "make namespacecheck". It appears to have not worked when calling
      ./scripts/namespace.pl directly.
      
      This behavior was changed in 7e1c0477 ("kbuild: Use relative path
      for $(objtree)", 2014-05-14)
      
      Rather than fixing the Makefile to set objtree to an absolute path, just
      fix namespace.pl to work when srctree and objtree are relative. Also fix
      the script to use an absolute path for these by default.
      
      Use the File::Spec module for this purpose. It's been part of perl
      5 since 5.005.
      
      The curdir() function is used to get the current directory when the
      objtree and srctree aren't set in the environment.
      
      rel2abs() is used to convert possibly relative objtree and srctree
      environment variables to absolute paths.
      
      Finally, the catfile() function is used instead of string appending
      paths together, since this is more robust when joining paths together.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Acked-by: NRandy Dunlap <rdunlap@infradead.org>
      Tested-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      82fdd12b