1. 02 9月, 2017 1 次提交
    • S
      Introduce v3 namespaced file capabilities · 8db6c34f
      Serge E. Hallyn 提交于
      Root in a non-initial user ns cannot be trusted to write a traditional
      security.capability xattr.  If it were allowed to do so, then any
      unprivileged user on the host could map his own uid to root in a private
      namespace, write the xattr, and execute the file with privilege on the
      host.
      
      However supporting file capabilities in a user namespace is very
      desirable.  Not doing so means that any programs designed to run with
      limited privilege must continue to support other methods of gaining and
      dropping privilege.  For instance a program installer must detect
      whether file capabilities can be assigned, and assign them if so but set
      setuid-root otherwise.  The program in turn must know how to drop
      partial capabilities, and do so only if setuid-root.
      
      This patch introduces v3 of the security.capability xattr.  It builds a
      vfs_ns_cap_data struct by appending a uid_t rootid to struct
      vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
      namespace which mounted the filesystem, usually init_user_ns) of the
      root id in whose namespaces the file capabilities may take effect.
      
      When a task asks to write a v2 security.capability xattr, if it is
      privileged with respect to the userns which mounted the filesystem, then
      nothing should change.  Otherwise, the kernel will transparently rewrite
      the xattr as a v3 with the appropriate rootid.  This is done during the
      execution of setxattr() to catch user-space-initiated capability writes.
      Subsequently, any task executing the file which has the noted kuid as
      its root uid, or which is in a descendent user_ns of such a user_ns,
      will run the file with capabilities.
      
      Similarly when asking to read file capabilities, a v3 capability will
      be presented as v2 if it applies to the caller's namespace.
      
      If a task writes a v3 security.capability, then it can provide a uid for
      the xattr so long as the uid is valid in its own user namespace, and it
      is privileged with CAP_SETFCAP over its namespace.  The kernel will
      translate that rootid to an absolute uid, and write that to disk.  After
      this, a task in the writer's namespace will not be able to use those
      capabilities (unless rootid was 0), but a task in a namespace where the
      given uid is root will.
      
      Only a single security.capability xattr may exist at a time for a given
      file.  A task may overwrite an existing xattr so long as it is
      privileged over the inode.  Note this is a departure from previous
      semantics, which required privilege to remove a security.capability
      xattr.  This check can be re-added if deemed useful.
      
      This allows a simple setxattr to work, allows tar/untar to work, and
      allows us to tar in one namespace and untar in another while preserving
      the capability, without risking leaking privilege into a parent
      namespace.
      
      Example using tar:
      
       $ cp /bin/sleep sleepx
       $ mkdir b1 b2
       $ lxc-usernsexec -m b:0:100000:1 -m b:1:$(id -u):1 -- chown 0:0 b1
       $ lxc-usernsexec -m b:0:100001:1 -m b:1:$(id -u):1 -- chown 0:0 b2
       $ lxc-usernsexec -m b:0:100000:1000 -- tar --xattrs-include=security.capability --xattrs -cf b1/sleepx.tar sleepx
       $ lxc-usernsexec -m b:0:100001:1000 -- tar --xattrs-include=security.capability --xattrs -C b2 -xf b1/sleepx.tar
       $ lxc-usernsexec -m b:0:100001:1000 -- getcap b2/sleepx
         b2/sleepx = cap_sys_admin+ep
       # /opt/ltp/testcases/bin/getv3xattr b2/sleepx
         v3 xattr, rootid is 100001
      
      A patch to linux-test-project adding a new set of tests for this
      functionality is in the nsfscaps branch at github.com/hallyn/ltp
      
      Changelog:
         Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
         Nov 07 2016: convert rootid from and to fs user_ns
         (From ebiederm: mar 28 2017)
           commoncap.c: fix typos - s/v4/v3
           get_vfs_caps_from_disk: clarify the fs_ns root access check
           nsfscaps: change the code split for cap_inode_setxattr()
         Apr 09 2017:
             don't return v3 cap for caps owned by current root.
            return a v2 cap for a true v2 cap in non-init ns
         Apr 18 2017:
            . Change the flow of fscap writing to support s_user_ns writing.
            . Remove refuse_fcap_overwrite().  The value of the previous
              xattr doesn't matter.
         Apr 24 2017:
            . incorporate Eric's incremental diff
            . move cap_convert_nscap to setxattr and simplify its usage
         May 8, 2017:
            . fix leaking dentry refcount in cap_inode_getsecurity
      Signed-off-by: NSerge Hallyn <serge@hallyn.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      8db6c34f
  2. 25 7月, 2017 3 次提交
    • E
      signal: Fix sending signals with siginfo · 64a76d0d
      Eric W. Biederman 提交于
      Today sending a signal with rt_sigqueueinfo and receving it on
      a signalfd does not work reliably.  The issue is that reading
      a signalfd instead of returning a siginfo returns a signalfd_siginfo and
      the kernel must convert from one to the other.
      
      The kernel does not currently have the code to deduce which union
      members of struct siginfo are in use.
      
      In this patchset I fix that by introducing a new function siginfo_layout
      that can look at a siginfo and report which union member of struct
      siginfo is in use.  Before that I clean up how we populate struct
      siginfo.
      
      The siginfo structure has two key members si_signo and si_code.  Some
      si_codes are signal specific and for those it takes si_signo and si_code
      to indicate the members of siginfo that are valid.  The rest of the
      si_code values are signal independent like SI_USER, SI_KERNEL, SI_QUEUE,
      and SI_TIMER and only si_code is needed to indicate which members of
      siginfo are valid.
      
      At least that is how POSIX documents them, and how common sense would
      indicate they should function.  In practice we have been rather sloppy
      about maintaining the ABI in linux and we have some exceptions.  We have
      a couple of buggy architectures that make SI_USER mean something
      different when combined with SIGFPE or SIGTRAP.  Worse we have
      fcntl(F_SETSIG) which results in the si_codes POLL_IN, POLL_OUT,
      POLL_MSG, POLL_ERR, POLL_PRI, POLL_HUP being sent with any arbitrary
      signal, while the values are in a range that overlaps the signal
      specific si_codes.
      
      Thankfully the ambiguous cases with the POLL_NNN si_codes are for
      things no sane persion would do that so we can rectify the situtation.
      AKA no one cares so we won't cause a regression fixing it.
      
      As part of fixing this I stop leaking the __SI_xxxx codes to userspace
      and stop storing them in the high 16bits of si_code.  Making the kernel
      code fundamentally simpler.  We have already confirmed that the one
      application that would see this difference in kernel behavior CRIU won't
      be affected by this change as it copies values verbatim from one kernel
      interface to another.
      
      v3:
         - Corrected the patches so they bisect properly
      v2:
         - Benchmarked the code to confirm no performance changes are visible.
         - Reworked the first couple of patches so that TRAP_FIXME and
           FPE_FIXME are not exported to userspace.
         - Rebased on top of the siginfo cleanup that came in v4.13-rc1
         - Updated alpha to use both TRAP_FIXME and FPE_FIXME
      
      Eric W. Biederman (7):
            signal/alpha: Document a conflict with SI_USER for SIGTRAP
            signal/ia64: Document a conflict with SI_USER with SIGFPE
            signal/sparc: Document a conflict with SI_USER with SIGFPE
            signal/mips: Document a conflict with SI_USER with SIGFPE
            signal/testing: Don't look for __SI_FAULT in userspace
            fcntl: Don't use ambiguous SIG_POLL si_codes
            signal: Remove kernel interal si_code magic
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      64a76d0d
    • E
      signal: Remove kernel interal si_code magic · cc731525
      Eric W. Biederman 提交于
      struct siginfo is a union and the kernel since 2.4 has been hiding a union
      tag in the high 16bits of si_code using the values:
      __SI_KILL
      __SI_TIMER
      __SI_POLL
      __SI_FAULT
      __SI_CHLD
      __SI_RT
      __SI_MESGQ
      __SI_SYS
      
      While this looks plausible on the surface, in practice this situation has
      not worked well.
      
      - Injected positive signals are not copied to user space properly
        unless they have these magic high bits set.
      
      - Injected positive signals are not reported properly by signalfd
        unless they have these magic high bits set.
      
      - These kernel internal values leaked to userspace via ptrace_peek_siginfo
      
      - It was possible to inject these kernel internal values and cause the
        the kernel to misbehave.
      
      - Kernel developers got confused and expected these kernel internal values
        in userspace in kernel self tests.
      
      - Kernel developers got confused and set si_code to __SI_FAULT which
        is SI_USER in userspace which causes userspace to think an ordinary user
        sent the signal and that it was not kernel generated.
      
      - The values make it impossible to reorganize the code to transform
        siginfo_copy_to_user into a plain copy_to_user.  As si_code must
        be massaged before being passed to userspace.
      
      So remove these kernel internal si codes and make the kernel code simpler
      and more maintainable.
      
      To replace these kernel internal magic si_codes introduce the helper
      function siginfo_layout, that takes a signal number and an si_code and
      computes which union member of siginfo is being used.  Have
      siginfo_layout return an enumeration so that gcc will have enough
      information to warn if a switch statement does not handle all of union
      members.
      
      A couple of architectures have a messed up ABI that defines signal
      specific duplications of SI_USER which causes more special cases in
      siginfo_layout than I would like.  The good news is only problem
      architectures pay the cost.
      
      Update all of the code that used the previous magic __SI_ values to
      use the new SIL_ values and to call siginfo_layout to get those
      values.  Escept where not all of the cases are handled remove the
      defaults in the switch statements so that if a new case is missed in
      the future the lack will show up at compile time.
      
      Modify the code that copies siginfo si_code to userspace to just copy
      the value and not cast si_code to a short first.  The high bits are no
      longer used to hold a magic union member.
      
      Fixup the siginfo header files to stop including the __SI_ values in
      their constants and for the headers that were missing it to properly
      update the number of si_codes for each signal type.
      
      The fixes to copy_siginfo_from_user32 implementations has the
      interesting property that several of them perviously should never have
      worked as the __SI_ values they depended up where kernel internal.
      With that dependency gone those implementations should work much
      better.
      
      The idea of not passing the __SI_ values out to userspace and then
      not reinserting them has been tested with criu and criu worked without
      changes.
      
      Ref: 2.4.0-test1
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      cc731525
    • E
      fcntl: Don't use ambiguous SIG_POLL si_codes · d08477aa
      Eric W. Biederman 提交于
      We have a weird and problematic intersection of features that when
      they all come together result in ambiguous siginfo values, that
      we can not support properly.
      
      - Supporting fcntl(F_SETSIG,...) with arbitrary valid signals.
      
      - Using positive values for POLL_IN, POLL_OUT, POLL_MSG, ..., etc
        that imply they are signal specific si_codes and using the
        aforementioned arbitrary signal to deliver them.
      
      - Supporting injection of arbitrary siginfo values for debugging and
        checkpoint/restore.
      
      The result is that just looking at siginfo si_codes of 1 to 6 are
      ambigious.  It could either be a signal specific si_code or it could
      be a generic si_code.
      
      For most of the kernel this is a non-issue but for sending signals
      with siginfo it is impossible to play back the kernel signals and
      get the same result.
      
      Strictly speaking when the si_code was changed from SI_SIGIO to
      POLL_IN and friends between 2.2 and 2.4 this functionality was not
      ambiguous, as only real time signals were supported.  Before 2.4 was
      released the kernel began supporting siginfo with non realtime signals
      so they could give details of why the signal was sent.
      
      The result is that if F_SETSIG is set to one of the signals with signal
      specific si_codes then user space can not know why the signal was sent.
      
      I grepped through a bunch of userspace programs using debian code
      search to get a feel for how often people choose a signal that results
      in an ambiguous si_code.  I only found one program doing so and it was
      using SIGCHLD to test the F_SETSIG functionality, and did not appear
      to be a real world usage.
      
      Therefore the ambiguity does not appears to be a real world problem in
      practice.  Remove the ambiguity while introducing the smallest chance
      of breakage by changing the si_code to SI_SIGIO when signals with
      signal specific si_codes are targeted.
      
      Fixes: v2.3.40 -- Added support for queueing non-rt signals
      Fixes: v2.3.21 -- Changed the si_code from SI_SIGIO
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      d08477aa
  3. 20 7月, 2017 8 次提交
    • K
      prctl: Allow local CAP_SYS_ADMIN changing exe_file · 4d28df61
      Kirill Tkhai 提交于
      During checkpointing and restore of userspace tasks
      we bumped into the situation, that it's not possible
      to restore the tasks, which user namespace does not
      have uid 0 or gid 0 mapped.
      
      People create user namespace mappings like they want,
      and there is no a limitation on obligatory uid and gid
      "must be mapped". So, if there is no uid 0 or gid 0
      in the mapping, it's impossible to restore mm->exe_file
      of the processes belonging to this user namespace.
      
      Also, there is no a workaround. It's impossible
      to create a temporary uid/gid mapping, because
      only one write to /proc/[pid]/uid_map and gid_map
      is allowed during a namespace lifetime.
      If there is an entry, then no more mapings can't be
      written. If there isn't an entry, we can't write
      there too, otherwise user task won't be able
      to do that in the future.
      
      The patch changes the check, and looks for CAP_SYS_ADMIN
      instead of zero uid and gid. This allows to restore
      a task independently of its user namespace mappings.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Serge Hallyn <serge@hallyn.com>
      CC: "Eric W. Biederman" <ebiederm@xmission.com>
      CC: Oleg Nesterov <oleg@redhat.com>
      CC: Michal Hocko <mhocko@suse.com>
      CC: Andrei Vagin <avagin@openvz.org>
      CC: Cyrill Gorcunov <gorcunov@openvz.org>
      CC: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
      CC: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      4d28df61
    • K
      security: Use user_namespace::level to avoid redundant iterations in cap_capable() · 64db4c7f
      Kirill Tkhai 提交于
      When ns->level is not larger then cred->user_ns->level,
      then ns can't be cred->user_ns's descendant, and
      there is no a sense to search in parents.
      
      So, break the cycle earlier and skip needless iterations.
      
      v2: Change comment on suggested by Andy Lutomirski.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      64db4c7f
    • E
      userns,pidns: Verify the userns for new pid namespaces · a2b42626
      Eric W. Biederman 提交于
      It is pointless and confusing to allow a pid namespace hierarchy and
      the user namespace hierarchy to get out of sync.  The owner of a child
      pid namespace should be the owner of the parent pid namespace or
      a descendant of the owner of the parent pid namespace.
      
      Otherwise it is possible to construct scenarios where a process has a
      capability over a parent pid namespace but does not have the
      capability over a child pid namespace.  Which confusingly makes
      permission checks non-transitive.
      
      It requires use of setns into a pid namespace (but not into a user
      namespace) to create such a scenario.
      
      Add the function in_userns to help in making this determination.
      
      v2: Optimized in_userns by using level as suggested
          by: Kirill Tkhai <ktkhai@virtuozzo.com>
      
      Ref: 49f4d8b9 ("pidns: Capture the user namespace and filter ns_last_pid")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      a2b42626
    • E
      signal/testing: Don't look for __SI_FAULT in userspace · d12fe87e
      Eric W. Biederman 提交于
      Fix the debug print statements in these tests where they reference
      si_codes and in particular __SI_FAULT.  __SI_FAULT is a kernel
      internal value and should never be seen by userspace.
      
      While I am in there also fix si_code_str.  si_codes are an enumeration
      there are not a bitmap so == and not & is the apropriate operation to
      test for an si_code.
      
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Fixes: 5f23f6d0 ("x86/pkeys: Add self-tests")
      Fixes: e754aedc ("x86/mpx, selftests: Add MPX self test")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      d12fe87e
    • E
      signal/mips: Document a conflict with SI_USER with SIGFPE · ea1b75cf
      Eric W. Biederman 提交于
      Setting si_code to __SI_FAULT results in a userspace seeing
      an si_code of 0.  This is the same si_code as SI_USER.  Posix
      and common sense requires that SI_USER not be a signal specific
      si_code.  As such this use of 0 for the si_code is a pretty
      horribly broken ABI.
      
      This use of of __SI_FAULT is only a decade old.  Which compared
      to the other pieces of kernel code that has made this mistake
      is almost yesterday.
      
      This is probably worth fixing but I don't know mips well enough
      to know what si_code to would be the proper one to use.
      
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Ref: 948a34cf ("[MIPS] Maintain si_code field properly for FP exceptions")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      ea1b75cf
    • E
      signal/sparc: Document a conflict with SI_USER with SIGFPE · cc9f72e4
      Eric W. Biederman 提交于
      Setting si_code to __SI_FAULT results in a userspace seeing
      an si_code of 0.  This is the same si_code as SI_USER.  Posix
      and common sense requires that SI_USER not be a signal specific
      si_code.  As such this use of 0 for the si_code is a pretty
      horribly broken ABI.
      
      This was introduced in 2.3.41 so this mess has had a long time for
      people to be able to start depending on it.
      
      As this bug has existed for 17 years already I don't know if it is
      worth fixing.  It is definitely worth documenting what is going
      on so that no one decides to copy this bad decision.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      cc9f72e4
    • E
      signal/ia64: Document a conflict with SI_USER with SIGFPE · 80dce5e3
      Eric W. Biederman 提交于
      Setting si_code to __SI_FAULT results in a userspace seeing
      an si_code of 0.  This is the same si_code as SI_USER.  Posix
      and common sense requires that SI_USER not be a signal specific
      si_code.  As such this use of 0 for the si_code is a pretty
      horribly broken ABI.
      
      Given that ia64 is on it's last legs I don't know that it is worth
      fixing this, but it is worth documenting what is going on so that
      no one decides to copy this bad decision.
      
      This was introduced in 2.3.51 so this mess has had a long time for
      people to be able to start depending on it.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: linux-ia64@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      80dce5e3
    • E
      signal/alpha: Document a conflict with SI_USER for SIGTRAP · e2bd64d9
      Eric W. Biederman 提交于
      Setting si_code to __SI_FAULT results in a userspace seeing
      an si_code of 0.  This is the same si_code as SI_USER.  Posix
      and common sense requires that SI_USER not be a signal specific
      si_code.  As such this use of 0 for the si_code is a pretty
      horribly broken ABI.
      
      Given that alpha is on it's last legs I don't know that it is worth
      fixing this, but it is worth documenting what is going on so that
      no one decides to copy this bad decision.
      
      This was introduced during the 2.5 development cycle so this
      mess has had a long time for people to be able to depend upon it.
      
      v2: Added FPE_FIXME for alpha as Helge Deller <deller@gmx.de> pointed out
          with his alternate patch one of the cases is SIGFPE not SIGTRAP.
      
      Cc: Helge Deller <deller@gmx.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: linux-alpha@vger.kernel.org
      Acked-by: NRichard Henderson <rth@twiddle.net>
      History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      Ref: 0a635c7a84cf ("Fill in siginfo_t.")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e2bd64d9
  4. 16 7月, 2017 13 次提交
    • L
      Linux v4.13-rc1 · 5771a8c0
      Linus Torvalds 提交于
      5771a8c0
    • L
      Merge tag 'standardize-docs' of git://git.lwn.net/linux · 486088bc
      Linus Torvalds 提交于
      Pull documentation format standardization from Jonathan Corbet:
       "This series converts a number of top-level documents to the RST format
        without incorporating them into the Sphinx tree. The hope is to bring
        some uniformity to kernel documentation and, perhaps more importantly,
        have our existing docs serve as an example of the desired formatting
        for those that will be added later.
      
        Mauro has gone through and fixed up a lot of top-level documentation
        files to make them conform to the RST format, but without moving or
        renaming them in any way. This will help when we incorporate the ones
        we want to keep into the Sphinx doctree, but the real purpose is to
        bring a bit of uniformity to our documentation and let the top-level
        docs serve as examples for those writing new ones"
      
      * tag 'standardize-docs' of git://git.lwn.net/linux: (84 commits)
        docs: kprobes.txt: Fix whitespacing
        tee.txt: standardize document format
        cgroup-v2.txt: standardize document format
        dell_rbu.txt: standardize document format
        zorro.txt: standardize document format
        xz.txt: standardize document format
        xillybus.txt: standardize document format
        vfio.txt: standardize document format
        vfio-mediated-device.txt: standardize document format
        unaligned-memory-access.txt: standardize document format
        this_cpu_ops.txt: standardize document format
        svga.txt: standardize document format
        static-keys.txt: standardize document format
        smsc_ece1099.txt: standardize document format
        SM501.txt: standardize document format
        siphash.txt: standardize document format
        sgi-ioc4.txt: standardize document format
        SAK.txt: standardize document format
        rpmsg.txt: standardize document format
        robust-futexes.txt: standardize document format
        ...
      486088bc
    • L
      Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random · 52f6c588
      Linus Torvalds 提交于
      Pull random updates from Ted Ts'o:
       "Add wait_for_random_bytes() and get_random_*_wait() functions so that
        callers can more safely get random bytes if they can block until the
        CRNG is initialized.
      
        Also print a warning if get_random_*() is called before the CRNG is
        initialized. By default, only one single-line warning will be printed
        per boot. If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a
        warning will be printed for each function which tries to get random
        bytes before the CRNG is initialized. This can get spammy for certain
        architecture types, so it is not enabled by default"
      
      * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
        random: reorder READ_ONCE() in get_random_uXX
        random: suppress spammy warnings about unseeded randomness
        random: warn when kernel uses unseeded randomness
        net/route: use get_random_int for random counter
        net/neighbor: use get_random_u32 for 32-bit hash random
        rhashtable: use get_random_u32 for hash_rnd
        ceph: ensure RNG is seeded before using
        iscsi: ensure RNG is seeded before use
        cifs: use get_random_u32 for 32-bit lock random
        random: add get_random_{bytes,u32,u64,int,long,once}_wait family
        random: add wait_for_random_bytes() API
      52f6c588
    • L
      Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 78dcf734
      Linus Torvalds 提交于
      Pull ->s_options removal from Al Viro:
       "Preparations for fsmount/fsopen stuff (coming next cycle). Everything
        gets moved to explicit ->show_options(), killing ->s_options off +
        some cosmetic bits around fs/namespace.c and friends. Basically, the
        stuff needed to work with fsmount series with minimum of conflicts
        with other work.
      
        It's not strictly required for this merge window, but it would reduce
        the PITA during the coming cycle, so it would be nice to have those
        bits and pieces out of the way"
      
      * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        isofs: Fix isofs_show_options()
        VFS: Kill off s_options and helpers
        orangefs: Implement show_options
        9p: Implement show_options
        isofs: Implement show_options
        afs: Implement show_options
        affs: Implement show_options
        befs: Implement show_options
        spufs: Implement show_options
        bpf: Implement show_options
        ramfs: Implement show_options
        pstore: Implement show_options
        omfs: Implement show_options
        hugetlbfs: Implement show_options
        VFS: Don't use save/replace_mount_options if not using generic_show_options
        VFS: Provide empty name qstr
        VFS: Make get_filesystem() return the affected filesystem
        VFS: Clean up whitespace in fs/namespace.c and fs/super.c
        Provide a function to create a NUL-terminated string from unterminated data
      78dcf734
    • L
      Merge branch 'work.__copy_to_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 93ff8185
      Linus Torvalds 提交于
      Pull more __copy_.._user elimination from Al Viro.
      
      * 'work.__copy_to_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        drm_dp_aux_dev: switch to read_iter/write_iter
      93ff8185
    • L
      Merge branch 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 89cbec71
      Linus Torvalds 提交于
      Pull uacess-unaligned removal from Al Viro:
       "That stuff had just one user, and an exotic one, at that - binfmt_flat
        on arm and m68k"
      
      * 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        kill {__,}{get,put}_user_unaligned()
        binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail
      89cbec71
    • L
      Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 2173bd06
      Linus Torvalds 提交于
      Pull network field-by-field copy-in updates from Al Viro:
       "This part of the misc compat queue was held back for review from
        networking folks and since davem has jus ACKed those..."
      
      * 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        get_compat_bpf_fprog(): don't copyin field-by-field
        get_compat_msghdr(): get rid of field-by-field copyin
        copy_msghdr_from_user(): get rid of field-by-field copyin
      2173bd06
    • L
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 568d135d
      Linus Torvalds 提交于
      Pull MIPS updates from Ralf Baechle:
       "Boston platform support:
         - Document DT bindings
         - Add CLK driver for board clocks
      
        CM:
         - Avoid per-core locking with CM3 & higher
         - WARN on attempt to lock invalid VP, not BUG
      
        CPS:
         - Select CONFIG_SYS_SUPPORTS_SCHED_SMT for MIPSr6
         - Prevent multi-core with dcache aliasing
         - Handle cores not powering down more gracefully
         - Handle spurious VP starts more gracefully
      
        DSP:
         - Add lwx & lhx missaligned access support
      
        eBPF:
         - Add MIPS support along with many supporting change to add the
           required infrastructure
      
        Generic arch code:
         - Misc sysmips MIPS_ATOMIC_SET fixes
         - Drop duplicate HAVE_SYSCALL_TRACEPOINTS
         - Negate error syscall return in trace
         - Correct forced syscall errors
         - Traced negative syscalls should return -ENOSYS
         - Allow samples/bpf/tracex5 to access syscall arguments for sane
           traces
         - Cleanup from old Kconfig options in defconfigs
         - Fix PREF instruction usage by memcpy for MIPS R6
         - Fix various special cases in the FPU eulation
         - Fix some special cases in MIPS16e2 support
         - Fix MIPS I ISA /proc/cpuinfo reporting
         - Sort MIPS Kconfig alphabetically
         - Fix minimum alignment requirement of IRQ stack as required by
           ABI / GCC
         - Fix special cases in the module loader
         - Perform post-DMA cache flushes on systems with MAARs
         - Probe the I6500 CPU
         - Cleanup cmpxchg and add support for 1 and 2 byte operations
         - Use queued read/write locks (qrwlock)
         - Use queued spinlocks (qspinlock)
         - Add CPU shared FTLB feature detection
         - Handle tlbex-tlbp race condition
         - Allow storing pgd in C0_CONTEXT for MIPSr6
         - Use current_cpu_type() in m4kc_tlbp_war()
         - Support Boston in the generic kernel
      
        Generic platform:
         - yamon-dt: Pull YAMON DT shim code out of SEAD-3 board
         - yamon-dt: Support > 256MB of RAM
         - yamon-dt: Use serial* rather than uart* aliases
         - Abstract FDT fixup application
         - Set RTC_ALWAYS_BCD to 0
         - Add a MAINTAINERS entry
      
        core kernel:
         - qspinlock.c: include linux/prefetch.h
      
        Loongson 3:
         - Add support
      
        Perf:
         - Add I6500 support
      
        SEAD-3:
         - Remove GIC timer from DT
         - Set interrupt-parent per-device, not at root node
         - Fix GIC interrupt specifiers
      
        SMP:
         - Skip IPI setup if we only have a single CPU
      
        VDSO:
         - Make comment match reality
         - Improvements to time code in VDSO"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (86 commits)
        locking/qspinlock: Include linux/prefetch.h
        MIPS: Fix MIPS I ISA /proc/cpuinfo reporting
        MIPS: Fix minimum alignment requirement of IRQ stack
        MIPS: generic: Support MIPS Boston development boards
        MIPS: DTS: img: Don't attempt to build-in all .dtb files
        clk: boston: Add a driver for MIPS Boston board clocks
        dt-bindings: Document img,boston-clock binding
        MIPS: Traced negative syscalls should return -ENOSYS
        MIPS: Correct forced syscall errors
        MIPS: Negate error syscall return in trace
        MIPS: Drop duplicate HAVE_SYSCALL_TRACEPOINTS select
        MIPS16e2: Provide feature overrides for non-MIPS16 systems
        MIPS: MIPS16e2: Report ASE presence in /proc/cpuinfo
        MIPS: MIPS16e2: Subdecode extended LWSP/SWSP instructions
        MIPS: MIPS16e2: Identify ASE presence
        MIPS: VDSO: Fix a mismatch between comment and preprocessor constant
        MIPS: VDSO: Add implementation of gettimeofday() fallback
        MIPS: VDSO: Add implementation of clock_gettime() fallback
        MIPS: VDSO: Fix conversions in do_monotonic()/do_monotonic_coarse()
        MIPS: Use current_cpu_type() in m4kc_tlbp_war()
        ...
      568d135d
    • L
      Merge branch 'for-linus-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · 4ecd4ff5
      Linus Torvalds 提交于
      Pull UML updates from Richard Weinberger:
       "Mostly fixes for UML:
      
         - First round of fixes for PTRACE_GETRESET/SETREGSET
      
         - A printf vs printk cleanup
      
         - Minor improvements"
      
      * 'for-linus-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
        um: Correctly check for PTRACE_GETRESET/SETREGSET
        um: v2: Use generic NOTES macro
        um: Add kerneldoc for userspace_tramp() and start_userspace()
        um: Add kerneldoc for segv_handler
        um: stub-data.h: remove superfluous include
        um: userspace - be more verbose in ptrace set regs error
        um: add dummy ioremap and iounmap functions
        um: Allow building and running on older hosts
        um: Avoid longjmp/setjmp symbol clashes with libpthread.a
        um: console: Ignore console= option
        um: Use os_warn to print out pre-boot warning/error messages
        um: Add os_warn() for pre-boot warning/error messages
        um: Use os_info for the messages on normal path
        um: Add os_info() for pre-boot information messages
        um: Use printk instead of printf in make_uml_dir
      4ecd4ff5
    • L
      Merge tag 'upstream-4.13-rc1' of git://git.infradead.org/linux-ubifs · 966859b9
      Linus Torvalds 提交于
      Pull UBIFS updates from Richard Weinberger:
      
       - Updates and fixes for the file encryption mode
      
       - Minor improvements
      
       - Random fixes
      
      * tag 'upstream-4.13-rc1' of git://git.infradead.org/linux-ubifs:
        ubifs: Set double hash cookie also for RENAME_EXCHANGE
        ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs
        ubifs: Don't leak kernel memory to the MTD
        ubifs: Change gfp flags in page allocation for bulk read
        ubifs: Fix oops when remounting with no_bulk_read.
        ubifs: Fail commit if TNC is obviously inconsistent
        ubifs: allow userspace to map mounts to volumes
        ubifs: Wire-up statx() support
        ubifs: Remove dead code from ubifs_get_link()
        ubifs: Massage debug prints wrt. fscrypt
        ubifs: Add assert to dent_key_init()
        ubifs: Fix unlink code wrt. double hash lookups
        ubifs: Fix data node size for truncating uncompressed nodes
        ubifs: Don't encrypt special files on creation
        ubifs: Fix memory leak in RENAME_WHITEOUT error path in do_rename
        ubifs: Fix inode data budget in ubifs_mknod
        ubifs: Correctly evict xattr inodes
        ubifs: Unexport ubifs_inode_slab
        ubifs: don't bother checking for encryption key in ->mmap()
        ubifs: require key for truncate(2) of encrypted file
      966859b9
    • L
      Merge tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm · e37a07e0
      Linus Torvalds 提交于
      Pull more KVM updates from Radim Krčmář:
       "Second batch of KVM updates for v4.13
      
        Common:
         - add uevents for VM creation/destruction
         - annotate and properly access RCU-protected objects
      
        s390:
         - rename IOCTL added in the first v4.13 merge
      
        x86:
         - emulate VMLOAD VMSAVE feature in SVM
         - support paravirtual asynchronous page fault while nested
         - add Hyper-V userspace interfaces for better migration
         - improve master clock corner cases
         - extend internal error reporting after EPT misconfig
         - correct single-stepping of emulated instructions in SVM
         - handle MCE during VM entry
         - fix nVMX VM entry checks and nVMX VMCS shadowing"
      
      * tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
        kvm: x86: hyperv: make VP_INDEX managed by userspace
        KVM: async_pf: Let guest support delivery of async_pf from guest mode
        KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
        KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
        KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list
        kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2
        KVM: x86: make backwards_tsc_observed a per-VM variable
        KVM: trigger uevents when creating or destroying a VM
        KVM: SVM: Enable Virtual VMLOAD VMSAVE feature
        KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition
        KVM: SVM: Rename lbr_ctl field in the vmcb control area
        KVM: SVM: Prepare for new bit definition in lbr_ctl
        KVM: SVM: handle singlestep exception when skipping emulated instructions
        KVM: x86: take slots_lock in kvm_free_pit
        KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition
        kvm: vmx: Properly handle machine check during VM-entry
        KVM: x86: update master clock before computing kvmclock_offset
        kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields
        kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls
        kvm: nVMX: Validate the I/O bitmaps on nested VM-entry
        ...
      e37a07e0
    • S
      random: reorder READ_ONCE() in get_random_uXX · 72e5c740
      Sebastian Andrzej Siewior 提交于
      Avoid the READ_ONCE in commit 4a072c71 ("random: silence compiler
      warnings and fix race") if we can leave the function after
      arch_get_random_XXX().
      
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      72e5c740
    • T
      random: suppress spammy warnings about unseeded randomness · eecabf56
      Theodore Ts'o 提交于
      Unfortunately, on some models of some architectures getting a fully
      seeded CRNG is extremely difficult, and so this can result in dmesg
      getting spammed for a surprisingly long time.  This is really bad from
      a security perspective, and so architecture maintainers really need to
      do what they can to get the CRNG seeded sooner after the system is
      booted.  However, users can't do anything actionble to address this,
      and spamming the kernel messages log will only just annoy people.
      
      For developers who want to work on improving this situation,
      CONFIG_WARN_UNSEEDED_RANDOM has been renamed to
      CONFIG_WARN_ALL_UNSEEDED_RANDOM.  By default the kernel will always
      print the first use of unseeded randomness.  This way, hopefully the
      security obsessed will be happy that there is _some_ indication when
      the kernel boots there may be a potential issue with that architecture
      or subarchitecture.  To see all uses of unseeded randomness,
      developers can enable CONFIG_WARN_ALL_UNSEEDED_RANDOM.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      eecabf56
  5. 15 7月, 2017 15 次提交
    • L
      Merge tag 'xfs-4.13-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · a80099a1
      Linus Torvalds 提交于
      Pull XFS fixes from Darrick Wong:
       "Largely debugging and regression fixes.
      
         - Add some locking assertions for the _ilock helpers.
      
         - Revert the XFS_QMOPT_NOLOCK patch; after discussion with hch the
           online fsck patch that would have needed it has been redesigned and
           no longer needs it.
      
         - Fix behavioral regression of SEEK_HOLE/DATA with negative offsets
           to match 4.12-era XFS behavior"
      
      * tag 'xfs-4.13-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        vfs: in iomap seek_{hole,data}, return -ENXIO for negative offsets
        Revert "xfs: grab dquots without taking the ilock"
        xfs: assert locking precondition in xfs_readlink_bmap_ilocked
        xfs: assert locking precondіtion in xfs_attr_list_int_ilocked
        xfs: fixup xfs_attr_get_ilocked
      a80099a1
    • L
      Merge branch 'for-4.13-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · bc243704
      Linus Torvalds 提交于
      Pull btrfs fixes from David Sterba:
       "We've identified and fixed a silent corruption (introduced by code in
        the first pull), a fixup after the blk_status_t merge and two fixes to
        incremental send that Filipe has been hunting for some time"
      
      * 'for-4.13-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix unexpected return value of bio_readpage_error
        btrfs: btrfs_create_repair_bio never fails, skip error handling
        btrfs: cloned bios must not be iterated by bio_for_each_segment_all
        Btrfs: fix write corruption due to bio cloning on raid5/6
        Btrfs: incremental send, fix invalid memory access
        Btrfs: incremental send, fix invalid path for link commands
      bc243704
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 0ffff118
      Linus Torvalds 提交于
      Pull a few more input updates from Dmitry Torokhov:
      
       - multi-touch handling for Xen
      
       - fix for long-standing bug causing crashes in i8042 on boot
      
       - change to gpio_keys to better handle key presses during system state
         transition
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: i8042 - fix crash at boot time
        Input: gpio_keys - handle the missing key press event in resume phase
        Input: xen-kbdfront - add multi-touch support
      0ffff118
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · dcf903d0
      Linus Torvalds 提交于
      Pull crypto fixes from Herbert Xu:
      
       - fix new compiler warnings in cavium
      
       - set post-op IV properly in caam (this fixes chaining)
      
       - fix potential use-after-free in atmel in case of EBUSY
      
       - fix sleeping in softirq path in chcr
      
       - disable buggy sha1-avx2 driver (may overread and page fault)
      
       - fix use-after-free on signals in caam
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: cavium - make several functions static
        crypto: chcr - Avoid algo allocation in softirq.
        crypto: caam - properly set IV after {en,de}crypt
        crypto: atmel - only treat EBUSY as transient if backlog
        crypto: af_alg - Avoid sock_graft call warning
        crypto: caam - fix signals handling
        crypto: sha1-ssse3 - Disable avx2
      dcf903d0
    • L
      Merge tag 'devprop-fix-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 96d0d831
      Linus Torvalds 提交于
      Pull device properties framework fix from Rafael Wysocki:
       "This fixes a problem with bool properties that could be seen as "true"
        when the property was not present at all by adding a special helper
        for bool properties with checks for all of the requisute conditions
        (Sakari Ailus)"
      
      * tag 'devprop-fix-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        device property: Introduce fwnode_call_bool_op() for ops that return bool
      96d0d831
    • L
      Merge tag 'acpi-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 1ef27400
      Linus Torvalds 提交于
      Pull ACPI fixes from Rafael Wysocki:
       "These fix the return value of an IRQ mapping routine in the ACPI core,
        fix an EC driver issue causing abnormal fan behavior after system
        resume on some systems and add quirks for ACPI device objects that
        need to be treated as "always present" to work around bogus
        implementations of the _STA control method.
      
        Specifics:
      
         - Fix the return value of acpi_gsi_to_irq() to make the GSI to IRQ
           mapping work on the Mustang (ARM64) platform (Mark Salter).
      
         - Fix an EC driver issue that causes fans to behave abnormally after
           system resume on some systems which turns out to be related to
           switching over the EC into the polling mode during the noirq stages
           of system suspend and resume (Lv Zheng).
      
         - Add quirks for ACPI device objects that need to be treated as
           "always present", because their _STA methods are designed to work
           around Windows driver bugs and return garbage from our perspective
           (Hans de Goede)"
      
      * tag 'acpi-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / x86: Add KIOX000A accelerometer on GPD win to always_present_ids array
        ACPI / x86: Add Dell Venue 11 Pro 7130 touchscreen to always_present_ids
        ACPI / x86: Allow matching always_present_id array entries by DMI
        Revert "ACPI / EC: Enable event freeze mode..." to fix a regression
        ACPI / EC: Drop EC noirq hooks to fix a regression
        ACPI / irq: Fix return code of acpi_gsi_to_irq()
      1ef27400
    • L
      Merge tag 'pm-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e37720e2
      Linus Torvalds 提交于
      Pull power management fixes from Rafael Wysocki:
       "These fix a recently exposed issue in the PCI device wakeup code and
        one older problem related to PCI device wakeup that has been reported
        recently, modify one more piece of computations in intel_pstate to get
        rid of a rounding error, fix a possible race in the schedutil cpufreq
        governor, fix the device PM QoS sysfs interface to correctly handle
        invalid user input, fix return values of two probe routines in devfreq
        drivers and constify an attribute_group structure in devfreq.
      
        Specifics:
      
         - Avoid clearing the PCI PME Enable bit for devices as a result of
           config space restoration which confuses AML executed afterward and
           causes wakeup events to be lost on some systems (Rafael Wysocki).
      
         - Fix the native PCIe PME interrupts handling in the cases when the
           PME IRQ is set up as a system wakeup one so that runtime PM remote
           wakeup works as expected after system resume on systems where that
           happens (Rafael Wysocki).
      
         - Fix the device PM QoS sysfs interface to handle invalid user input
           correctly instead of using an unititialized variable value as the
           latency tolerance for the device at hand (Dan Carpenter).
      
         - Get rid of one more rounding error from intel_pstate computations
           (Srinivas Pandruvada).
      
         - Fix the schedutil cpufreq governor to prevent it from possibly
           accessing unititialized data structures from governor callbacks in
           some cases on systems when multiple CPUs share a single cpufreq
           policy object (Vikram Mulukutla).
      
         - Fix the return values of probe routines in two devfreq drivers
           (Gustavo Silva).
      
         - Constify an attribute_group structure in devfreq (Arvind Yadav)"
      
      * tag 'pm-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PCI / PM: Fix native PME handling during system suspend/resume
        PCI / PM: Restore PME Enable after config space restoration
        cpufreq: schedutil: Fix sugov_start() versus sugov_update_shared() race
        PM / QoS: return -EINVAL for bogus strings
        cpufreq: intel_pstate: Fix ratio setting for min_perf_pct
        PM / devfreq: constify attribute_group structures.
        PM / devfreq: tegra: fix error return code in tegra_devfreq_probe()
        PM / devfreq: rk3399_dmc: fix error return code in rk3399_dmcfreq_probe()
      e37720e2
    • L
      Merge branch 'akpm' (patches from Andrew) · 867eacd7
      Linus Torvalds 提交于
      Merge even more updates from Andrew Morton:
      
       - a few leftovers
      
       - fault-injector rework
      
       - add a module loader test driver
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        kmod: throttle kmod thread limit
        kmod: add test driver to stress test the module loader
        MAINTAINERS: give kmod some maintainer love
        xtensa: use generic fb.h
        fault-inject: add /proc/<pid>/fail-nth
        fault-inject: simplify access check for fail-nth
        fault-inject: make fail-nth read/write interface symmetric
        fault-inject: parse as natural 1-based value for fail-nth write interface
        fault-inject: automatically detect the number base for fail-nth write interface
        kernel/watchdog.c: use better pr_fmt prefix
        MAINTAINERS: move the befs tree to kernel.org
        lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int
        mm: fix overflow check in expand_upwards()
      867eacd7
    • D
      replace incorrect strscpy use in FORTIFY_SOURCE · 077d2ba5
      Daniel Micay 提交于
      Using strscpy was wrong because FORTIFY_SOURCE is passing the maximum
      possible size of the outermost object, but strscpy defines the count
      parameter as the exact buffer size, so this could copy past the end of
      the source.  This would still be wrong with the planned usage of
      __builtin_object_size(p, 1) for intra-object overflow checks since it's
      the maximum possible size of the specified object with no guarantee of
      it being that large.
      
      Reuse of the fortified functions like this currently makes the runtime
      error reporting less precise but that can be improved later on.
      
      Noticed by Dave Jones and KASAN.
      Signed-off-by: NDaniel Micay <danielmicay@gmail.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      077d2ba5
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile · 01ea9177
      Linus Torvalds 提交于
      Pull arch/tile updates from Chris Metcalf:
       "This adds support for an <arch/intreg.h> to help with removing
        __need_xxx #defines from glibc, and removes some dead code in
        arch/tile/mm/init.c"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
        mm, tile: drop arch_{add,remove}_memory
        tile: prefer <arch/intreg.h> to __need_int_reg_t
      01ea9177
    • L
      Merge tag 'powerpc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · deed9deb
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "Nothing that really stands out, just a bunch of fixes that have come
        in in the last couple of weeks.
      
        None of these are actually fixes for code that is new in 4.13. It's
        roughly half older bugs, with fixes going to stable, and half
        fixes/updates for Power9.
      
        Thanks to: Aneesh Kumar K.V, Anton Blanchard, Balbir Singh, Benjamin
        Herrenschmidt, Madhavan Srinivasan, Michael Neuling, Nicholas Piggin,
        Oliver O'Halloran"
      
      * tag 'powerpc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64: Fix atomic64_inc_not_zero() to return an int
        powerpc: Fix emulation of mfocrf in emulate_step()
        powerpc: Fix emulation of mcrf in emulate_step()
        powerpc/perf: Add POWER9 alternate PM_RUN_CYC and PM_RUN_INST_CMPL events
        powerpc/perf: Fix SDAR_MODE value for continous sampling on Power9
        powerpc/asm: Mark cr0 as clobbered in mftb()
        powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9
        powerpc/mm/radix: Synchronize updates to the process table
        powerpc/mm/radix: Properly clear process table entry
        powerpc/powernv: Tell OPAL about our MMU mode on POWER9
        powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR
      deed9deb
    • L
      kmod: throttle kmod thread limit · 6d7964a7
      Luis R. Rodriguez 提交于
      If we reach the limit of modprobe_limit threads running the next
      request_module() call will fail.  The original reason for adding a kill
      was to do away with possible issues with in old circumstances which would
      create a recursive series of request_module() calls.
      
      We can do better than just be super aggressive and reject calls once we've
      reached the limit by simply making pending callers wait until the
      threshold has been reduced, and then throttling them in, one by one.
      
      This throttling enables requests over the kmod concurrent limit to be
      processed once a pending request completes.  Only the first item queued up
      to wait is woken up.  The assumption here is once a task is woken it will
      have no other option to also kick the queue to check if there are more
      pending tasks -- regardless of whether or not it was successful.
      
      By throttling and processing only max kmod concurrent tasks we ensure we
      avoid unexpected fatal request_module() calls, and we keep memory
      consumption on module loading to a minimum.
      
      With x86_64 qemu, with 4 cores, 4 GiB of RAM it takes the following run
      time to run both tests:
      
      time ./kmod.sh -t 0008
      real    0m16.366s
      user    0m0.883s
      sys     0m8.916s
      
      time ./kmod.sh -t 0009
      real    0m50.803s
      user    0m0.791s
      sys     0m9.852s
      
      Link: http://lkml.kernel.org/r/20170628223155.26472-4-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d7964a7
    • L
      kmod: add test driver to stress test the module loader · d9c6a72d
      Luis R. Rodriguez 提交于
      This adds a new stress test driver for kmod: the kernel module loader.
      The new stress test driver, test_kmod, is only enabled as a module right
      now.  It should be possible to load this as built-in and load tests
      early (refer to the force_init_test module parameter), however since a
      lot of test can get a system out of memory fast we leave this disabled
      for now.
      
      Using a system with 1024 MiB of RAM can *easily* get your kernel OOM
      fast with this test driver.
      
      The test_kmod driver exposes API knobs for us to fine tune simple
      request_module() and get_fs_type() calls.  Since these API calls only
      allow each one parameter a test driver for these is rather simple.
      Other factors that can help out test driver though are the number of
      calls we issue and knowing current limitations of each.  This exposes
      configuration as much as possible through userspace to be able to build
      tests directly from userspace.
      
      Since it allows multiple misc devices its will eventually (once we add a
      knob to let us create new devices at will) also be possible to perform
      more tests in parallel, provided you have enough memory.
      
      We only enable tests we know work as of right now.
      
      Demo screenshots:
      
       # tools/testing/selftests/kmod/kmod.sh
      kmod_test_0001_driver: OK! - loading kmod test
      kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
      kmod_test_0001_fs: OK! - loading kmod test
      kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
      kmod_test_0002_driver: OK! - loading kmod test
      kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
      kmod_test_0002_fs: OK! - loading kmod test
      kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
      kmod_test_0003: OK! - loading kmod test
      kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0004: OK! - loading kmod test
      kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0005: OK! - loading kmod test
      kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0006: OK! - loading kmod test
      kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0005: OK! - loading kmod test
      kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0006: OK! - loading kmod test
      kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      XXX: add test restult for 0007
      Test completed
      
      You can also request for specific tests:
      
       # tools/testing/selftests/kmod/kmod.sh -t 0001
      kmod_test_0001_driver: OK! - loading kmod test
      kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
      kmod_test_0001_fs: OK! - loading kmod test
      kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
      Test completed
      
      Lastly, the current available number of tests:
      
       # tools/testing/selftests/kmod/kmod.sh --help
      Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
      Valid tests: 0001-0009
      
      0001 - Simple test - 1 thread  for empty string
      0002 - Simple test - 1 thread  for modules/filesystems that do not exist
      0003 - Simple test - 1 thread  for get_fs_type() only
      0004 - Simple test - 2 threads for get_fs_type() only
      0005 - multithreaded tests with default setup - request_module() only
      0006 - multithreaded tests with default setup - get_fs_type() only
      0007 - multithreaded tests with default setup test request_module() and get_fs_type()
      0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
      0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
      
      The following test cases currently fail, as such they are not currently
      enabled by default:
      
       # tools/testing/selftests/kmod/kmod.sh -t 0008
       # tools/testing/selftests/kmod/kmod.sh -t 0009
      
      To be sure to run them as intended please unload both of the modules:
      
        o test_module
        o xfs
      
      And ensure they are not loaded on your system prior to testing them.  If
      you use these paritions for your rootfs you can change the default test
      driver used for get_fs_type() by exporting it into your environment.  For
      example of other test defaults you can override refer to kmod.sh
      allow_user_defaults().
      
      Behind the scenes this is how we fine tune at a test case prior to
      hitting a trigger to run it:
      
      cat /sys/devices/virtual/misc/test_kmod0/config
      echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
      echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
      echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
      cat /sys/devices/virtual/misc/test_kmod0/config
      echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
      
      Finally to trigger:
      
      echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
      
      The kmod.sh script uses the above constructs to build different test cases.
      
      A bit of interpretation of the current failures follows, first two
      premises:
      
      a) When request_module() is used userspace figures out an optimized
         version of module order for us.  Once it finds the modules it needs, as
         per depmod symbol dep map, it will finit_module() the respective
         modules which are needed for the original request_module() request.
      
      b) We have an optimization in place whereby if a kernel uses
         request_module() on a module already loaded we never bother userspace
         as the module already is loaded.  This is all handled by kernel/kmod.c.
      
      A few things to consider to help identify root causes of issues:
      
      0) kmod 19 has a broken heuristic for modules being assumed to be
         built-in to your kernel and will return 0 even though request_module()
         failed.  Upgrade to a newer version of kmod.
      
      1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs",
         not for "xfs".  The optimization in kernel described in b) fails to
         catch if we have a lot of consecutive get_fs_type() calls.  The reason
         is the optimization in place does not look for aliases.  This means two
         consecutive get_fs_type() calls will bump kmod_concurrent, whereas
         request_module() will not.
      
      This one explanation why test case 0009 fails at least once for
      get_fs_type().
      
      2) If a module fails to load --- for whatever reason (kmod_concurrent
         limit reached, file not yet present due to rootfs switch, out of
         memory) we have a period of time during which module request for the
         same name either with request_module() or get_fs_type() will *also*
         fail to load even if the file for the module is ready.
      
      This explains why *multiple* NULLs are possible on test 0009.
      
      3) finit_module() consumes quite a bit of memory.
      
      4) Filesystems typically also have more dependent modules than other
         modules, its important to note though that even though a get_fs_type()
         call does not incur additional kmod_concurrent bumps, since userspace
         loads dependencies it finds it needs via finit_module_fd(), it *will*
         take much more memory to load a module with a lot of dependencies.
      
      Because of 3) and 4) we will easily run into out of memory failures with
      certain tests.  For instance test 0006 fails on qemu with 1024 MiB of RAM.
      It panics a box after reaping all userspace processes and still not
      having enough memory to reap.
      
      [arnd@arndb.de: add dependencies for test module]
        Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de
      Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9c6a72d
    • L
      MAINTAINERS: give kmod some maintainer love · 062b8740
      Luis R. Rodriguez 提交于
      As suggested by Jessica, I've been actively working on kmod, so might as
      well reflect its maintained status.
      
      Changes are expected to go through akpm's tree.
      
      Link: http://lkml.kernel.org/r/20170628223155.26472-2-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Suggested-by: NJessica Yu <jeyu@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      062b8740
    • T
      xtensa: use generic fb.h · 20cf0c54
      Tobias Klauser 提交于
      The arch uses a verbatim copy of the asm-generic version and does not
      add any own implementations to the header, so use asm-generic/fb.h
      instead of duplicating code.
      
      Link: http://lkml.kernel.org/r/20170517083545.2115-1-tklauser@distanz.chSigned-off-by: NTobias Klauser <tklauser@distanz.ch>
      Acked-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20cf0c54