1. 16 1月, 2018 4 次提交
  2. 13 1月, 2018 7 次提交
  3. 05 12月, 2017 1 次提交
    • H
      bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type · c895f6f7
      Hendrik Brueckner 提交于
      Commit 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT
      program type") introduced the bpf_perf_event_data structure which
      exports the pt_regs structure.  This is OK for multiple architectures
      but fail for s390 and arm64 which do not export pt_regs.  Programs
      using them, for example, the bpf selftest fail to compile on these
      architectures.
      
      For s390, exporting the pt_regs is not an option because s390 wants
      to allow changes to it.  For arm64, there is a user_pt_regs structure
      that covers parts of the pt_regs structure for use by user space.
      
      To solve the broken uapi for s390 and arm64, introduce an abstract
      type for pt_regs and add an asm/bpf_perf_event.h file that concretes
      the type.  An asm-generic header file covers the architectures that
      export pt_regs today.
      
      The arch-specific enablement for s390 and arm64 follows in separate
      commits.
      Reported-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Fixes: 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type")
      Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Reviewed-and-tested-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c895f6f7
  4. 03 11月, 2017 2 次提交
    • J
      mm: Define MAP_SYNC and VM_SYNC flags · b6fb293f
      Jan Kara 提交于
      Define new MAP_SYNC flag and corresponding VMA VM_SYNC flag. As the
      MAP_SYNC flag is not part of LEGACY_MAP_MASK, currently it will be
      refused by all MAP_SHARED_VALIDATE map attempts and silently ignored for
      everything else.
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b6fb293f
    • D
      mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags · 1c972597
      Dan Williams 提交于
      The mmap(2) syscall suffers from the ABI anti-pattern of not validating
      unknown flags. However, proposals like MAP_SYNC need a mechanism to
      define new behavior that is known to fail on older kernels without the
      support. Define a new MAP_SHARED_VALIDATE flag pattern that is
      guaranteed to fail on all legacy mmap implementations.
      
      It is worth noting that the original proposal was for a standalone
      MAP_VALIDATE flag. However, when that  could not be supported by all
      archs Linus observed:
      
          I see why you *think* you want a bitmap. You think you want
          a bitmap because you want to make MAP_VALIDATE be part of MAP_SYNC
          etc, so that people can do
      
          ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED
      		    | MAP_SYNC, fd, 0);
      
          and "know" that MAP_SYNC actually takes.
      
          And I'm saying that whole wish is bogus. You're fundamentally
          depending on special semantics, just make it explicit. It's already
          not portable, so don't try to make it so.
      
          Rename that MAP_VALIDATE as MAP_SHARED_VALIDATE, make it have a value
          of 0x3, and make people do
      
          ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED_VALIDATE
      		    | MAP_SYNC, fd, 0);
      
          and then the kernel side is easier too (none of that random garbage
          playing games with looking at the "MAP_VALIDATE bit", but just another
          case statement in that map type thing.
      
          Boom. Done.
      
      Similar to ->fallocate() we also want the ability to validate the
      support for new flags on a per ->mmap() 'struct file_operations'
      instance basis.  Towards that end arrange for flags to be generically
      validated against a mmap_supported_flags exported by 'struct
      file_operations'. By default all existing flags are implicitly
      supported, but new flags require MAP_SHARED_VALIDATE and
      per-instance-opt-in.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1c972597
  5. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX license identifier to uapi header files with no license · 6f52b16c
      Greg Kroah-Hartman 提交于
      Many user space API headers are missing licensing information, which
      makes it hard for compliance tools to determine the correct license.
      
      By default are files without license information under the default
      license of the kernel, which is GPLV2.  Marking them GPLV2 would exclude
      them from being included in non GPLV2 code, which is obviously not
      intended. The user space API headers fall under the syscall exception
      which is in the kernels COPYING file:
      
         NOTE! This copyright does *not* cover user programs that use kernel
         services by normal system calls - this is merely considered normal use
         of the kernel, and does *not* fall under the heading of "derived work".
      
      otherwise syscall usage would not be possible.
      
      Update the files which contain no license information with an SPDX
      license identifier.  The chosen identifier is 'GPL-2.0 WITH
      Linux-syscall-note' which is the officially assigned identifier for the
      Linux syscall exception.  SPDX license identifiers are a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.  See the previous patch in this series for the
      methodology of how this patch was researched.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f52b16c
  6. 07 9月, 2017 3 次提交
    • R
      mm,fork: introduce MADV_WIPEONFORK · d2cd9ede
      Rik van Riel 提交于
      Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
      in the child process after fork.  This differs from MADV_DONTFORK in one
      important way.
      
      If a child process accesses memory that was MADV_WIPEONFORK, it will get
      zeroes.  The address ranges are still valid, they are just empty.
      
      If a child process accesses memory that was MADV_DONTFORK, it will get a
      segmentation fault, since those address ranges are no longer valid in
      the child after fork.
      
      Since MADV_DONTFORK also seems to be used to allow very large programs
      to fork in systems with strict memory overcommit restrictions, changing
      the semantics of MADV_DONTFORK might break existing programs.
      
      MADV_WIPEONFORK only works on private, anonymous VMAs.
      
      The use case is libraries that store or cache information, and want to
      know that they need to regenerate it in the child process after fork.
      
      Examples of this would be:
       - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
         check, which is too slow without a PID cache)
       - PKCS#11 API reinitialization check (mandated by specification)
       - glibc's upcoming PRNG (reseed after fork)
       - OpenSSL PRNG (reseed after fork)
      
      The security benefits of a forking server having a re-inialized PRNG in
      every child process are pretty obvious.  However, due to libraries
      having all kinds of internal state, and programs getting compiled with
      many different versions of each library, it is unreasonable to expect
      calling programs to re-initialize everything manually after fork.
      
      A further complication is the proliferation of clone flags, programs
      bypassing glibc's functions to call clone directly, and programs calling
      unshare, causing the glibc pthread_atfork hook to not get called.
      
      It would be better to have the kernel take care of this automatically.
      
      The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
      MADV_WIPEONFORK.
      
      This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
      
          https://man.openbsd.org/minherit.2
      
      [akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
      Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.comSigned-off-by: NRik van Riel <riel@redhat.com>
      Reported-by: NFlorian Weimer <fweimer@redhat.com>
      Reported-by: NColm MacCártaigh <colm@allcosts.net>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Cc: <linux-api@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2cd9ede
    • M
      mm: arch: consolidate mmap hugetlb size encodings · aafd4562
      Mike Kravetz 提交于
      A non-default huge page size can be encoded in the flags argument of the
      mmap system call.  The definitions for these encodings are in arch
      specific header files.  However, all architectures use the same values.
      
      Consolidate all the definitions in the primary user header file
      (uapi/linux/mman.h).  Include definitions for all known huge page sizes.
      Use the generic encoding definitions in hugetlb_encode.h as the basis
      for these definitions.
      
      Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aafd4562
    • M
      mm: hugetlb: define system call hugetlb size encodings in single file · e652f694
      Mike Kravetz 提交于
      Patch series "Consolidate system call hugetlb page size encodings".
      
      These patches are the result of discussions in
      https://lkml.org/lkml/2017/3/8/548.  The following changes are made in the
      patch set:
      
      1) Put all the log2 encoded huge page size definitions in a common
         header file.  The idea is have a set of definitions that can be use as
         the basis for system call specific definitions such as MAP_HUGE_* and
         SHM_HUGE_*.
      
      2) Remove MAP_HUGE_* definitions in arch specific files.  All these
         definitions are the same.  Consolidate all definitions in the primary
         user header file (uapi/linux/mman.h).
      
      3) Remove SHM_HUGE_* definitions intended for user space from kernel
         header file, and add to user (uapi/linux/shm.h) header file.  Add
         definitions for all known huge page size encodings as in mmap.
      
      This patch (of 3):
      
      If hugetlb pages are requested in mmap or shmget system calls, a huge
      page size other than default can be requested.  This is accomplished by
      encoding the log2 of the huge page size in the upper bits of the flag
      argument.  asm-generic and arch specific headers all define the same
      values for these encodings.
      
      Put common definitions in a single header file.  The primary uapi header
      files for mmap and shm will use these definitions as a basis for
      definitions specific to those system calls.
      
      Link: http://lkml.kernel.org/r/1501527386-10736-2-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e652f694
  7. 04 8月, 2017 1 次提交
  8. 25 7月, 2017 2 次提交
    • E
      signal: Remove kernel interal si_code magic · cc731525
      Eric W. Biederman 提交于
      struct siginfo is a union and the kernel since 2.4 has been hiding a union
      tag in the high 16bits of si_code using the values:
      __SI_KILL
      __SI_TIMER
      __SI_POLL
      __SI_FAULT
      __SI_CHLD
      __SI_RT
      __SI_MESGQ
      __SI_SYS
      
      While this looks plausible on the surface, in practice this situation has
      not worked well.
      
      - Injected positive signals are not copied to user space properly
        unless they have these magic high bits set.
      
      - Injected positive signals are not reported properly by signalfd
        unless they have these magic high bits set.
      
      - These kernel internal values leaked to userspace via ptrace_peek_siginfo
      
      - It was possible to inject these kernel internal values and cause the
        the kernel to misbehave.
      
      - Kernel developers got confused and expected these kernel internal values
        in userspace in kernel self tests.
      
      - Kernel developers got confused and set si_code to __SI_FAULT which
        is SI_USER in userspace which causes userspace to think an ordinary user
        sent the signal and that it was not kernel generated.
      
      - The values make it impossible to reorganize the code to transform
        siginfo_copy_to_user into a plain copy_to_user.  As si_code must
        be massaged before being passed to userspace.
      
      So remove these kernel internal si codes and make the kernel code simpler
      and more maintainable.
      
      To replace these kernel internal magic si_codes introduce the helper
      function siginfo_layout, that takes a signal number and an si_code and
      computes which union member of siginfo is being used.  Have
      siginfo_layout return an enumeration so that gcc will have enough
      information to warn if a switch statement does not handle all of union
      members.
      
      A couple of architectures have a messed up ABI that defines signal
      specific duplications of SI_USER which causes more special cases in
      siginfo_layout than I would like.  The good news is only problem
      architectures pay the cost.
      
      Update all of the code that used the previous magic __SI_ values to
      use the new SIL_ values and to call siginfo_layout to get those
      values.  Escept where not all of the cases are handled remove the
      defaults in the switch statements so that if a new case is missed in
      the future the lack will show up at compile time.
      
      Modify the code that copies siginfo si_code to userspace to just copy
      the value and not cast si_code to a short first.  The high bits are no
      longer used to hold a magic union member.
      
      Fixup the siginfo header files to stop including the __SI_ values in
      their constants and for the headers that were missing it to properly
      update the number of si_codes for each signal type.
      
      The fixes to copy_siginfo_from_user32 implementations has the
      interesting property that several of them perviously should never have
      worked as the __SI_ values they depended up where kernel internal.
      With that dependency gone those implementations should work much
      better.
      
      The idea of not passing the __SI_ values out to userspace and then
      not reinserting them has been tested with criu and criu worked without
      changes.
      
      Ref: 2.4.0-test1
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      cc731525
    • E
      fcntl: Don't use ambiguous SIG_POLL si_codes · d08477aa
      Eric W. Biederman 提交于
      We have a weird and problematic intersection of features that when
      they all come together result in ambiguous siginfo values, that
      we can not support properly.
      
      - Supporting fcntl(F_SETSIG,...) with arbitrary valid signals.
      
      - Using positive values for POLL_IN, POLL_OUT, POLL_MSG, ..., etc
        that imply they are signal specific si_codes and using the
        aforementioned arbitrary signal to deliver them.
      
      - Supporting injection of arbitrary siginfo values for debugging and
        checkpoint/restore.
      
      The result is that just looking at siginfo si_codes of 1 to 6 are
      ambigious.  It could either be a signal specific si_code or it could
      be a generic si_code.
      
      For most of the kernel this is a non-issue but for sending signals
      with siginfo it is impossible to play back the kernel signals and
      get the same result.
      
      Strictly speaking when the si_code was changed from SI_SIGIO to
      POLL_IN and friends between 2.2 and 2.4 this functionality was not
      ambiguous, as only real time signals were supported.  Before 2.4 was
      released the kernel began supporting siginfo with non realtime signals
      so they could give details of why the signal was sent.
      
      The result is that if F_SETSIG is set to one of the signals with signal
      specific si_codes then user space can not know why the signal was sent.
      
      I grepped through a bunch of userspace programs using debian code
      search to get a feel for how often people choose a signal that results
      in an ambiguous si_code.  I only found one program doing so and it was
      using SIGCHLD to test the F_SETSIG functionality, and did not appear
      to be a real world usage.
      
      Therefore the ambiguity does not appears to be a real world problem in
      practice.  Remove the ambiguity while introducing the smallest chance
      of breakage by changing the si_code to SI_SIGIO when signals with
      signal specific si_codes are targeted.
      
      Fixes: v2.3.40 -- Added support for queueing non-rt signals
      Fixes: v2.3.21 -- Changed the si_code from SI_SIGIO
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      d08477aa
  9. 17 7月, 2017 1 次提交
  10. 21 6月, 2017 1 次提交
    • D
      net: introduce SO_PEERGROUPS getsockopt · 28b5ba2a
      David Herrmann 提交于
      This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
      retrieve the auxiliary groups of the remote peer. It is designed to
      naturally extend SO_PEERCRED. That is, the underlying data is from the
      same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
      is, if the provided buffer is too small, ERANGE is returned and @optlen
      is updated. Otherwise, the information is copied, @optlen is set to the
      actual size, and 0 is returned.
      
      While SO_PEERCRED (and thus `struct ucred') already returns the primary
      group, it lacks the auxiliary group vector. However, nearly all access
      controls (including kernel side VFS and SYSVIPC, but also user-space
      polkit, DBus, ...) consider the entire set of groups, rather than just
      the primary group. But this is currently not possible with pure
      SO_PEERCRED. Instead, user-space has to work around this and query the
      system database for the auxiliary groups of a UID retrieved via
      SO_PEERCRED.
      
      Unfortunately, there is no race-free way to query the auxiliary groups
      of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
      solution is to use getgrouplist(3p), which itself falls back to NSS and
      whatever is configured in nsswitch.conf(3). This effectively checks
      which groups we *would* assign to the user if it logged in *now*. On
      normal systems it is as easy as reading /etc/group, but with NSS it can
      resort to quering network databases (eg., LDAP), using IPC or network
      communication.
      
      Long story short: Whenever we want to use auxiliary groups for access
      checks on IPC, we need further IPC to talk to the user/group databases,
      rather than just relying on SO_PEERCRED and the incoming socket. This
      is unfortunate, and might even result in dead-locks if the database
      query uses the same IPC as the original request.
      
      So far, those recursions / dead-locks have been avoided by using
      primitive IPC for all crucial NSS modules. However, we want to avoid
      re-inventing the wheel for each NSS module that might be involved in
      user/group queries. Hence, we would preferably make DBus (and other IPC
      that supports access-management based on groups) work without resorting
      to the user/group database. This new SO_PEERGROUPS ioctl would allow us
      to make dbus-daemon work without ever calling into NSS.
      
      Cc: Michal Sekletar <msekleta@redhat.com>
      Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
      Reviewed-by: NTom Gundersen <teg@jklm.no>
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28b5ba2a
  11. 09 6月, 2017 1 次提交
    • A
      tty: add TIOCGPTPEER ioctl · 54ebbfb1
      Aleksa Sarai 提交于
      When opening the slave end of a PTY, it is not possible for userspace to
      safely ensure that /dev/pts/$num is actually a slave (in cases where the
      mount namespace in which devpts was mounted is controlled by an
      untrusted process). In addition, there are several unresolvable
      race conditions if userspace were to attempt to detect attacks through
      stat(2) and other similar methods [in addition it is not clear how
      userspace could detect attacks involving FUSE].
      
      Resolve this by providing an interface for userpace to safely open the
      "peer" end of a PTY file descriptor by using the dentry cached by
      devpts. Since it is not possible to have an open master PTY without
      having its slave exposed in /dev/pts this interface is safe. This
      interface currently does not provide a way to get the master pty (since
      it is not clear whether such an interface is safe or even useful).
      
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Valentin Rothberg <vrothberg@suse.com>
      Signed-off-by: NAleksa Sarai <asarai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54ebbfb1
  12. 04 6月, 2017 1 次提交
  13. 22 5月, 2017 1 次提交
    • M
      net: add new control message for incoming HW-timestamped packets · aad9c8c4
      Miroslav Lichvar 提交于
      Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
      for incoming packets with hardware timestamps. It contains the index of
      the real interface which received the packet and the length of the
      packet at layer 2.
      
      The index is useful with bonding, bridges and other interfaces, where
      IP_PKTINFO doesn't allow applications to determine which PHC made the
      timestamp. With the L2 length (and link speed) it is possible to
      transpose preamble timestamps to trailer timestamps, which are used in
      the NTP protocol.
      
      While this information could be provided by two new socket options
      independently from timestamping, it doesn't look like they would be very
      useful. With this option any performance impact is limited to hardware
      timestamping.
      
      Use dev_get_by_napi_id() to get the device and its index. On kernels
      with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero
      index will be returned in the control message.
      
      CC: Richard Cochran <richardcochran@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aad9c8c4
  14. 10 5月, 2017 1 次提交
    • N
      uapi: export all headers under uapi directories · fcc8487d
      Nicolas Dichtel 提交于
      Regularly, when a new header is created in include/uapi/, the developer
      forgets to add it in the corresponding Kbuild file. This error is usually
      detected after the release is out.
      
      In fact, all headers under uapi directories should be exported, thus it's
      useless to have an exhaustive list.
      
      After this patch, the following files, which were not exported, are now
      exported (with make headers_install_all):
      asm-arc/kvm_para.h
      asm-arc/ucontext.h
      asm-blackfin/shmparam.h
      asm-blackfin/ucontext.h
      asm-c6x/shmparam.h
      asm-c6x/ucontext.h
      asm-cris/kvm_para.h
      asm-h8300/shmparam.h
      asm-h8300/ucontext.h
      asm-hexagon/shmparam.h
      asm-m32r/kvm_para.h
      asm-m68k/kvm_para.h
      asm-m68k/shmparam.h
      asm-metag/kvm_para.h
      asm-metag/shmparam.h
      asm-metag/ucontext.h
      asm-mips/hwcap.h
      asm-mips/reg.h
      asm-mips/ucontext.h
      asm-nios2/kvm_para.h
      asm-nios2/ucontext.h
      asm-openrisc/shmparam.h
      asm-parisc/kvm_para.h
      asm-powerpc/perf_regs.h
      asm-sh/kvm_para.h
      asm-sh/ucontext.h
      asm-tile/shmparam.h
      asm-unicore32/shmparam.h
      asm-unicore32/ucontext.h
      asm-x86/hwcap2.h
      asm-xtensa/kvm_para.h
      drm/armada_drm.h
      drm/etnaviv_drm.h
      drm/vgem_drm.h
      linux/aspeed-lpc-ctrl.h
      linux/auto_dev-ioctl.h
      linux/bcache.h
      linux/btrfs_tree.h
      linux/can/vxcan.h
      linux/cifs/cifs_mount.h
      linux/coresight-stm.h
      linux/cryptouser.h
      linux/fsmap.h
      linux/genwqe/genwqe_card.h
      linux/hash_info.h
      linux/kcm.h
      linux/kcov.h
      linux/kfd_ioctl.h
      linux/lightnvm.h
      linux/module.h
      linux/nbd-netlink.h
      linux/nilfs2_api.h
      linux/nilfs2_ondisk.h
      linux/nsfs.h
      linux/pr.h
      linux/qrtr.h
      linux/rpmsg.h
      linux/sched/types.h
      linux/sed-opal.h
      linux/smc.h
      linux/smc_diag.h
      linux/stm.h
      linux/switchtec_ioctl.h
      linux/vfio_ccw.h
      linux/wil6210_uapi.h
      rdma/bnxt_re-abi.h
      
      Note that I have removed from this list the files which are generated in every
      exported directories (like .install or .install.cmd).
      
      Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
      subdirs with a pure makefile command.
      
      For the record, note that exported files for asm directories are a mix of
      files listed by:
       - include/uapi/asm-generic/Kbuild.asm;
       - arch/<arch>/include/uapi/asm/Kbuild;
       - arch/<arch>/include/asm/Kbuild.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: NMark Salter <msalter@redhat.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      fcc8487d
  15. 18 4月, 2017 1 次提交
    • A
      Remove compat_sys_getdents64() · 2611dc19
      Al Viro 提交于
      Unlike normal compat syscall variants, it is needed only for
      biarch architectures that have different alignement requirements for
      u64 in 32bit and 64bit ABI *and* have __put_user() that won't handle
      a store of 64bit value at 32bit-aligned address.  We used to have one
      such (ia64), but its biarch support has been gone since 2010 (after
      being broken in 2008, which went unnoticed since nobody had been using
      it).
      
      It had escaped removal at the same time only because back in 2004
      a patch that switched several syscalls on amd64 from private wrappers to
      generic compat ones had switched to use of compat_sys_getdents64(), which
      hadn't needed (or used) a compat wrapper on amd64.
      
      Let's bury it - it's at least 7 years overdue.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2611dc19
  16. 08 4月, 2017 1 次提交
  17. 25 3月, 2017 1 次提交
  18. 23 3月, 2017 1 次提交
  19. 20 3月, 2017 1 次提交
    • S
      generic syscalls: Wire up statx syscall · fdfe4a39
      Stafford Horne 提交于
      The new syscall statx is implemented as generic code, so enable it
      for architectures like openrisc which use the generic syscall table.
      
      Fixes: a528d35e ("statx: Add a system call to make enhanced file info available")
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: NStafford Horne <shorne@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      fdfe4a39
  20. 23 2月, 2017 1 次提交
    • A
      userfaultfd: document _IOR/_IOW · e067eba5
      Andrea Arcangeli 提交于
      Patch series "userfaultfd tmpfs/hugetlbfs/non-cooperative", v2
      
      These userfaultfd features are finished and are ready for larger
      exposure in -mm and upstream merging.
      
      1) tmpfs non present userfault
      2) hugetlbfs non present userfault
      3) non cooperative userfault for fork/madvise/mremap
      
      qemu development code is already exercising 2) and container postcopy
      live migration needs 3).
      
      1) is not currently used but there's a self test and we know some qemu
      user for various reasons uses tmpfs as backing for KVM so it'll need it
      too to use postcopy live migration with tmpfs memory.
      
      All review feedback from the previous submit has been handled and the
      fixes are included.  There's no outstanding issue AFIK.
      
      Upstream code just did a s/fe/vmf/ conversion in the page faults and
      this has been converted as well incrementally.
      
      In addition to the previous submits, this also wakes up stuck userfaults
      during UFFDIO_UNREGISTER.  The non cooperative testcase actually
      reproduced this problem by getting stuck instead of quitting clean in
      some rare case as it could call UFFDIO_UNREGISTER while some userfault
      could be still in flight.  The other option would have been to keep
      leaving it up to userland to serialize itself and to patch the testcase
      instead but the wakeup during unregister I think is preferable.
      
      David also asked the UFFD_FEATURE_MISSING_HUGETLBFS and
      UFFD_FEATURE_MISSING_SHMEM feature flags to be added so QEMU can avoid
      to probe if the hugetlbfs/shmem missing support is available by calling
      UFFDIO_REGISTER.  QEMU already checks HUGETLBFS_MAGIC with fstatfs so if
      UFFD_FEATURE_MISSING_HUGETLBFS is also set, it knows UFFDIO_REGISTER
      will succeed (or if it fails, it's for some other more concerning
      reason).  There's no reason to worry about adding too many feature
      flags.  There are 64 available and worst case we've to bump the API if
      someday we're really going to run out of them.
      
      The round-trip network latency of hugetlbfs userfaults during postcopy
      live migration is still of the order of dozen milliseconds on 10GBit if
      at 2MB hugepage granularity so it's working perfectly and it should
      provide for higher bandwidth or lower CPU usage (which makes it
      interesting to add an option in the future to support THP granularity
      too for anonymous memory, UFFDIO_COPY would then have to create THP if
      alignment/len allows for it).  1GB hugetlbfs granularity will require
      big changes in hugetlbfs to work so it's deferred for later.
      
      This patch (of 42):
      
      This adds proper documentation (inline) to avoid the risk of further
      misunderstandings about the semantics of _IOW/_IOR and it also reminds
      whoever will bump the UFFDIO_API in the future, to change the two ioctl
      to _IOW.
      
      This was found while implementing strace support for those ioctl,
      otherwise we could have never found it by just reviewing kernel code and
      testing it.
      
      _IOC_READ or _IOC_WRITE alters nothing but the ioctl number itself, so
      it's only worth fixing if the UFFDIO_API is bumped someday.
      
      Link: http://lkml.kernel.org/r/20161216144821.5183-2-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: N"Dmitry V. Levin" <ldv@altlinux.org>
      Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e067eba5
  21. 30 11月, 2016 1 次提交
    • F
      tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING · 1c885808
      Francis Yan 提交于
      This patch exports the sender chronograph stats via the socket
      SO_TIMESTAMPING channel. Currently we can instrument how long a
      particular application unit of data was queued in TCP by tracking
      SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
      these sender chronograph stats exported simultaneously along with
      these timestamps allow further breaking down the various sender
      limitation.  For example, a video server can tell if a particular
      chunk of video on a connection takes a long time to deliver because
      TCP was experiencing small receive window. It is not possible to
      tell before this patch without packet traces.
      
      To prepare these stats, the user needs to set
      SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
      while requesting other SOF_TIMESTAMPING TX timestamps. When the
      timestamps are available in the error queue, the stats are returned
      in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
      in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
      TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.
      Signed-off-by: NFrancis Yan <francisyyan@gmail.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c885808
  22. 18 10月, 2016 1 次提交
    • D
      generic syscalls: kill cruft from removed pkey syscalls · 71757904
      Dave Hansen 提交于
      pkey_set() and pkey_get() were syscalls present in older versions
      of the protection keys patches.  They were fully excised from the
      x86 code, but some cruft was left in the generic syscall code.  The
      C++ comments were intended to help to make it more glaring to me to
      fix them before actually submitting them.  That technique worked,
      but later than I would have liked.
      
      I test-compiled this for arm64.
      
      Fixes: a60f7b69 ("generic syscalls: Wire up memory protection keys syscalls")
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: mgorman@techsingularity.net
      Cc: linux-api@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: luto@kernel.org
      Cc: akpm@linux-foundation.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71757904
  23. 09 9月, 2016 2 次提交
    • D
      generic syscalls: Wire up memory protection keys syscalls · a60f7b69
      Dave Hansen 提交于
      These new syscalls are implemented as generic code, so enable them for
      architectures like arm64 which use the generic syscall table.
      
      According to Arnd:
      
        Even if the support is x86 specific for the forseeable future, it may be
        good to reserve the number just in case.  The other architecture specific
        syscall lists are usually left to the individual arch maintainers, most a
        lot of the newer architectures share this table.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Dave Hansen <dave@sr71.net>
      Cc: mgorman@techsingularity.net
      Cc: linux-api@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: luto@kernel.org
      Cc: akpm@linux-foundation.org
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/20160729163018.505A6875@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a60f7b69
    • D
      x86/pkeys: Allocation/free syscalls · e8c24d3a
      Dave Hansen 提交于
      This patch adds two new system calls:
      
      	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
      	int pkey_free(int pkey);
      
      These implement an "allocator" for the protection keys
      themselves, which can be thought of as analogous to the allocator
      that the kernel has for file descriptors.  The kernel tracks
      which numbers are in use, and only allows operations on keys that
      are valid.  A key which was not obtained by pkey_alloc() may not,
      for instance, be passed to pkey_mprotect().
      
      These system calls are also very important given the kernel's use
      of pkeys to implement execute-only support.  These help ensure
      that userspace can never assume that it has control of a key
      unless it first asks the kernel.  The kernel does not promise to
      preserve PKRU (right register) contents except for allocated
      pkeys.
      
      The 'init_access_rights' argument to pkey_alloc() specifies the
      rights that will be established for the returned pkey.  For
      instance:
      
      	pkey = pkey_alloc(flags, PKEY_DENY_WRITE);
      
      will allocate 'pkey', but also sets the bits in PKRU[1] such that
      writing to 'pkey' is already denied.
      
      The kernel does not prevent pkey_free() from successfully freeing
      in-use pkeys (those still assigned to a memory range by
      pkey_mprotect()).  It would be expensive to implement the checks
      for this, so we instead say, "Just don't do it" since sane
      software will never do it anyway.
      
      Any piece of userspace calling pkey_alloc() needs to be prepared
      for it to fail.  Why?  pkey_alloc() returns the same error code
      (ENOSPC) when there are no pkeys and when pkeys are unsupported.
      They can be unsupported for a whole host of reasons, so apps must
      be prepared for this.  Also, libraries or LD_PRELOADs might steal
      keys before an application gets access to them.
      
      This allocation mechanism could be implemented in userspace.
      Even if we did it in userspace, we would still need additional
      user/kernel interfaces to tell userspace which keys are being
      used by the kernel internally (such as for execute-only
      mappings).  Having the kernel provide this facility completely
      removes the need for these additional interfaces, or having an
      implementation of this in userspace at all.
      
      Note that we have to make changes to all of the architectures
      that do not use mman-common.h because we use the new
      PKEY_DENY_ACCESS/WRITE macros in arch-independent code.
      
      1. PKRU is the Protection Key Rights User register.  It is a
         usermode-accessible register that controls whether writes
         and/or access to each individual pkey is allowed or denied.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: linux-arch@vger.kernel.org
      Cc: Dave Hansen <dave@sr71.net>
      Cc: arnd@arndb.de
      Cc: linux-api@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: luto@kernel.org
      Cc: akpm@linux-foundation.org
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e8c24d3a
  24. 05 5月, 2016 2 次提交
    • J
      asm-generic: Drop renameat syscall from default list · b0da6d44
      James Hogan 提交于
      The newer renameat2 syscall provides all the functionality provided by
      the renameat syscall and adds flags, so future architectures won't need
      to include renameat.
      
      Therefore drop the renameat syscall from the generic syscall list unless
      __ARCH_WANT_RENAMEAT is defined by the architecture's unistd.h prior to
      including asm-generic/unistd.h, and adjust all architectures using the
      generic syscall list to define it so that no in-tree architectures are
      affected.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-snps-arc@lists.infradead.org
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-metag@vger.kernel.org
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: linux@lists.openrisc.net
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: nios2-dev@lists.rocketboards.org
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      b0da6d44
    • Y
      asm-generic: use compat version for preadv2 and pwritev2 · 1f93e9f2
      Yury Norov 提交于
      Compat architectures that does not use generic unistd (mips, s390),
      declare compat version in their syscall tables for preadv2 and
      pwritev2. Generic unistd syscall table should do it as well.
      
      [arnd: this initially slipped through the review and an
       incorrect patch got merged. arch/tile/ is the only architecture
       that could be affected for their 32-bit compat mode, every
       other architecture we support today is fine.]
      Signed-off-by: NYury Norov <ynorov@caviumnetworks.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      1f93e9f2
  25. 24 4月, 2016 1 次提交