1. 29 1月, 2020 3 次提交
    • J
      io_uring: allow registering credentials · 071698e1
      Jens Axboe 提交于
      If an application wants to use a ring with different kinds of
      credentials, it can register them upfront. We don't lookup credentials,
      the credentials of the task calling IORING_REGISTER_PERSONALITY is used.
      
      An 'id' is returned for the application to use in subsequent personality
      support.
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      071698e1
    • P
      io_uring: add io-wq workqueue sharing · 24369c2e
      Pavel Begunkov 提交于
      If IORING_SETUP_ATTACH_WQ is set, it expects wq_fd in io_uring_params to
      be a valid io_uring fd io-wq of which will be shared with the newly
      created io_uring instance. If the flag is set but it can't share io-wq,
      it fails.
      
      This allows creation of "sibling" io_urings, where we prefer to keep the
      SQ/CQ private, but want to share the async backend to minimize the amount
      of overhead associated with having multiple rings that belong to the same
      backend.
      Reported-by: NJens Axboe <axboe@kernel.dk>
      Reported-by: NDaurnimator <quae@daurnimator.com>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      24369c2e
    • J
      io_uring/io-wq: don't use static creds/mm assignments · cccf0ee8
      Jens Axboe 提交于
      We currently setup the io_wq with a static set of mm and creds. Even for
      a single-use io-wq per io_uring, this is suboptimal as we have may have
      multiple enters of the ring. For sharing the io-wq backend, it doesn't
      work at all.
      
      Switch to passing in the creds and mm when the work item is setup. This
      means that async work is no longer deferred to the io_uring mm and creds,
      it is done with the current mm and creds.
      
      Flag this behavior with IORING_FEAT_CUR_PERSONALITY, so applications know
      they can rely on the current personality (mm and creds) being the same
      for direct issue and async issue.
      Reviewed-by: NStefan Metzmacher <metze@samba.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cccf0ee8
  2. 21 1月, 2020 17 次提交
    • P
      io_uring: optimise sqe-to-req flags translation · 6b47ee6e
      Pavel Begunkov 提交于
      For each IOSQE_* flag there is a corresponding REQ_F_* flag. And there
      is a repetitive pattern of their translation:
      e.g. if (sqe->flags & SQE_FLAG*) req->flags |= REQ_F_FLAG*
      
      Use same numeric values/bits for them and copy instead of manual
      handling.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6b47ee6e
    • J
      io_uring: add support for probing opcodes · 66f4af93
      Jens Axboe 提交于
      The application currently has no way of knowing if a given opcode is
      supported or not without having to try and issue one and see if we get
      -EINVAL or not. And even this approach is fraught with peril, as maybe
      we're getting -EINVAL due to some fields being missing, or maybe it's
      just not that easy to issue that particular command without doing some
      other leg work in terms of setup first.
      
      This adds IORING_REGISTER_PROBE, which fills in a structure with info
      on what it supported or not. This will work even with sparse opcode
      fields, which may happen in the future or even today if someone
      backports specific features to older kernels.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      66f4af93
    • J
      io_uring: add support for IORING_OP_OPENAT2 · cebdb986
      Jens Axboe 提交于
      Add support for the new openat2(2) system call. It's trivial to do, as
      we can have openat(2) just be wrapped around it.
      Suggested-by: NStefan Metzmacher <metze@samba.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cebdb986
    • J
      io_uring: enable option to only trigger eventfd for async completions · f2842ab5
      Jens Axboe 提交于
      If an application is using eventfd notifications with poll to know when
      new SQEs can be issued, it's expecting the following read/writes to
      complete inline. And with that, it knows that there are events available,
      and don't want spurious wakeups on the eventfd for those requests.
      
      This adds IORING_REGISTER_EVENTFD_ASYNC, which works just like
      IORING_REGISTER_EVENTFD, except it only triggers notifications for events
      that happen from async completions (IRQ, or io-wq worker completions).
      Any completions inline from the submission itself will not trigger
      notifications.
      Suggested-by: NMark Papadakis <markuspapadakis@icloud.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f2842ab5
    • J
      io_uring: add support for send(2) and recv(2) · fddaface
      Jens Axboe 提交于
      This adds IORING_OP_SEND for send(2) support, and IORING_OP_RECV for
      recv(2) support.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fddaface
    • J
      io_uring: add support for IORING_SETUP_CLAMP · 8110c1a6
      Jens Axboe 提交于
      Some applications like to start small in terms of ring size, and then
      ramp up as needed. This is a bit tricky to do currently, since we don't
      advertise the max ring size.
      
      This adds IORING_SETUP_CLAMP. If set, and the values for SQ or CQ ring
      size exceed what we support, then clamp them at the max values instead
      of returning -EINVAL. Since we return the chosen ring sizes after setup,
      no further changes are needed on the application side. io_uring already
      changes the ring sizes if the application doesn't ask for power-of-two
      sizes, for example.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8110c1a6
    • J
      io_uring: add IORING_OP_MADVISE · c1ca757b
      Jens Axboe 提交于
      This adds support for doing madvise(2) through io_uring. We assume that
      any operation can block, and hence punt everything async. This could be
      improved, but hard to make bullet proof. The async punt ensures it's
      safe.
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c1ca757b
    • J
      io_uring: add IORING_OP_FADVISE · 4840e418
      Jens Axboe 提交于
      This adds support for doing fadvise through io_uring. We assume that
      WILLNEED doesn't block, but that DONTNEED may block.
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4840e418
    • J
      io_uring: allow use of offset == -1 to mean file position · ba04291e
      Jens Axboe 提交于
      This behaves like preadv2/pwritev2 with offset == -1, it'll use (and
      update) the current file position. This obviously comes with the caveat
      that if the application has multiple read/writes in flight, then the
      end result will not be as expected. This is similar to threads sharing
      a file descriptor and doing IO using the current file position.
      
      Since this feature isn't easily detectable by doing a read or write,
      add a feature flags, IORING_FEAT_RW_CUR_POS, to allow applications to
      detect presence of this feature.
      Reported-by: N李通洲 <carter.li@eoitek.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ba04291e
    • J
      io_uring: add non-vectored read/write commands · 3a6820f2
      Jens Axboe 提交于
      For uses cases that don't already naturally have an iovec, it's easier
      (or more convenient) to just use a buffer address + length. This is
      particular true if the use case is from languages that want to create
      a memory safe abstraction on top of io_uring, and where introducing
      the need for the iovec may impose an ownership issue. For those cases,
      they currently need an indirection buffer, which means allocating data
      just for this purpose.
      
      Add basic read/write that don't require the iovec.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3a6820f2
    • J
      io_uring: add IOSQE_ASYNC · ce35a47a
      Jens Axboe 提交于
      io_uring defaults to always doing inline submissions, if at all
      possible. But for larger copies, even if the data is fully cached, that
      can take a long time. Add an IOSQE_ASYNC flag that the application can
      set on the SQE - if set, it'll ensure that we always go async for those
      kinds of requests. Use the io-wq IO_WQ_WORK_CONCURRENT flag to ensure we
      get the concurrency we desire for this case.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ce35a47a
    • J
      io_uring: add support for IORING_OP_STATX · eddc7ef5
      Jens Axboe 提交于
      This provides support for async statx(2) through io_uring.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eddc7ef5
    • J
      io_uring: avoid ring quiesce for fixed file set unregister and update · 05f3fb3c
      Jens Axboe 提交于
      We currently fully quiesce the ring before an unregister or update of
      the fixed fileset. This is very expensive, and we can be a bit smarter
      about this.
      
      Add a percpu refcount for the file tables as a whole. Grab a percpu ref
      when we use a registered file, and put it on completion. This is cheap
      to do. Upon removal of a file from a set, switch the ref count to atomic
      mode. When we hit zero ref on the completion side, then we know we can
      drop the previously registered files. When the old files have been
      dropped, switch the ref back to percpu mode for normal operation.
      
      Since there's a period between doing the update and the kernel being
      done with it, add a IORING_OP_FILES_UPDATE opcode that can perform the
      same action. The application knows the update has completed when it gets
      the CQE for it. Between doing the update and receiving this completion,
      the application must continue to use the unregistered fd if submitting
      IO on this particular file.
      
      This takes the runtime of test/file-register from liburing from 14s to
      about 0.7s.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      05f3fb3c
    • J
      io_uring: add support for IORING_OP_CLOSE · b5dba59e
      Jens Axboe 提交于
      This works just like close(2), unsurprisingly. We remove the file
      descriptor and post the completion inline, then offload the actual
      (potential) last file put to async context.
      
      Mark the async part of this work as uncancellable, as we really must
      guarantee that the latter part of the close is run.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b5dba59e
    • J
      io_uring: add support for IORING_OP_OPENAT · 15b71abe
      Jens Axboe 提交于
      This works just like openat(2), except it can be performed async. For
      the normal case of a non-blocking path lookup this will complete
      inline. If we have to do IO to perform the open, it'll be done from
      async context.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      15b71abe
    • J
      io_uring: add support for fallocate() · d63d1b5e
      Jens Axboe 提交于
      This exposes fallocate(2) through io_uring.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d63d1b5e
    • E
      io_uring: fix compat for IORING_REGISTER_FILES_UPDATE · 1292e972
      Eugene Syromiatnikov 提交于
      fds field of struct io_uring_files_update is problematic with regards
      to compat user space, as pointer size is different in 32-bit, 32-on-64-bit,
      and 64-bit user space.  In order to avoid custom handling of compat in
      the syscall implementation, make fds __u64 and use u64_to_user_ptr in
      order to retrieve it.  Also, align the field naturally and check that
      no garbage is passed there.
      
      Fixes: c3a31e60 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
      Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1292e972
  3. 18 1月, 2020 1 次提交
    • A
      open: introduce openat2(2) syscall · fddb5d43
      Aleksa Sarai 提交于
      /* Background. */
      For a very long time, extending openat(2) with new features has been
      incredibly frustrating. This stems from the fact that openat(2) is
      possibly the most famous counter-example to the mantra "don't silently
      accept garbage from userspace" -- it doesn't check whether unknown flags
      are present[1].
      
      This means that (generally) the addition of new flags to openat(2) has
      been fraught with backwards-compatibility issues (O_TMPFILE has to be
      defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
      kernels gave errors, since it's insecure to silently ignore the
      flag[2]). All new security-related flags therefore have a tough road to
      being added to openat(2).
      
      Userspace also has a hard time figuring out whether a particular flag is
      supported on a particular kernel. While it is now possible with
      contemporary kernels (thanks to [3]), older kernels will expose unknown
      flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
      openat(2) time matches modern syscall designs and is far more
      fool-proof.
      
      In addition, the newly-added path resolution restriction LOOKUP flags
      (which we would like to expose to user-space) don't feel related to the
      pre-existing O_* flag set -- they affect all components of path lookup.
      We'd therefore like to add a new flag argument.
      
      Adding a new syscall allows us to finally fix the flag-ignoring problem,
      and we can make it extensible enough so that we will hopefully never
      need an openat3(2).
      
      /* Syscall Prototype. */
        /*
         * open_how is an extensible structure (similar in interface to
         * clone3(2) or sched_setattr(2)). The size parameter must be set to
         * sizeof(struct open_how), to allow for future extensions. All future
         * extensions will be appended to open_how, with their zero value
         * acting as a no-op default.
         */
        struct open_how { /* ... */ };
      
        int openat2(int dfd, const char *pathname,
                    struct open_how *how, size_t size);
      
      /* Description. */
      The initial version of 'struct open_how' contains the following fields:
      
        flags
          Used to specify openat(2)-style flags. However, any unknown flag
          bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
          will result in -EINVAL. In addition, this field is 64-bits wide to
          allow for more O_ flags than currently permitted with openat(2).
      
        mode
          The file mode for O_CREAT or O_TMPFILE.
      
          Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.
      
        resolve
          Restrict path resolution (in contrast to O_* flags they affect all
          path components). The current set of flags are as follows (at the
          moment, all of the RESOLVE_ flags are implemented as just passing
          the corresponding LOOKUP_ flag).
      
          RESOLVE_NO_XDEV       => LOOKUP_NO_XDEV
          RESOLVE_NO_SYMLINKS   => LOOKUP_NO_SYMLINKS
          RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
          RESOLVE_BENEATH       => LOOKUP_BENEATH
          RESOLVE_IN_ROOT       => LOOKUP_IN_ROOT
      
      open_how does not contain an embedded size field, because it is of
      little benefit (userspace can figure out the kernel open_how size at
      runtime fairly easily without it). It also only contains u64s (even
      though ->mode arguably should be a u16) to avoid having padding fields
      which are never used in the future.
      
      Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
      is no longer permitted for openat(2). As far as I can tell, this has
      always been a bug and appears to not be used by userspace (and I've not
      seen any problems on my machines by disallowing it). If it turns out
      this breaks something, we can special-case it and only permit it for
      openat(2) but not openat2(2).
      
      After input from Florian Weimer, the new open_how and flag definitions
      are inside a separate header from uapi/linux/fcntl.h, to avoid problems
      that glibc has with importing that header.
      
      /* Testing. */
      In a follow-up patch there are over 200 selftests which ensure that this
      syscall has the correct semantics and will correctly handle several
      attack scenarios.
      
      In addition, I've written a userspace library[4] which provides
      convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
      because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
      must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
      syscalls). During the development of this patch, I've run numerous
      verification tests using libpathrs (showing that the API is reasonably
      usable by userspace).
      
      /* Future Work. */
      Additional RESOLVE_ flags have been suggested during the review period.
      These can be easily implemented separately (such as blocking auto-mount
      during resolution).
      
      Furthermore, there are some other proposed changes to the openat(2)
      interface (the most obvious example is magic-link hardening[5]) which
      would be a good opportunity to add a way for userspace to restrict how
      O_PATH file descriptors can be re-opened.
      
      Another possible avenue of future work would be some kind of
      CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
      which openat2(2) flags and fields are supported by the current kernel
      (to avoid userspace having to go through several guesses to figure it
      out).
      
      [1]: https://lwn.net/Articles/588444/
      [2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
      [3]: commit 629e014b ("fs: completely ignore unknown open flags")
      [4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
      [5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
      [6]: https://youtu.be/ggD-eb3yPVsSuggested-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NAleksa Sarai <cyphar@cyphar.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fddb5d43
  4. 05 1月, 2020 1 次提交
  5. 14 12月, 2019 1 次提交
  6. 13 12月, 2019 1 次提交
    • T
      mac80211: Turn AQL into an NL80211_EXT_FEATURE · 911bde0f
      Toke Høiland-Jørgensen 提交于
      Instead of just having an airtime flag in debugfs, turn AQL into a proper
      NL80211_EXT_FEATURE, so drivers can turn it on when they are ready, and so
      we also expose the presence of the feature to userspace.
      
      This also has the effect of flipping the default, so drivers have to opt in
      to using AQL instead of getting it by default with TXQs. To keep
      functionality the same as pre-patch, we set this feature for ath10k (which
      is where it is needed the most).
      
      While we're at it, split out the debugfs interface so AQL gets its own
      per-station debugfs file instead of using the 'airtime' file.
      
      [Johannes:]
      This effectively disables AQL for iwlwifi, where it fixes a number of
      issues:
       * TSO in iwlwifi is causing underflows and associated warnings in AQL
       * HE (802.11ax) rates aren't reported properly so at HE rates, AQL could
         never have a valid estimate (it'd use 6 Mbps instead of up to 2400!)
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/20191212111437.224294-1-toke@redhat.com
      Fixes: 3ace10f5 ("mac80211: Implement Airtime-based Queue Limit (AQL)")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      911bde0f
  7. 12 12月, 2019 1 次提交
    • J
      io_uring: ensure we return -EINVAL on unknown opcode · 9e3aa61a
      Jens Axboe 提交于
      If we submit an unknown opcode and have fd == -1, io_op_needs_file()
      will return true as we default to needing a file. Then when we go and
      assign the file, we find the 'fd' invalid and return -EBADF. We really
      should be returning -EINVAL for that case, as we normally do for
      unsupported opcodes.
      
      Change io_op_needs_file() to have the following return values:
      
      0   - does not need a file
      1   - does need a file
      < 0 - error value
      
      and use this to pass back the right value for this invalid case.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9e3aa61a
  8. 11 12月, 2019 1 次提交
    • J
      io_uring: allow unbreakable links · 4e88d6e7
      Jens Axboe 提交于
      Some commands will invariably end in a failure in the sense that the
      completion result will be less than zero. One such example is timeouts
      that don't have a completion count set, they will always complete with
      -ETIME unless cancelled.
      
      For linked commands, we sever links and fail the rest of the chain if
      the result is less than zero. Since we have commands where we know that
      will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever
      regardless of the completion result. Note that the link will still sever
      if we fail submitting the parent request, hard links are only resilient
      in the presence of completion results for requests that did submit
      correctly.
      
      Cc: stable@vger.kernel.org # v5.4
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Reported-by: N李通洲 <carter.li@eoitek.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4e88d6e7
  9. 09 12月, 2019 1 次提交
  10. 05 12月, 2019 6 次提交
    • M
      arch: sembuf.h: make uapi asm/sembuf.h self-contained · 0fb9dc28
      Masahiro Yamada 提交于
      Userspace cannot compile <asm/sembuf.h> due to some missing type
      definitions.  For example, building it for x86 fails as follows:
      
          CC      usr/include/asm/sembuf.h.s
        In file included from <command-line>:32:0:
        usr/include/asm/sembuf.h:17:20: error: field `sem_perm' has incomplete type
          struct ipc64_perm sem_perm; /* permissions .. see ipc.h */
                            ^~~~~~~~
        usr/include/asm/sembuf.h:24:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t sem_otime; /* last semop time */
          ^~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:25:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused1;
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:26:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t sem_ctime; /* last change time */
          ^~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:27:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused2;
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:29:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t sem_nsems; /* no. of semaphores in array */
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:30:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused3;
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:31:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused4;
          ^~~~~~~~~~~~~~~~
      
      It is just a matter of missing include directive.
      
      Include <asm/ipcbuf.h> to make it self-contained, and add it to
      the compile-test coverage.
      
      Link: http://lkml.kernel.org/r/20191030063855.9989-3-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0fb9dc28
    • M
      arch: msgbuf.h: make uapi asm/msgbuf.h self-contained · 9ef0e004
      Masahiro Yamada 提交于
      Userspace cannot compile <asm/msgbuf.h> due to some missing type
      definitions.  For example, building it for x86 fails as follows:
      
          CC      usr/include/asm/msgbuf.h.s
        In file included from usr/include/asm/msgbuf.h:6:0,
                         from <command-line>:32:
        usr/include/asm-generic/msgbuf.h:25:20: error: field `msg_perm' has incomplete type
          struct ipc64_perm msg_perm;
                            ^~~~~~~~
        usr/include/asm-generic/msgbuf.h:27:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t msg_stime; /* last msgsnd time */
          ^~~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:28:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t msg_rtime; /* last msgrcv time */
          ^~~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:29:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t msg_ctime; /* last change time */
          ^~~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:41:2: error: unknown type name `__kernel_pid_t'
          __kernel_pid_t msg_lspid; /* pid of last msgsnd */
          ^~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:42:2: error: unknown type name `__kernel_pid_t'
          __kernel_pid_t msg_lrpid; /* last receive pid */
          ^~~~~~~~~~~~~~
      
      It is just a matter of missing include directive.
      
      Include <asm/ipcbuf.h> to make it self-contained, and add it to
      the compile-test coverage.
      
      Link: http://lkml.kernel.org/r/20191030063855.9989-2-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ef0e004
    • M
      arch: ipcbuf.h: make uapi asm/ipcbuf.h self-contained · 5b009673
      Masahiro Yamada 提交于
      Userspace cannot compile <asm/ipcbuf.h> due to some missing type
      definitions.  For example, building it for x86 fails as follows:
      
          CC      usr/include/asm/ipcbuf.h.s
        In file included from usr/include/asm/ipcbuf.h:1:0,
                         from <command-line>:32:
        usr/include/asm-generic/ipcbuf.h:21:2: error: unknown type name `__kernel_key_t'
          __kernel_key_t  key;
          ^~~~~~~~~~~~~~
        usr/include/asm-generic/ipcbuf.h:22:2: error: unknown type name `__kernel_uid32_t'
          __kernel_uid32_t uid;
          ^~~~~~~~~~~~~~~~
        usr/include/asm-generic/ipcbuf.h:23:2: error: unknown type name `__kernel_gid32_t'
          __kernel_gid32_t gid;
          ^~~~~~~~~~~~~~~~
        usr/include/asm-generic/ipcbuf.h:24:2: error: unknown type name `__kernel_uid32_t'
          __kernel_uid32_t cuid;
          ^~~~~~~~~~~~~~~~
        usr/include/asm-generic/ipcbuf.h:25:2: error: unknown type name `__kernel_gid32_t'
          __kernel_gid32_t cgid;
          ^~~~~~~~~~~~~~~~
        usr/include/asm-generic/ipcbuf.h:26:2: error: unknown type name `__kernel_mode_t'
          __kernel_mode_t  mode;
          ^~~~~~~~~~~~~~~
        usr/include/asm-generic/ipcbuf.h:28:35: error: `__kernel_mode_t' undeclared here (not in a function)
          unsigned char  __pad1[4 - sizeof(__kernel_mode_t)];
                                           ^~~~~~~~~~~~~~~
        usr/include/asm-generic/ipcbuf.h:31:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused1;
          ^~~~~~~~~~~~~~~~
        usr/include/asm-generic/ipcbuf.h:32:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused2;
          ^~~~~~~~~~~~~~~~
      
      It is just a matter of missing include directive.
      
      Include <linux/posix_types.h> to make it self-contained, and add it to
      the compile-test coverage.
      
      Link: http://lkml.kernel.org/r/20191030063855.9989-1-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b009673
    • A
      kcov: remote coverage support · eec028c9
      Andrey Konovalov 提交于
      Patch series " kcov: collect coverage from usb and vhost", v3.
      
      This patchset extends kcov to allow collecting coverage from backgound
      kernel threads.  This extension requires custom annotations for each of
      the places where coverage collection is desired.  This patchset
      implements this for hub events in the USB subsystem and for vhost
      workers.  See the first patch description for details about the kcov
      extension.  The other two patches apply this kcov extension to USB and
      vhost.
      
      Examples of other subsystems that might potentially benefit from this
      when custom annotations are added (the list is based on
      process_one_work() callers for bugs recently reported by syzbot):
      
      1. fs: writeback wb_workfn() worker,
      2. net: addrconf_dad_work()/addrconf_verify_work() workers,
      3. net: neigh_periodic_work() worker,
      4. net/p9: p9_write_work()/p9_read_work() workers,
      5. block: blk_mq_run_work_fn() worker.
      
      These patches have been used to enable coverage-guided USB fuzzing with
      syzkaller for the last few years, see the details here:
      
        https://github.com/google/syzkaller/blob/master/docs/linux/external_fuzzing_usb.md
      
      This patchset has been pushed to the public Linux kernel Gerrit
      instance:
      
        https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/1524
      
      This patch (of 3):
      
      Add background thread coverage collection ability to kcov.
      
      With KCOV_ENABLE coverage is collected only for syscalls that are issued
      from the current process.  With KCOV_REMOTE_ENABLE it's possible to
      collect coverage for arbitrary parts of the kernel code, provided that
      those parts are annotated with kcov_remote_start()/kcov_remote_stop().
      
      This allows to collect coverage from two types of kernel background
      threads: the global ones, that are spawned during kernel boot in a
      limited number of instances (e.g.  one USB hub_event() worker thread is
      spawned per USB HCD); and the local ones, that are spawned when a user
      interacts with some kernel interface (e.g.  vhost workers).
      
      To enable collecting coverage from a global background thread, a unique
      global handle must be assigned and passed to the corresponding
      kcov_remote_start() call.  Then a userspace process can pass a list of
      such handles to the KCOV_REMOTE_ENABLE ioctl in the handles array field
      of the kcov_remote_arg struct.  This will attach the used kcov device to
      the code sections, that are referenced by those handles.
      
      Since there might be many local background threads spawned from
      different userspace processes, we can't use a single global handle per
      annotation.  Instead, the userspace process passes a non-zero handle
      through the common_handle field of the kcov_remote_arg struct.  This
      common handle gets saved to the kcov_handle field in the current
      task_struct and needs to be passed to the newly spawned threads via
      custom annotations.  Those threads should in turn be annotated with
      kcov_remote_start()/kcov_remote_stop().
      
      Internally kcov stores handles as u64 integers.  The top byte of a
      handle is used to denote the id of a subsystem that this handle belongs
      to, and the lower 4 bytes are used to denote the id of a thread instance
      within that subsystem.  A reserved value 0 is used as a subsystem id for
      common handles as they don't belong to a particular subsystem.  The
      bytes 4-7 are currently reserved and must be zero.  In the future the
      number of bytes used for the subsystem or handle ids might be increased.
      
      When a particular userspace process collects coverage by via a common
      handle, kcov will collect coverage for each code section that is
      annotated to use the common handle obtained as kcov_handle from the
      current task_struct.  However non common handles allow to collect
      coverage selectively from different subsystems.
      
      Link: http://lkml.kernel.org/r/e90e315426a384207edbec1d6aa89e43008e4caf.1572366574.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: David Windsor <dwindsor@gmail.com>
      Cc: Elena Reshetova <elena.reshetova@intel.com>
      Cc: Anders Roxell <anders.roxell@linaro.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eec028c9
    • M
      linux/scc.h: make uapi linux/scc.h self-contained · 1a18374f
      Masahiro Yamada 提交于
      Userspace cannot compile <linux/scc.h>
      
          CC      usr/include/linux/scc.h.s
        In file included from <command-line>:32:0:
        usr/include/linux/scc.h:20:20: error: `SIOCDEVPRIVATE' undeclared here (not in a function)
          SIOCSCCRESERVED = SIOCDEVPRIVATE,
                            ^~~~~~~~~~~~~~
      
      Include <linux/sockios.h> to make it self-contained, and add it to the
      compile-test coverage.
      
      Link: http://lkml.kernel.org/r/20191108055809.26969-1-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a18374f
    • M
      Input: add privacy screen toggle keycode · 25b2f1b7
      Mathew King 提交于
      Add keycode for toggling electronic privacy screen to the keycodes
      definition. Some new laptops have a privacy screen which can be toggled
      with a key on the keyboard.
      Signed-off-by: NMathew King <mathewk@chromium.org>
      Link: https://lore.kernel.org/r/20191017163208.235518-1-mathewk@chromium.orgSigned-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      25b2f1b7
  11. 03 12月, 2019 1 次提交
  12. 28 11月, 2019 1 次提交
  13. 26 11月, 2019 1 次提交
    • J
      io_uring: add support for IORING_OP_CONNECT · f8e85cf2
      Jens Axboe 提交于
      This allows an application to call connect() in an async fashion. Like
      other opcodes, we first try a non-blocking connect, then punt to async
      context if we have to.
      
      Note that we can still return -EINPROGRESS, and in that case the caller
      should use IORING_OP_POLL_ADD to do an async wait for completion of the
      connect request (just like for regular connect(2), except we can do it
      async here too).
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f8e85cf2
  14. 25 11月, 2019 1 次提交
  15. 24 11月, 2019 1 次提交
  16. 22 11月, 2019 2 次提交
    • X
      net: sched: allow flower to match erspan options · 79b1011c
      Xin Long 提交于
      This patch is to allow matching options in erspan.
      
      The options can be described in the form:
      VER:INDEX:DIR:HWID/VER:INDEX_MASK:DIR_MASK:HWID_MASK.
      When ver is set to 1, index will be applied while dir
      and hwid will be ignored, and when ver is set to 2,
      dir and hwid will be used while index will be ignored.
      
      Different from geneve, only one option can be set. And
      also, geneve options, vxlan options or erspan options
      can't be set at the same time.
      
        # ip link add name erspan1 type erspan external
        # tc qdisc add dev erspan1 ingress
        # tc filter add dev erspan1 protocol ip parent ffff: \
            flower \
              enc_src_ip 10.0.99.192 \
              enc_dst_ip 10.0.99.193 \
              enc_key_id 11 \
              erspan_opts 1:12:0:0/1:ffff:0:0 \
              ip_proto udp \
              action mirred egress redirect dev eth0
      
      v1->v2:
        - improve some err msgs of extack.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79b1011c
    • X
      net: sched: allow flower to match vxlan options · d8f9dfae
      Xin Long 提交于
      This patch is to allow matching gbp option in vxlan.
      
      The options can be described in the form GBP/GBP_MASK,
      where GBP is represented as a 32bit hexadecimal value.
      Different from geneve, only one option can be set. And
      also, geneve options and vxlan options can't be set at
      the same time.
      
        # ip link add name vxlan0 type vxlan dstport 0 external
        # tc qdisc add dev vxlan0 ingress
        # tc filter add dev vxlan0 protocol ip parent ffff: \
            flower \
              enc_src_ip 10.0.99.192 \
              enc_dst_ip 10.0.99.193 \
              enc_key_id 11 \
              vxlan_opts 01020304/ffffffff \
              ip_proto udp \
              action mirred egress redirect dev eth0
      
      v1->v2:
        - add .strict_start_type for enc_opts_policy as Jakub noticed.
        - use Duplicate instead of Wrong in err msg for extack as Jakub
          suggested.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8f9dfae