1. 07 8月, 2014 1 次提交
  2. 06 8月, 2014 5 次提交
    • W
      net-timestamp: ACK timestamp for bytestreams · e1c8a607
      Willem de Bruijn 提交于
      Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte
      in the send() call is acknowledged. It implements the feature for TCP.
      
      The timestamp is generated when the TCP socket cumulative ACK is moved
      beyond the tracked seqno for the first time. The feature ignores SACK
      and FACK, because those acknowledge the specific byte, but not
      necessarily the entire contents of the buffer up to that byte.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1c8a607
    • W
      net-timestamp: SCHED timestamp on entering packet scheduler · e7fd2885
      Willem de Bruijn 提交于
      Kernel transmit latency is often incurred in the packet scheduler.
      Introduce a new timestamp on transmission just before entering the
      scheduler. When data travels through multiple devices (bonding,
      tunneling, ...) each device will export an individual timestamp.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7fd2885
    • W
      net-timestamp: add key to disambiguate concurrent datagrams · 09c2d251
      Willem de Bruijn 提交于
      Datagrams timestamped on transmission can coexist in the kernel stack
      and be reordered in packet scheduling. When reading looped datagrams
      from the socket error queue it is not always possible to unique
      correlate looped data with original send() call (for application
      level retransmits). Even if possible, it may be expensive and complex,
      requiring packet inspection.
      
      Introduce a data-independent ID mechanism to associate timestamps with
      send calls. Pass an ID alongside the timestamp in field ee_data of
      sock_extended_err.
      
      The ID is a simple 32 bit unsigned int that is associated with the
      socket and incremented on each send() call for which software tx
      timestamp generation is enabled.
      
      The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to
      avoid changing ee_data for existing applications that expect it 0.
      The counter is reset each time the flag is reenabled. Reenabling
      does not change the ID of already submitted data. It is possible
      to receive out of order IDs if the timestamp stream is not quiesced
      first.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09c2d251
    • W
      net-timestamp: extend SCM_TIMESTAMPING ancillary data struct · f24b9be5
      Willem de Bruijn 提交于
      Applications that request kernel tx timestamps with SO_TIMESTAMPING
      read timestamps as recvmsg() ancillary data. The response is defined
      implicitly as timespec[3].
      
      1) define struct scm_timestamping explicitly and
      
      2) add support for new tstamp types. On tx, scm_timestamping always
         accompanies a sock_extended_err. Define previously unused field
         ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to
         define the existing behavior.
      
      The reception path is not modified. On rx, no struct similar to
      sock_extended_err is passed along with SCM_TIMESTAMPING.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f24b9be5
    • T
      random: introduce getrandom(2) system call · c6e9d6f3
      Theodore Ts'o 提交于
      The getrandom(2) system call was requested by the LibreSSL Portable
      developers.  It is analoguous to the getentropy(2) system call in
      OpenBSD.
      
      The rationale of this system call is to provide resiliance against
      file descriptor exhaustion attacks, where the attacker consumes all
      available file descriptors, forcing the use of the fallback code where
      /dev/[u]random is not available.  Since the fallback code is often not
      well-tested, it is better to eliminate this potential failure mode
      entirely.
      
      The other feature provided by this new system call is the ability to
      request randomness from the /dev/urandom entropy pool, but to block
      until at least 128 bits of entropy has been accumulated in the
      /dev/urandom entropy pool.  Historically, the emphasis in the
      /dev/urandom development has been to ensure that urandom pool is
      initialized as quickly as possible after system boot, and preferably
      before the init scripts start execution.
      
      This is because changing /dev/urandom reads to block represents an
      interface change that could potentially break userspace which is not
      acceptable.  In practice, on most x86 desktop and server systems, in
      general the entropy pool can be initialized before it is needed (and
      in modern kernels, we will printk a warning message if not).  However,
      on an embedded system, this may not be the case.  And so with this new
      interface, we can provide the functionality of blocking until the
      urandom pool has been initialized.  Any userspace program which uses
      this new functionality must take care to assure that if it is used
      during the boot process, that it will not cause the init scripts or
      other portions of the system startup to hang indefinitely.
      
      SYNOPSIS
      	#include <linux/random.h>
      
      	int getrandom(void *buf, size_t buflen, unsigned int flags);
      
      DESCRIPTION
      	The system call getrandom() fills the buffer pointed to by buf
      	with up to buflen random bytes which can be used to seed user
      	space random number generators (i.e., DRBG's) or for other
      	cryptographic uses.  It should not be used for Monte Carlo
      	simulations or other programs/algorithms which are doing
      	probabilistic sampling.
      
      	If the GRND_RANDOM flags bit is set, then draw from the
      	/dev/random pool instead of the /dev/urandom pool.  The
      	/dev/random pool is limited based on the entropy that can be
      	obtained from environmental noise, so if there is insufficient
      	entropy, the requested number of bytes may not be returned.
      	If there is no entropy available at all, getrandom(2) will
      	either block, or return an error with errno set to EAGAIN if
      	the GRND_NONBLOCK bit is set in flags.
      
      	If the GRND_RANDOM bit is not set, then the /dev/urandom pool
      	will be used.  Unlike using read(2) to fetch data from
      	/dev/urandom, if the urandom pool has not been sufficiently
      	initialized, getrandom(2) will block (or return -1 with the
      	errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags).
      
      	The getentropy(2) system call in OpenBSD can be emulated using
      	the following function:
      
                  int getentropy(void *buf, size_t buflen)
                  {
                          int     ret;
      
                          if (buflen > 256)
                                  goto failure;
                          ret = getrandom(buf, buflen, 0);
                          if (ret < 0)
                                  return ret;
                          if (ret == buflen)
                                  return 0;
                  failure:
                          errno = EIO;
                          return -1;
                  }
      
      RETURN VALUE
             On success, the number of bytes that was filled in the buf is
             returned.  This may not be all the bytes requested by the
             caller via buflen if insufficient entropy was present in the
             /dev/random pool, or if the system call was interrupted by a
             signal.
      
             On error, -1 is returned, and errno is set appropriately.
      
      ERRORS
      	EINVAL		An invalid flag was passed to getrandom(2)
      
      	EFAULT		buf is outside the accessible address space.
      
      	EAGAIN		The requested entropy was not available, and
      			getentropy(2) would have blocked if the
      			GRND_NONBLOCK flag was not set.
      
      	EINTR		While blocked waiting for entropy, the call was
      			interrupted by a signal handler; see the description
      			of how interrupted read(2) calls on "slow" devices
      			are handled with and without the SA_RESTART flag
      			in the signal(7) man page.
      
      NOTES
      	For small requests (buflen <= 256) getrandom(2) will not
      	return EINTR when reading from the urandom pool once the
      	entropy pool has been initialized, and it will return all of
      	the bytes that have been requested.  This is the recommended
      	way to use getrandom(2), and is designed for compatibility
      	with OpenBSD's getentropy() system call.
      
      	However, if you are using GRND_RANDOM, then getrandom(2) may
      	block until the entropy accounting determines that sufficient
      	environmental noise has been gathered such that getrandom(2)
      	will be operating as a NRBG instead of a DRBG for those people
      	who are working in the NIST SP 800-90 regime.  Since it may
      	block for a long time, these guarantees do *not* apply.  The
      	user may want to interrupt a hanging process using a signal,
      	so blocking until all of the requested bytes are returned
      	would be unfriendly.
      
      	For this reason, the user of getrandom(2) MUST always check
      	the return value, in case it returns some error, or if fewer
      	bytes than requested was returned.  In the case of
      	!GRND_RANDOM and small request, the latter should never
      	happen, but the careful userspace code (and all crypto code
      	should be careful) should check for this anyway!
      
      	Finally, unless you are doing long-term key generation (and
      	perhaps not even then), you probably shouldn't be using
      	GRND_RANDOM.  The cryptographic algorithms used for
      	/dev/urandom are quite conservative, and so should be
      	sufficient for all purposes.  The disadvantage of GRND_RANDOM
      	is that it can block, and the increased complexity required to
      	deal with partially fulfilled getrandom(2) requests.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NZach Brown <zab@zabbo.net>
      c6e9d6f3
  3. 05 8月, 2014 1 次提交
  4. 03 8月, 2014 1 次提交
    • A
      net: filter: split 'struct sk_filter' into socket and bpf parts · 7ae457c1
      Alexei Starovoitov 提交于
      clean up names related to socket filtering and bpf in the following way:
      - everything that deals with sockets keeps 'sk_*' prefix
      - everything that is pure BPF is changed to 'bpf_*' prefix
      
      split 'struct sk_filter' into
      struct sk_filter {
      	atomic_t        refcnt;
      	struct rcu_head rcu;
      	struct bpf_prog *prog;
      };
      and
      struct bpf_prog {
              u32                     jited:1,
                                      len:31;
              struct sock_fprog_kern  *orig_prog;
              unsigned int            (*bpf_func)(const struct sk_buff *skb,
                                                  const struct bpf_insn *filter);
              union {
                      struct sock_filter      insns[0];
                      struct bpf_insn         insnsi[0];
                      struct work_struct      work;
              };
      };
      so that 'struct bpf_prog' can be used independent of sockets and cleans up
      'unattached' bpf use cases
      
      split SK_RUN_FILTER macro into:
          SK_RUN_FILTER to be used with 'struct sk_filter *' and
          BPF_PROG_RUN to be used with 'struct bpf_prog *'
      
      __sk_filter_release(struct sk_filter *) gains
      __bpf_prog_release(struct bpf_prog *) helper function
      
      also perform related renames for the functions that work
      with 'struct bpf_prog *', since they're on the same lines:
      
      sk_filter_size -> bpf_prog_size
      sk_filter_select_runtime -> bpf_prog_select_runtime
      sk_filter_free -> bpf_prog_free
      sk_unattached_filter_create -> bpf_prog_create
      sk_unattached_filter_destroy -> bpf_prog_destroy
      sk_store_orig_filter -> bpf_prog_store_orig_filter
      sk_release_orig_filter -> bpf_release_orig_filter
      __sk_migrate_filter -> bpf_migrate_filter
      __sk_prepare_filter -> bpf_prepare_filter
      
      API for attaching classic BPF to a socket stays the same:
      sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
      and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
      which is used by sockets, tun, af_packet
      
      API for 'unattached' BPF programs becomes:
      bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
      and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
      which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ae457c1
  5. 31 7月, 2014 2 次提交
  6. 30 7月, 2014 1 次提交
  7. 29 7月, 2014 1 次提交
  8. 28 7月, 2014 2 次提交
    • A
      KVM: Allow KVM_CHECK_EXTENSION on the vm fd · 92b591a4
      Alexander Graf 提交于
      The KVM_CHECK_EXTENSION is only available on the kvm fd today. Unfortunately
      on PPC some of the capabilities change depending on the way a VM was created.
      
      So instead we need a way to expose capabilities as VM ioctl, so that we can
      see which VM type we're using (HV or PR). To enable this, add the
      KVM_CHECK_EXTENSION ioctl to our vm ioctl portfolio.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      92b591a4
    • P
      KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling · 699a0ea0
      Paul Mackerras 提交于
      This provides a way for userspace controls which sPAPR hcalls get
      handled in the kernel.  Each hcall can be individually enabled or
      disabled for in-kernel handling, except for H_RTAS.  The exception
      for H_RTAS is because userspace can already control whether
      individual RTAS functions are handled in-kernel or not via the
      KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for
      H_RTAS is out of the normal sequence of hcall numbers.
      
      Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the
      KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM.
      The args field of the struct kvm_enable_cap specifies the hcall number
      in args[0] and the enable/disable flag in args[1]; 0 means disable
      in-kernel handling (so that the hcall will always cause an exit to
      userspace) and 1 means enable.  Enabling or disabling in-kernel
      handling of an hcall is effective across the whole VM.
      
      The ability for KVM_ENABLE_CAP to be used on a VM file descriptor
      on PowerPC is new, added by this commit.  The KVM_CAP_ENABLE_CAP_VM
      capability advertises that this ability exists.
      
      When a VM is created, an initial set of hcalls are enabled for
      in-kernel handling.  The set that is enabled is the set that have
      an in-kernel implementation at this point.  Any new hcall
      implementations from this point onwards should not be added to the
      default set without a good reason.
      
      No distinction is made between real-mode and virtual-mode hcall
      implementations; the one setting controls them both.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      699a0ea0
  9. 26 7月, 2014 2 次提交
  10. 24 7月, 2014 1 次提交
  11. 22 7月, 2014 6 次提交
  12. 21 7月, 2014 1 次提交
    • T
      ALSA: pcm: Introduce protocol version field to sw_params · 58900810
      Takashi Iwai 提交于
      For controlling the new fields more strictly, add sw_params.proto
      field indicating the protocol version of the user-space.  User-space
      should fill the SNDRV_PCM_VERSION value it's built with, then kernel
      can know whether the new fields should be evaluated or not.
      
      And now tstamp_type field is evaluated only when the valid value is
      set there.  This avoids the wrong override of tstamp_type to zero,
      which is SNDRV_PCM_TSTAMP_TYPE_GETTIMEOFDAY.
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      58900810
  13. 19 7月, 2014 2 次提交
    • K
      seccomp: implement SECCOMP_FILTER_FLAG_TSYNC · c2e1f2e3
      Kees Cook 提交于
      Applying restrictive seccomp filter programs to large or diverse
      codebases often requires handling threads which may be started early in
      the process lifetime (e.g., by code that is linked in). While it is
      possible to apply permissive programs prior to process start up, it is
      difficult to further restrict the kernel ABI to those threads after that
      point.
      
      This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
      synchronizing thread group seccomp filters at filter installation time.
      
      When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
      filter) an attempt will be made to synchronize all threads in current's
      threadgroup to its new seccomp filter program. This is possible iff all
      threads are using a filter that is an ancestor to the filter current is
      attempting to synchronize to. NULL filters (where the task is running as
      SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
      transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
      ...) has been set on the calling thread, no_new_privs will be set for
      all synchronized threads too. On success, 0 is returned. On failure,
      the pid of one of the failing threads will be returned and no filters
      will have been applied.
      
      The race conditions against another thread are:
      - requesting TSYNC (already handled by sighand lock)
      - performing a clone (already handled by sighand lock)
      - changing its filter (already handled by sighand lock)
      - calling exec (handled by cred_guard_mutex)
      The clone case is assisted by the fact that new threads will have their
      seccomp state duplicated from their parent before appearing on the tasklist.
      
      Holding cred_guard_mutex means that seccomp filters cannot be assigned
      while in the middle of another thread's exec (potentially bypassing
      no_new_privs or similar). The call to de_thread() may kill threads waiting
      for the mutex.
      
      Changes across threads to the filter pointer includes a barrier.
      
      Based on patches by Will Drewry.
      Suggested-by: NJulien Tinnes <jln@chromium.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
      c2e1f2e3
    • K
      seccomp: add "seccomp" syscall · 48dc92b9
      Kees Cook 提交于
      This adds the new "seccomp" syscall with both an "operation" and "flags"
      parameter for future expansion. The third argument is a pointer value,
      used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
      be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).
      
      In addition to the TSYNC flag later in this patch series, there is a
      non-zero chance that this syscall could be used for configuring a fixed
      argument area for seccomp-tracer-aware processes to pass syscall arguments
      in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
      for this syscall. Additionally, this syscall uses operation, flags,
      and user pointer for arguments because strictly passing arguments via
      a user pointer would mean seccomp itself would be unable to trivially
      filter the seccomp syscall itself.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
      48dc92b9
  14. 18 7月, 2014 2 次提交
    • Y
      serial/uart/8250: Add tunable RX interrupt trigger I/F of FIFO buffers · aef9a7bd
      Yoshihiro YUNOMAE 提交于
      Add tunable RX interrupt trigger I/F of FIFO buffers.
      
      Serial devices are used as not only message communication devices but control
      or sending communication devices. For the latter uses, normally small data
      will be exchanged, so user applications want to receive data unit as soon as
      possible for real-time tendency. If we have a sensor which sends a 1 byte data
      each time and must control a device based on the sensor feedback, the RX
      interrupt should be triggered for each data.
      
      According to HW specification of serial UART devices, RX interrupt trigger
      can be changed, but the trigger is hard-coded. For example, RX interrupt trigger
      in 16550A can be set to 1, 4, 8, or 14 bytes for HW, but current driver sets
      the trigger to only 8bytes.
      
      This patch makes some devices change RX interrupt trigger from userland.
      
      <How to use>
      - Read current setting
       # cat /sys/class/tty/ttyS0/rx_trig_bytes
       8
      
      - Write user setting
       # echo 1 > /sys/class/tty/ttyS0/rx_trig_bytes
       # cat /sys/class/tty/ttyS0/rx_trig_bytes
       1
      
      <Support uart devices>
      - 16550A and Tegra (1, 4, 8, or 14 bytes)
      - 16650V2 (8, 16, 24, or 28 bytes)
      - 16654 (8, 16, 56, or 60 bytes)
      - 16750 (1, 16, 32, or 56 bytes)
      
      <Change log>
      Changes in V9:
       - Use attr_group instead of dev_spec_attr_group of uart_port structure
      
      Changes in V8:
       - Divide this patch from V7's patch based on Greg's comment
      
      Changes in V7:
       - Add Documentation
       - Change I/F name from rx_int_trig to rx_trig_bytes because the name
         rx_int_trig is hard to understand how users specify the value
      
      Changes in V6:
       - Move FCR_RX_TRIG_* definition in 8250.h to include/uapi/linux/serial_reg.h,
         rename those to UART_FCR_R_TRIG_*, and use UART_FCR_TRIGGER_MASK to
         UART_FCR_R_TRIG_BITS()
       - Change following function names:
          convert_fcr2val() => fcr_get_rxtrig_bytes()
          convert_val2rxtrig() => bytes_to_fcr_rxtrig()
       - Fix typo in serial8250_do_set_termios()
       - Delete the verbose error message pr_info() in bytes_to_fcr_rxtrig()
       - Rename *rx_int_trig/rx_trig* to *rxtrig* for several functions or variables
         (but UI remains rx_int_trig)
       - Change the meaningless variable name 'val' to 'bytes' following functions:
          fcr_get_rxtrig_bytes(), bytes_to_fcr_rxtrig(), do_set_rxtrig(),
          do_serial8250_set_rxtrig(), and serial8250_set_attr_rxtrig()
       - Use up->fcr in order to get rxtrig_bytes instead of rx_trig_raw in
         fcr_get_rxtrig_bytes()
       - Use conf_type->rxtrig_bytes[0] instead of switch statement for support check
         in register_dev_spec_attr_grp()
       - Delete the checking whether a user changed FCR or not when minimum buffer
         is needed in serial8250_do_set_termios()
      
      Changes in V5.1:
       - Fix FCR_RX_TRIG_MAX_STATE definition
      
      Changes in V5:
       - Support Tegra, 16650V2, 16654, and 16750
       - Store default FCR value to up->fcr when the port is first created
       - Add rx_trig_byte[] in uart_config[] for each device and use rx_trig_byte[]
         in convert_fcr2val() and convert_val2rxtrig()
      
      Changes in V4:
       - Introduce fifo_bug flag in uart_8250_port structure
         This is enabled only when parity is enabled and UART_BUG_PARITY is enabled
         for up->bugs. If this flag is enabled, user cannot set RX trigger.
       - Return -EOPNOTSUPP when it does not support device at convert_fcr2val() and
         at convert_val2rxtrig()
       - Set the nearest lower RX trigger when users input a meaningless value at
         convert_val2rxtrig()
       - Check whether p->fcr is existing at serial8250_clear_and_reinit_fifos()
       - Set fcr = up->fcr in the begging of serial8250_do_set_termios()
      
      Changes in V3:
       - Change I/F from ioctl(2) to sysfs(rx_int_trig)
      
      Changed in V2:
       - Use _IOW for TIOCSFIFORTRIG definition
       - Pass the interrupt trigger value itself
      Signed-off-by: NYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aef9a7bd
    • H
      [media] videodev2.h: add V4L2_FIELD_HAS_T_OR_B macro · e34c4db8
      Hans Verkuil 提交于
      Add a macro to test if the field consists of a single top
      or bottom field. Anyone who needs to work with fields as opposed to
      frame will need this.
      Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
      e34c4db8
  15. 17 7月, 2014 12 次提交