1. 26 3月, 2016 1 次提交
  2. 05 3月, 2016 1 次提交
    • D
      mm/pkeys: Fix siginfo ABI breakage caused by new u64 field · 49cd53bf
      Dave Hansen 提交于
      Stephen Rothwell reported this linux-next build failure:
      
      	http://lkml.kernel.org/r/20160226164406.065a1ffc@canb.auug.org.au
      
      ... caused by the Memory Protection Keys patches from the tip tree triggering
      a newly introduced build-time sanity check on an ARM build, because they changed
      the ABI of siginfo in an unexpected way.
      
      If u64 has a natural alignment of 8 bytes (which is the case on most mainstream
      platforms, with the notable exception of x86-32), then the leadup to the
      _sifields union matters:
      
      typedef struct siginfo {
              int si_signo;
              int si_errno;
              int si_code;
      
              union {
      	...
              } _sifields;
      } __ARCH_SI_ATTRIBUTES siginfo_t;
      
      Note how the first 3 fields give us 12 bytes, so _sifields is not 8
      naturally bytes aligned.
      
      Before the _pkey field addition the largest element of _sifields (on
      32-bit platforms) was 32 bits. With the u64 added, the minimum alignment
      requirement increased to 8 bytes on those (rare) 32-bit platforms. Thus
      GCC padded the space after si_code with 4 extra bytes, and shifted all
      _sifields offsets by 4 bytes - breaking the ABI of all of those
      remaining fields.
      
      On 64-bit platforms this problem was hidden due to _sifields already
      having numerous fields with natural 8 bytes alignment (pointers).
      
      To fix this, we replace the u64 with an '__u32'.  The __u32 does not
      increase the minimum alignment requirement of the union, and it is
      also large enough to store the 16-bit pkey we have today on x86.
      Reported-by: NStehen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NStehen Rothwell <sfr@canb.auug.org.au>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-next@vger.kernel.org
      Fixes: cd0ea35f ("signals, pkeys: Notify userspace about protection key faults")
      Link: http://lkml.kernel.org/r/20160301125451.02C7426D@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      49cd53bf
  3. 26 2月, 2016 1 次提交
    • T
      net: Facility to report route quality of connected sockets · a87cb3e4
      Tom Herbert 提交于
      This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
      purpose is to allow an application to give feedback to the kernel about
      the quality of the network path for a connected socket. The value
      argument indicates the type of quality report. For this initial patch
      the only supported advice is a value of 1 which indicates "bad path,
      please reroute"-- the action taken by the kernel is to call
      dst_negative_advice which will attempt to choose a different ECMP route,
      reset the TX hash for flow label and UDP source port in encapsulation,
      etc.
      
      This facility should be useful for connected UDP sockets where only the
      application can provide any feedback about path quality. It could also
      be useful for TCP applications that have additional knowledge about the
      path outside of the normal TCP control loop.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a87cb3e4
  4. 18 2月, 2016 1 次提交
  5. 23 1月, 2016 1 次提交
  6. 05 1月, 2016 1 次提交
  7. 15 12月, 2015 1 次提交
  8. 29 10月, 2015 1 次提交
  9. 16 9月, 2015 1 次提交
  10. 13 5月, 2015 1 次提交
  11. 10 2月, 2015 1 次提交
  12. 06 1月, 2015 1 次提交
  13. 06 12月, 2014 1 次提交
    • A
      net: sock: allow eBPF programs to be attached to sockets · 89aa0758
      Alexei Starovoitov 提交于
      introduce new setsockopt() command:
      
      setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))
      
      where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
      and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER
      
      setsockopt() calls bpf_prog_get() which increments refcnt of the program,
      so it doesn't get unloaded while socket is using the program.
      
      The same eBPF program can be attached to multiple sockets.
      
      User task exit automatically closes socket which calls sk_filter_uncharge()
      which decrements refcnt of eBPF program
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89aa0758
  14. 20 11月, 2014 1 次提交
  15. 18 11月, 2014 1 次提交
  16. 12 11月, 2014 1 次提交
    • E
      net: introduce SO_INCOMING_CPU · 2c8c56e1
      Eric Dumazet 提交于
      Alternative to RPS/RFS is to use hardware support for multiple
      queues.
      
      Then split a set of million of sockets into worker threads, each
      one using epoll() to manage events on its own socket pool.
      
      Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
      know after accept() or connect() on which queue/cpu a socket is managed.
      
      We normally use one cpu per RX queue (IRQ smp_affinity being properly
      set), so remembering on socket structure which cpu delivered last packet
      is enough to solve the problem.
      
      After accept(), connect(), or even file descriptor passing around
      processes, applications can use :
      
       int cpu;
       socklen_t len = sizeof(cpu);
      
       getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
      
      And use this information to put the socket into the right silo
      for optimal performance, as all networking stack should run
      on the appropriate cpu, without need to send IPI (RPS/RFS).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c8c56e1
  17. 10 10月, 2014 1 次提交
  18. 16 9月, 2014 1 次提交
  19. 19 8月, 2014 1 次提交
  20. 07 8月, 2014 1 次提交
  21. 24 6月, 2014 1 次提交
  22. 20 5月, 2014 1 次提交
  23. 18 4月, 2014 1 次提交
  24. 29 1月, 2014 1 次提交
  25. 19 1月, 2014 1 次提交
    • M
      net: introduce SO_BPF_EXTENSIONS · ea02f941
      Michal Sekletar 提交于
      For user space packet capturing libraries such as libpcap, there's
      currently only one way to check which BPF extensions are supported
      by the kernel, that is, commit aa1113d9 ("net: filter: return
      -EINVAL if BPF_S_ANC* operation is not supported"). For querying all
      extensions at once this might be rather inconvenient.
      
      Therefore, this patch introduces a new option which can be used as
      an argument for getsockopt(), and allows one to obtain information
      about which BPF extensions are supported by the current kernel.
      
      As David Miller suggests, we do not need to define any bits right
      now and status quo can just return 0 in order to state that this
      versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
      additions to BPF extensions need to add their bits to the
      bpf_tell_extensions() function, as documented in the comment.
      Signed-off-by: NMichal Sekletar <msekleta@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Reviewed-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea02f941
  26. 11 12月, 2013 1 次提交
  27. 29 9月, 2013 1 次提交
    • E
      net: introduce SO_MAX_PACING_RATE · 62748f32
      Eric Dumazet 提交于
      As mentioned in commit afe4fd06 ("pkt_sched: fq: Fair Queue packet
      scheduler"), this patch adds a new socket option.
      
      SO_MAX_PACING_RATE offers the application the ability to cap the
      rate computed by transport layer. Value is in bytes per second.
      
      u32 val = 1000000;
      setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));
      
      To be effectively paced, a flow must use FQ packet scheduler.
      
      Note that a packet scheduler takes into account the headers for its
      computations. The effective payload rate depends on MSS and retransmits
      if any.
      
      I chose to make this pacing rate a SOL_SOCKET option instead of a
      TCP one because this can be used by other protocols.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Steinar H. Gunderson <sesse@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62748f32
  28. 11 7月, 2013 1 次提交
  29. 18 6月, 2013 1 次提交
  30. 28 4月, 2013 1 次提交
  31. 01 4月, 2013 1 次提交
    • K
      net: add option to enable error queue packets waking select · 7d4c04fc
      Keller, Jacob E 提交于
      Currently, when a socket receives something on the error queue it only wakes up
      the socket on select if it is in the "read" list, that is the socket has
      something to read. It is useful also to wake the socket if it is in the error
      list, which would enable software to wait on error queue packets without waking
      up for regular data on the socket. The main use case is for receiving
      timestamped transmit packets which return the timestamp to the socket via the
      error queue. This enables an application to select on the socket for the error
      queue only instead of for the regular traffic.
      
      -v2-
      * Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file
      * Modified every socket poll function that checks error queue
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Matthew Vick <matthew.vick@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d4c04fc
  32. 24 1月, 2013 1 次提交
  33. 17 1月, 2013 1 次提交
    • V
      sk-filter: Add ability to lock a socket filter program · d59577b6
      Vincent Bernat 提交于
      While a privileged program can open a raw socket, attach some
      restrictive filter and drop its privileges (or send the socket to an
      unprivileged program through some Unix socket), the filter can still
      be removed or modified by the unprivileged program. This commit adds a
      socket option to lock the filter (SO_LOCK_FILTER) preventing any
      modification of a socket filter program.
      
      This is similar to OpenBSD BIOCLOCK ioctl on bpf sockets, except even
      root is not allowed change/drop the filter.
      
      The state of the lock can be read with getsockopt(). No error is
      triggered if the state is not changed. -EPERM is returned when a user
      tries to remove the lock or to change/remove the filter while the lock
      is active. The check is done directly in sk_attach_filter() and
      sk_detach_filter() and does not affect only setsockopt() syscall.
      Signed-off-by: NVincent Bernat <bernat@luffy.cx>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d59577b6
  34. 04 1月, 2013 1 次提交
  35. 20 12月, 2012 1 次提交
  36. 01 11月, 2012 1 次提交
    • P
      sk-filter: Add ability to get socket filter program (v2) · a8fc9277
      Pavel Emelyanov 提交于
      The SO_ATTACH_FILTER option is set only. I propose to add the get
      ability by using SO_ATTACH_FILTER in getsockopt. To be less
      irritating to eyes the SO_GET_FILTER alias to it is declared. This
      ability is required by checkpoint-restore project to be able to
      save full state of a socket.
      
      There are two issues with getting filter back.
      
      First, kernel modifies the sock_filter->code on filter load, thus in
      order to return the filter element back to user we have to decode it
      into user-visible constants. Fortunately the modification in question
      is interconvertible.
      
      Second, the BPF_S_ALU_DIV_K code modifies the command argument k to
      speed up the run-time division by doing kernel_k = reciprocal(user_k).
      Bad news is that different user_k may result in same kernel_k, so we
      can't get the original user_k back. Good news is that we don't have
      to do it. What we need to is calculate a user2_k so, that
      
        reciprocal(user2_k) == reciprocal(user_k) == kernel_k
      
      i.e. if it's re-loaded back the compiled again value will be exactly
      the same as it was. That said, the user2_k can be calculated like this
      
        user2_k = reciprocal(kernel_k)
      
      with an exception, that if kernel_k == 0, then user2_k == 1.
      
      The optlen argument is treated like this -- when zero, kernel returns
      the amount of sock_fprog elements in filter, otherwise it should be
      large enough for the sock_fprog array.
      
      changes since v1:
      * Declared SO_GET_FILTER in all arch headers
      * Added decode of vlan-tag codes
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8fc9277
  37. 17 10月, 2012 1 次提交
    • D
      UAPI: Put a comment into uapi/asm-generic/kvm_para.h and use it from arches · 0420c87e
      David Howells 提交于
      Make uapi/asm-generic/kvm_para.h non-empty by addition of a comment to stop
      the patch program from deleting it when it creates it.
      
      Then delete empty arch-specific uapi/asm/kvm_para.h files and tell the Kbuild
      files to use the generic instead.
      
      Should this perhaps instead be a #warning or #error that the facility is
      unsupported on this arch?
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Arnd Bergmann <arnd@arndb.de>
      cc: Avi Kivity <avi@redhat.com>
      cc: Marcelo Tosatti <mtosatti@redhat.com>
      cc: kvm@vger.kernel.org
      0420c87e
  38. 09 10月, 2012 1 次提交
  39. 03 10月, 2012 1 次提交