1. 06 1月, 2015 2 次提交
    • D
      net: tcp: add RTAX_CC_ALGO fib handling · ea697639
      Daniel Borkmann 提交于
      This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
      control metric to be set up and dumped back to user space.
      
      While the internal representation of RTAX_CC_ALGO is handled as a u32
      key, we avoided to expose this implementation detail to user space, thus
      instead, we chose the netlink attribute that is being exchanged between
      user space to be the actual congestion control algorithm name, similarly
      as in the setsockopt(2) API in order to allow for maximum flexibility,
      even for 3rd party modules.
      
      It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
      it should have been stored in RTAX_FEATURES instead, we first thought
      about reusing it for the congestion control key, but it brings more
      complications and/or confusion than worth it.
      
      Joint work with Florian Westphal.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea697639
    • H
      net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined · 6cb69742
      Hubert Sokolowski 提交于
      Add checking whether the call to ndo_dflt_fdb_dump is needed.
      It is not expected to call ndo_dflt_fdb_dump unconditionally
      by some drivers (i.e. qlcnic or macvlan) that defines
      own ndo_fdb_dump. Other drivers define own ndo_fdb_dump
      and don't want ndo_dflt_fdb_dump to be called at all.
      At the same time it is desirable to call the default dump
      function on a bridge device.
      Fix attributes that are passed to dev->netdev_ops->ndo_fdb_dump.
      Add extra checking in br_fdb_dump to avoid duplicate entries
      as now filter_dev can be NULL.
      
      Following tests for filtering have been performed before
      the change and after the patch was applied to make sure
      they are the same and it doesn't break the filtering algorithm.
      
      [root@localhost ~]# cd /root/iproute2-3.18.0/bridge
      [root@localhost bridge]# modprobe dummy
      [root@localhost bridge]# ./bridge fdb add f1:f2:f3:f4:f5:f6 dev dummy0
      [root@localhost bridge]# brctl addbr br0
      [root@localhost bridge]# brctl addif  br0 dummy0
      [root@localhost bridge]# ip link set dev br0 address 02:00:00:12:01:04
      [root@localhost bridge]# # show all
      [root@localhost bridge]# ./bridge fdb show
      33:33:00:00:00:01 dev p2p1 self permanent
      01:00:5e:00:00:01 dev p2p1 self permanent
      33:33:ff:ac:ce:32 dev p2p1 self permanent
      33:33:00:00:02:02 dev p2p1 self permanent
      01:00:5e:00:00:fb dev p2p1 self permanent
      33:33:00:00:00:01 dev p7p1 self permanent
      01:00:5e:00:00:01 dev p7p1 self permanent
      33:33:ff:79:50:53 dev p7p1 self permanent
      33:33:00:00:02:02 dev p7p1 self permanent
      01:00:5e:00:00:fb dev p7p1 self permanent
      f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
      f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
      33:33:00:00:00:01 dev dummy0 self permanent
      f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
      33:33:00:00:00:01 dev br0 self permanent
      02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
      02:00:00:12:01:04 dev br0 master br0 permanent
      [root@localhost bridge]# # filter by bridge
      [root@localhost bridge]# ./bridge fdb show br br0
      f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
      f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
      33:33:00:00:00:01 dev dummy0 self permanent
      f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
      33:33:00:00:00:01 dev br0 self permanent
      02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
      02:00:00:12:01:04 dev br0 master br0 permanent
      [root@localhost bridge]# # filter by port
      [root@localhost bridge]# ./bridge fdb show brport dummy0
      f2:46:50:85:6d:d9 master br0 permanent
      f2:46:50:85:6d:d9 vlan 1 master br0 permanent
      33:33:00:00:00:01 self permanent
      f1:f2:f3:f4:f5:f6 self permanent
      [root@localhost bridge]# # filter by port + bridge
      [root@localhost bridge]# ./bridge fdb show br br0 brport dummy0
      f2:46:50:85:6d:d9 master br0 permanent
      f2:46:50:85:6d:d9 vlan 1 master br0 permanent
      33:33:00:00:00:01 self permanent
      f1:f2:f3:f4:f5:f6 self permanent
      [root@localhost bridge]#
      Signed-off-by: NHubert Sokolowski <hubert.sokolowski@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cb69742
  2. 03 1月, 2015 1 次提交
  3. 27 12月, 2014 2 次提交
    • J
      net: Generalize ndo_gso_check to ndo_features_check · 5f35227e
      Jesse Gross 提交于
      GSO isn't the only offload feature with restrictions that
      potentially can't be expressed with the current features mechanism.
      Checksum is another although it's a general issue that could in
      theory apply to anything. Even if it may be possible to
      implement these restrictions in other ways, it can result in
      duplicate code or inefficient per-packet behavior.
      
      This generalizes ndo_gso_check so that drivers can remove any
      features that don't make sense for a given packet, similar to
      netif_skb_features(). It also converts existing driver
      restrictions to the new format, completing the work that was
      done to support tunnel protocols since the issues apply to
      checksums as well.
      
      By actually removing features from the set that are used to do
      offloading, it solves another problem with the existing
      interface. In these cases, GSO would run with the original set
      of features and not do anything because it appears that
      segmentation is not required.
      
      CC: Tom Herbert <therbert@google.com>
      CC: Joe Stringer <joestringer@nicira.com>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Hayes Wang <hayeswang@realtek.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NTom Herbert <therbert@google.com>
      Fixes: 04ffcb25 ("net: Add ndo_gso_check")
      Tested-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f35227e
    • J
      net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding · 2c26d34b
      Jay Vosburgh 提交于
      When using VXLAN tunnels and a sky2 device, I have experienced
      checksum failures of the following type:
      
      [ 4297.761899] eth0: hw csum failure
      [...]
      [ 4297.765223] Call Trace:
      [ 4297.765224]  <IRQ>  [<ffffffff8172f026>] dump_stack+0x46/0x58
      [ 4297.765235]  [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
      [ 4297.765238]  [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
      [ 4297.765240]  [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
      [ 4297.765243]  [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
      [ 4297.765246]  [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360
      
      	These are reliably reproduced in a network topology of:
      
      container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch
      
      	When VXLAN encapsulated traffic is received from a similarly
      configured peer, the above warning is generated in the receive
      processing of the encapsulated packet.  Note that the warning is
      associated with the container eth0.
      
              The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and
      because the packet is an encapsulated Ethernet frame, the checksum
      generated by the hardware includes the inner protocol and Ethernet
      headers.
      
      	The receive code is careful to update the skb->csum, except in
      __dev_forward_skb, as called by dev_forward_skb.  __dev_forward_skb
      calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN)
      to skip over the Ethernet header, but does not update skb->csum when
      doing so.
      
      	This patch resolves the problem by adding a call to
      skb_postpull_rcsum to update the skb->csum after the call to
      eth_type_trans.
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c26d34b
  4. 24 12月, 2014 8 次提交
  5. 17 12月, 2014 1 次提交
  6. 11 12月, 2014 4 次提交
    • A
      net: sock: fix access via invalid file descriptor · 198bf1b0
      Alexei Starovoitov 提交于
      0day robot reported the following crash:
      [   21.233581] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
      [   21.234709] IP: [<ffffffff8156ebda>] sk_attach_bpf+0x39/0xc2
      
      It's due to bpf_prog_get() returning ERR_PTR.
      Check it properly.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Fixes: 89aa0758 ("net: sock: allow eBPF programs to be attached to sockets")
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      198bf1b0
    • G
      net: introduce helper macro for_each_cmsghdr · f95b414e
      Gu Zheng 提交于
      Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
      cmsghdr from msghdr, just cleanup.
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f95b414e
    • A
      net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb · fd11a83d
      Alexander Duyck 提交于
      This change pulls the core functionality out of __netdev_alloc_skb and
      places them in a new function named __alloc_rx_skb.  The reason for doing
      this is to make these bits accessible to a new function __napi_alloc_skb.
      In addition __alloc_rx_skb now has a new flags value that is used to
      determine which page frag pool to allocate from.  If the SKB_ALLOC_NAPI
      flag is set then the NAPI pool is used.  The advantage of this is that we
      do not have to use local_irq_save/restore when accessing the NAPI pool from
      NAPI context.
      
      In my test setup I saw at least 11ns of savings using the napi_alloc_skb
      function versus the netdev_alloc_skb function, most of this being due to
      the fact that we didn't have to call local_irq_save/restore.
      
      The main use case for napi_alloc_skb would be for things such as copybreak
      or page fragment based receive paths where an skb is allocated after the
      data has been received instead of before.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd11a83d
    • A
      net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag · ffde7328
      Alexander Duyck 提交于
      This patch splits the netdev_alloc_frag function up so that it can be used
      on one of two page frag pools instead of being fixed on the
      netdev_alloc_cache.  By doing this we can add a NAPI specific function
      __napi_alloc_frag that accesses a pool that is only used from softirq
      context.  The advantage to this is that we do not need to call
      local_irq_save/restore which can be a significant savings.
      
      I also took the opportunity to refactor the core bits that were placed in
      __alloc_page_frag.  First I updated the allocation to do either a 32K
      allocation or an order 0 page.  This is based on the changes in commmit
      d9b2938a where it was found that latencies could be reduced in case of
      failures.  Then I also rewrote the logic to work from the end of the page to
      the start.  By doing this the size value doesn't have to be used unless we
      have run out of space for page fragments.  Finally I cleaned up the atomic
      bits so that we just do an atomic_sub_and_test and if that returns true then
      we set the page->_count via an atomic_set.  This way we can remove the extra
      conditional for the atomic_read since it would have led to an atomic_inc in
      the case of success anyway.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ffde7328
  7. 10 12月, 2014 7 次提交
    • R
      rocker: remove swdev mode · 1d460b98
      Roopa Prabhu 提交于
      Remove use of 'swdev' mode in rocker. rocker dev offloads
      can use the BRIDGE_FLAGS_SELF to indicate offload to hardware.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d460b98
    • L
      net: avoid to call skb_queue_len again · e008f3f0
      Li RongQing 提交于
      the queue length of sd->input_pkt_queue has been put into qlen,
      and impossible to change, since hold the lock
      Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e008f3f0
    • A
      skb_copy_datagram_iovec() can die · d3a9632f
      Al Viro 提交于
      no callers other than itself.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d3a9632f
    • A
      switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives · e5a4b0bb
      Al Viro 提交于
      ... making both non-draining.  That means that tcp_recvmsg() becomes
      non-draining.  And _that_ would break iscsit_do_rx_data() unless we
      	a) make sure tcp_recvmsg() is uniformly non-draining (it is)
      	b) make sure it copes with arbitrary (including shifted)
      iov_iter (it does, all it uses is iov_iter primitives)
      	c) make iscsit_do_rx_data() initialize ->msg_iter only once.
      
      Fortunately, (c) is doable with minimal work and we are rid of one
      the two places where kernel send/recvmsg users would be unhappy with
      non-draining behaviour.
      
      Actually, that makes all but one of ->recvmsg() instances iov_iter-clean.
      The exception is skcipher_recvmsg() and it also isn't hard to convert
      to primitives (iov_iter_get_pages() is needed there).  That'll wait
      a bit - there's some interplay with ->sendmsg() path for that one.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e5a4b0bb
    • H
      dst: no need to take reference on DST_NOCACHE dsts · dbfc4fb7
      Hannes Frederic Sowa 提交于
      Since commit f8864972 ("ipv4: fix dst race in sk_dst_get()")
      DST_NOCACHE dst_entries get freed by RCU. So there is no need to get a
      reference on them when we are in rcu protected sections.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbfc4fb7
    • E
      net: avoid two atomic operations in fast clones · 6ffe75eb
      Eric Dumazet 提交于
      Commit ce1a4ea3 ("net: avoid one atomic operation in skb_clone()")
      took the wrong way to save one atomic operation.
      
      It is actually possible to avoid two atomic operations, if we
      do not change skb->fclone values, and only rely on clone_ref
      content to signal if the clone is available or not.
      
      skb_clone() can simply use the fast clone if clone_ref is 1.
      
      kfree_skbmem() can avoid the atomic_dec_and_test() if clone_ref is 1.
      
      Note that because we usually free the clone before the original skb,
      this particular attempt is only done for the original skb to have better
      branch prediction.
      
      SKB_FCLONE_FREE is removed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Sabrina Dubroca <sd@queasysnail.net>
      Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ffe75eb
    • M
      rtnetlink: delay RTM_DELLINK notification until after ndo_uninit() · 395eea6c
      Mahesh Bandewar 提交于
      The commit 56bfa7ee ("unregister_netdevice : move RTM_DELLINK to
      until after ndo_uninit") tried to do this ealier but while doing so
      it created a problem. Unfortunately the delayed rtmsg_ifinfo() also
      delayed call to fill_info(). So this translated into asking driver
      to remove private state and then query it's private state. This
      could have catastropic consequences.
      
      This change breaks the rtmsg_ifinfo() into two parts - one takes the
      precise snapshot of the device by called fill_info() before calling
      the ndo_uninit() and the second part sends the notification using
      collected snapshot.
      
      It was brought to notice when last link is deleted from an ipvlan device
      when it has free-ed the port and the subsequent .fill_info() call is
      trying to get the info from the port.
      
      kernel: [  255.139429] ------------[ cut here ]------------
      kernel: [  255.139439] WARNING: CPU: 12 PID: 11173 at net/core/rtnetlink.c:2238 rtmsg_ifinfo+0x100/0x110()
      kernel: [  255.139493] Modules linked in: ipvlan bonding w1_therm ds2482 wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid bnx2x ptp pps_core mdio libcrc32c
      kernel: [  255.139513] CPU: 12 PID: 11173 Comm: ip Not tainted 3.18.0-smp-DEV #167
      kernel: [  255.139514] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 1.0.10 05/15/2012
      kernel: [  255.139515]  0000000000000009 ffff880851b6b828 ffffffff815d87f4 00000000000000e0
      kernel: [  255.139516]  0000000000000000 ffff880851b6b868 ffffffff8109c29c 0000000000000000
      kernel: [  255.139518]  00000000ffffffa6 00000000000000d0 ffffffff81aaf580 0000000000000011
      kernel: [  255.139520] Call Trace:
      kernel: [  255.139527]  [<ffffffff815d87f4>] dump_stack+0x46/0x58
      kernel: [  255.139531]  [<ffffffff8109c29c>] warn_slowpath_common+0x8c/0xc0
      kernel: [  255.139540]  [<ffffffff8109c2ea>] warn_slowpath_null+0x1a/0x20
      kernel: [  255.139544]  [<ffffffff8150d570>] rtmsg_ifinfo+0x100/0x110
      kernel: [  255.139547]  [<ffffffff814f78b5>] rollback_registered_many+0x1d5/0x2d0
      kernel: [  255.139549]  [<ffffffff814f79cf>] unregister_netdevice_many+0x1f/0xb0
      kernel: [  255.139551]  [<ffffffff8150acab>] rtnl_dellink+0xbb/0x110
      kernel: [  255.139553]  [<ffffffff8150da90>] rtnetlink_rcv_msg+0xa0/0x240
      kernel: [  255.139557]  [<ffffffff81329283>] ? rhashtable_lookup_compare+0x43/0x80
      kernel: [  255.139558]  [<ffffffff8150d9f0>] ? __rtnl_unlock+0x20/0x20
      kernel: [  255.139562]  [<ffffffff8152cb11>] netlink_rcv_skb+0xb1/0xc0
      kernel: [  255.139563]  [<ffffffff8150a495>] rtnetlink_rcv+0x25/0x40
      kernel: [  255.139565]  [<ffffffff8152c398>] netlink_unicast+0x178/0x230
      kernel: [  255.139567]  [<ffffffff8152c75f>] netlink_sendmsg+0x30f/0x420
      kernel: [  255.139571]  [<ffffffff814e0b0c>] sock_sendmsg+0x9c/0xd0
      kernel: [  255.139575]  [<ffffffff811d1d7f>] ? rw_copy_check_uvector+0x6f/0x130
      kernel: [  255.139577]  [<ffffffff814e11c9>] ? copy_msghdr_from_user+0x139/0x1b0
      kernel: [  255.139578]  [<ffffffff814e1774>] ___sys_sendmsg+0x304/0x310
      kernel: [  255.139581]  [<ffffffff81198723>] ? handle_mm_fault+0xca3/0xde0
      kernel: [  255.139585]  [<ffffffff811ebc4c>] ? destroy_inode+0x3c/0x70
      kernel: [  255.139589]  [<ffffffff8108e6ec>] ? __do_page_fault+0x20c/0x500
      kernel: [  255.139597]  [<ffffffff811e8336>] ? dput+0xb6/0x190
      kernel: [  255.139606]  [<ffffffff811f05f6>] ? mntput+0x26/0x40
      kernel: [  255.139611]  [<ffffffff811d2b94>] ? __fput+0x174/0x1e0
      kernel: [  255.139613]  [<ffffffff814e2129>] __sys_sendmsg+0x49/0x90
      kernel: [  255.139615]  [<ffffffff814e2182>] SyS_sendmsg+0x12/0x20
      kernel: [  255.139617]  [<ffffffff815df092>] system_call_fastpath+0x12/0x17
      kernel: [  255.139619] ---[ end trace 5e6703e87d984f6b ]---
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Reported-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: David S. Miller <davem@davemloft.net>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      395eea6c
  8. 09 12月, 2014 1 次提交
    • E
      ethtool: Support for configurable RSS hash function · 892311f6
      Eyal Perry 提交于
      This patch extends the set/get_rxfh ethtool-options for getting or
      setting the RSS hash function.
      
      It modifies drivers implementation of set/get_rxfh accordingly.
      
      This change also delegates the responsibility of checking whether a
      modification to a certain RX flow hash parameter is supported to the
      driver implementation of set_rxfh.
      
      User-kernel API is done through the new hfunc bitmask field in the
      ethtool_rxfh struct. A bit set in the hfunc field is corresponding to an
      index in the new string-set ETH_SS_RSS_HASH_FUNCS.
      
      Got approval from most of the relevant driver maintainers that their
      driver is using Toeplitz, and for the few that didn't answered, also
      assumed it is Toeplitz.
      
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Ariel Elior <ariel.elior@qlogic.com>
      Cc: Prashant Sreedharan <prashant@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Cc: Hariprasad S <hariprasad@chelsio.com>
      Cc: Sathya Perla <sathya.perla@emulex.com>
      Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
      Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Cc: Bruce Allan <bruce.w.allan@intel.com>
      Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
      Cc: Don Skidmore <donald.c.skidmore@intel.com>
      Cc: Greg Rose <gregory.v.rose@intel.com>
      Cc: Matthew Vick <matthew.vick@intel.com>
      Cc: John Ronciak <john.ronciak@intel.com>
      Cc: Mitch Williams <mitch.a.williams@intel.com>
      Cc: Amir Vadai <amirv@mellanox.com>
      Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
      Cc: Shradha Shah <sshah@solarflare.com>
      Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
      Cc: "VMware, Inc." <pv-drivers@vmware.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: NEyal Perry <eyalpe@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      892311f6
  9. 06 12月, 2014 1 次提交
    • A
      net: sock: allow eBPF programs to be attached to sockets · 89aa0758
      Alexei Starovoitov 提交于
      introduce new setsockopt() command:
      
      setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))
      
      where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
      and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER
      
      setsockopt() calls bpf_prog_get() which increments refcnt of the program,
      so it doesn't get unloaded while socket is using the program.
      
      The same eBPF program can be attached to multiple sockets.
      
      User task exit automatically closes socket which calls sk_filter_uncharge()
      which decrements refcnt of eBPF program
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89aa0758
  10. 05 12月, 2014 6 次提交
  11. 03 12月, 2014 5 次提交
  12. 30 11月, 2014 1 次提交
  13. 27 11月, 2014 1 次提交