1. 07 3月, 2015 4 次提交
  2. 05 3月, 2015 3 次提交
  3. 03 3月, 2015 13 次提交
    • A
      at86rf230: restore trx len when needed · 263be332
      Alexander Aring 提交于
      In the most cases the spi messages has a length of two. Currently we
      always set the the len field to two before transmit a spi message. In
      cases for read out/write in the frame buffer we need another len. This
      patch use trx len two as default. For the frame buffer cases we restore
      the trx len to two on success and failure. This will reduce the len
      setting of two when it's already two.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      263be332
    • A
      at86rf230: remove multiple dereferencing for ctx · 31fa7434
      Alexander Aring 提交于
      This patch cleanups the referencing for the state change context
      variable. The state change context should only set once and this is by
      initial a state change. This patch will use the initial state change
      variable in the complete handler of the state change by using the ctx
      context which should be always the same like the initial state change
      context.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      31fa7434
    • A
      at86rf230: remove multiple dereferencing for irq · cca990c8
      Alexander Aring 提交于
      By holding the irq variable inside at86rf230_state_change we can squash
      some multiple dereferencing for getting irq num.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      cca990c8
    • A
      at86rf230: refactor receive handling · 74de4c80
      Alexander Aring 提交于
      This patch refactor the receive handling into one function.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      74de4c80
    • A
      at86rf230: cleanup and squash stack variable · ef5428a1
      Alexander Aring 提交于
      I had this variable because I thought it would be protected by
      disable/enable irq but this is not true. It's protected by stop/wake
      netdev queue which is called by ieee802154_xmit_complete.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      ef5428a1
    • A
      at86rf230: add transmit retry support · ba6d2239
      Alexander Aring 提交于
      This patch introduce a transmit retry handling into at86rf230 transmit
      path. Current behaviour is to wait the normal receive time if we want
      to go into STATE_TX_ON when the transceiver is in STATE_BUSY_RX_AACK
      which indicates that a frame is currently receiving. A non force state
      change will not interrupt the the receiving state.
      
      The current behaviour is that after the normal receive time we will
      start a force change into STATE_TX_ON. With this patch we do seven
      retries to go into STATE_TX_ON without forcing. After we hit the
      AT86RF2XX_MAX_TX_RETRIES we will start the force state change.
      This is a polling like method to go into STATE_TX_ON in times of maximum
      receiving time.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      ba6d2239
    • K
      Bluetooth: btusb: Add support for QCA ROME chipset family · 3267c884
      Kim, Ben Young Tae 提交于
      This patch supports ROME Bluetooth family from Qualcomm Atheros,
      e.g. QCA61x4 or QCA6574.
      
      New chipset have similar firmware downloading sequences to previous
      chipset from Atheros, however, it doesn't support vid/pid switching
      after downloading the patch so that firmware needs to be handled by
      btusb module directly.
      
      ROME chipset can be differentiated from previous version by reading
      ROM version.
      
      T:  Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 16 Spd=12   MxCh= 0
      D:  Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=0cf3 ProdID=e300 Rev= 0.01
      C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
      I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
      I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
      I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
      I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
      I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
      I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
      
      T:  Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  8 Spd=12   MxCh= 0
      D:  Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=0cf3 ProdID=e360 Rev= 0.01
      C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
      I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
      I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
      I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
      I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
      I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
      I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
      Signed-off-by: NBen Young Tae Kim <ytkim@qca.qualcomm.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      3267c884
    • K
      Bluetooth: btusb: Add setup callback for chip init on USB · ace31982
      Kim, Ben Young Tae 提交于
      Some of chipset does not allow to send a patch or config files through
      HCI VS channel at early stage as well as they don't support to send
      USB patch files to other channel except USB bulk path.
      
      New callback added is for initialization of BT controller through USB
      Signed-off-by: NBen Young Tae Kim <ytkim@qca.qualcomm.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      ace31982
    • D
      filter: refactor common filter attach code into __sk_attach_prog · 49b31e57
      Daniel Borkmann 提交于
      Both sk_attach_filter() and sk_attach_bpf() are setting up sk_filter,
      charging skmem and attaching it to the socket after we got the eBPF
      prog up and ready. Lets refactor that into a common helper.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49b31e57
    • D
      Merge branch 'for-upstream' of... · 70c836a4
      David S. Miller 提交于
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2015-03-02
      
      Here's the first bluetooth-next pull request targeting the 4.1 kernel:
      
       - ieee802154/6lowpan cleanups
       - SCO routing to host interface support for the btmrvl driver
       - AMP code cleanups
       - Fixes to AMP HCI init sequence
       - Refactoring of the HCI callback mechanism
       - Added shutdown routine for Intel controllers in the btusb driver
       - New config option to enable/disable Bluetooth debugfs information
       - Fix for early data reception on L2CAP fixed channels
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70c836a4
    • D
      Merge branch 'sendmsg_recvmsg_iocb_removal' · b4844353
      David S. Miller 提交于
      Ying Xue says:
      
      ====================
      net: Remove iocb argument from sendmsg and recvmsg
      
      Currently there is only one user - TIPC whose sendmsg() instances
      using iocb argument. Meanwhile, there is no user using iocb argument
      in its recvmsg() instance. Therefore, if we eliminate the werid usage
      of iobc argument from TIPC, the iocb argument can be removed from
      all sendmsg() and recvmsg() instances of the whole networking stack.
      
      Reference:
      https://patchwork.ozlabs.org/patch/433960/
      
      Changes:
      
      v2:
       * Fix compile errors of DCCP module pointed by David
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4844353
    • Y
      net: Remove iocb argument from sendmsg and recvmsg · 1b784140
      Ying Xue 提交于
      After TIPC doesn't depend on iocb argument in its internal
      implementations of sendmsg() and recvmsg() hooks defined in proto
      structure, no any user is using iocb argument in them at all now.
      Then we can drop the redundant iocb argument completely from kinds of
      implementations of both sendmsg() and recvmsg() in the entire
      networking stack.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b784140
    • Y
      tipc: Don't use iocb argument in socket layer · 39a0295f
      Ying Xue 提交于
      Currently the iocb argument is used to idenfiy whether or not socket
      lock is hold before tipc_sendmsg()/tipc_send_stream() is called. But
      this usage prevents iocb argument from being dropped through sendmsg()
      at socket common layer. Therefore, in the commit we introduce two new
      functions called __tipc_sendmsg() and __tipc_send_stream(). When they
      are invoked, it assumes that their callers have taken socket lock,
      thereby avoiding the weird usage of iocb argument.
      
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39a0295f
  4. 02 3月, 2015 20 次提交
    • D
      Merge branch 'dropcount' · 6556c385
      David S. Miller 提交于
      Eyal Birger says:
      
      ====================
      net: move skb->dropcount to skb->cb[]
      
      Commit 97775007 ("af_packet: add interframe drop cmsg (v6)")
      unionized skb->mark and skb->dropcount in order to allow recording
      of the socket drop count while maintaining struct sk_buff size.
      
      skb->dropcount was introduced since there was no available room
      in skb->cb[] in packet sockets. However, its introduction led to
      the inability to export skb->mark to userspace.
      
      It was considered to alias skb->priority instead of skb->mark.
      However, that would lead to the inabilty to export skb->priority
      to userspace if desired. Such change may also lead to hard-to-find
      issues as skb->priority is assumed to be alias free, and, as noted
      by Shmulik Ladkani, is not 'naturally orthogonal' with other skb
      fields.
      
      This patch series follows the suggestions made by Eric Dumazet moving
      the dropcount metric to skb->cb[], eliminating this problem
      at the expense of 4 bytes less in skb->cb[] for protocol families
      using it.
      
      The patch series include compactization of bluetooth and packet
      use of skb->cb[] as well as the infrastructure for placing dropcount
      in skb->cb[].
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6556c385
    • E
      net: move skb->dropcount to skb->cb[] · 744d5a3e
      Eyal Birger 提交于
      Commit 97775007 ("af_packet: add interframe drop cmsg (v6)")
      unionized skb->mark and skb->dropcount in order to allow recording
      of the socket drop count while maintaining struct sk_buff size.
      
      skb->dropcount was introduced since there was no available room
      in skb->cb[] in packet sockets. However, its introduction led to
      the inability to export skb->mark, or any other aliased field to
      userspace if so desired.
      
      Moving the dropcount metric to skb->cb[] eliminates this problem
      at the expense of 4 bytes less in skb->cb[] for protocol families
      using it.
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      744d5a3e
    • E
      net: add common accessor for setting dropcount on packets · 3bc3b96f
      Eyal Birger 提交于
      As part of an effort to move skb->dropcount to skb->cb[], use
      a common function in order to set dropcount in struct sk_buff.
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bc3b96f
    • E
      net: use common macro for assering skb->cb[] available size in protocol families · b4772ef8
      Eyal Birger 提交于
      As part of an effort to move skb->dropcount to skb->cb[] use a common
      macro in protocol families using skb->cb[] for ancillary data to
      validate available room in skb->cb[].
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4772ef8
    • E
      net: packet: use sockaddr_ll fields as storage for skb original length in recvmsg path · 2472d761
      Eyal Birger 提交于
      As part of an effort to move skb->dropcount to skb->cb[], 4 bytes
      of additional room are needed in skb->cb[] in packet sockets.
      
      Store the skb original length in the first two fields of sockaddr_ll
      (sll_family and sll_protocol) as they can be derived from the skb when
      needed.
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2472d761
    • E
      net: rxrpc: change call to sock_recv_ts_and_drops() on rxrpc recvmsg to sock_recv_timestamp() · 2cfdf9fc
      Eyal Birger 提交于
      Commit 3b885787 ("net: Generalize socket rx gap / receive queue overflow cmsg")
      allowed receiving packet dropcount information as a socket level option.
      RXRPC sockets recvmsg function was changed to support this by calling
      sock_recv_ts_and_drops() instead of sock_recv_timestamp().
      
      However, protocol families wishing to receive dropcount should call
      sock_queue_rcv_skb() or set the dropcount specifically (as done
      in packet_rcv()). This was not done for rxrpc and thus this feature
      never worked on these sockets.
      
      Formalizing this by not calling sock_recv_ts_and_drops() in rxrpc as
      part of an effort to move skb->dropcount into skb->cb[]
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cfdf9fc
    • E
      net: bluetooth: compact struct bt_skb_cb by converting boolean fields to bit fields · 6368c235
      Eyal Birger 提交于
      Convert boolean fields incoming and req_start to bit fields and move
      force_active in order save space in bt_skb_cb in an effort to use
      a portion of skb->cb[] for storing skb->dropcount.
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6368c235
    • E
      net: bluetooth: compact struct bt_skb_cb by inlining struct hci_req_ctrl · 49a6fe05
      Eyal Birger 提交于
      struct hci_req_ctrl is never used outside of struct bt_skb_cb;
      Inlining it frees 8 bytes on a 64 bit system in skb->cb[] allowing
      the addition of more ancillary data.
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Reviewed-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49a6fe05
    • S
      pppoe: Use workqueue to die properly when a PADT is received · 287f3a94
      Simon Farnsworth 提交于
      When a PADT frame is received, the socket may not be in a good state to
      close down the PPP interface. The current implementation handles this by
      simply blocking all further PPP traffic, and hoping that the lack of traffic
      will trigger the user to investigate.
      
      Use schedule_work to get to a process context from which we clear down the
      PPP interface, in a fashion analogous to hangup on a TTY-based PPP
      interface. This causes pppd to disconnect immediately, and allows tools to
      take immediate corrective action.
      
      Note that pppd's rp_pppoe.so plugin has code in it to disable the session
      when it disconnects; however, as a consequence of this patch, the session is
      already disabled before rp_pppoe.so is asked to disable the session. The
      result is a harmless error message:
      
      Failed to disconnect PPPoE socket: 114 Operation already in progress
      
      This message is safe to ignore, as long as the error is 114 Operation
      already in progress; in that specific case, it means that the PPPoE session
      has already been disabled before pppd tried to disable it.
      Signed-off-by: NSimon Farnsworth <simon@farnz.org.uk>
      Tested-by: NDan Williams <dcbw@redhat.com>
      Tested-by: NChristoph Schulz <develop@kristov.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      287f3a94
    • I
      bnx2: disable toggling of rxvlan if necessary · 26caa346
      Ivan Vecera 提交于
      The bnx2 driver uses .ndo_fix_features to force enable of Rx VLAN tag
      stripping when the card cannot disable it. The driver should remove
      NETIF_F_HW_VLAN_CTAG_RX flag from hw_features instead so it is fixed
      for the ethtool.
      
      Cc: Sony Chacko <sony.chacko@qlogic.com>
      Cc: Dept-HSGLinuxNICDev@qlogic.com
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26caa346
    • A
      net: macb: Properly add DMACFG bit definitions · ea373041
      Arun Chandran 提交于
      Add *_SIZE macros for the bits ENDIA_DESC and
      ENDIA_PKT
      Signed-off-by: NArun Chandran <achandran@mvista.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea373041
    • A
      net: macb: Add on the fly CPU endianness detection · 62f6924c
      Arun Chandran 提交于
      Program management descriptor's access mode according to the
      dynamically detected CPU endianness.
      Signed-off-by: NArun Chandran <achandran@mvista.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Tested-by: NMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62f6924c
    • S
      Driver: Vmxnet3: Copy TCP header to mapped frame for IPv6 packets · 759c9359
      Shrikrishna Khare 提交于
      Allows for packet parsing to be done by the fast path. This performance
      optimization already exists for IPv4. Add similar logic for IPv6.
      Signed-off-by: NAmitabha Banerjee <banerjeea@vmware.com>
      Signed-off-by: NShrikrishna Khare <skhare@vmware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      759c9359
    • D
      Merge branch 'ebpf_support_for_cls_bpf' · 68932f71
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      eBPF support for cls_bpf
      
      This is the non-RFC version of my patchset posted before netdev01 [1]
      conference. It contains a couple of eBPF cleanups and preparation
      patches to get eBPF support into cls_bpf. The last patch adds the
      actual support. I'll post the iproute2 parts after the kernel bits
      are merged, an initial preview link to the code is mentioned in the
      last patch.
      
      Patch 4 and 5 were originally one patch, but I've split them into
      two parts upon request as patch 4 only is also needed for Alexei's
      tracing patches that go via tip tree.
      
      Tested with tc and all in-kernel available BPF test suites.
      
      I have configured and built LLVM with --enable-experimental-targets=BPF
      but as Alexei put it, the plan is to get rid of the experimental
      status in future [2].
      
      Thanks a lot!
      
      v1 -> v2:
       - Removed arch patches from this series
        - x86 is already queued in tip tree, under x86/mm
        - arm64 just reposted directly to arm folks
       - Rest is unchanged
      
        [1] http://thread.gmane.org/gmane.linux.network/350191
        [2] http://article.gmane.org/gmane.linux.kernel/1874969
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68932f71
    • D
      cls_bpf: add initial eBPF support for programmable classifiers · e2e9b654
      Daniel Borkmann 提交于
      This work extends the "classic" BPF programmable tc classifier by
      extending its scope also to native eBPF code!
      
      This allows for user space to implement own custom, 'safe' C like
      classifiers (or whatever other frontend language LLVM et al may
      provide in future), that can then be compiled with the LLVM eBPF
      backend to an eBPF elf file. The result of this can be loaded into
      the kernel via iproute2's tc. In the kernel, they can be JITed on
      major archs and thus run in native performance.
      
      Simple, minimal toy example to demonstrate the workflow:
      
        #include <linux/ip.h>
        #include <linux/if_ether.h>
        #include <linux/bpf.h>
      
        #include "tc_bpf_api.h"
      
        __section("classify")
        int cls_main(struct sk_buff *skb)
        {
          return (0x800 << 16) | load_byte(skb, ETH_HLEN + __builtin_offsetof(struct iphdr, tos));
        }
      
        char __license[] __section("license") = "GPL";
      
      The classifier can then be compiled into eBPF opcodes and loaded
      via tc, for example:
      
        clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o
        tc filter add dev em1 parent 1: bpf cls.o [...]
      
      As it has been demonstrated, the scope can even reach up to a fully
      fledged flow dissector (similarly as in samples/bpf/sockex2_kern.c).
      
      For tc, maps are allowed to be used, but from kernel context only,
      in other words, eBPF code can keep state across filter invocations.
      In future, we perhaps may reattach from a different application to
      those maps e.g., to read out collected statistics/state.
      
      Similarly as in socket filters, we may extend functionality for eBPF
      classifiers over time depending on the use cases. For that purpose,
      cls_bpf programs are using BPF_PROG_TYPE_SCHED_CLS program type, so
      we can allow additional functions/accessors (e.g. an ABI compatible
      offset translation to skb fields/metadata). For an initial cls_bpf
      support, we allow the same set of helper functions as eBPF socket
      filters, but we could diverge at some point in time w/o problem.
      
      I was wondering whether cls_bpf and act_bpf could share C programs,
      I can imagine that at some point, we introduce i) further common
      handlers for both (or even beyond their scope), and/or if truly needed
      ii) some restricted function space for each of them. Both can be
      abstracted easily through struct bpf_verifier_ops in future.
      
      The context of cls_bpf versus act_bpf is slightly different though:
      a cls_bpf program will return a specific classid whereas act_bpf a
      drop/non-drop return code, latter may also in future mangle skbs.
      That said, we can surely have a "classify" and "action" section in
      a single object file, or considered mentioned constraint add a
      possibility of a shared section.
      
      The workflow for getting native eBPF running from tc [1] is as
      follows: for f_bpf, I've added a slightly modified ELF parser code
      from Alexei's kernel sample, which reads out the LLVM compiled
      object, sets up maps (and dynamically fixes up map fds) if any, and
      loads the eBPF instructions all centrally through the bpf syscall.
      
      The resulting fd from the loaded program itself is being passed down
      to cls_bpf, which looks up struct bpf_prog from the fd store, and
      holds reference, so that it stays available also after tc program
      lifetime. On tc filter destruction, it will then drop its reference.
      
      Moreover, I've also added the optional possibility to annotate an
      eBPF filter with a name (e.g. path to object file, or something
      else if preferred) so that when tc dumps currently installed filters,
      some more context can be given to an admin for a given instance (as
      opposed to just the file descriptor number).
      
      Last but not least, bpf_prog_get() and bpf_prog_put() needed to be
      exported, so that eBPF can be used from cls_bpf built as a module.
      Thanks to 60a3b225 ("net: bpf: make eBPF interpreter images
      read-only") I think this is of no concern since anything wanting to
      alter eBPF opcode after verification stage would crash the kernel.
      
        [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpfSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2e9b654
    • D
      ebpf: move read-only fields to bpf_prog and shrink bpf_prog_aux · 24701ece
      Daniel Borkmann 提交于
      is_gpl_compatible and prog_type should be moved directly into bpf_prog
      as they stay immutable during bpf_prog's lifetime, are core attributes
      and they can be locked as read-only later on via bpf_prog_select_runtime().
      
      With a bit of rearranging, this also allows us to shrink bpf_prog_aux
      to exactly 1 cacheline.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24701ece
    • D
      ebpf: add sched_cls_type and map it to sk_filter's verifier ops · 96be4325
      Daniel Borkmann 提交于
      As discussed recently and at netconf/netdev01, we want to prevent making
      bpf_verifier_ops registration available for modules, but have them at a
      controlled place inside the kernel instead.
      
      The reason for this is, that out-of-tree modules can go crazy and define
      and register any verfifier ops they want, doing all sorts of crap, even
      bypassing available GPLed eBPF helper functions. We don't want to offer
      such a shiny playground, of course, but keep strict control to ourselves
      inside the core kernel.
      
      This also encourages us to design eBPF user helpers carefully and
      generically, so they can be shared among various subsystems using eBPF.
      
      For the eBPF traffic classifier (cls_bpf), it's a good start to share
      the same helper facilities as we currently do in eBPF for socket filters.
      
      That way, we have BPF_PROG_TYPE_SCHED_CLS look like it's own type, thus
      one day if there's a good reason to diverge the set of helper functions
      from the set available to socket filters, we keep ABI compatibility.
      
      In future, we could place all bpf_prog_type_list at a central place,
      perhaps.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96be4325
    • D
      ebpf: remove CONFIG_BPF_SYSCALL ifdefs in socket filter code · d4052c4a
      Daniel Borkmann 提交于
      This gets rid of CONFIG_BPF_SYSCALL ifdefs in the socket filter code,
      now that the BPF internal header can deal with it.
      
      While going over it, I also changed eBPF related functions to a sk_filter
      prefix to be more consistent with the rest of the file.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4052c4a
    • D
      ebpf: make internal bpf API independent of CONFIG_BPF_SYSCALL ifdefs · 0fc174de
      Daniel Borkmann 提交于
      Socket filter code and other subsystems with upcoming eBPF support should
      not need to deal with the fact that we have CONFIG_BPF_SYSCALL defined or
      not.
      
      Having the bpf syscall as a config option is a nice thing and I'd expect
      it to stay that way for expert users (I presume one day the default setting
      of it might change, though), but code making use of it should not care if
      it's actually enabled or not.
      
      Instead, hide this via header files and let the rest deal with it.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fc174de
    • D
      ebpf: export BPF_PSEUDO_MAP_FD to uapi · f1a66f85
      Daniel Borkmann 提交于
      We need to export BPF_PSEUDO_MAP_FD to user space, as it's used in the
      ELF BPF loader where instructions are being loaded that need map fixups.
      
      An initial stage loads all maps into the kernel, and later on replaces
      related instructions in the eBPF blob with BPF_PSEUDO_MAP_FD as source
      register and the actual fd as immediate value.
      
      The kernel verifier recognizes this keyword and replaces the map fd with
      a real pointer internally.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1a66f85