1. 02 9月, 2016 2 次提交
    • G
      rps: flow_dissector: Add the const for the parameter of flow_keys_have_l4 · 66fdd05e
      Gao Feng 提交于
      Add the const for the parameter of flow_keys_have_l4 for the readability.
      Signed-off-by: NGao Feng <fgao@ikuai8.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66fdd05e
    • D
      rxrpc: Don't expose skbs to in-kernel users [ver #2] · d001648e
      David Howells 提交于
      Don't expose skbs to in-kernel users, such as the AFS filesystem, but
      instead provide a notification hook the indicates that a call needs
      attention and another that indicates that there's a new call to be
      collected.
      
      This makes the following possibilities more achievable:
      
       (1) Call refcounting can be made simpler if skbs don't hold refs to calls.
      
       (2) skbs referring to non-data events will be able to be freed much sooner
           rather than being queued for AFS to pick up as rxrpc_kernel_recv_data
           will be able to consult the call state.
      
       (3) We can shortcut the receive phase when a call is remotely aborted
           because we don't have to go through all the packets to get to the one
           cancelling the operation.
      
       (4) It makes it easier to do encryption/decryption directly between AFS's
           buffers and sk_buffs.
      
       (5) Encryption/decryption can more easily be done in the AFS's thread
           contexts - usually that of the userspace process that issued a syscall
           - rather than in one of rxrpc's background threads on a workqueue.
      
       (6) AFS will be able to wait synchronously on a call inside AF_RXRPC.
      
      To make this work, the following interface function has been added:
      
           int rxrpc_kernel_recv_data(
      		struct socket *sock, struct rxrpc_call *call,
      		void *buffer, size_t bufsize, size_t *_offset,
      		bool want_more, u32 *_abort_code);
      
      This is the recvmsg equivalent.  It allows the caller to find out about the
      state of a specific call and to transfer received data into a buffer
      piecemeal.
      
      afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction
      logic between them.  They don't wait synchronously yet because the socket
      lock needs to be dealt with.
      
      Five interface functions have been removed:
      
      	rxrpc_kernel_is_data_last()
          	rxrpc_kernel_get_abort_code()
          	rxrpc_kernel_get_error_number()
          	rxrpc_kernel_free_skb()
          	rxrpc_kernel_data_consumed()
      
      As a temporary hack, sk_buffs going to an in-kernel call are queued on the
      rxrpc_call struct (->knlrecv_queue) rather than being handed over to the
      in-kernel user.  To process the queue internally, a temporary function,
      temp_deliver_data() has been added.  This will be replaced with common code
      between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a
      future patch.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d001648e
  2. 01 9月, 2016 1 次提交
  3. 31 8月, 2016 1 次提交
    • R
      net: lwtunnel: Handle fragmentation · 14972cbd
      Roopa Prabhu 提交于
      Today mpls iptunnel lwtunnel_output redirect expects the tunnel
      output function to handle fragmentation. This is ok but can be
      avoided if we did not do the mpls output redirect too early.
      ie we could wait until ip fragmentation is done and then call
      mpls output for each ip fragment.
      
      To make this work we will need,
      1) the lwtunnel state to carry encap headroom
      2) and do the redirect to the encap output handler on the ip fragment
      (essentially do the output redirect after fragmentation)
      
      This patch adds tunnel headroom in lwtstate to make sure we
      account for tunnel data in mtu calculations during fragmentation
      and adds new xmit redirect handler to redirect to lwtunnel xmit func
      after ip fragmentation.
      
      This includes IPV6 and some mtu fixes and testing from David Ahern.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14972cbd
  4. 30 8月, 2016 3 次提交
  5. 29 8月, 2016 6 次提交
    • V
      net: ethtool: add support for 1000BaseX and missing 10G link modes · 5711a982
      Vidya Sagar Ravipati 提交于
      This patch enhances ethtool link mode bitmap to include
      missing interface modes for 1G/10G speeds
      
      Changes:
      1000baseX is the mode introduced to cover all 1G Fiber cases.
      All modes under 1000BaseX i.e. 1000BASE-SX, 1000BASE-LX, 1000BASE-LX10
      and 1000BASE-BX10 are not explicitly defined at this moment.
      10G CR,SR,LR and ER link modes are included for 10G speed..
      
      Issue:
      ethtool on  1G/10G SFP port reports Base-T
      as this port supports 1000baseX,10G CR, SR and LR modes.
      
      root@tor-02$ ethtool swp1
      Settings for swp1:
              Supported ports: [ FIBRE ]
              Supported link modes:   1000baseT/Full
                                      10000baseT/Full
              Supported pause frame use: Symmetric Receive-only
              Supports auto-negotiation: Yes
              Advertised link modes:  1000baseT/Full
              Advertised pause frame use: No
              Advertised auto-negotiation: No
              Speed: 10000Mb/s
              Duplex: Full
              Port: FIBRE
              PHYAD: 0
              Transceiver: external
              Auto-negotiation: off
              Current message level: 0x00000000 (0)
      
              Link detected: yes
      
      After fix:
      root@tor-02$ ethtool swp1
      Settings for swp1:
              Supported ports: [ FIBRE ]
              Supported link modes:   1000baseX/Full
                                      10000baseCR/Full
                                      10000baseSR/Full
                                      10000baseLR/Full
                                      10000baseER/Full
              Supported pause frame use: Symmetric Receive-only
              Supports auto-negotiation: Yes
              Advertised link modes:  1000baseT/Full
              Advertised pause frame use: No
              Advertised auto-negotiation: No
              Speed: 10000Mb/s
              Duplex: Full
              Port: FIBRE
              PHYAD: 0
              Transceiver: external
              Auto-negotiation: off
              Current message level: 0x00000000 (0)
              Link detected: yes
      Signed-off-by: NVidya Sagar Ravipati <vidya@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5711a982
    • E
      tcp: add tcp_add_backlog() · c9c33212
      Eric Dumazet 提交于
      When TCP operates in lossy environments (between 1 and 10 % packet
      losses), many SACK blocks can be exchanged, and I noticed we could
      drop them on busy senders, if these SACK blocks have to be queued
      into the socket backlog.
      
      While the main cause is the poor performance of RACK/SACK processing,
      we can try to avoid these drops of valuable information that can lead to
      spurious timeouts and retransmits.
      
      Cause of the drops is the skb->truesize overestimation caused by :
      
      - drivers allocating ~2048 (or more) bytes as a fragment to hold an
        Ethernet frame.
      
      - various pskb_may_pull() calls bringing the headers into skb->head
        might have pulled all the frame content, but skb->truesize could
        not be lowered, as the stack has no idea of each fragment truesize.
      
      The backlog drops are also more visible on bidirectional flows, since
      their sk_rmem_alloc can be quite big.
      
      Let's add some room for the backlog, as only the socket owner
      can selectively take action to lower memory needs, like collapsing
      receive queues or partial ofo pruning.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9c33212
    • R
      net: smc91x: fix SMC accesses · 2fb04fdf
      Russell King 提交于
      Commit b70661c7 ("net: smc91x: use run-time configuration on all ARM
      machines") broke some ARM platforms through several mistakes.  Firstly,
      the access size must correspond to the following rule:
      
      (a) at least one of 16-bit or 8-bit access size must be supported
      (b) 32-bit accesses are optional, and may be enabled in addition to
          the above.
      
      Secondly, it provides no emulation of 16-bit accesses, instead blindly
      making 16-bit accesses even when the platform specifies that only 8-bit
      is supported.
      
      Reorganise smc91x.h so we can make use of the existing 16-bit access
      emulation already provided - if 16-bit accesses are supported, use
      16-bit accesses directly, otherwise if 8-bit accesses are supported,
      use the provided 16-bit access emulation.  If neither, BUG().  This
      exactly reflects the driver behaviour prior to the commit being fixed.
      
      Since the conversion incorrectly cut down the available access sizes on
      several platforms, we also need to go through every platform and fix up
      the overly-restrictive access size: Arnd assumed that if a platform can
      perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
      size needed to be specified - not so, all available access sizes must
      be specified.
      
      This likely fixes some performance regressions in doing this: if a
      platform does not support 8-bit accesses, 8-bit accesses have been
      emulated by performing a 16-bit read-modify-write access.
      
      Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
      accesses, which was broken by the original commit.
      
      Fixes: b70661c7 ("net: smc91x: use run-time configuration on all ARM machines")
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Tested-by: NRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fb04fdf
    • T
      kcm: Remove TCP specific references from kcm and strparser · 96a59083
      Tom Herbert 提交于
      kcm and strparser need to work with any type of stream socket not just
      TCP. Eliminate references to TCP and call generic proto_ops functions of
      read_sock and peek_len. Also in strp_init check if the socket support
      the proto_ops read_sock and peek_len.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96a59083
    • T
      tcp: Set read_sock and peek_len proto_ops · 32035585
      Tom Herbert 提交于
      In inet_stream_ops we set read_sock to tcp_read_sock and peek_len to
      tcp_peek_len (which is just a stub function that calls tcp_inq).
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32035585
    • T
      net: Add read_sock proto_op · 0294b625
      Tom Herbert 提交于
      Add new function in proto_ops structure. This includes moving the
      typedef got sk_read_actor into net.h and removing the definition from
      tcp.h.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0294b625
  6. 27 8月, 2016 6 次提交
  7. 26 8月, 2016 2 次提交
  8. 25 8月, 2016 3 次提交
    • L
      net: diag: allow socket bytecode filters to match socket marks · a52e95ab
      Lorenzo Colitti 提交于
      This allows a privileged process to filter by socket mark when
      dumping sockets via INET_DIAG_BY_FAMILY. This is useful on
      systems that use mark-based routing such as Android.
      
      The ability to filter socket marks requires CAP_NET_ADMIN, which
      is consistent with other privileged operations allowed by the
      SOCK_DIAG interface such as the ability to destroy sockets and
      the ability to inspect BPF filters attached to packet sockets.
      
      Tested: https://android-review.googlesource.com/261350Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a52e95ab
    • V
      net: dsa: rename switch operations structure · 9d490b4e
      Vivien Didelot 提交于
      Now that the dsa_switch_driver structure contains only function pointers
      as it is supposed to, rename it to the more appropriate dsa_switch_ops,
      uniformly to any other operations structure in the kernel.
      
      No functional changes here, basically just the result of something like:
      s/dsa_switch_driver *drv/dsa_switch_ops *ops/g
      
      However keep the {un,}register_switch_driver functions and their
      dsa_switch_drivers list as is, since they represent the -- likely to be
      deprecated soon -- legacy DSA registration framework.
      
      In the meantime, also fix the following checks from checkpatch.pl to
      make it happy with this patch:
      
          CHECK: Comparison to NULL could be written "!ops"
          #403: FILE: net/dsa/dsa.c:470:
          +	if (ops == NULL) {
      
          CHECK: Comparison to NULL could be written "ds->ops->get_strings"
          #773: FILE: net/dsa/slave.c:697:
          +		if (ds->ops->get_strings != NULL)
      
          CHECK: Comparison to NULL could be written "ds->ops->get_ethtool_stats"
          #824: FILE: net/dsa/slave.c:785:
          +	if (ds->ops->get_ethtool_stats != NULL)
      
          CHECK: Comparison to NULL could be written "ds->ops->get_sset_count"
          #835: FILE: net/dsa/slave.c:798:
          +		if (ds->ops->get_sset_count != NULL)
      
          total: 0 errors, 0 warnings, 4 checks, 784 lines checked
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d490b4e
    • V
      xen: change the type of xen_vcpu_id to uint32_t · 55467dea
      Vitaly Kuznetsov 提交于
      We pass xen_vcpu_id mapping information to hypercalls which require
      uint32_t type so it would be cleaner to have it as uint32_t. The
      initializer to -1 can be dropped as we always do the mapping before using
      it and we never check the 'not set' value anyway.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      55467dea
  9. 24 8月, 2016 9 次提交
  10. 23 8月, 2016 7 次提交