1. 18 6月, 2013 9 次提交
    • Y
      tipc: convert topology server to use new server facility · 13a2e898
      Ying Xue 提交于
      As the new TIPC server infrastructure has been introduced, we can
      now convert the TIPC topology server to it.  We get two benefits
      from doing this:
      
      1) It simplifies the topology server locking policy.  In the
      original locking policy, we placed one spin lock pointer in the
      tipc_subscriber structure to reuse the lock of the subscriber's
      server port, controlling access to members of tipc_subscriber
      instance.  That is, we only used one lock to ensure both
      tipc_port and tipc_subscriber members were safely accessed.
      
      Now we introduce another spin lock for tipc_subscriber structure
      only protecting themselves, to get a finer granularity locking
      policy.  Moreover, the change will allow us to make the topology
      server code more readable and maintainable.
      
      2) It fixes a bug where sent subscription events may be lost when
      the topology port is congested.  Using the new service, the
      topology server now queues sent events into an outgoing buffer,
      and then wakes up a sender process which has been blocked in
      workqueue context.  The process will keep picking events from the
      buffer and send them to their respective subscribers, using the
      kernel socket interface, until the buffer is empty. Even if the
      socket is congested during transmission there is no risk that
      events may be dropped, since the sender process may block when
      needed.
      
      Some minor reordering of initialization is done, since we now
      have a scenario where the topology server must be started after
      socket initialization has taken place, as the former depends
      on the latter.  And overall, we see a simplification of the
      TIPC subscriber code in making this changeover.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13a2e898
    • Y
      tipc: introduce new TIPC server infrastructure · c5fa7b3c
      Ying Xue 提交于
      TIPC has two internal servers, one providing a subscription
      service for topology events, and another providing the
      configuration interface. These servers have previously been running
      in BH context, accessing the TIPC-port (aka native) API directly.
      Apart from these servers, even the TIPC socket implementation is
      partially built on this API.
      
      As this API may simultaneously be called via different paths and in
      different contexts, a complex and costly lock policiy is required
      in order to protect TIPC internal resources.
      
      To eliminate the need for this complex lock policiy, we introduce
      a new, generic service API that uses kernel sockets for message
      passing instead of the native API. Once the toplogy and configuration
      servers are converted to use this new service, all code pertaining
      to the native API can be removed. This entails a significant
      reduction in code amount and complexity, and opens up for a complete
      rework of the locking policy in TIPC.
      
      The new service also solves another problem:
      
      As the current topology server works in BH context, it cannot easily
      be blocked when sending of events fails due to congestion. In such
      cases events may have to be silently dropped, something that is
      unacceptable. Therefore, the new service keeps a dedicated outbound
      queue receiving messages from BH context. Once messages are
      inserted into this queue, we will immediately schedule a work from a
      special workqueue. This way, messages/events from the topology server
      are in reality sent in process context, and the server can block
      if necessary.
      
      Analogously, there is a new workqueue for receiving messages. Once a
      notification about an arriving message is received in BH context, we
      schedule a work from the receive workqueue to do the job of
      receiving the message in process context.
      
      As both sending and receive messages are now finished in processes,
      subscribed events cannot be dropped any more.
      
      As of this commit, this new server infrastructure is built, but
      not actually yet called by the existing TIPC code, but since the
      conversion changes required in order to use it are significant,
      the addition is kept here as a separate commit.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5fa7b3c
    • E
      tipc: allow implicit connect for stream sockets · 5d21cb70
      Erik Hugne 提交于
      TIPC's implied connect feature, aka piggyback connect, allows
      applications to save one syscall and all SYN/SYN-ACK signalling
      overhead when setting up a connection.  Until now, this has only
      been supported for SEQPACKET sockets.  Here, we make it possible
      to use this feature even with stream sockets.
      
      At the connecting side, the connection is completed when the
      first data message arrives from the accepting peer.  This means
      that we must allow the connecting user to call blocking recv()
      before the socket has reached state SS_CONNECTED.  So we must must
      relax the state machine check at recv_stream(), and allow the
      recv() call even if socket is in state SS_CONNECTING.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d21cb70
    • Y
      tipc: change socket buffer overflow control to respect sk_rcvbuf · cc79dd1b
      Ying Xue 提交于
      As per feedback from the netdev community, we change the buffer
      overflow protection algorithm in receiving sockets so that it
      always respects the nominal upper limit set in sk_rcvbuf.
      
      Instead of scaling up from a small sk_rcvbuf value, which leads to
      violation of the configured sk_rcvbuf limit, we now calculate the
      weighted per-message limit by scaling down from a much bigger value,
      still in the same field, according to the importance priority of the
      received message.
      
      To allow for administrative tunability of the socket receive buffer
      size, we create a tipc_rmem sysctl variable to allow the user to
      configure an even bigger value via sysctl command.  It is a size of
      three (min/default/max) to be consistent with things like tcp_rmem.
      
      By default, the value initialized in tipc_rmem[1] is equal to the
      receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message.
      This value is also set as the default value of sk_rcvbuf.
      Originally-by: NJon Maloy <jon.maloy@ericsson.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      [Ying: added sysctl variation to Jon's original patch]
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      [PG: don't compile sysctl.c if not config'd; add Documentation]
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc79dd1b
    • Y
      tipc: update code comments to reflect new uapi header path · 8941bbcd
      Ying Xue 提交于
      Files tipc.h and tipc_config.h were moved to uapi directory, but
      the corresponding comments were not updated at the same time.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8941bbcd
    • E
      net: add socket option for low latency polling · dafcc438
      Eliezer Tamir 提交于
      adds a socket option for low latency polling.
      This allows overriding the global sysctl value with a per-socket one.
      Unexport sysctl_net_ll_poll since for now it's not needed in modules.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dafcc438
    • E
      net: remove NET_LL_RX_POLL config menue · 89bf1b5a
      Eliezer Tamir 提交于
      Remove NET_LL_RX_POLL from the config menu.
      Change default to y.
      Busy polling still needs to be enabled at run time.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89bf1b5a
    • E
      net: convert low latency sockets to sched_clock() · 9a3c71aa
      Eliezer Tamir 提交于
      Use sched_clock() instead of get_cycles().
      We can use sched_clock() because we don't care much about accuracy.
      Remove the dependency on X86_TSC
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a3c71aa
    • E
      net: change sysctl_net_ll_poll into an unsigned int · eb6db622
      Eliezer Tamir 提交于
      There is no reason for sysctl_net_ll_poll to be an unsigned long.
      Change it into an unsigned int.
      Fix the proc handler.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb6db622
  2. 15 6月, 2013 13 次提交
  3. 14 6月, 2013 5 次提交
    • R
      net/mlx4: Add VF link state support · 948e306d
      Rony Efraim 提交于
      Add support to change the link state of VF (vPort)
      Signed-off-by: NRony Efraim <ronye@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      948e306d
    • R
      net/core: Add VF link state control · 1d8faf48
      Rony Efraim 提交于
      Add netlink directives and ndo entry to allow for controling
      VF link, which can be in one of three states:
      
      Auto - VF link state reflects the PF link state (default)
      
      Up - VF link state is up, traffic from VF to VF works even if
      the actual PF link is down
      
      Down - VF link state is down, no traffic from/to this VF, can be of
      use while configuring the VF
      Signed-off-by: NRony Efraim <ronye@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d8faf48
    • F
      bcm63xx_enet: add support Broadcom BCM6345 Ethernet · 3dc6475c
      Florian Fainelli 提交于
      This patch adds support for the Broadcom BCM6345 SoC Ethernet. BCM6345
      has a slightly different and older DMA engine which requires the
      following modifications:
      
      - the width of the DMA channels on BCM6345 is 64 bytes vs 16 bytes,
        which means that the helpers enet_dma{c,s} need to account for this
        channel width and we can no longer use macros
      
      - BCM6345 DMA engine does not have any internal SRAM for transfering
        buffers
      
      - BCM6345 buffer allocation and flow control is not per-channel but
        global (done in RSET_ENETDMA)
      
      - the DMA engine bits are right-shifted by 3 compared to other DMA
        generations
      
      - the DMA enable/interrupt masks are a little different (we need to
        enabled more bits for 6345)
      
      - some register have the same meaning but are offsetted in the ENET_DMAC
        space so a lookup table is required to return the proper offset
      
      The MAC itself is identical and requires no modifications to work.
      Signed-off-by: NFlorian Fainelli <florian@openwrt.org>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dc6475c
    • E
      htb: reorder struct htb_class fields for performance · ca4ec90b
      Eric Dumazet 提交于
      htb_class structures are big, and source of false sharing on SMP.
      
      By carefully splitting them in two parts, we can improve performance.
      
      I got 9 % performance increase on a 24 threads machine, with 200
      concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca4ec90b
    • W
      net-rps: fixes for rps flow limit · 5f121b9a
      Willem de Bruijn 提交于
      Caught by sparse:
      - __rcu: missing annotation to sd->flow_limit
      - __user: direct access in cpumask_scnprintf
      
      Also
      - add endline character when printing bitmap if room in buffer
      - avoid bucket overflow by reducing FLOW_LIMIT_HISTORY
      
      The last item warrants some explanation. The hashtable buckets are
      subject to overflow if FLOW_LIMIT_HISTORY is larger than or equal
      to bucket size, since all packets may end up in a single bucket. The
      current (rather arbitrary) history value of 256 happens to match the
      buffer size (u8).
      
      As a result, with a single flow, the first 128 packets are accepted
      (correct), the second 128 packets dropped (correct) and then the
      history[] array has filled, so that each subsequent new packet
      causes an increment in the bucket for new_flow plus a decrement
      for old_flow: a steady state.
      
      This is fine if packets are dropped, as the steady state goes away
      as soon as a mix of traffic reappears. But, because the 256th packet
      overflowed the bucket to 0: no packets are dropped.
      
      Instead of explicitly adding an overflow check, this patch changes
      FLOW_LIMIT_HISTORY to never be able to overflow a single bucket.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      (first item)
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f121b9a
  4. 13 6月, 2013 13 次提交