1. 04 1月, 2014 3 次提交
    • S
      netfilter: nf_conntrack: remove dead code · dcd93ed4
      stephen hemminger 提交于
      The following code is not used in current upstream code.
      Some of this seems to be old hooks, other might be used by some
      out of tree module (which I don't care about breaking), and
      the need_ipv4_conntrack was used by old NAT code but no longer
      called.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      dcd93ed4
    • S
      netfilter: ipset: remove unused code · 02eca9d2
      stephen hemminger 提交于
      Function never used in current upstream code.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      02eca9d2
    • D
      netfilter: nf_nat: add full port randomization support · 34ce3240
      Daniel Borkmann 提交于
      We currently use prandom_u32() for allocation of ports in tcp bind(0)
      and udp code. In case of plain SNAT we try to keep the ports as is
      or increment on collision.
      
      SNAT --random mode does use per-destination incrementing port
      allocation. As a recent paper pointed out in [1] that this mode of
      port allocation makes it possible to an attacker to find the randomly
      allocated ports through a timing side-channel in a socket overloading
      attack conducted through an off-path attacker.
      
      So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
      in regard to the attack described in this paper. As we need to keep
      compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
      that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
      selection algorithm with a simple prandom_u32() in order to mitigate
      this attack vector. Note that the lfsr113's internal state is
      periodically reseeded by the kernel through a local secure entropy
      source.
      
      More details can be found in [1], the basic idea is to send bursts
      of packets to a socket to overflow its receive queue and measure
      the latency to detect a possible retransmit when the port is found.
      Because of increasing ports to given destination and port, further
      allocations can be predicted. This information could then be used by
      an attacker for e.g. for cache-poisoning, NS pinning, and degradation
      of service attacks against DNS servers [1]:
      
        The best defense against the poisoning attacks is to properly
        deploy and validate DNSSEC; DNSSEC provides security not only
        against off-path attacker but even against MitM attacker. We hope
        that our results will help motivate administrators to adopt DNSSEC.
        However, full DNSSEC deployment make take significant time, and
        until that happens, we recommend short-term, non-cryptographic
        defenses. We recommend to support full port randomisation,
        according to practices recommended in [2], and to avoid
        per-destination sequential port allocation, which we show may be
        vulnerable to derandomisation attacks.
      
      Joint work between Hannes Frederic Sowa and Daniel Borkmann.
      
       [1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
       [2] http://arxiv.org/pdf/1205.5190v1.pdfSigned-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      34ce3240
  2. 27 12月, 2013 1 次提交
    • G
      ipvs: Remove unused variable ret from sync_thread_master() · 9dcbe1b8
      Geert Uytterhoeven 提交于
      net/netfilter/ipvs/ip_vs_sync.c: In function 'sync_thread_master':
      net/netfilter/ipvs/ip_vs_sync.c:1640:8: warning: unused variable 'ret' [-Wunused-variable]
      
      Commit 35a2af94 ("sched/wait: Make the
      __wait_event*() interface more friendly") changed how the interruption
      state is returned. However, sync_thread_master() ignores this state,
      now causing a compile warning.
      
      According to Julian Anastasov <ja@ssi.bg>, this behavior is OK:
      
          "Yes, your patch looks ok to me. In the past we used ssleep() but IPVS
           users were confused why IPVS threads increase the load average. So, we
           switched to _interruptible calls and later the socket polling was
           added."
      
      Document this, as requested by Peter Zijlstra, to avoid precious developers
      disappearing in this pitfall in the future.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      9dcbe1b8
  3. 24 12月, 2013 1 次提交
  4. 21 12月, 2013 1 次提交
  5. 20 12月, 2013 2 次提交
  6. 13 12月, 2013 7 次提交
    • J
      net: reorder struct netns_ct for better cache-line usage · 8cf4d6a2
      Jesper Dangaard Brouer 提交于
      Reorder struct netns_ct so that atomic_t "count" changes don't
      slowdown users of read mostly fields.
      
      This is based on Eric Dumazet's proposed patch:
       "netfilter: conntrack: remove the central spinlock"
       http://thread.gmane.org/gmane.linux.network/268758/focus=47306
      
      The tricky part of cache-aligning this structure, that it is getting
      inlined in struct net (include/net/net_namespace.h), thus changes to
      other netns_xxx structures affects our alignment.
      
      Eric's original patch contained an ambiguity on 32-bit regarding
      alignment in struct net.  This patch also takes 32-bit into account,
      and in case of changed (struct net) alignment sysctl_xxx entries have
      been ordered according to how often they are accessed.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8cf4d6a2
    • F
      ipv6: fix incorrect type in declaration · 68536053
      Florent Fourcot 提交于
      Introduced by 1397ed35
        "ipv6: add flowinfo for tcp6 pkt_options for all cases"
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      
      V2: fix the title, add empty line after the declaration (Sergei Shtylyov
      feedbacks)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68536053
    • O
      net: eth: 8390: remove section warning in etherh.c · 335802d1
      Olof Johansson 提交于
      Commit c45f812f ('8390 : Replace ei_debug with msg_enable/NETIF_MSG_*
      feature') ended up moving the printout of version[] from something that
      will be compiled out due to defines, to something that is now evaluated
      at runtime.
      
      That means that what always used to be an access to an __initdata string
      from non-__init code started showing up as a section mismatch when it
      didn't before.
      
      All other 8390 versions skip __initdata on the version string, and
      starting to annotate the whole chain of callers with __init seems like
      more churn than it's worth on this driver, so remove it from etherh.c as well.
      
      Fixes: c45f812f ('8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature')
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      335802d1
    • J
      net-gro: Prepare GRO stack for the upcoming tunneling support · 299603e8
      Jerry Chu 提交于
      This patch modifies the GRO stack to avoid the use of "network_header"
      and associated macros like ip_hdr() and ipv6_hdr() in order to allow
      an arbitary number of IP hdrs (v4 or v6) to be used in the
      encapsulation chain. This lays the foundation for various IP
      tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later.
      
      With this patch, the GRO stack traversing now is mostly based on
      skb_gro_offset rather than special hdr offsets saved in skb (e.g.,
      skb->network_header). As a result all but the top layer (i.e., the
      the transport layer) must have hdrs of the same length in order for
      a pkt to be considered for aggregation. Therefore when adding a new
      encap layer (e.g., for tunneling), one must check and skip flows
      (e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a
      different hdr length.
      
      Note that unlike the network header, the transport header can and
      will continue to be set by the GRO code since there will be at
      most one "transport layer" in the encap chain.
      Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      299603e8
    • D
      Merge branch 'macvtap_capture' · a46dc748
      David S. Miller 提交于
      Vlad Yasevich says:
      
      ====================
      Add packet capture support on macvtap device
      
      Change from RFC:
        - moved to the rx_handler approach.
      
      This series adds support for packet capturing on macvtap device.
      The initial approach was to simply export the capturing code as
      a function from the core network.  While simple, it was not
      a very architecturally clean approach.
      
      The new appraoch is to provide macvtap with its rx_handler which can
      is attached to the macvtap device itself.   Macvlan will simply requeue
      the packet with an updated skb->dev.  BTW, macvlan layer already does this
      for macvlan devices.  So, now macvtap and macvlan have almost the
      same exact input path.
      
      I've toyed with short-circuting the input path for macvtap by returning
      RX_HANDLER_ANOTHER, but that just made the code more complicated and
      didn't provide any kind of measurable gain (at least according to
      netperf and perf runs on the host).
      
      To see if there was a performance regression, I ran 1, 2 and 4 netperf
      STREAM and MAERTS tests agains the VM from both remote host and another
      guest on the same system.   The command ran was
          netperf -H $host -t $test -l 20 -i 10 -I 95 -c -C
      
      The numbers I was getting with the new code were consistently very
      slightly (1-2%) better then the old code.  I don't consider this
      an improvement, but it's not a regression! :)
      
      Running 'perf record' on the host didn't show any new hot spots
      and cpu utilization stayed about the same.  This was better
      then I expected from simply looking at the code.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a46dc748
    • V
      macvlan: Remove custom recieve and forward handlers · 2f6a1b66
      Vlad Yasevich 提交于
      Since now macvlan and macvtap use the same receive and
      forward handlers, we can remove them completely and use
      netif_rx and dev_forward_skb() directly.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f6a1b66
    • V
      macvtap: Add support of packet capture on macvtap device. · 6acf54f1
      Vlad Yasevich 提交于
      Macvtap device currently doesn not allow a user to capture
      traffic on due to the fact that it steals the packets
      from the network stack before the skb->dev is set correctly
      on the receive side, and that use uses macvlan transmit
      path directly on the send side.  As a result, we never
      get a change to give traffic to the taps while the correct
      device is set in the skb.
      
      This patch makes macvtap device behave almost exaclty like
      macvlan.  On the send side, we switch to using dev_queue_xmit().
      On the receive side, to deliver packets to macvtap, we now
      use netif_rx and dev_forward_skb just like macvlan.  The only
      differnce now is that macvtap has its own rx_handler which is
      attached to the macvtap netdev.  It is here that we now steal
      the packet and provide it to the socket.
      
      As a result, we can now capture traffic on the macvtap device:
         tcpdump -i macvtap0
      
      It also gives us the abilit to add tc actions to the macvtap
      device and actually utilize different bandwidth management
      queues on output.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6acf54f1
  7. 12 12月, 2013 17 次提交
    • D
      Merge branch 'bpf' · 70f56132
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      bpf/filter updates
      
      This set adds just two minimal helper tools that complement the
      already available bpf_jit_disasm and complete BPF tooling; plus
      it adds and an extensive documentation update of filter.txt.
      
      Please see individual descriptions for details.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70f56132
    • D
      filter: doc: improve BPF documentation · 7924cd5e
      Daniel Borkmann 提交于
      This patch significantly updates the BPF documentation and describes
      its internal architecture, Linux extensions, and handling of the
      kernel's BPF and JIT engine, plus documents how development can be
      facilitated with the help of bpf_dbg, bpf_asm, bpf_jit_disasm.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7924cd5e
    • D
      filter: bpf_asm: add minimal bpf asm tool · 3f356385
      Daniel Borkmann 提交于
      There are a couple of valid use cases for a minimal low-level bpf asm
      like tool, for example, using/linking to libpcap is not an option, the
      required BPF filters use Linux extensions that are not supported by
      libpcap's compiler, a filter might be more complex and not cleanly
      implementable with libpcap's compiler, particular filter codes should
      be optimized differently than libpcap's internal BPF compiler does,
      or for security audits of emitted BPF JIT code for prepared set of BPF
      instructions resp. BPF JIT compiler development in general.
      
      Then, in such cases writing such a filter in low-level syntax can be
      an good alternative, for example, xt_bpf and cls_bpf users might have
      requirements that could result in more complex filter code, or one that
      cannot be expressed with libpcap (e.g. different return codes in
      cls_bpf for flowids on various BPF code paths).
      
      Moreover, BPF JIT implementors may wish to manually write test cases
      in order to verify the resulting JIT image, and thus need low-level
      access to BPF code generation as well. Therefore, complete the available
      toolchain for BPF with this small bpf_asm helper tool for the tools/net/
      directory. These 3 complementary minimal helper tools round up and
      facilitate BPF development.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f356385
    • D
      filter: bpf_dbg: add minimal bpf debugger · fd981e3c
      Daniel Borkmann 提交于
      This patch adds a minimal BPF debugger that "emulates" the kernel's
      BPF engine (w/o extensions) and allows for single stepping (forwards
      and backwards through BPF code) or running with >=1 breakpoints through
      selected or all packets from a pcap file with a provided user filter
      in order to facilitate verification of a BPF program. When a breakpoint
      is being hit, it dumps all register contents, decoded instructions and
      in case of branches both decoded branch targets as well as other useful
      information.
      
      Having this facility is in particular useful to verify BPF programs
      against given test traffic *before* attaching to a live system.
      
      With the general availability of cls_bpf, xt_bpf, socket filters,
      team driver and e.g. PTP code, all BPF users, quite often a single
      more complex BPF program is being used. Reasons for a more complex
      BPF program are primarily to optimize execution time for making a
      verdict when multiple simple BPF programs are combined into one in
      order to prevent parsing same headers multiple times. In particular,
      for cls_bpf that can have various return paths for encoding flowids,
      and xt_bpf to come to a fw verdict this can be the case.
      
      Therefore, as this can result in more complex and harder to debug
      code, it would be very useful to have this minimal tool for testing
      purposes. It can also be of help for BPF JIT developers as filters
      are "test attached" to the kernel on a temporary socket thus
      triggering a JIT image dump when enabled. The tool uses an interactive
      libreadline shell with auto-completion and history support.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd981e3c
    • O
      net: eth: cpsw: 64-bit phys_addr_t and sparse cleanup · 1a3b5056
      Olof Johansson 提交于
      Minor fix for printk format of a phys_addr_t, and the switch of two local
      functions to static since they're not used outside of the file.
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a3b5056
    • O
      net: eth: davinci_cpdma: Mark a local variable static · df784160
      Olof Johansson 提交于
      Only used locally. Found by sparse.
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df784160
    • O
      net: eth: davinci_cpdma: 64-bit phys/dma_addr_t cleanup · c767db51
      Olof Johansson 提交于
      Silences the below warnings when building with ARM_LPAE enabled, which
      gives longer dma_addr_t by default:
      
      drivers/net/ethernet/ti/davinci_cpdma.c: In function 'cpdma_desc_pool_create':
      drivers/net/ethernet/ti/davinci_cpdma.c:182:3: warning: passing argument 3 of 'dma_alloc_attrs' from incompatible pointer type [enabled by default]
      drivers/net/ethernet/ti/davinci_cpdma.c: In function 'desc_phys':
      drivers/net/ethernet/ti/davinci_cpdma.c:222:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      drivers/net/ethernet/ti/davinci_cpdma.c:223:8: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c767db51
    • M
      8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature · c45f812f
      Matthew Whitehead 提交于
      Removed the shared ei_debug variable. Replaced it by adding u32 msg_enable to
      the private struct ei_device. Now each 8390 ethernet instance has a per-device
      logging variable.
      
      Changed older style printk() calls to more canonical forms.
      
      Tested on: ne, ne2k-pci, smc-ultra, and wd hardware.
      
      V4.0
      - Substituted pr_info() and pr_debug() for printk() KERN_INFO and KERN_DEBUG
      
      V3.0
      - Checked for cases where pr_cont() was most appropriate choice.
      - Changed module parameter from 'debug' to 'msg_enable' because debug was
      no longer the best description.
      
      V2.0
      - Changed netif_msg_(drv|probe|ifdown|rx_err|tx_err|tx_queued|intr|rx_status|hw)
      to netif_(dbg|info|warn|err) where possible.
      Signed-off-by: NMatthew Whitehead <tedheadster@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c45f812f
    • J
      ipv6: router reachability probing · 7e980569
      Jiri Benc 提交于
      RFC 4191 states in 3.5:
      
         When a host avoids using any non-reachable router X and instead sends
         a data packet to another router Y, and the host would have used
         router X if router X were reachable, then the host SHOULD probe each
         such router X's reachability by sending a single Neighbor
         Solicitation to that router's address.  A host MUST NOT probe a
         router's reachability in the absence of useful traffic that the host
         would have sent to the router if it were reachable.  In any case,
         these probes MUST be rate-limited to no more than one per minute per
         router.
      
      Currently, when the neighbour corresponding to a router falls into
      NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
      value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
      should be probed with a single NS. The probe is ratelimited by the existing
      code. To better distinguish meanings of the failure values, rename
      RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e980569
    • W
      sctp: remove redundant null check on asoc · e4772668
      wangweidong 提交于
      In sctp_err_lookup, goto out while the asoc is not NULL, so remove the
      check NULL. Also, in sctp_err_finish which called by sctp_v4_err and
      sctp_v6_err, they pass asoc to sctp_err_finish while the asoc is not
      NULL, so remove the check.
      Signed-off-by: NWang Weidong <wangweidong1@huawei.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4772668
    • Y
      sch_htb: remove unnecessary NULL pointer judgment · 6b1dd856
      Yang Yingliang 提交于
      It already has a NULL pointer judgment of rtab in qdisc_put_rtab().
      Remove the judgment outside of qdisc_put_rtab().
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b1dd856
    • N
      ipv4: fix wildcard search with inet_confirm_addr() · b601fa19
      Nicolas Dichtel 提交于
      Help of this function says: "in_dev: only on this interface, 0=any interface",
      but since commit 39a6d063 ("[NETNS]: Process inet_confirm_addr in the
      correct namespace."), the code supposes that it will never be NULL. This
      function is never called with in_dev == NULL, but it's exported and may be used
      by an external module.
      
      Because this patch restore the ability to call inet_confirm_addr() with in_dev
      == NULL, I partially revert the above commit, as suggested by Julian.
      
      CC: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b601fa19
    • G
      vxlan: leave multicast group when vxlan device down · 95ab0991
      Gao feng 提交于
      vxlan_group_used only allows device to leave multicast group
      when the remote_ip of this vxlan device is difference from
      other vxlan devices' remote_ip. this will cause device not
      leave multicast group untile the vn_sock of this vxlan deivce
      being released.
      
      The check in vxlan_group_used is not quite precise. since even
      the remote_ip is same, but these vxlan devices may use different
      lower devices, and they may use different vn_socks.
      
      Only when some vxlan devices use the same vn_sock,same lower
      device and same remote_ip, the mc_list of the vn_sock should
      not be changed.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95ab0991
    • G
      vxlan: remove vxlan_group_used in vxlan_open · 79d4a94f
      Gao feng 提交于
      In vxlan_open, vxlan_group_used always returns true,
      because the state of the vxlan deivces which we want
      to open has alreay been running. and it has already
      in vxlan_list.
      
      Since ip_mc_join_group takes care of the reference
      of struct ip_mc_list. removing vxlan_group_used here
      is safe.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79d4a94f
    • R
      1a0ab767
    • R
      bgmac: reset cached MAC state during chip reset · d469962f
      Rafał Miłecki 提交于
      Without this bgmac_adjust_link didn't know it should re-initialize MAC
      state. This led to the MAC not working after if down & up routine.
      Signed-off-by: NRafał Miłecki <zajec5@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d469962f
    • Y
      net_sched: expand control flow of macro SKIP_NONLOCAL · 4f8f61eb
      Yang Yingliang 提交于
      SKIP_NONLOCAL hides the control flow. The control flow should be
      inlined and expanded explicitly in code so that someone who reads
      it can tell the control flow can be changed by the statement.
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f8f61eb
  8. 11 12月, 2013 8 次提交
    • S
      net: macb: Fix build warning · 9319e47c
      Soren Brinkmann 提交于
      When adjusting the link speed, the target frequency is determined by a
      'swith (LINK_SPEED)' statement, that assigns the target rate only for
      valid and expected LINK_SPEED values. This incomplete switch statement
      leads to the following build warning:
           drivers/net/ethernet/cadence/macb.c: In function 'macb_handle_link_change':
        >> drivers/net/ethernet/cadence/macb.c:241:14: warning: 'rate' may be used uninitialized in this function [-Wmaybe-uninitialized]
              netdev_warn(dev, "unable to generate target frequency: %ld Hz\n",
                         ^
           drivers/net/ethernet/cadence/macb.c:215:13: note: 'rate' was declared here
             long ferr, rate, rate_rounded;
      
      Fixing this by bailing out of that function in the switch's default case
      before the rate variable is used.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NSoren Brinkmann <soren.brinkmann@xilinx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9319e47c
    • D
      Merge branch 'tipc' · fcfa1a17
      David S. Miller 提交于
      Jon Maloy says:
      
      ====================
      tipc: cleanups in media and bearer layer
      
      This commit series performs a number cleanups in order to make the
      bearer and media part of the code more comprehensible and manageable.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcfa1a17
    • Y
      tipc: remove unused 'blocked' flag from tipc_link struct · 77a7e07a
      Ying Xue 提交于
      In early versions of TIPC it was possible to administratively block
      individual links through the use of the member flag 'blocked'. This
      functionality was deemed redundant, and since commit 7368dd ("tipc:
      clean out all instances of #if 0'd unused code"), this flag has been
      unused.
      
      In the current code, a link only needs to be blocked for sending and
      reception if it is subject to an ongoing link failover. In that case,
      it is sufficient to check if the number of expected failover packets
      is non-zero, something which is done via the funtion 'link_blocked()'.
      
      This commit finally removes the redundant 'blocked' flag completely.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77a7e07a
    • Y
      tipc: eliminate code duplication in media layer · e4d050cb
      Ying Xue 提交于
      Currently TIPC supports two L2 media types, Ethernet and Infiniband.
      Because both these media are accessed through the common net_device API,
      several functions in the two media adaptation files turn out to be
      fully or almost identical, leading to unnecessary code duplication.
      
      In this commit we extract this common code from the two media files
      and move them to the generic bearer.c. Additionally, we change
      the function names to reflect their real role: to access L2 media,
      irrespective of type.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4d050cb
    • Y
      tipc: relocate common functions from media to bearer · 6e967adf
      Ying Xue 提交于
      Currently, registering a TIPC stack handler in the network device layer
      is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
      repectively. But, as this registration is not media specific, we can
      avoid some code duplication by moving the registering function to
      the generic bearer layer, to the file bearer.c, and call it only once.
      The same is true for the network device event notifier.
      
      As a side effect, the two workqueues we are using for for setting up/
      cleaning up media can now be eliminated. Furthermore, the array for
      storing the specific media type structs, media_array[], can be entirely
      deleted.
      
      Note that the eth_started and ib_started flags were removed during the
      code relocation.  There is now only one call to bearer_setup and
      bearer_cleanup, and these can logically not race against each other.
      
      Despite its size, this cleanup work incurs no functional changes in TIPC.
      In particular, it should be noted that the sequence ordering of received
      packets is unaffected by this change, since packet reception never was
      subject to any work queue handling in the first place.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e967adf
    • Y
      tipc: remove TIPC usage of field af_packet_priv in struct net_device · 37cb0620
      Ying Xue 提交于
      TIPC is currently using the field 'af_packet_priv' in struct net_device
      as a handle to find the bearer instance associated to the given network
      device. But, by doing so it is blocking other networking cleanups, such
      as the one discussed here:
      
      http://patchwork.ozlabs.org/patch/178044/
      
      This commit removes this usage from TIPC. Instead, we introduce a new
      field, 'tipc_ptr', to the net_device structure, to serve this purpose.
      When TIPC bearer is enabled, the bearer object is associated to
      'tipc_ptr'. When a TIPC packet arrives in the recv_msg() upcall
      from a networking device, the bearer object can now be obtained from
      'tipc_ptr'. When a bearer is disabled, the bearer object is detached
      from its underlying network device by setting 'tipc_ptr' to NULL.
      
      Additionally, an RCU lock is used to protect the new pointer.
      Henceforth, the existing tipc_net_lock is used in write mode to
      serialize write accesses to this pointer, while the new RCU lock is
      applied on the read side to ensure that the pointer is 100% valid
      within its wrapped area for all readers.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37cb0620
    • J
      tipc: improve naming and comment consistency in media layer · ef72a7e0
      Jon Paul Maloy 提交于
      struct 'tipc_media' represents the specific info that the media
      layer adaptors (eth_media and ib_media) expose to the generic
      bearer layer. We clarify this by improved commenting, and by giving
      the 'media_list' array the more appropriate name 'media_info_array'.
      
      There are no functional changes in this commit.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef72a7e0
    • J
      tipc: initiate media type array at compile time · 5702dbab
      Jon Paul Maloy 提交于
      Communication media types are abstracted through the struct 'tipc_media',
      one per media type. These structs are allocated statically inside their
      respective media file.
      
      Furthermore, in order to be able to reach all instances from a central
      location, we keep a static array with pointers to these structs. This
      array is currently initialized at runtime, under protection of
      tipc_net_lock. However, since the contents of the array itself never
      changes after initialization, we can just as well initialize it at
      compile time and make it 'const', at the same time making it obvious
      that no lock protection is needed here.
      
      This commit makes the array constant and removes the redundant lock
      protection.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5702dbab