1. 10 1月, 2017 14 次提交
    • I
      net: ethernet: ti: cpsw: extend limits for cpsw_get/set_ringparam · f89d21b9
      Ivan Khoronzhuk 提交于
      Allow to set number of descs close to possible values. In case of
      minimum limit it's equal to number of channels to be able to set
      at least one desc per channel. For maximum limit leave enough descs
      number for tx channels.
      Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f89d21b9
    • A
      cls_u32: don't bother explicitly initializing ->divisor to zero · 58fa118f
      Alexandru Moise 提交于
      This struct member is already initialized to zero upon root_ht's
      allocation via kzalloc().
      Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58fa118f
    • D
      Merge branch 'siphash' · 17650f22
      David S. Miller 提交于
      Jason A. Donenfeld says:
      
      ====================
      Introduce The SipHash PRF
      
      This patch series introduces SipHash into the kernel. SipHash is a
      cryptographically secure PRF, which serves a variety of functions, and is
      introduced in patch #1. The following patch #2 introduces HalfSipHash,
      an optimization suitable for hash tables only. Finally, the last two patches
      in this series show two usages of the introduced siphash function family.
      It is expected that after this initial introduction, other usages will follow.
      
      Please read the extensive descriptions in patch #1 and patch #2 of what these
      functions do and the various levels of assurances. They're products of intense
      cryptographic research, and I believe they're suitable for the uses outlined
      herein.
      
      The use of SipHash is not limited to the networking subsystem -- indeed I
      would like to use it in other places too in the kernel. But after discussing
      with a few on this list and at Linus' suggestion, the initial import of these
      functions is coming through the networking tree. After these are merged, it
      will then be easier to expand use elsewhere.
      
      Changes v2->v3:
        - hsiphash keys now simply use an unsigned long, in order to avoid
          a cluttered ifdef and make it a bit more clear what's happening.
        - A typo in the documentation has been fixed.
        - The documentation has been augmented with an example relating to struct
          packing and passing.
        - The net_secret variable is now __read_mostly.
      
      Hopefully this is the last of the required revisions, and v3 can be merged
      into net-next.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17650f22
    • J
      syncookies: use SipHash in place of SHA1 · fe62d05b
      Jason A. Donenfeld 提交于
      SHA1 is slower and less secure than SipHash, and so replacing syncookie
      generation with SipHash makes natural sense. Some BSDs have been doing
      this for several years in fact.
      
      The speedup should be similar -- and even more impressive -- to the
      speedup from the sequence number fix in this series.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe62d05b
    • J
      secure_seq: use SipHash in place of MD5 · 7cd23e53
      Jason A. Donenfeld 提交于
      This gives a clear speed and security improvement. Siphash is both
      faster and is more solid crypto than the aging MD5.
      
      Rather than manually filling MD5 buffers, for IPv6, we simply create
      a layout by a simple anonymous struct, for which gcc generates
      rather efficient code. For IPv4, we pass the values directly to the
      short input convenience functions.
      
      64-bit x86_64:
      [    1.683628] secure_tcpv6_sequence_number_md5# cycles: 99563527
      [    1.717350] secure_tcp_sequence_number_md5# cycles: 92890502
      [    1.741968] secure_tcpv6_sequence_number_siphash# cycles: 67825362
      [    1.762048] secure_tcp_sequence_number_siphash# cycles: 67485526
      
      32-bit x86:
      [    1.600012] secure_tcpv6_sequence_number_md5# cycles: 103227892
      [    1.634219] secure_tcp_sequence_number_md5# cycles: 94732544
      [    1.669102] secure_tcpv6_sequence_number_siphash# cycles: 96299384
      [    1.700165] secure_tcp_sequence_number_siphash# cycles: 86015473
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7cd23e53
    • J
      siphash: implement HalfSipHash1-3 for hash tables · 1ae2324f
      Jason A. Donenfeld 提交于
      HalfSipHash, or hsiphash, is a shortened version of SipHash, which
      generates 32-bit outputs using a weaker 64-bit key. It has *much* lower
      security margins, and shouldn't be used for anything too sensitive, but
      it could be used as a hashtable key function replacement, if the output
      is never exposed, and if the security requirement is not too high.
      
      The goal is to make this something that performance-critical jhash users
      would be willing to use.
      
      On 64-bit machines, HalfSipHash1-3 is slower than SipHash1-3, so we alias
      SipHash1-3 to HalfSipHash1-3 on those systems.
      
      64-bit x86_64:
      [    0.509409] test_siphash:     SipHash2-4 cycles: 4049181
      [    0.510650] test_siphash:     SipHash1-3 cycles: 2512884
      [    0.512205] test_siphash: HalfSipHash1-3 cycles: 3429920
      [    0.512904] test_siphash:    JenkinsHash cycles:  978267
      So, we map hsiphash() -> SipHash1-3
      
      32-bit x86:
      [    0.509868] test_siphash:     SipHash2-4 cycles: 14812892
      [    0.513601] test_siphash:     SipHash1-3 cycles:  9510710
      [    0.515263] test_siphash: HalfSipHash1-3 cycles:  3856157
      [    0.515952] test_siphash:    JenkinsHash cycles:  1148567
      So, we map hsiphash() -> HalfSipHash1-3
      
      hsiphash() is roughly 3 times slower than jhash(), but comes with a
      considerable security improvement.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: NJean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ae2324f
    • J
      siphash: add cryptographically secure PRF · 2c956a60
      Jason A. Donenfeld 提交于
      SipHash is a 64-bit keyed hash function that is actually a
      cryptographically secure PRF, like HMAC. Except SipHash is super fast,
      and is meant to be used as a hashtable keyed lookup function, or as a
      general PRF for short input use cases, such as sequence numbers or RNG
      chaining.
      
      For the first usage:
      
      There are a variety of attacks known as "hashtable poisoning" in which an
      attacker forms some data such that the hash of that data will be the
      same, and then preceeds to fill up all entries of a hashbucket. This is
      a realistic and well-known denial-of-service vector. Currently
      hashtables use jhash, which is fast but not secure, and some kind of
      rotating key scheme (or none at all, which isn't good). SipHash is meant
      as a replacement for jhash in these cases.
      
      There are a modicum of places in the kernel that are vulnerable to
      hashtable poisoning attacks, either via userspace vectors or network
      vectors, and there's not a reliable mechanism inside the kernel at the
      moment to fix it. The first step toward fixing these issues is actually
      getting a secure primitive into the kernel for developers to use. Then
      we can, bit by bit, port things over to it as deemed appropriate.
      
      While SipHash is extremely fast for a cryptographically secure function,
      it is likely a bit slower than the insecure jhash, and so replacements
      will be evaluated on a case-by-case basis based on whether or not the
      difference in speed is negligible and whether or not the current jhash usage
      poses a real security risk.
      
      For the second usage:
      
      A few places in the kernel are using MD5 or SHA1 for creating secure
      sequence numbers, syn cookies, port numbers, or fast random numbers.
      SipHash is a faster and more fitting, and more secure replacement for MD5
      in those situations. Replacing MD5 and SHA1 with SipHash for these uses is
      obvious and straight-forward, and so is submitted along with this patch
      series. There shouldn't be much of a debate over its efficacy.
      
      Dozens of languages are already using this internally for their hash
      tables and PRFs. Some of the BSDs already use this in their kernels.
      SipHash is a widely known high-speed solution to a widely known set of
      problems, and it's time we catch-up.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: NJean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c956a60
    • D
      net: ipv4: remove disable of bottom half in inet_rtm_getroute · eafea739
      David Ahern 提交于
      Nothing about the route lookup requires bottom half to be disabled.
      Remove the local_bh_disable ... local_bh_enable around ip_route_input.
      This appears to be a vestige of days gone by as it has been there
      since the beginning of git time.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eafea739
    • P
      net: intel: e100: use new api ethtool_{get|set}_link_ksettings · 6b0c06e0
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b0c06e0
    • P
      net: ibm: ibmvnic: use new api ethtool_{get|set}_link_ksettings · 8a43379f
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a43379f
    • P
      net: ibm: ibmveth: use new api ethtool_{get|set}_link_ksettings · 9ce8c2df
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ce8c2df
    • P
      net: ibm: emac: use new api ethtool_{get|set}_link_ksettings · e4ccf764
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4ccf764
    • P
      net: ibm: ehea: use new api ethtool_{get|set}_link_ksettings · cecf62d6
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cecf62d6
    • Y
      net: change init_inodecache() return void · 1e911632
      yuan linyu 提交于
      sock_init() call it but not check it's return value,
      so change it to void return and add an internal BUG_ON() check.
      Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e911632
  2. 09 1月, 2017 14 次提交
    • D
      Merge branch 'tc-skb-diet' · 4289e60c
      David S. Miller 提交于
      Willem de Bruijn says:
      
      ====================
      convert tc_verd to integer bitfields
      
      The skb tc_verd field takes up two bytes but uses far fewer bits.
      Convert the remaining use cases to bitfields that fit in existing
      holes (depending on config options) and potentially save the two
      bytes in struct sk_buff.
      
      This patchset is based on an earlier set by Florian Westphal and its
      discussion (http://www.spinics.net/lists/netdev/msg329181.html).
      
      Patches 1 and 2 are low hanging fruit: removing the last traces of
        data that are no longer stored in tc_verd.
      
      Patches 3 and 4 convert tc_verd to individual bitfields (5 bits).
      
      Patch 5 reduces TC_AT to a single bitfield,
        as AT_STACK is not valid here (unlike in the case of TC_FROM).
      
      Patch 6 changes TC_FROM to two bitfields with clearly defined purpose.
      
      It may be possible to reduce storage further after this initial round.
      If tc_skip_classify is set only by IFB, testing skb_iif may suffice.
      The L2 header pushing/popping logic can perhaps be shared with
      AF_PACKET, which currently not pkt_type for the same purpose.
      
      Changes:
        RFC -> v1
          - (patch 3): remove no longer needed label in tfc_action_exec
          - (patch 5): set tc_at_ingress at the same points as existing
                       SET_TC_AT calls
      
      Tested ingress mirred + netem + ifb:
      
        ip link set dev ifb0 up
        tc qdisc add dev eth0 ingress
        tc filter add dev eth0 parent ffff: \
          u32 match ip dport 8000 0xffff \
          action mirred egress redirect dev ifb0
        tc qdisc add dev ifb0 root netem delay 1000ms
        nc -u -l 8000 &
        ssh $otherhost nc -u $host 8000
      
      Tested egress mirred:
      
        ip link add veth1 type veth peer name veth2
        ip link set dev veth1 up
        ip link set dev veth2 up
        tcpdump -n -i veth2 udp and dst port 8000 &
      
        tc qdisc add dev eth0 root handle 1: prio
        tc filter add dev eth0 parent 1:0 \
          u32 match ip dport 8000 0xffff \
          action mirred egress redirect dev veth1
        tc qdisc add dev veth1 root netem delay 1000ms
        nc -u $otherhost 8000
      
      Tested ingress mirred:
      
        ip link add veth1 type veth peer name veth2
        ip link add veth3 type veth peer name veth4
      
        ip netns add ns0
        ip netns add ns1
      
        for i in 1 2 3 4; do \
          NS=ns$((${i}%2)); \
          ip link set dev veth${i} netns ${NS}; \
          ip netns exec ${NS} \
            ip addr add dev veth${i} 192.168.1.${i}/24; \
          ip netns exec ${NS} \
            ip link set dev veth${i} up; \
        done
      
        ip netns exec ns0 tc qdisc add dev veth2 ingress
        ip netns exec ns0 \
          tc filter add dev veth2 parent ffff: \
            u32 match ip dport 8000 0xffff \
            action mirred ingress redirect dev veth4
      
        ip netns exec ns0 \
          tcpdump -n -i veth4 udp and dst port 8000 &
        ip netns exec ns1 \
          nc -u 192.168.1.2 8000
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4289e60c
    • W
      net-tc: convert tc_from to tc_from_ingress and tc_redirected · bc31c905
      Willem de Bruijn 提交于
      The tc_from field fulfills two roles. It encodes whether a packet was
      redirected by an act_mirred device and, if so, whether act_mirred was
      called on ingress or egress. Split it into separate fields.
      
      The information is needed by the special IFB loop, where packets are
      taken out of the normal path by act_mirred, forwarded to IFB, then
      reinjected at their original location (ingress or egress) by IFB.
      
      The IFB device cannot use skb->tc_at_ingress, because that may have
      been overwritten as the packet travels from act_mirred to ifb_xmit,
      when it passes through tc_classify on the IFB egress path. Cache this
      value in skb->tc_from_ingress.
      
      That field is valid only if a packet arriving at ifb_xmit came from
      act_mirred. Other packets can be crafted to reach ifb_xmit. These
      must be dropped. Set tc_redirected on redirection and drop all packets
      that do not have this bit set.
      
      Both fields are set only on cloned skbs in tc actions, so original
      packet sources do not have to clear the bit when reusing packets
      (notably, pktgen and octeon).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc31c905
    • W
      net-tc: convert tc_at to tc_at_ingress · 8dc07fdb
      Willem de Bruijn 提交于
      Field tc_at is used only within tc actions to distinguish ingress from
      egress processing. A single bit is sufficient for this purpose.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dc07fdb
    • W
      net-tc: convert tc_verd to integer bitfields · a5135bcf
      Willem de Bruijn 提交于
      Extract the remaining two fields from tc_verd and remove the __u16
      completely. TC_AT and TC_FROM are converted to equivalent two-bit
      integer fields tc_at and tc_from. Where possible, use existing
      helper skb_at_tc_ingress when reading tc_at. Introduce helper
      skb_reset_tc to clear fields.
      
      Not documenting tc_from and tc_at, because they will be replaced
      with single bit fields in follow-on patches.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5135bcf
    • W
      net-tc: extract skip classify bit from tc_verd · e7246e12
      Willem de Bruijn 提交于
      Packets sent by the IFB device skip subsequent tc classification.
      A single bit governs this state. Move it out of tc_verd in
      anticipation of removing that __u16 completely.
      
      The new bitfield tc_skip_classify temporarily uses one bit of a
      hole, until tc_verd is removed completely in a follow-up patch.
      
      Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
      With that many options, little value in documenting it.
      
      Introduce a helper function to deduplicate the logic in the two
      sites that check this bit.
      
      The field tc_skip_classify is set only in IFB on skbs cloned in
      act_mirred, so original packet sources do not have to clear the
      bit when reusing packets (notably, pktgen and octeon).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7246e12
    • W
      net-tc: make MAX_RECLASSIFY_LOOP local · d6264071
      Willem de Bruijn 提交于
      This field is no longer kept in tc_verd. Remove it from the global
      definition of that struct.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6264071
    • W
      net-tc: remove unused tc_verd fields · aec745e2
      Willem de Bruijn 提交于
      Remove the last reference to tc_verd's munge and redirect ttl bits.
      These fields are no longer used.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aec745e2
    • F
      mdio: Demote print from info to debug in mdio_device_register · 29b84f20
      Florian Fainelli 提交于
      While it is useful to know which MDIO device is being registered, demote
      the dev_info() to a dev_dbg().
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29b84f20
    • S
      net: remove useless memset's in drivers get_stats64 · 5944701d
      stephen hemminger 提交于
      In dev_get_stats() the statistic structure storage has already been
      zeroed. Therefore network drivers do not need to call memset() again.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5944701d
    • S
      net: make ndo_get_stats64 a void function · bc1f4470
      stephen hemminger 提交于
      The network device operation for reading statistics is only called
      in one place, and it ignores the return value. Having a structure
      return value is potentially confusing because some future driver could
      incorrectly assume that the return value was used.
      
      Fix all drivers with ndo_get_stats64 to have a void function.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc1f4470
    • D
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 63c64de7
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2017-01-08
      
      This series contains updates to fm10k only.
      
      Ngai-Mint changes the driver to use the MAC pointer in the fm10k_mac_info
      structure for fm10k_get_host_state_generic().  Fixed a race condition
      where the mailbox interrupt request bits can be cleared before being
      handled causing certain mailbox messages from the PF to be untreated
      and the PF will enter in some inactive state.
      
      Jake removes the typecast of u8 to char, and the extra variable that was
      created for the typecast.  Bumps the driver version.  Added back the
      receive descriptor timestamp value so that applications built on top
      of the IES API can function properly.  Cleaned up the debug statistics
      flag, since debug statistics were removed and the flag was missed in
      the removal.
      
      Scott limits the DMA sync for CPU to the actual length of the packet,
      instead of the entire buffer, since the DMA sync occurs every time a
      packet is received.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63c64de7
    • D
      net: ipv4: Remove flow arg from ip_mkroute_input · dc33da59
      David Ahern 提交于
      fl4 arg is not used; remove it.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc33da59
    • D
      net: ipmr: Remove nowait arg to ipmr_get_route · 9f09eaea
      David Ahern 提交于
      ipmr_get_route has 1 caller and the nowait arg is 0. Remove the arg and
      simplify ipmr_get_route accordingly.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f09eaea
    • D
      liquidio: simplify octeon_flush_iq() · 60889869
      Derek Chickles 提交于
      Because every call to octeon_flush_iq() has a hardcoded 1 for the
      pending_thresh argument, simplify that function by removing that argument.
      This avoids one atomic read as well.
      Signed-off-by: NDerek Chickles <derek.chickles@cavium.com>
      Signed-off-by: NFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: NSatanand Burla <satananda.burla@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60889869
  3. 08 1月, 2017 12 次提交