1. 29 6月, 2018 25 次提交
  2. 28 6月, 2018 15 次提交
    • D
      Merge branch 'net-preserve-sock-reference-when-scrubbing-the-skb' · 16c0cd07
      David S. Miller 提交于
      Flavio Leitner says:
      
      ====================
      net: preserve sock reference when scrubbing the skb.
      
      The sock reference is lost when scrubbing the packet and that breaks
      TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
      performance impacts of about 50% in a single TCP stream when crossing
      network namespaces.
      
      XPS breaks because the queue mapping stored in the socket is not
      available, so another random queue might be selected when the stack
      needs to transmit something like a TCP ACK, or TCP Retransmissions.
      That causes packet re-ordering and/or performance issues.
      
      TSQ breaks because it orphans the packet while it is still in the
      host, so packets are queued contributing to the buffer bloat problem.
      
      Preserving the sock reference fixes both issues. The socket is
      orphaned anyways in the receiving path before any relevant action,
      but the transmit side needs some extra checking included in the
      first patch.
      
      The first patch will update netfilter to check if the socket
      netns is local before use it.
      
      The second patch removes the skb_orphan() from the skb_scrub_packet()
      and improve the documentation.
      
      ChangeLog:
      - split into two (Eric)
      - addressed Paolo's offline feedback to swap the checks in xt_socket.c
        to preserve original behavior.
      - improved ip-sysctl.txt (reported by Cong)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16c0cd07
    • F
      skbuff: preserve sock reference when scrubbing the skb. · 9c4c3252
      Flavio Leitner 提交于
      The sock reference is lost when scrubbing the packet and that breaks
      TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
      performance impacts of about 50% in a single TCP stream when crossing
      network namespaces.
      
      XPS breaks because the queue mapping stored in the socket is not
      available, so another random queue might be selected when the stack
      needs to transmit something like a TCP ACK, or TCP Retransmissions.
      That causes packet re-ordering and/or performance issues.
      
      TSQ breaks because it orphans the packet while it is still in the
      host, so packets are queued contributing to the buffer bloat problem.
      
      Preserving the sock reference fixes both issues. The socket is
      orphaned anyways in the receiving path before any relevant action
      and on TX side the netfilter checks if the reference is local before
      use it.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c4c3252
    • F
      netfilter: check if the socket netns is correct. · f5646501
      Flavio Leitner 提交于
      Netfilter assumes that if the socket is present in the skb, then
      it can be used because that reference is cleaned up while the skb
      is crossing netns.
      
      We want to change that to preserve the socket reference in a future
      patch, so this is a preparation updating netfilter to check if the
      socket netns matches before use it.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5646501
    • D
      Merge branch 'net-sched-actions-code-style-cleanup-and-fixes' · 003504a2
      David S. Miller 提交于
      Roman Mashak says:
      
      ====================
      net sched actions: code style cleanup and fixes
      
      The patchset fixes a few code stylistic issues and typos, as well as one
      detected by sparse semantic checker tool.
      
      No functional changes introduced.
      
      Patch 1 & 2 fix coding style bits caught by the checkpatch.pl script
      Patch 3 fixes an issue with a shadowed variable
      Patch 4 adds sizeof() operator instead of magic number for buffer length
      Patch 5 fixes typos in diagnostics messages
      Patch 6 explicitly sets unsigned char for bitwise operation
      
      v2:
         - submit for net-next
         - added Reviewed-by tags
         - use u8* instead of char* as per Davide Caratti suggestion
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      003504a2
    • R
      net sched actions: avoid bitwise operation on signed value in pedit · 43052741
      Roman Mashak 提交于
      Since char can be unsigned or signed, and bitwise operators may have
      implementation-dependent results when performed on signed operands,
      declare 'u8 *' operand instead.
      Suggested-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43052741
    • R
      net sched actions: fix misleading text strings in pedit action · 95b0d2dc
      Roman Mashak 提交于
      Change "tc filter pedit .." to "tc actions pedit .." in error
      messages to clearly refer to pedit action.
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95b0d2dc
    • R
      net sched actions: use sizeof operator for buffer length · 6ff7586e
      Roman Mashak 提交于
      Replace constant integer with sizeof() to clearly indicate
      the destination buffer length in skb_header_pointer() calls.
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ff7586e
    • R
      net sched actions: fix sparse warning · 544377cd
      Roman Mashak 提交于
      The variable _data in include/asm-generic/sections.h defines sections,
      this causes sparse warning in pedit:
      
      net/sched/act_pedit.c:293:35: warning: symbol '_data' shadows an earlier one
      ./include/asm-generic/sections.h:36:13: originally declared here
      
      Therefore rename the variable.
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      544377cd
    • R
      net sched actions: fix coding style in pedit headers · d020d455
      Roman Mashak 提交于
      Fix coding style issues in tc pedit headers detected by the
      checkpatch script.
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d020d455
    • R
      net sched actions: fix coding style in pedit action · 80f0f574
      Roman Mashak 提交于
      Fix coding style issues in tc pedit action detected by the
      checkpatch script.
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80f0f574
    • Y
      netem: slotting with non-uniform distribution · 0a9fe5c3
      Yousuk Seung 提交于
      Extend slotting with support for non-uniform distributions. This is
      similar to netem's non-uniform distribution delay feature.
      
      Commit f043efeae2f1 ("netem: support delivering packets in delayed
      time slots") added the slotting feature to approximate the behaviors
      of media with packet aggregation but only supported a uniform
      distribution for delays between transmission attempts. Tests with TCP
      BBR with emulated wifi links with non-uniform distributions produced
      more useful results.
      
      Syntax:
         slot dist DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
            [bytes MAX_BYTES]
      
      The syntax and use of the distribution table is the same as in the
      non-uniform distribution delay feature. A file DISTRIBUTION must be
      present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
      NETEM_DIST_SCALE. A random value x is selected from the table and it
      takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
      supported.
      
      Examples:
        Normal distribution delay with mean = 800us and stdev = 100us.
        > tc qdisc add dev eth0 root netem slot dist normal 800us 100us
      
        Optionally set the max slot size in bytes and/or packets.
        > tc qdisc add dev eth0 root netem slot dist normal 800us 100us \
          bytes 64k packets 42
      Signed-off-by: NYousuk Seung <ysseung@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a9fe5c3
    • D
      netlink: Return extack message if attribute validation fails · 7861552c
      David Ahern 提交于
      Have one extack message for parsing and validating.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7861552c
    • B
      net: phy: xgmiitorgmii: Check read_status results · 8d0752d1
      Brandon Maier 提交于
      We're ignoring the result of the attached phy device's read_status().
      Return it so we can detect errors.
      Signed-off-by: NBrandon Maier <brandon.maier@rockwellcollins.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d0752d1
    • B
      net: phy: xgmiitorgmii: Use correct mdio bus · cf31ea71
      Brandon Maier 提交于
      The xgmiitorgmii is using the mii_bus of the device it's attached to,
      instead of the bus it was given during probe.
      Signed-off-by: NBrandon Maier <brandon.maier@rockwellcollins.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf31ea71
    • B
      net: phy: xgmiitorgmii: Check phy_driver ready before accessing · ab4e6ee5
      Brandon Maier 提交于
      Since a phy_device is added to the global mdio_bus list during
      phy_device_register(), but a phy_device's phy_driver doesn't get
      attached until phy_probe(). It's possible of_phy_find_device() in
      xgmiitorgmii will return a valid phy with a NULL phy_driver. Leading to
      a NULL pointer access during the memcpy().
      
      Fixes this Oops:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = c0004000
      [00000000] *pgd=00000000
      Internal error: Oops: 5 [#1] PREEMPT SMP ARM
      Modules linked in:
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.40 #1
      Hardware name: Xilinx Zynq Platform
      task: ce4c8d00 task.stack: ce4ca000
      PC is at memcpy+0x48/0x330
      LR is at xgmiitorgmii_probe+0x90/0xe8
      pc : [<c074bc68>]    lr : [<c0529548>]    psr: 20000013
      sp : ce4cbb54  ip : 00000000  fp : ce4cbb8c
      r10: 00000000  r9 : 00000000  r8 : c0c49178
      r7 : 00000000  r6 : cdc14718  r5 : ce762800  r4 : cdc14710
      r3 : 00000000  r2 : 00000054  r1 : 00000000  r0 : cdc14718
      Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      Control: 18c5387d  Table: 0000404a  DAC: 00000051
      Process swapper/0 (pid: 1, stack limit = 0xce4ca210)
      ...
      [<c074bc68>] (memcpy) from [<c0529548>] (xgmiitorgmii_probe+0x90/0xe8)
      [<c0529548>] (xgmiitorgmii_probe) from [<c0526a94>] (mdio_probe+0x28/0x34)
      [<c0526a94>] (mdio_probe) from [<c04db98c>] (driver_probe_device+0x254/0x414)
      [<c04db98c>] (driver_probe_device) from [<c04dbd58>] (__device_attach_driver+0xac/0x10c)
      [<c04dbd58>] (__device_attach_driver) from [<c04d96f4>] (bus_for_each_drv+0x84/0xc8)
      [<c04d96f4>] (bus_for_each_drv) from [<c04db5bc>] (__device_attach+0xd0/0x134)
      [<c04db5bc>] (__device_attach) from [<c04dbdd4>] (device_initial_probe+0x1c/0x20)
      [<c04dbdd4>] (device_initial_probe) from [<c04da8fc>] (bus_probe_device+0x98/0xa0)
      [<c04da8fc>] (bus_probe_device) from [<c04d8660>] (device_add+0x43c/0x5d0)
      [<c04d8660>] (device_add) from [<c0526cb8>] (mdio_device_register+0x34/0x80)
      [<c0526cb8>] (mdio_device_register) from [<c0580b48>] (of_mdiobus_register+0x170/0x30c)
      [<c0580b48>] (of_mdiobus_register) from [<c05349c4>] (macb_probe+0x710/0xc00)
      [<c05349c4>] (macb_probe) from [<c04dd700>] (platform_drv_probe+0x44/0x80)
      [<c04dd700>] (platform_drv_probe) from [<c04db98c>] (driver_probe_device+0x254/0x414)
      [<c04db98c>] (driver_probe_device) from [<c04dbc58>] (__driver_attach+0x10c/0x118)
      [<c04dbc58>] (__driver_attach) from [<c04d9600>] (bus_for_each_dev+0x8c/0xd0)
      [<c04d9600>] (bus_for_each_dev) from [<c04db1fc>] (driver_attach+0x2c/0x30)
      [<c04db1fc>] (driver_attach) from [<c04daa98>] (bus_add_driver+0x50/0x260)
      [<c04daa98>] (bus_add_driver) from [<c04dc440>] (driver_register+0x88/0x108)
      [<c04dc440>] (driver_register) from [<c04dd6b4>] (__platform_driver_register+0x50/0x58)
      [<c04dd6b4>] (__platform_driver_register) from [<c0b31248>] (macb_driver_init+0x24/0x28)
      [<c0b31248>] (macb_driver_init) from [<c010203c>] (do_one_initcall+0x60/0x1a4)
      [<c010203c>] (do_one_initcall) from [<c0b00f78>] (kernel_init_freeable+0x15c/0x1f8)
      [<c0b00f78>] (kernel_init_freeable) from [<c0763d10>] (kernel_init+0x18/0x124)
      [<c0763d10>] (kernel_init) from [<c0112d74>] (ret_from_fork+0x14/0x20)
      Code: ba000002 f5d1f03c f5d1f05c f5d1f07c (e8b151f8)
      ---[ end trace 3e4ec21905820a1f ]---
      Signed-off-by: NBrandon Maier <brandon.maier@rockwellcollins.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab4e6ee5