1. 19 2月, 2016 10 次提交
    • D
      Merge branch 'netlink-mmap-remove' · f169af2c
      David S. Miller 提交于
      Florian Westphal says:
      
      ====================
      netlink: remove mmapped netlink support
      
      As discussed during netconf 2016 in Seville, this series removes
      CONFIG_NETLINK_MMAP.
      
      Close to three years after it was merged it has retained several problems
      that do not appear to be fixable.
      
      No official netfilter libmnl release contains support for mmap backed netlink
      sockets. No openvswitch release makes use of it either.
      
      To use the mmap interface, userspace not only has to probe for mmap netlink
      support, it also has to implement a recv/socket receive path in order to
      handle messages that exceed the size of an rx ring element (NL_MMAP_STATUS_COPY).
      
      So if there are odd programs out there that attempt to use MMAP netlink
      they should continue to work as they already need a socket based code path
      to work properly.
      
      The actual revert (first patch) has a list of problems.
      The followup patches remove a couple of helpers that are no longer needed
      after the revert.
      
      I did a few tests with mmap vs. socket based interface on a 4.4 based
      kernel on an i7-4790 box and there are no performance advantages:
      
      loopback, single nfqueue, queueing in -t filter INPUT:
      traffic generated by 8 * ping -q -f localhost:
      socket backend:
      real    0m27.325s
      user    0m3.993s
      sys     0m23.292s
      
      with mmap ring backend:
      real    0m29.054s
      user    0m4.924s
      sys     0m24.127s
      
      with single tcp stream, unidirectional, loopback mtu set at 1500
      (nc localhost discard < /dev/zero > /dev/null):
      
      socket interface:
      time nfqdump -b $((8 * 1024 * 1024 * 1024)) -w /dev/null
      real    0m15.960s
      user    0m1.756s
      sys     0m11.143s
      
      mmap ring:
      real    0m16.441s
      user    0m3.040s
      sys     0m13.687s
      
      socket interface nfqdump[1] with --gso option (i.e. MTU is exceeded,
      no kernel-side segmentation and checksum fixups) completes in about 5s.
      
      I also tested dumping a conntrack table with 1m entries.
      On my box this takes about 2.4 seconds for both mmap and socket backend:
      
      time LD_PRELOAD=../../src/.libs/libmnl.so ./nfct-dump-sk > /dev/null
      mnl_cb_run: Success
      messages: 1000000
      real    0m2.485s
      user    0m1.085s
      sys     0m1.400s
      
      time LD_PRELOAD=../../src/.libs/libmnl.so ./nfct-dump-mmap > /dev/null
      messages: 1000000
      real    0m2.451s
      user    0m1.124s
      sys     0m1.328s
      
      [1] https://git.breakpoint.cc/cgit/fw/nfqdump.git/
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f169af2c
    • F
      nfnetlink: Revert "nfnetlink: add support for memory mapped netlink" · c5b0db32
      Florian Westphal 提交于
      reverts commit 3ab1f683 ("nfnetlink: add support for memory mapped
      netlink")'
      
      Like previous commits in the series, remove wrappers that are not needed
      after mmapped netlink removal.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5b0db32
    • F
      nfnetlink: remove nfnetlink_alloc_skb · 905f0a73
      Florian Westphal 提交于
      Following mmapped netlink removal this code can be simplified by
      removing the alloc wrapper.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      905f0a73
    • F
      Revert "genl: Add genlmsg_new_unicast() for unicast message allocation" · 263ea090
      Florian Westphal 提交于
      This reverts commit bb9b18fb ("genl: Add genlmsg_new_unicast() for
      unicast message allocation")'.
      
      Nothing wrong with it; its no longer needed since this was only for
      mmapped netlink support.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      263ea090
    • F
      openvswitch: Revert: "Enable memory mapped Netlink i/o" · 551ddc05
      Florian Westphal 提交于
      revert commit 795449d8 ("openvswitch: Enable memory mapped Netlink i/o").
      Following the mmaped netlink removal this code can be removed.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      551ddc05
    • F
      netlink: remove mmapped netlink support · d1b4c689
      Florian Westphal 提交于
      mmapped netlink has a number of unresolved issues:
      
      - TX zerocopy support had to be disabled more than a year ago via
        commit 4682a035 ("netlink: Always copy on mmap TX.")
        because the content of the mmapped area can change after netlink
        attribute validation but before message processing.
      
      - RX support was implemented mainly to speed up nfqueue dumping packet
        payload to userspace.  However, since commit ae08ce00
        ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy
        with the socket-based interface too (via the skb_zerocopy helper).
      
      The other problem is that skbs attached to mmaped netlink socket
      behave different from normal skbs:
      
      - they don't have a shinfo area, so all functions that use skb_shinfo()
      (e.g. skb_clone) cannot be used.
      
      - reserving headroom prevents userspace from seeing the content as
      it expects message to start at skb->head.
      See for instance
      commit aa3a0220 ("netlink: not trim skb for mmaped socket when dump").
      
      - skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we
      crash because it needs the sk to check if a tx ring is attached.
      
      Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf35
      ("netfilter: nfnetlink: use original skbuff when acking batches").
      
      mmaped netlink also didn't play nicely with the skb_zerocopy helper
      used by nfqueue and openvswitch.  Daniel Borkmann fixed this via
      commit 6bb0fef4 ("netlink, mmap: fix edge-case leakages in nf queue
      zero-copy")' but at the cost of also needing to provide remaining
      length to the allocation function.
      
      nfqueue also has problems when used with mmaped rx netlink:
      - mmaped netlink doesn't allow use of nfqueue batch verdict messages.
        Problem is that in the mmap case, the allocation time also determines
        the ordering in which the frame will be seen by userspace (A
        allocating before B means that A is located in earlier ring slot,
        but this also means that B might get a lower sequence number then A
        since seqno is decided later.  To fix this we would need to extend the
        spinlocked region to also cover the allocation and message setup which
        isn't desirable.
      - nfqueue can now be configured to queue large (GSO) skbs to userspace.
        Queing GSO packets is faster than having to force a software segmentation
        in the kernel, so this is a desirable option.  However, with a mmap based
        ring one has to use 64kb per ring slot element, else mmap has to fall back
        to the socket path (NL_MMAP_STATUS_COPY) for all large packets.
      
      To use the mmap interface, userspace not only has to probe for mmap netlink
      support, it also has to implement a recv/socket receive path in order to
      handle messages that exceed the size of an rx ring element.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1b4c689
    • J
    • I
      bridge: switchdev: Offload VLAN flags to hardware bridge · 7fbac984
      Ido Schimmel 提交于
      When VLANs are created / destroyed on a VLAN filtering bridge (MASTER
      flag set), the configuration is passed down to the hardware. However,
      when only the flags (e.g. PVID) are toggled, the configuration is done
      in the software bridge alone.
      
      While it is possible to pass these flags to hardware when invoked with
      the SELF flag set, this creates inconsistency with regards to the way
      the VLANs are initially configured.
      
      Pass the flags down to the hardware even when the VLAN already exists
      and only the flags are toggled.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fbac984
    • S
      net: phy: Add SGMII support for Marvell 88E1510/1512/1514/1518 · 930b37ee
      Stefan Roese 提交于
      Add code to select SGMII-to-copper mode upon SGMII interface selection.
      Signed-off-by: NStefan Roese <sr@denx.de>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      930b37ee
    • A
      isdn: divamnt: use y2038-safe ktime_get_ts64() for trace data timestamps · 096f6262
      Alison Schofield 提交于
      divamnt stores a start_time at module init and uses it to calculate
      elapsed time. The elapsed time, stored in secs and usecs, is part of
      the trace data the driver maintains for the DIVA Server ISDN cards.
      No change to the format of that time data is required.
      
      To avoid overflow on 32-bit systems use ktime_get_ts64() to return
      the elapsed monotonic time since system boot.
      
      This is a change from real to monotonic time. Since the driver only
      stores elapsed time, monotonic time is sufficient and more robust
      against real time clock changes. These new monotonic values can be
      more useful for debugging because they can be easily compared to
      other monotonic timestamps.
      
      Note elaspsed time values will now start at system boot time rather
      than module load time, so they will differ slightly from previously
      reported values.
      
      Remove declaration and init of previously unused time constants:
      start_sec, start_usec.
      Signed-off-by: NAlison Schofield <amsfield22@gmail.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      096f6262
  2. 18 2月, 2016 30 次提交