1. 20 1月, 2015 7 次提交
  2. 18 1月, 2015 1 次提交
    • D
      net: sctp: fix race for one-to-many sockets in sendmsg's auto associate · 2061dcd6
      Daniel Borkmann 提交于
      I.e. one-to-many sockets in SCTP are not required to explicitly
      call into connect(2) or sctp_connectx(2) prior to data exchange.
      Instead, they can directly invoke sendmsg(2) and the SCTP stack
      will automatically trigger connection establishment through 4WHS
      via sctp_primitive_ASSOCIATE(). However, this in its current
      implementation is racy: INIT is being sent out immediately (as
      it cannot be bundled anyway) and the rest of the DATA chunks are
      queued up for later xmit when connection is established, meaning
      sendmsg(2) will return successfully. This behaviour can result
      in an undesired side-effect that the kernel made the application
      think the data has already been transmitted, although none of it
      has actually left the machine, worst case even after close(2)'ing
      the socket.
      
      Instead, when the association from client side has been shut down
      e.g. first gracefully through SCTP_EOF and then close(2), the
      client could afterwards still receive the server's INIT_ACK due
      to a connection with higher latency. This INIT_ACK is then considered
      out of the blue and hence responded with ABORT as there was no
      alive assoc found anymore. This can be easily reproduced f.e.
      with sctp_test application from lksctp. One way to fix this race
      is to wait for the handshake to actually complete.
      
      The fix defers waiting after sctp_primitive_ASSOCIATE() and
      sctp_primitive_SEND() succeeded, so that DATA chunks cooked up
      from sctp_sendmsg() have already been placed into the output
      queue through the side-effect interpreter, and therefore can then
      be bundeled together with COOKIE_ECHO control chunks.
      
      strace from example application (shortened):
      
      socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
      sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
                 msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
      sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
                 msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
      sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
                 msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
      sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
                 msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
      sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
                 msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...},
                 msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF
      close(3) = 0
      
      tcpdump before patch (fooling the application):
      
      22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684]
      22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591]
      22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT]
      
      tcpdump after patch:
      
      14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729]
      14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492]
      14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...]
      14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0]
      14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...]
      14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0]
      14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...]
      14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0]
      14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN]
      14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK]
      14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE]
      
      Looks like this bug is from the pre-git history museum. ;)
      
      Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2061dcd6
  3. 17 1月, 2015 3 次提交
    • J
      genetlink: synchronize socket closing and family removal · ee1c2442
      Johannes Berg 提交于
      In addition to the problem Jeff Layton reported, I looked at the code
      and reproduced the same warning by subscribing and removing the genl
      family with a socket still open. This is a fairly tricky race which
      originates in the fact that generic netlink allows the family to go
      away while sockets are still open - unlike regular netlink which has
      a module refcount for every open socket so in general this cannot be
      triggered.
      
      Trying to resolve this issue by the obvious locking isn't possible as
      it will result in deadlocks between unregistration and group unbind
      notification (which incidentally lockdep doesn't find due to the home
      grown locking in the netlink table.)
      
      To really resolve this, introduce a "closing socket" reference counter
      (for generic netlink only, as it's the only affected family) in the
      core netlink code and use that in generic netlink to wait for all the
      sockets that are being closed at the same time as a generic netlink
      family is removed.
      
      This fixes the race that when a socket is closed, it will should call
      the unbind, but if the family is removed at the same time the unbind
      will not find it, leading to the warning. The real problem though is
      that in this case the unbind could actually find a new family that is
      registered to have a multicast group with the same ID, and call its
      mcast_unbind() leading to confusing.
      
      Also remove the warning since it would still trigger, but is now no
      longer a problem.
      
      This also moves the code in af_netlink.c to before unreferencing the
      module to avoid having the same problem in the normal non-genl case.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee1c2442
    • J
      genetlink: disallow subscribing to unknown mcast groups · 5ad63005
      Johannes Berg 提交于
      Jeff Layton reported that he could trigger the multicast unbind warning
      in generic netlink using trinity. I originally thought it was a race
      condition between unregistering the generic netlink family and closing
      the socket, but there's a far simpler explanation: genetlink currently
      allows subscribing to groups that don't (yet) exist, and the warning is
      triggered when unsubscribing again while the group still doesn't exist.
      
      Originally, I had a warning in the subscribe case and accepted it out of
      userspace API concerns, but the warning was of course wrong and removed
      later.
      
      However, I now think that allowing userspace to subscribe to groups that
      don't exist is wrong and could possibly become a security problem:
      Consider a (new) genetlink family implementing a permission check in
      the mcast_bind() function similar to the like the audit code does today;
      it would be possible to bypass the permission check by guessing the ID
      and subscribing to the group it exists. This is only possible in case a
      family like that would be dynamically loaded, but it doesn't seem like a
      huge stretch, for example wireless may be loaded when you plug in a USB
      device.
      
      To avoid this reject such subscription attempts.
      
      If this ends up causing userspace issues we may need to add a workaround
      in af_netlink to deny such requests but not return an error.
      Reported-by: NJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ad63005
    • J
      genetlink: document parallel_ops · f555f3d7
      Johannes Berg 提交于
      The kernel-doc for the parallel_ops family struct member is
      missing, add it.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f555f3d7
  4. 16 1月, 2015 16 次提交
  5. 15 1月, 2015 13 次提交
    • A
      can: kvaser_usb: Don't dereference skb after a netif_rx() · a58518cc
      Ahmed S. Darwish 提交于
      We should not touch the packet after a netif_rx: it might
      get freed behind our back.
      Suggested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      a58518cc
    • A
      can: kvaser_usb: Don't send a RESET_CHIP for non-existing channels · 5e7e6e0c
      Ahmed S. Darwish 提交于
      Recent Leaf firmware versions (>= 3.1.557) do not allow to send
      commands for non-existing channels.  If a command is sent for a
      non-existing channel, the firmware crashes.
      Reported-by: NChristopher Storah <Christopher.Storah@invetech.com.au>
      Signed-off-by: NOlivier Sobrie <olivier@sobrie.be>
      Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      5e7e6e0c
    • A
      can: kvaser_usb: Reset all URB tx contexts upon channel close · 889b77f7
      Ahmed S. Darwish 提交于
      Flooding the Kvaser CAN to USB dongle with multiple reads and
      writes in very high frequency (*), closing the CAN channel while
      all the transmissions are on (#), opening the device again (@),
      then sending a small number of packets would make the driver
      enter an almost infinite loop of:
      
      [....]
      [15959.853988] kvaser_usb 4-3:1.0 can0: cannot find free context
      [15959.853990] kvaser_usb 4-3:1.0 can0: cannot find free context
      [15959.853991] kvaser_usb 4-3:1.0 can0: cannot find free context
      [15959.853993] kvaser_usb 4-3:1.0 can0: cannot find free context
      [15959.853994] kvaser_usb 4-3:1.0 can0: cannot find free context
      [15959.853995] kvaser_usb 4-3:1.0 can0: cannot find free context
      [....]
      
      _dragging the whole system down_ in the process due to the
      excessive logging output.
      
      Initially, this has caused random panics in the kernel due to a
      buggy error recovery path.  That got fixed in an earlier commit.(%)
      This patch aims at solving the root cause. -->
      
      16 tx URBs and contexts are allocated per CAN channel per USB
      device. Such URBs are protected by:
      
      a) A simple atomic counter, up to a value of MAX_TX_URBS (16)
      b) A flag in each URB context, stating if it's free
      c) The fact that ndo_start_xmit calls are themselves protected
         by the networking layers higher above
      
      After grabbing one of the tx URBs, if the driver noticed that all
      of them are now taken, it stops the netif transmission queue.
      Such queue is worken up again only if an acknowedgment was received
      from the firmware on one of our earlier-sent frames.
      
      Meanwhile, upon channel close (#), the driver sends a CMD_STOP_CHIP
      to the firmware, effectively closing all further communication.  In
      the high traffic case, the atomic counter remains at MAX_TX_URBS,
      and all the URB contexts remain marked as active.  While opening
      the channel again (@), it cannot send any further frames since no
      more free tx URB contexts are available.
      
      Reset all tx URB contexts upon CAN channel close.
      
      (*) 50 parallel instances of `cangen0 -g 0 -ix`
      (#) `ifconfig can0 down`
      (@) `ifconfig can0 up`
      (%) "can: kvaser_usb: Don't free packets when tight on URBs"
      Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      889b77f7
    • A
      can: kvaser_usb: Don't free packets when tight on URBs · b442723f
      Ahmed S. Darwish 提交于
      Flooding the Kvaser CAN to USB dongle with multiple reads and
      writes in high frequency caused seemingly-random panics in the
      kernel.
      
      On further inspection, it seems the driver erroneously freed the
      to-be-transmitted packet upon getting tight on URBs and returning
      NETDEV_TX_BUSY, leading to invalid memory writes and double frees
      at a later point in time.
      
      Note:
      
      Finding no more URBs/transmit-contexts and returning NETDEV_TX_BUSY
      is a driver bug in and out of itself: it means that our start/stop
      queue flow control is broken.
      
      This patch only fixes the (buggy) error handling code; the root
      cause shall be fixed in a later commit.
      Acked-by: NOlivier Sobrie <olivier@sobrie.be>
      Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      b442723f
    • R
      can: c_can: use regmap_update_bits() to modify RAMINIT register · 47e3485a
      Roger Quadros 提交于
      use of regmap_read() and regmap_write() in c_can_hw_raminit_syscon()
      is not safe as the RAMINIT register can be shared between different drivers
      at least for TI SoCs.
      
      To make the modification atomic we switch to using regmap_update_bits().
      
      regmap_update_bits() skips writing to the register if it's read content is the
      same as what is going to be written. This causes an issue for us when we
      need to clear the DONE bit with the initial condition START:0, DONE:1 as
      DONE bit must be written with 1 to clear it.
      
      So we defer the clearing of DONE bit to later when we set the START bit.
      There we are sure that START bit is changed from 0 to 1 so the write of
      1 to already set DONE bit will happen.
      Signed-off-by: NRoger Quadros <rogerq@ti.com>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      47e3485a
    • O
      can: m_can: tag current CAN FD controllers as non-ISO · 6cfda7fb
      Oliver Hartkopp 提交于
      During the CAN FD standardization process within the ISO it turned out that
      the failure detection capability has to be improved.
      
      The CAN in Automation organization (CiA) defined the already implemented CAN
      FD controllers as 'non-ISO' and the upcoming improved CAN FD controllers as
      'ISO' compliant. See at http://www.can-cia.com/index.php?id=1937
      
      Finally there will be three types of CAN FD controllers in the future:
      
      1. ISO compliant (fixed)
      2. non-ISO compliant (fixed, like the M_CAN IP v3.0.1 in m_can.c)
      3. ISO/non-ISO CAN FD controllers (switchable, like the PEAK USB FD)
      
      So the current M_CAN driver for the M_CAN IP v3.0.1 has to expose its non-ISO
      implementation by setting the CAN_CTRLMODE_FD_NON_ISO ctrlmode at startup.
      As this bit cannot be switched at configuration time CAN_CTRLMODE_FD_NON_ISO
      must not be set in ctrlmode_supported of the current M_CAN driver.
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      6cfda7fb
    • O
      can: dev: fix crtlmode_supported check · 9b1087aa
      Oliver Hartkopp 提交于
      When changing flags in the CAN drivers ctrlmode the provided new content has to
      be checked whether the bits are allowed to be changed. The bits that are to be
      changed are given as a bitfield in cm->mask. Therefore checking against
      cm->flags is wrong as the content can hold any kind of values.
      
      The iproute2 tool sets the bits in cm->mask and cm->flags depending on the
      detected command line options. To be robust against bogus user space
      applications additionally sanitize the provided flags with the provided mask.
      
      Cc: Wolfgang Grandegger <wg@grandegger.com>
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      9b1087aa
    • M
      MAINTAINERS: update linux-can git repositories · 870482a4
      Marc Kleine-Budde 提交于
      The linux-can upstream git repositories are now hosted on kernel.org, update
      MAINTAINERS accordingly.
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      870482a4
    • S
      be2net: Allow GRE to work concurrently while a VxLAN tunnel is configured · 16dde0d6
      Sriharsha Basavapatna 提交于
      Other tunnels like GRE break while VxLAN offloads are enabled in Skyhawk-R. To
      avoid this, we should restrict offload features on a per-packet basis in such
      conditions.
      Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@emulex.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16dde0d6
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a6391a92
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Don't use uninitialized data in IPVS, from Dan Carpenter.
      
       2) conntrack race fixes from Pablo Neira Ayuso.
      
       3) Fix TX hangs with i40e, from Jesse Brandeburg.
      
       4) Fix budget return from poll calls in dnet and alx, from Eric
          Dumazet.
      
       5) Fix bugus "if (unlikely(x) < 0)" test in AF_PACKET, from Christoph
          Jaeger.
      
       6) Fix bug introduced by conversion to list_head in TIPC retransmit
          code, from Jon Paul Maloy.
      
       7) Don't use GFP_NOIO under spinlock in USB kaweth driver, from Alexey
          Khoroshilov.
      
       8) Fix bridge build with INET disabled, from Arnd Bergmann.
      
       9) Fix netlink array overrun for PROBE attributes in openvswitch, from
          Thomas Graf.
      
      10) Don't hold spinlock across synchronize_irq() in tg3 driver, from
          Prashant Sreedharan.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
        tg3: Release tp->lock before invoking synchronize_irq()
        tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
        tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
        team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
        openvswitch: packet messages need their own probe attribtue
        i40e: adds FCoE configure option
        cxgb4vf: Fix queue allocation for 40G adapter
        netdevice: Add missing parentheses in macro
        bridge: only provide proxy ARP when CONFIG_INET is enabled
        neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
        net: fec: fix MDIO bus assignement for dual fec SoC's
        xen-netfront: use different locks for Rx and Tx stats
        drivers: net: cpsw: fix multicast flush in dual emac mode
        cxgb4vf: Initialize mdio_addr before using it
        net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations
        usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()
        MAINTAINERS: add me as ibmveth maintainer
        tipc: fix bug in broadcast retransmit code
        update ip-sysctl.txt documentation (v2)
        net/at91_ether: prepare and unprepare clock
        ...
      a6391a92
    • D
      Merge branch 'tg3-net' · c637dbce
      David S. Miller 提交于
      Prashant Sreedharan says:
      
      ====================
      tg3: synchronize_irq() should be called without taking locks
      
      v2: Added Reported-by, Tested-by fields and reference to the thread that
          reported the problem
      
      This series addresses the problem reported by Peter Hurley in mail thread
      https://lkml.org/lkml/2015/1/12/1082
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c637dbce
    • P
      tg3: Release tp->lock before invoking synchronize_irq() · 932f19de
      Prashant Sreedharan 提交于
      synchronize_irq() can sleep waiting, for pending IRQ handlers so driver
      should release the tp->lock spin lock before invoking synchronize_irq()
      Reported-by: NPeter Hurley <peter@hurleysoftware.com>
      Tested-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NPrashant Sreedharan <prashant@broadcom.com>
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      932f19de
    • P
      tg3: tg3_reset_task() needs to use rtnl_lock to synchronize · db84bf43
      Prashant Sreedharan 提交于
      Currently tg3_reset_task() uses only tp->lock for synchronizing with code
      paths like tg3_open() etc. But since tp->lock is released before doing
      synchronize_irq(), rtnl_lock should be taken in tg3_reset_task() to
      synchronize it with other code paths.
      Reported-by: NPeter Hurley <peter@hurleysoftware.com>
      Tested-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NPrashant Sreedharan <prashant@broadcom.com>
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db84bf43