1. 07 3月, 2014 16 次提交
    • S
      bonding: correctly handle out of range parameters for lp_interval · 5bd4e4c1
      Sasha Levin 提交于
      We didn't correctly check cases where the value for lp_interval is not
      within the legal range due to a missing table terminator.
      
      This would let userspace trigger a kernel panic by specifying a value out
      of range:
      
      	echo -1 > /sys/devices/virtual/net/bond0/bonding/lp_interval
      
      Introduced by commit 4325b374 ("bonding: convert lp_interval to use
      the new option API").
      Acked-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bd4e4c1
    • A
      ipv6: Fix exthdrs offload registration. · d2d273ff
      Anton Nayshtut 提交于
      Without this fix, ipv6_exthdrs_offload_init doesn't register IPPROTO_DSTOPTS
      offload, but returns 0 (as the IPPROTO_ROUTING registration actually succeeds).
      
      This then causes the ipv6_gso_segment to drop IPv6 packets with IPPROTO_DSTOPTS
      header.
      
      The issue detected and the fix verified by running MS HCK Offload LSO test on
      top of QEMU Windows guests, as this test sends IPv6 packets with
      IPPROTO_DSTOPTS.
      Signed-off-by: NAnton Nayshtut <anton@swortex.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2d273ff
    • A
      ibmveth: Fix endian issues with MAC addresses · d746ca95
      Anton Blanchard 提交于
      The code to load a MAC address into a u64 for passing to the
      hypervisor via a register is broken on little endian.
      
      Create a helper function called ibmveth_encode_mac_addr
      which does the right thing in both big and little endian.
      
      We were storing the MAC address in a long in struct ibmveth_adapter.
      It's never used so remove it - we don't need another place in the
      driver where we create endian issues with MAC addresses.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d746ca95
    • A
      net: unix socket code abuses csum_partial · 0a13404d
      Anton Blanchard 提交于
      The unix socket code is using the result of csum_partial to
      hash into a lookup table:
      
      	unix_hash_fold(csum_partial(sunaddr, len, 0));
      
      csum_partial is only guaranteed to produce something that can be
      folded into a checksum, as its prototype explains:
      
       * returns a 32-bit number suitable for feeding into itself
       * or csum_tcpudp_magic
      
      The 32bit value should not be used directly.
      
      Depending on the alignment, the ppc64 csum_partial will return
      different 32bit partial checksums that will fold into the same
      16bit checksum.
      
      This difference causes the following testcase (courtesy of
      Gustavo) to sometimes fail:
      
      #include <sys/socket.h>
      #include <stdio.h>
      
      int main()
      {
      	int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
      
      	int i = 1;
      	setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);
      
      	struct sockaddr addr;
      	addr.sa_family = AF_LOCAL;
      	bind(fd, &addr, 2);
      
      	listen(fd, 128);
      
      	struct sockaddr_storage ss;
      	socklen_t sslen = (socklen_t)sizeof(ss);
      	getsockname(fd, (struct sockaddr*)&ss, &sslen);
      
      	fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
      
      	if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
      		perror(NULL);
      		return 1;
      	}
      	printf("OK\n");
      	return 0;
      }
      
      As suggested by davem, fix this by using csum_fold to fold the
      partial 32bit checksum into a 16bit checksum before using it.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a13404d
    • A
      net: Improve SO_TIMESTAMPING documentation and fix a minor code bug · adca4767
      Andrew Lutomirski 提交于
      The original documentation was very unclear.
      
      The code fix is presumably related to the formerly unclear
      documentation: SOCK_TIMESTAMPING_RX_SOFTWARE has no effect on
      __sock_recv_timestamp's behavior, so calling __sock_recv_ts_and_drops
      from sock_recv_ts_and_drops if only SOCK_TIMESTAMPING_RX_SOFTWARE is
      set is pointless.  This should have no user-observable effect.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adca4767
    • B
      phy: fix compiler array bounds warning on settings[] · 4ae6e50c
      Bjorn Helgaas 提交于
      With -Werror=array-bounds, gcc v4.7.x warns that in phy_find_valid(), the
      settings[] "array subscript is above array bounds", I think because idx is
      a signed integer and if the caller supplied idx < 0, we pass the guard but
      still reference out of bounds.
      
      Fix this by making idx unsigned here and elsewhere.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ae6e50c
    • F
      inet: frag: make sure forced eviction removes all frags · e588e2f2
      Florian Westphal 提交于
      Quoting Alexander Aring:
        While fragmentation and unloading of 6lowpan module I got this kernel Oops
        after few seconds:
      
        BUG: unable to handle kernel paging request at f88bbc30
        [..]
        Modules linked in: ipv6 [last unloaded: 6lowpan]
        Call Trace:
         [<c012af4c>] ? call_timer_fn+0x54/0xb3
         [<c012aef8>] ? process_timeout+0xa/0xa
         [<c012b66b>] run_timer_softirq+0x140/0x15f
      
      Problem is that incomplete frags are still around after unload; when
      their frag expire timer fires, we get crash.
      
      When a netns is removed (also done when unloading module), inet_frag
      calls the evictor with 'force' argument to purge remaining frags.
      
      The evictor loop terminates when accounted memory ('work') drops to 0
      or the lru-list becomes empty.  However, the mem accounting is done
      via percpu counters and may not be accurate, i.e. loop may terminate
      prematurely.
      
      Alter evictor to only stop once the lru list is empty when force is
      requested.
      Reported-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Reported-by: NAlexander Aring <alex.aring@gmail.com>
      Tested-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e588e2f2
    • D
      Merge branch 'tipc' · 409e1456
      David S. Miller 提交于
      Eric Hugne says:
      
      ====================
      tipc: refcount and memory leak fixes
      
      v3: Remove error logging from data path completely. Rebased on top of
          latest net merge.
      
      v2: Drop specific -ENOMEM logging in patch #1 (tipc: allow connection
          shutdown callback to be invoked in advance) And add a general error
          message if an internal server tries to send a message on a
          closed/nonexisting connection.
      
      In addition to the fix for refcount leak and memory leak during
      module removal, we also fix a problem where the topology server
      listening socket where unexpectedly closed. We also eliminate an
      unnecessary context switch during accept()/recvmsg() for nonblocking
      sockets.
      
      It might be good to include this patchset in stable aswell. After the
      v3 rebase on latest merge from net all patches apply cleanly on that
      tree.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      409e1456
    • E
      tipc: don't log disabled tasklet handler errors · 2892505e
      Erik Hugne 提交于
      Failure to schedule a TIPC tasklet with tipc_k_signal because the
      tasklet handler is disabled is not an error. It means TIPC is
      currently in the process of shutting down. We remove the error
      logging in this case.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2892505e
    • E
      tipc: fix memory leak during module removal · 1bb8dce5
      Erik Hugne 提交于
      When the TIPC module is removed, the tasklet handler is disabled
      before all other subsystems. This will cause lingering publications
      in the name table because the node_down tasklets responsible to
      clean up publications from an unreachable node will never run.
      When the name table is shut down, these publications are detected
      and an error message is logged:
      tipc: nametbl_stop(): orphaned hash chain detected
      This is actually a memory leak, introduced with commit
      993b858e ("tipc: correct the order
      of stopping services at rmmod")
      
      Instead of just logging an error and leaking memory, we free
      the orphaned entries during nametable shutdown.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bb8dce5
    • E
      tipc: drop subscriber connection id invalidation · edcc0511
      Erik Hugne 提交于
      When a topology server subscriber is disconnected, the associated
      connection id is set to zero. A check vs zero is then done in the
      subscription timeout function to see if the subscriber have been
      shut down. This is unnecessary, because all subscription timers
      will be cancelled when a subscriber terminates. Setting the
      connection id to zero is actually harmful because id zero is the
      identity of the topology server listening socket, and can cause a
      race that leads to this socket being closed instead.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edcc0511
    • Y
      tipc: avoid to unnecessary process switch under non-block mode · fe8e4649
      Ying Xue 提交于
      When messages are received via tipc socket under non-block mode,
      schedule_timeout() is called in tipc_wait_for_rcvmsg(), that is,
      the process of receiving messages will be scheduled once although
      timeout value passed to schedule_timeout() is 0. The same issue
      exists in accept()/wait_for_accept(). To avoid this unnecessary
      process switch, we only call schedule_timeout() if the timeout
      value is non-zero.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe8e4649
    • Y
      tipc: fix connection refcount leak · 4652edb7
      Ying Xue 提交于
      When tipc_conn_sendmsg() calls tipc_conn_lookup() to query a
      connection instance, its reference count value is increased if
      it's found. But subsequently if it's found that the connection is
      closed, the work of sending message is not queued into its server
      send workqueue, and the connection reference count is not decreased.
      This will cause a reference count leak. To reproduce this problem,
      an application would need to open and closes topology server
      connections with high intensity.
      
      We fix this by immediately decrementing the connection reference
      count if a send fails due to the connection being closed.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4652edb7
    • Y
      tipc: allow connection shutdown callback to be invoked in advance · 6d4ebeb4
      Ying Xue 提交于
      Currently connection shutdown callback function is called when
      connection instance is released in tipc_conn_kref_release(), and
      receiving packets and sending packets are running in different
      threads. Even if connection is closed by the thread of receiving
      packets, its shutdown callback may not be called immediately as
      the connection reference count is non-zero at that moment. So,
      although the connection is shut down by the thread of receiving
      packets, the thread of sending packets doesn't know it. Before
      its shutdown callback is invoked to tell the sending thread its
      connection has been closed, the sending thread may deliver
      messages by tipc_conn_sendmsg(), this is why the following error
      information appears:
      
      "Sending subscription event failed, no memory"
      
      To eliminate it, allow connection shutdown callback function to
      be called before connection id is removed in tipc_close_conn(),
      which makes the sending thread know the truth in time that its
      socket is closed so that it doesn't send message to it. We also
      remove the "Sending XXX failed..." error reporting for topology
      and config services.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d4ebeb4
    • G
      l2tp: fix userspace reception on plain L2TP sockets · 9e9cb622
      Guillaume Nault 提交于
      As pppol2tp_recv() never queues up packets to plain L2TP sockets,
      pppol2tp_recvmsg() never returns data to userspace, thus making
      the recv*() system calls unusable.
      
      Instead of dropping packets when the L2TP socket isn't bound to a PPP
      channel, this patch adds them to its reception queue.
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e9cb622
    • G
      l2tp: fix manual sequencing (de)activation in L2TPv2 · bb5016ea
      Guillaume Nault 提交于
      Commit e0d4435f "l2tp: Update PPP-over-L2TP driver to work over L2TPv3"
      broke the PPPOL2TP_SO_SENDSEQ setsockopt. The L2TP header length was
      previously computed by pppol2tp_l2t_header_len() before each call to
      l2tp_xmit_skb(). Now that header length is retrieved from the hdr_len
      session field, this field must be updated every time the L2TP header
      format is modified, or l2tp_xmit_skb() won't push the right amount of
      data for the L2TP header.
      
      This patch uses l2tp_session_set_header_len() to adjust hdr_len every
      time sequencing is (de)activated from userspace (either by the
      PPPOL2TP_SO_SENDSEQ setsockopt or the L2TP_ATTR_SEND_SEQ netlink
      attribute).
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb5016ea
  2. 06 3月, 2014 8 次提交
    • H
      hyperv: Move state setting for link query · 1b07da51
      Haiyang Zhang 提交于
      It moves the state setting for query into rndis_filter_receive_response().
      All callbacks including query-complete and status-callback are synchronized
      by channel->inbound_lock. This prevents pentential race between them.
      Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b07da51
    • S
      net: macb: DMA-unmap full rx-buffer · 48330e08
      Soren Brinkmann 提交于
      When allocating RX buffers a fixed size is used, while freeing is based
      on actually received bytes, resulting in the following kernel warning
      when CONFIG_DMA_API_DEBUG is enabled:
       WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1051 check_unmap+0x258/0x894()
       macb e000b000.ethernet: DMA-API: device driver frees DMA memory with different size [device address=0x000000002d170040] [map size=1536 bytes] [unmap size=60 bytes]
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc3-xilinx-00220-g49f84081ce4f #65
       [<c001516c>] (unwind_backtrace) from [<c0011df8>] (show_stack+0x10/0x14)
       [<c0011df8>] (show_stack) from [<c03c775c>] (dump_stack+0x7c/0xc8)
       [<c03c775c>] (dump_stack) from [<c00245cc>] (warn_slowpath_common+0x60/0x84)
       [<c00245cc>] (warn_slowpath_common) from [<c0024670>] (warn_slowpath_fmt+0x2c/0x3c)
       [<c0024670>] (warn_slowpath_fmt) from [<c0227d44>] (check_unmap+0x258/0x894)
       [<c0227d44>] (check_unmap) from [<c0228588>] (debug_dma_unmap_page+0x64/0x70)
       [<c0228588>] (debug_dma_unmap_page) from [<c02ab78c>] (gem_rx+0x118/0x170)
       [<c02ab78c>] (gem_rx) from [<c02ac4d4>] (macb_poll+0x24/0x94)
       [<c02ac4d4>] (macb_poll) from [<c031222c>] (net_rx_action+0x6c/0x188)
       [<c031222c>] (net_rx_action) from [<c0028a28>] (__do_softirq+0x108/0x280)
       [<c0028a28>] (__do_softirq) from [<c0028e8c>] (irq_exit+0x84/0xf8)
       [<c0028e8c>] (irq_exit) from [<c000f360>] (handle_IRQ+0x68/0x8c)
       [<c000f360>] (handle_IRQ) from [<c0008528>] (gic_handle_irq+0x3c/0x60)
       [<c0008528>] (gic_handle_irq) from [<c0012904>] (__irq_svc+0x44/0x78)
       Exception stack(0xc056df20 to 0xc056df68)
       df20: 00000001 c0577430 00000000 c0577430 04ce8e0d 00000002 edfce238 00000000
       df40: 04e20f78 00000002 c05981f4 00000000 00000008 c056df68 c0064008 c02d7658
       df60: 20000013 ffffffff
       [<c0012904>] (__irq_svc) from [<c02d7658>] (cpuidle_enter_state+0x54/0xf8)
       [<c02d7658>] (cpuidle_enter_state) from [<c02d77dc>] (cpuidle_idle_call+0xe0/0x138)
       [<c02d77dc>] (cpuidle_idle_call) from [<c000f660>] (arch_cpu_idle+0x8/0x3c)
       [<c000f660>] (arch_cpu_idle) from [<c006bec4>] (cpu_startup_entry+0xbc/0x124)
       [<c006bec4>] (cpu_startup_entry) from [<c053daec>] (start_kernel+0x350/0x3b0)
       ---[ end trace d5fdc38641bd3a11 ]---
       Mapped at:
        [<c0227184>] debug_dma_map_page+0x48/0x11c
        [<c02ab32c>] gem_rx_refill+0x154/0x1f8
        [<c02ac7b4>] macb_open+0x270/0x3e0
        [<c03152e0>] __dev_open+0x7c/0xfc
        [<c031554c>] __dev_change_flags+0x8c/0x140
      
      Fixing this by passing the same size which is passed during mapping the
      memory to the unmap function as well.
      Signed-off-by: NSoren Brinkmann <soren.brinkmann@xilinx.com>
      Reviewed-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48330e08
    • S
      net: macb: Check DMA mappings for error · 92030908
      Soren Brinkmann 提交于
      With CONFIG_DMA_API_DEBUG enabled the following warning is printed:
       WARNING: CPU: 0 PID: 619 at lib/dma-debug.c:1101 check_unmap+0x758/0x894()
       macb e000b000.ethernet: DMA-API: device driver failed to check map error[device address=0x000000002d171c02] [size=322 bytes] [mapped as single]
       Modules linked in:
       CPU: 0 PID: 619 Comm: udhcpc Not tainted 3.14.0-rc3-xilinx-00219-gd158fc7f #63
       [<c001516c>] (unwind_backtrace) from [<c0011df8>] (show_stack+0x10/0x14)
       [<c0011df8>] (show_stack) from [<c03c7714>] (dump_stack+0x7c/0xc8)
       [<c03c7714>] (dump_stack) from [<c00245cc>] (warn_slowpath_common+0x60/0x84)
       [<c00245cc>] (warn_slowpath_common) from [<c0024670>] (warn_slowpath_fmt+0x2c/0x3c)
       [<c0024670>] (warn_slowpath_fmt) from [<c0228244>] (check_unmap+0x758/0x894)
       [<c0228244>] (check_unmap) from [<c0228588>] (debug_dma_unmap_page+0x64/0x70)
       [<c0228588>] (debug_dma_unmap_page) from [<c02aba64>] (macb_interrupt+0x1f8/0x2dc)
       [<c02aba64>] (macb_interrupt) from [<c006c6e4>] (handle_irq_event_percpu+0x2c/0x178)
       [<c006c6e4>] (handle_irq_event_percpu) from [<c006c86c>] (handle_irq_event+0x3c/0x5c)
       [<c006c86c>] (handle_irq_event) from [<c006f548>] (handle_fasteoi_irq+0xb8/0x100)
       [<c006f548>] (handle_fasteoi_irq) from [<c006c148>] (generic_handle_irq+0x20/0x30)
       [<c006c148>] (generic_handle_irq) from [<c000f35c>] (handle_IRQ+0x64/0x8c)
       [<c000f35c>] (handle_IRQ) from [<c0008528>] (gic_handle_irq+0x3c/0x60)
       [<c0008528>] (gic_handle_irq) from [<c0012904>] (__irq_svc+0x44/0x78)
       Exception stack(0xed197f60 to 0xed197fa8)
       7f60: 00000134 60000013 bd94362e bd94362e be96b37c 00000014 fffffd72 00000122
       7f80: c000ebe4 ed196000 00000000 00000011 c032c0d8 ed197fa8 c0064008 c000ea20
       7fa0: 60000013 ffffffff
       [<c0012904>] (__irq_svc) from [<c000ea20>] (ret_fast_syscall+0x0/0x48)
       ---[ end trace 478f921d0d542d1e ]---
       Mapped at:
        [<c0227184>] debug_dma_map_page+0x48/0x11c
        [<c02aaca0>] macb_start_xmit+0x184/0x2a8
        [<c03143c0>] dev_hard_start_xmit+0x334/0x470
        [<c032c09c>] sch_direct_xmit+0x78/0x2f8
        [<c0314814>] __dev_queue_xmit+0x318/0x708
      
      due to missing checks of the dma mapping. Add the appropriate checks to fix
      this.
      Signed-off-by: NSoren Brinkmann <soren.brinkmann@xilinx.com>
      Reviewed-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92030908
    • D
      net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk · c485658b
      Daniel Borkmann 提交于
      While working on ec0223ec ("net: sctp: fix sctp_sf_do_5_1D_ce to
      verify if we/peer is AUTH capable"), we noticed that there's a skb
      memory leakage in the error path.
      
      Running the same reproducer as in ec0223ec and by unconditionally
      jumping to the error label (to simulate an error condition) in
      sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about
      the unfreed chunk->auth_chunk skb clone:
      
      Unreferenced object 0xffff8800b8f3a000 (size 256):
        comm "softirq", pid 0, jiffies 4294769856 (age 110.757s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00  ..u^..X.........
        backtrace:
          [<ffffffff816660be>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff8119f328>] kmem_cache_alloc+0xc8/0x210
          [<ffffffff81566929>] skb_clone+0x49/0xb0
          [<ffffffffa0467459>] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp]
          [<ffffffffa046fdbc>] sctp_inq_push+0x4c/0x70 [sctp]
          [<ffffffffa047e8de>] sctp_rcv+0x82e/0x9a0 [sctp]
          [<ffffffff815abd38>] ip_local_deliver_finish+0xa8/0x210
          [<ffffffff815a64af>] nf_reinject+0xbf/0x180
          [<ffffffffa04b4762>] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue]
          [<ffffffffa04aa40b>] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink]
          [<ffffffff815a3269>] netlink_rcv_skb+0xa9/0xc0
          [<ffffffffa04aa7cf>] nfnetlink_rcv+0x23f/0x408 [nfnetlink]
          [<ffffffff815a2bd8>] netlink_unicast+0x168/0x250
          [<ffffffff815a2fa1>] netlink_sendmsg+0x2e1/0x3f0
          [<ffffffff8155cc6b>] sock_sendmsg+0x8b/0xc0
          [<ffffffff8155d449>] ___sys_sendmsg+0x369/0x380
      
      What happens is that commit bbd0d598 clones the skb containing
      the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case
      that an endpoint requires COOKIE-ECHO chunks to be authenticated:
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        ------------------ AUTH; COOKIE-ECHO ---------------->
        <-------------------- COOKIE-ACK ---------------------
      
      When we enter sctp_sf_do_5_1D_ce() and before we actually get to
      the point where we process (and subsequently free) a non-NULL
      chunk->auth_chunk, we could hit the "goto nomem_init" path from
      an error condition and thus leave the cloned skb around w/o
      freeing it.
      
      The fix is to centrally free such clones in sctp_chunk_destroy()
      handler that is invoked from sctp_chunk_free() after all refs have
      dropped; and also move both kfree_skb(chunk->auth_chunk) there,
      so that chunk->auth_chunk is either NULL (since sctp_chunkify()
      allocs new chunks through kmem_cache_zalloc()) or non-NULL with
      a valid skb pointer. chunk->skb and chunk->auth_chunk are the
      only skbs in the sctp_chunk structure that need to be handeled.
      
      While at it, we should use consume_skb() for both. It is the same
      as dev_kfree_skb() but more appropriately named as we are not
      a device but a protocol. Also, this effectively replaces the
      kfree_skb() from both invocations into consume_skb(). Functions
      are the same only that kfree_skb() assumes that the frame was
      being dropped after a failure (e.g. for tools like drop monitor),
      usage of consume_skb() seems more appropriate in function
      sctp_chunk_destroy() though.
      
      Fixes: bbd0d598 ("[SCTP]: Implement the receive and verification of AUTH chunk")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <yasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c485658b
    • H
      r8152: disable the ECM mode · 10c32717
      hayeswang 提交于
      There are known issues for switching the drivers between ECM mode and
      vendor mode. The interrup transfer may become abnormal. The hardware
      may have the opportunity to die if you change the configuration without
      unloading the current driver first, because all the control transfers
      of the current driver would fail after the command of switching the
      configuration.
      
      Although to use the ecm driver and vendor driver independently is fine,
      it may have problems to change the driver from one to the other by
      switching the configuration. Additionally, now the vendor mode driver
      is more powerful than the ECM driver. Thus, disable the ECM mode driver,
      and let r8152 to set the configuration to vendor mode and reset the
      device automatically.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10c32717
    • G
      net/mlx4: Support shutdown() interface · 367d56f7
      Gavin Shan 提交于
      In kexec scenario, we failed to load the mlx4 driver in the
      second kernel because the ownership bit was hold by the first
      kernel without release correctly.
      
      The patch adds shutdown() interface so that the ownership can
      be released correctly in the first kernel. It also helps avoiding
      EEH error happened during boot stage of the second kernel because
      of undesired traffic, which can't be handled by hardware during
      that stage on Power platform.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Tested-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      367d56f7
    • L
      bridge: multicast: add sanity check for query source addresses · 6565b9ee
      Linus Lüssing 提交于
      MLD queries are supposed to have an IPv6 link-local source address
      according to RFC2710, section 4 and RFC3810, section 5.1.14. This patch
      adds a sanity check to ignore such broken MLD queries.
      
      Without this check, such malformed MLD queries can result in a
      denial of service: The queries are ignored by any MLD listener
      therefore they will not respond with an MLD report. However,
      without this patch these malformed MLD queries would enable the
      snooping part in the bridge code, potentially shutting down the
      according ports towards these hosts for multicast traffic as the
      bridge did not learn about these listeners.
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6565b9ee
    • N
      net: fix for a race condition in the inet frag code · 24b9bf43
      Nikolay Aleksandrov 提交于
      I stumbled upon this very serious bug while hunting for another one,
      it's a very subtle race condition between inet_frag_evictor,
      inet_frag_intern and the IPv4/6 frag_queue and expire functions
      (basically the users of inet_frag_kill/inet_frag_put).
      
      What happens is that after a fragment has been added to the hash chain
      but before it's been added to the lru_list (inet_frag_lru_add) in
      inet_frag_intern, it may get deleted (either by an expired timer if
      the system load is high or the timer sufficiently low, or by the
      fraq_queue function for different reasons) before it's added to the
      lru_list, then after it gets added it's a matter of time for the
      evictor to get to a piece of memory which has been freed leading to a
      number of different bugs depending on what's left there.
      
      I've been able to trigger this on both IPv4 and IPv6 (which is normal
      as the frag code is the same), but it's been much more difficult to
      trigger on IPv4 due to the protocol differences about how fragments
      are treated.
      
      The setup I used to reproduce this is: 2 machines with 4 x 10G bonded
      in a RR bond, so the same flow can be seen on multiple cards at the
      same time. Then I used multiple instances of ping/ping6 to generate
      fragmented packets and flood the machines with them while running
      other processes to load the attacked machine.
      
      *It is very important to have the _same flow_ coming in on multiple CPUs
      concurrently. Usually the attacked machine would die in less than 30
      minutes, if configured properly to have many evictor calls and timeouts
      it could happen in 10 minutes or so.
      
      An important point to make is that any caller (frag_queue or timer) of
      inet_frag_kill will remove both the timer refcount and the
      original/guarding refcount thus removing everything that's keeping the
      frag from being freed at the next inet_frag_put.  All of this could
      happen before the frag was ever added to the LRU list, then it gets
      added and the evictor uses a freed fragment.
      
      An example for IPv6 would be if a fragment is being added and is at
      the stage of being inserted in the hash after the hash lock is
      released, but before inet_frag_lru_add executes (or is able to obtain
      the lru lock) another overlapping fragment for the same flow arrives
      at a different CPU which finds it in the hash, but since it's
      overlapping it drops it invoking inet_frag_kill and thus removing all
      guarding refcounts, and afterwards freeing it by invoking
      inet_frag_put which removes the last refcount added previously by
      inet_frag_find, then inet_frag_lru_add gets executed by
      inet_frag_intern and we have a freed fragment in the lru_list.
      
      The fix is simple, just move the lru_add under the hash chain locked
      region so when a removing function is called it'll have to wait for
      the fragment to be added to the lru_list, and then it'll remove it (it
      works because the hash chain removal is done before the lru_list one
      and there's no window between the two list adds when the frag can get
      dropped). With this fix applied I couldn't kill the same machine in 24
      hours with the same setup.
      
      Fixes: 3ef0eb0d ("net: frag, move LRU list maintenance outside of
      rwlock")
      
      CC: Florian Westphal <fw@strlen.de>
      CC: Jesper Dangaard Brouer <brouer@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24b9bf43
  3. 05 3月, 2014 3 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c3bebc71
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix memory leak in ieee80211_prep_connection(), sta_info leaked on
          error.  From Eytan Lifshitz.
      
       2) Unintentional switch case fallthrough in nft_reject_inet_eval(),
          from Patrick McHardy.
      
       3) Must check if payload lenth is a power of 2 in
          nft_payload_select_ops(), from Nikolay Aleksandrov.
      
       4) Fix mis-checksumming in xen-netfront driver, ip_hdr() is not in the
          correct place when we invoke skb_checksum_setup().  From Wei Liu.
      
       5) TUN driver should not advertise HW vlan offload features in
          vlan_features.  Fix from Fernando Luis Vazquez Cao.
      
       6) IPV6_VTI needs to select NET_IPV_TUNNEL to avoid build errors, fix
          from Steffen Klassert.
      
       7) Add missing locking in xfrm_migrade_state_find(), we must hold the
          per-namespace xfrm_state_lock while traversing the lists.  Fix from
          Steffen Klassert.
      
       8) Missing locking in ath9k driver, access to tid->sched must be done
          under ath_txq_lock().  Fix from Stanislaw Gruszka.
      
       9) Fix two bugs in TCP fastopen.  First respect the size argument given
          to tcp_sendmsg() in the fastopen path, and secondly prevent
          tcp_send_syn_data() from potentially using order-5 allocations.
          From Eric Dumazet.
      
      10) Fix handling of default neigh garbage collection params, from Jiri
          Pirko.
      
      11) Fix cwnd bloat and over-inflation of RTT when transmit segmentation
          is in use.  From Eric Dumazet.
      
      12) Missing initialization of Realtek r8169 driver's statistics
          seqlocks.  Fix from Kyle McMartin.
      
      13) Fix RTNL assertion failures in 802.3ad and AB ARP monitor of bonding
          driver, from Ding Tianhong.
      
      14) Bonding slave release race can cause divide by zero, fix from
          Nikolay Aleksandrov.
      
      15) Overzealous return from neigh_periodic_work() causes reachability
          time to not be computed.  Fix from Duain Jiong.
      
      16) Fix regression in ipv6_find_hdr(), it should not return -ENOENT when
          a specific target is specified and found.  From Hans Schillstrom.
      
      17) Fix VLAN tag stripping regression in BNA driver, from Ivan Vecera.
      
      18) Tail loss probe can calculate bogus RTTs due to missing packet
          marking on retransmit.  Fix from Yuchung Cheng.
      
      19) We cannot do skb_dst_drop() in iptunnel_pull_header() because
          multicast loopback detection in later code paths need access to
          skb_rtable().  Fix from Xin Long.
      
      20) The macvlan driver regresses in that it propagates lower device
          offload support disables into itself, causing severe slowdowns when
          running over a bridge.  Provide the software offloads always on
          macvlan devices to deal with this and the regression is gone.  From
          Vlad Yasevich.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
        macvlan: Add support for 'always_on' offload features
        net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable
        ip_tunnel:multicast process cause panic due to skb->_skb_refdst NULL pointer
        net: cpsw: fix cpdma rx descriptor leak on down interface
        be2net: isolate TX workarounds not applicable to Skyhawk-R
        be2net: Fix skb double free in be_xmit_wrokarounds() failure path
        be2net: clear promiscuous bits in adapter->flags while disabling promiscuous mode
        be2net: Fix to reset transparent vlan tagging
        qlcnic: dcb: a couple off by one bugs
        tcp: fix bogus RTT on special retransmission
        hsr: off by one sanity check in hsr_register_frame_in()
        can: remove CAN FD compatibility for CAN 2.0 sockets
        can: flexcan: factor out soft reset into seperate funtion
        can: flexcan: flexcan_remove(): add missing netif_napi_del()
        can: flexcan: fix transition from and to freeze mode in chip_{,un}freeze
        can: flexcan: factor out transceiver {en,dis}able into seperate functions
        can: flexcan: fix transition from and to low power mode in chip_{en,dis}able
        can: flexcan: flexcan_open(): fix error path if flexcan_chip_start() fails
        can: flexcan: fix shutdown: first disable chip, then all interrupts
        USB AX88179/178A: Support D-Link DUB-1312
        ...
      c3bebc71
    • L
      Merge tag 'regulator-v3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · 16e3f539
      Linus Torvalds 提交于
      Pull regulator fixes from Mark Brown:
       "A couple of fixes here which ensure that regulators using the core
        support for GPIO enables work in all cases by ensuring that helpers
        are used consistently rather than open coding in places and hence not
        having GPIO support in some of them"
      
      * tag 'regulator-v3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: core: Replace direct ops->disable usage
        regulator: core: Replace direct ops->enable usage
      16e3f539
    • L
      Merge branch 'akpm' (patches from Andrew Morton) · 3f803abf
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton.
      
      * emailed patches from Andrew Morton akpm@linux-foundation.org>:
        mm: page_alloc: exempt GFP_THISNODE allocations from zone fairness
        mm: numa: bugfix for LAST_CPUPID_NOT_IN_PAGE_FLAGS
        MAINTAINERS: add and correct types of some "T:" entries
        MAINTAINERS: use tab for separator
        rapidio/tsi721: fix tasklet termination in dma channel release
        hfsplus: fix remount issue
        zram: avoid null access when fail to alloc meta
        sh: prefix sh-specific "CCR" and "CCR2" by "SH_"
        ocfs2: fix quota file corruption
        drivers/rtc/rtc-s3c.c: fix incorrect way of save/restore of S3C2410_TICNT for TYPE_S3C64XX
        kallsyms: fix absolute addresses for kASLR
        scripts/gen_initramfs_list.sh: fix flags for initramfs LZ4 compression
        mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking
        memcg: reparent charges of children before processing parent
        memcg: fix endless loop in __mem_cgroup_iter_next()
        lib/radix-tree.c: swapoff tmpfs radix_tree: remember to rcu_read_unlock
        dma debug: account for cachelines and read-only mappings in overlap tracking
        mm: close PageTail race
        MAINTAINERS: EDAC: add Mauro and Borislav as interim patch collectors
      3f803abf
  4. 04 3月, 2014 13 次提交