1. 18 12月, 2014 6 次提交
  2. 30 11月, 2014 1 次提交
  3. 27 11月, 2014 4 次提交
  4. 26 11月, 2014 3 次提交
  5. 25 11月, 2014 3 次提交
  6. 24 11月, 2014 2 次提交
    • L
      ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic · 20ea60ca
      lucien 提交于
      Now the vti_link_ops do not point the .dellink, for fb tunnel device
      (ip_vti0), the net_device will be removed as the default .dellink is
      unregister_netdevice_queue,but the tunnel still in the tunnel list,
      then if we add a new vti tunnel, in ip_tunnel_find():
      
              hlist_for_each_entry_rcu(t, head, hash_node) {
                      if (local == t->parms.iph.saddr &&
                          remote == t->parms.iph.daddr &&
                          link == t->parms.link &&
      ==>                 type == t->dev->type &&
                          ip_tunnel_key_match(&t->parms, flags, key))
                              break;
              }
      
      the panic will happen, cause dev of ip_tunnel *t is null:
      [ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
      [ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
      [ 3835.073008] Oops: 0000 [#1] SMP
      .....
      [ 3835.073008] Stack:
      [ 3835.073008]  ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
      [ 3835.073008]  ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
      [ 3835.073008]  ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
      [ 3835.073008] Call Trace:
      [ 3835.073008]  [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
      [ 3835.073008]  [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti]
      [ 3835.073008]  [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0
      [ 3835.073008]  [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70
      [ 3835.073008]  [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60
      [ 3835.073008]  [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0
      [ 3835.073008]  [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270
      [ 3835.073008]  [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90
      [ 3835.073008]  [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30
      [ 3835.073008]  [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0
      [ 3835.073008]  [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30
      ....
      
      modprobe ip_vti
      ip link del ip_vti0 type vti
      ip link add ip_vti0 type vti
      rmmod ip_vti
      
      do that one or more times, kernel will panic.
      
      fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
      which we skip the unregister of fb tunnel device. do the same on ip6_vti.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20ea60ca
    • A
      ipv6: Do not treat a GSO_TCPV4 request from UDP tunnel over IPv6 as invalid · b6fef4c6
      Alexander Duyck 提交于
      This patch adds SKB_GSO_TCPV4 to the list of supported GSO types handled by
      the IPv6 GSO offloads.  Without this change VXLAN tunnels running over IPv6
      do not currently handle IPv4 TCP TSO requests correctly and end up handing
      the non-segmented frame off to the device.
      
      Below is the before and after for a simple netperf TCP_STREAM test between
      two endpoints tunneling IPv4 over a VXLAN tunnel running on IPv6 on top of
      a 1Gb/s network adapter.
      
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.29       0.88      Before
       87380  16384  16384    10.03     895.69      After
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6fef4c6
  7. 22 11月, 2014 2 次提交
    • C
      tcp: Restore RFC5961-compliant behavior for SYN packets · 0c228e83
      Calvin Owens 提交于
      Commit c3ae62af ("tcp: should drop incoming frames without ACK
      flag set") was created to mitigate a security vulnerability in which a
      local attacker is able to inject data into locally-opened sockets by
      using TCP protocol statistics in procfs to quickly find the correct
      sequence number.
      
      This broke the RFC5961 requirement to send a challenge ACK in response
      to spurious RST packets, which was subsequently fixed by commit
      7b514a88 ("tcp: accept RST without ACK flag").
      
      Unfortunately, the RFC5961 requirement that spurious SYN packets be
      handled in a similar manner remains broken.
      
      RFC5961 section 4 states that:
      
         ... the handling of the SYN in the synchronized state SHOULD be
         performed as follows:
      
         1) If the SYN bit is set, irrespective of the sequence number, TCP
            MUST send an ACK (also referred to as challenge ACK) to the remote
            peer:
      
            <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
      
            After sending the acknowledgment, TCP MUST drop the unacceptable
            segment and stop processing further.
      
         By sending an ACK, the remote peer is challenged to confirm the loss
         of the previous connection and the request to start a new connection.
         A legitimate peer, after restart, would not have a TCB in the
         synchronized state.  Thus, when the ACK arrives, the peer should send
         a RST segment back with the sequence number derived from the ACK
         field that caused the RST.
      
         This RST will confirm that the remote peer has indeed closed the
         previous connection.  Upon receipt of a valid RST, the local TCP
         endpoint MUST terminate its connection.  The local TCP endpoint
         should then rely on SYN retransmission from the remote end to
         re-establish the connection.
      
      This patch lets SYN packets through the discard added in c3ae62af,
      so that spurious SYN packets are properly dealt with as per the RFC.
      
      The challenge ACK is sent unconditionally and is rate-limited, so the
      original vulnerability is not reintroduced by this patch.
      Signed-off-by: NCalvin Owens <calvinowens@fb.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c228e83
    • E
      net: Revert "net: avoid one atomic operation in skb_clone()" · e7820e39
      Eric Dumazet 提交于
      Not sure what I was thinking, but doing anything after
      releasing a refcount is suicidal or/and embarrassing.
      
      By the time we set skb->fclone to SKB_FCLONE_FREE, another cpu
      could have released last reference and freed whole skb.
      
      We potentially corrupt memory or trap if CONFIG_DEBUG_PAGEALLOC is set.
      Reported-by: NChris Mason <clm@fb.com>
      Fixes: ce1a4ea3 ("net: avoid one atomic operation in skb_clone()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Sabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7820e39
  8. 21 11月, 2014 2 次提交
    • J
      ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg · 01462405
      Jiri Bohac 提交于
      This fixes an old regression introduced by commit
      b0d0d915 (ipx: remove the BKL).
      
      When a recvmsg syscall blocks waiting for new data, no data can be sent on the
      same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked.
      
      This breaks mars-nwe (NetWare emulator):
      - the ncpserv process reads the request using recvmsg
      - ncpserv forks and spawns nwconn
      - ncpserv calls a (blocking) recvmsg and waits for new requests
      - nwconn deadlocks in sendmsg on the same socket
      
      Commit b0d0d915 has simply replaced BKL locking with
      lock_sock/release_sock. Unlike now, BKL got unlocked while
      sleeping, so a blocking recvmsg did not block a concurrent
      sendmsg.
      
      Only keep the socket locked while actually working with the socket data and
      release it prior to calling skb_recv_datagram().
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01462405
    • J
      openvswitch: Don't validate IPv6 label masks. · d3052bb5
      Joe Stringer 提交于
      When userspace doesn't provide a mask, OVS datapath generates a fully
      unwildcarded mask for the flow by copying the flow and setting all bits
      in all fields. For IPv6 label, this creates a mask that matches on the
      upper 12 bits, causing the following error:
      
      openvswitch: netlink: Invalid IPv6 flow label value (value=ffffffff, max=fffff)
      
      This patch ignores the label validation check for masks, avoiding this
      error.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3052bb5
  9. 20 11月, 2014 2 次提交
  10. 19 11月, 2014 1 次提交
  11. 17 11月, 2014 5 次提交
    • L
      bridge: fix netfilter/NF_BR_LOCAL_OUT for own, locally generated queries · f0b4eece
      Linus Lüssing 提交于
      Ebtables on the OUTPUT chain (NF_BR_LOCAL_OUT) would not work as expected
      for both locally generated IGMP and MLD queries. The IP header specific
      filter options are off by 14 Bytes for netfilter (actual output on
      interfaces is fine).
      
      NF_HOOK() expects the skb->data to point to the IP header, not the
      ethernet one (while dev_queue_xmit() does not). Luckily there is an
      br_dev_queue_push_xmit() helper function already - let's just use that.
      
      Introduced by eb1d1641
      ("bridge: Add core IGMP snooping support")
      
      Ebtables example:
      
      $ ebtables -I OUTPUT -p IPv6 -o eth1 --logical-out br0 \
      	--log --log-level 6 --log-ip6 --log-prefix="~EBT: " -j DROP
      
      before (broken):
      
      ~EBT:  IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
      	MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
      	SRC=64a4:39c2:86dd:6000:0000:0020:0001:fe80 IPv6 \
      	DST=0000:0000:0000:0004:64ff:fea4:39c2:ff02, \
      	IPv6 priority=0x3, Next Header=2
      
      after (working):
      
      ~EBT:  IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
      	MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
      	SRC=fe80:0000:0000:0000:0004:64ff:fea4:39c2 IPv6 \
      	DST=ff02:0000:0000:0000:0000:0000:0000:0001, \
      	IPv6 priority=0x0, Next Header=0
      Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f0b4eece
    • P
      netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind · 97840cb6
      Pablo Neira Ayuso 提交于
      Make sure the netlink group exists, otherwise you can trigger an out
      of bound array memory access from the netlink_bind() path. This splat
      can only be triggered only by superuser.
      
      [  180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
      [  180.204249] index 9 is out of range for type 'int [9]'
      [  180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
      [  180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
      +04/01/2014
      [  180.206498]  0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
      [  180.207220]  ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
      [  180.207887]  ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
      [  180.208639] Call Trace:
      [  180.208857] dump_stack (lib/dump_stack.c:52)
      [  180.209370] ubsan_epilogue (lib/ubsan.c:174)
      [  180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
      [  180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
      [  180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
      [  180.211495] SYSC_bind (net/socket.c:1541)
      
      Moreover, define the missing nf_tables and nf_acct multicast groups too.
      Reported-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      97840cb6
    • D
      ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs · feb91a02
      Daniel Borkmann 提交于
      It has been reported that generating an MLD listener report on
      devices with large MTUs (e.g. 9000) and a high number of IPv6
      addresses can trigger a skb_over_panic():
      
      skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
      head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
      dev:port1
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:100!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: ixgbe(O)
      CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff80578226>] ? skb_put+0x3a/0x3b
       [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
       [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
       [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
       [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
       [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
       [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
       [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
       [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
       [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
       [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
      
      mld_newpack() skb allocations are usually requested with dev->mtu
      in size, since commit 72e09ad1 ("ipv6: avoid high order allocations")
      we have changed the limit in order to be less likely to fail.
      
      However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
      macros, which determine if we may end up doing an skb_put() for
      adding another record. To avoid possible fragmentation, we check
      the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
      assumption as the actual max allocation size can be much smaller.
      
      The IGMP case doesn't have this issue as commit 57e1ab6e
      ("igmp: refine skb allocations") stores the allocation size in
      the cb[].
      
      Set a reserved_tailroom to make it fit into the MTU and use
      skb_availroom() helper instead. This also allows to get rid of
      igmp_skb_size().
      Reported-by: NWei Liu <lw1a2.jing@gmail.com>
      Fixes: 72e09ad1 ("ipv6: avoid high order allocations")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: David L Stevens <david.stevens@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feb91a02
    • A
      dcbnl : Disable software interrupts before taking dcb_lock · 52cff74e
      Anish Bhatt 提交于
      Solves possible lockup issues that can be seen from firmware DCB agents calling
      into the DCB app api.
      
      DCB firmware event queues can be tied in with NAPI so that dcb events are
      generated in softIRQ context. This can results in calls to dcb_*app()
      functions which try to take the dcb_lock.
      
      If the the event triggers while we also have the dcb_lock because lldpad or
      some other agent happened to be issuing a  get/set command we could see a cpu
      lockup.
      
      This code was not originally written with firmware agents in mind, hence
      grabbing dcb_lock from softIRQ context was not considered.
      Signed-off-by: NAnish Bhatt <anish@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52cff74e
    • P
      ipv4: Fix incorrect error code when adding an unreachable route · 49dd18ba
      Panu Matilainen 提交于
      Trying to add an unreachable route incorrectly returns -ESRCH if
      if custom FIB rules are present:
      
      [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
      RTNETLINK answers: Network is unreachable
      [root@localhost ~]# ip rule add to 55.66.77.88 table 200
      [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
      RTNETLINK answers: No such process
      [root@localhost ~]#
      
      Commit 83886b6b ("[NET]: Change "not found"
      return value for rule lookup") changed fib_rules_lookup()
      to use -ESRCH as a "not found" code internally, but for user space it
      should be translated into -ENETUNREACH. Handle the translation centrally in
      ipv4-specific fib_lookup(), leaving the DECnet case alone.
      
      On a related note, commit b7a71b51
      ("ipv4: removed redundant conditional") removed a similar translation from
      ip_route_input_slow() prematurely AIUI.
      
      Fixes: b7a71b51 ("ipv4: removed redundant conditional")
      Signed-off-by: NPanu Matilainen <pmatilai@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49dd18ba
  12. 15 11月, 2014 6 次提交
  13. 14 11月, 2014 3 次提交