1. 30 9月, 2015 1 次提交
  2. 25 9月, 2015 1 次提交
    • P
      skbuff: Fix skb checksum flag on skb pull · 6ae459bd
      Pravin B Shelar 提交于
      VXLAN device can receive skb with checksum partial. But the checksum
      offset could be in outer header which is pulled on receive. This results
      in negative checksum offset for the skb. Such skb can cause the assert
      failure in skb_checksum_help(). Following patch fixes the bug by setting
      checksum-none while pulling outer header.
      
      Following is the kernel panic msg from old kernel hitting the bug.
      
      ------------[ cut here ]------------
      kernel BUG at net/core/dev.c:1906!
      RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
      Call Trace:
      <IRQ>
      [<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
      [<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
      [<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
      [<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
      [<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
      [<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
      [<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
      [<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
      [<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
      [<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
      [<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
      [<ffffffff81550128>] ip_local_deliver+0x88/0x90
      [<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
      [<ffffffff81550365>] ip_rcv+0x235/0x300
      [<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
      [<ffffffff8151c360>] netif_receive_skb+0x80/0x90
      [<ffffffff81459935>] virtnet_poll+0x555/0x6f0
      [<ffffffff8151cd04>] net_rx_action+0x134/0x290
      [<ffffffff810683d8>] __do_softirq+0xa8/0x210
      [<ffffffff8162fe6c>] call_softirq+0x1c/0x30
      [<ffffffff810161a5>] do_softirq+0x65/0xa0
      [<ffffffff810687be>] irq_exit+0x8e/0xb0
      [<ffffffff81630733>] do_IRQ+0x63/0xe0
      [<ffffffff81625f2e>] common_interrupt+0x6e/0x6e
      Reported-by: NAnupam Chanda <achanda@vmware.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ae459bd
  3. 15 9月, 2015 1 次提交
  4. 02 9月, 2015 6 次提交
  5. 22 8月, 2015 1 次提交
    • M
      mm: make page pfmemalloc check more robust · 2f064f34
      Michal Hocko 提交于
      Commit c48a11c7 ("netvm: propagate page->pfmemalloc to skb") added
      checks for page->pfmemalloc to __skb_fill_page_desc():
      
              if (page->pfmemalloc && !page->mapping)
                      skb->pfmemalloc = true;
      
      It assumes page->mapping == NULL implies that page->pfmemalloc can be
      trusted.  However, __delete_from_page_cache() can set set page->mapping
      to NULL and leave page->index value alone.  Due to being in union, a
      non-zero page->index will be interpreted as true page->pfmemalloc.
      
      So the assumption is invalid if the networking code can see such a page.
      And it seems it can.  We have encountered this with a NFS over loopback
      setup when such a page is attached to a new skbuf.  There is no copying
      going on in this case so the page confuses __skb_fill_page_desc which
      interprets the index as pfmemalloc flag and the network stack drops
      packets that have been allocated using the reserves unless they are to
      be queued on sockets handling the swapping which is the case here and
      that leads to hangs when the nfs client waits for a response from the
      server which has been dropped and thus never arrive.
      
      The struct page is already heavily packed so rather than finding another
      hole to put it in, let's do a trick instead.  We can reuse the index
      again but define it to an impossible value (-1UL).  This is the page
      index so it should never see the value that large.  Replace all direct
      users of page->pfmemalloc by page_is_pfmemalloc which will hide this
      nastiness from unspoiled eyes.
      
      The information will get lost if somebody wants to use page->index
      obviously but that was the case before and the original code expected
      that the information should be persisted somewhere else if that is
      really needed (e.g.  what SLAB and SLUB do).
      
      [akpm@linux-foundation.org: fix blooper in slub]
      Fixes: c48a11c7 ("netvm: propagate page->pfmemalloc to skb")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Debugged-by: NVlastimil Babka <vbabka@suse.com>
      Debugged-by: NJiri Bohac <jbohac@suse.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>	[3.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f064f34
  6. 11 8月, 2015 1 次提交
  7. 01 8月, 2015 1 次提交
  8. 30 7月, 2015 1 次提交
  9. 22 7月, 2015 1 次提交
    • T
      vxlan: Flow based tunneling · ee122c79
      Thomas Graf 提交于
      Allows putting a VXLAN device into a new flow-based mode in which
      skbs with a ip_tunnel_info dst metadata attached will be encapsulated
      according to the instructions stored in there with the VXLAN device
      defaults taken into consideration.
      
      Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
      set, the packet processing will populate a ip_tunnel_info struct for
      each packet received and attach it to the skb using the new metadata
      dst.  The metadata structure will contain the outer header and tunnel
      header fields which have been stripped off. Layers further up in the
      stack such as routing, tc or netfitler can later match on these fields
      and perform forwarding. It is the responsibility of upper layers to
      ensure that the flag is set if the metadata is needed. The flag limits
      the additional cost of metadata collecting based on demand.
      
      This prepares the VXLAN device to be steered by the routing and other
      subsystems which allows to support encapsulation for a large number
      of tunnel endpoints and tunnel ids through a single net_device which
      improves the scalability.
      
      It also allows for OVS to leverage this mode which in turn allows for
      the removal of the OVS specific VXLAN code.
      
      Because the skb is currently scrubed in vxlan_rcv(), the attachment of
      the new dst metadata is postponed until after scrubing which requires
      the temporary addition of a new member to vxlan_metadata. This member
      is removed again in a later commit after the indirect VXLAN receive API
      has been removed.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee122c79
  10. 21 7月, 2015 2 次提交
  11. 13 6月, 2015 1 次提交
  12. 12 6月, 2015 2 次提交
    • B
      netfilter: bridge: refactor frag_max_size · 411ffb4f
      Bernhard Thaler 提交于
      Currently frag_max_size is member of br_input_skb_cb and copied back and
      forth using IPCB(skb) and BR_INPUT_SKB_CB(skb) each time it is changed or
      used.
      
      Attach frag_max_size to nf_bridge_info and set value in pre_routing and
      forward functions. Use its value in forward and xmit functions.
      Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      411ffb4f
    • B
      netfilter: bridge: detect NAT66 correctly and change MAC address · 72b31f72
      Bernhard Thaler 提交于
      IPv4 iptables allows to REDIRECT/DNAT/SNAT any traffic over a bridge.
      
      e.g. REDIRECT
      $ sysctl -w net.bridge.bridge-nf-call-iptables=1
      $ iptables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \
        -j REDIRECT --to-ports 81
      
      This does not work with ip6tables on a bridge in NAT66 scenario
      because the REDIRECT/DNAT/SNAT is not correctly detected.
      
      The bridge pre-routing (finish) netfilter hook has to check for a possible
      redirect and then fix the destination mac address. This allows to use the
      ip6tables rules for local REDIRECT/DNAT/SNAT REDIRECT similar to the IPv4
      iptables version.
      
      e.g. REDIRECT
      $ sysctl -w net.bridge.bridge-nf-call-ip6tables=1
      $ ip6tables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \
        -j REDIRECT --to-ports 81
      
      This patch makes it possible to use IPv6 NAT66 on a bridge. It was tested
      on a bridge with two interfaces using SNAT/DNAT NAT66 rules.
      Reported-by: NArtie Hamilton <artiemhamilton@yahoo.com>
      Signed-off-by: NSven Eckelmann <sven@open-mesh.com>
      [bernhard.thaler@wvnet.at: rebased, add indirect call to ip6_route_input()]
      [bernhard.thaler@wvnet.at: rebased, split into separate patches]
      Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      72b31f72
  13. 05 6月, 2015 1 次提交
  14. 25 5月, 2015 2 次提交
  15. 20 5月, 2015 1 次提交
  16. 18 5月, 2015 1 次提交
    • E
      net: fix two sparse errors · c91d4606
      Eric Dumazet 提交于
      First one in __skb_checksum_validate_complete() fixes the following
      (and other callers)
      
      make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/tcp_ipv4.o
        CHECK   net/ipv4/tcp_ipv4.c
      include/linux/skbuff.h:3052:24: warning: incorrect type in return expression (different base types)
      include/linux/skbuff.h:3052:24:    expected restricted __sum16
      include/linux/skbuff.h:3052:24:    got int
      
      Second is fixing gso_make_checksum() :
      
        CHECK   net/ipv4/gre_offload.c
      include/linux/skbuff.h:3360:14: warning: incorrect type in assignment (different base types)
      include/linux/skbuff.h:3360:14:    expected unsigned short [unsigned] [usertype] csum
      include/linux/skbuff.h:3360:14:    got restricted __sum16
      include/linux/skbuff.h:3365:16: warning: incorrect type in return expression (different base types)
      include/linux/skbuff.h:3365:16:    expected restricted __sum16
      include/linux/skbuff.h:3365:16:    got unsigned short [unsigned] [usertype] csum
      
      Fixes: 5a212329 ("net: Support for csum_bad in skbuff")
      Fixes: 7e2b10c1 ("net: Support for multiple checksums with gso")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      CC: Tom Herbert <tom@herbertland.com>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c91d4606
  17. 14 5月, 2015 6 次提交
  18. 12 5月, 2015 3 次提交
  19. 05 5月, 2015 1 次提交
    • L
      net: Export IGMP/MLD message validation code · 9afd85c9
      Linus Lüssing 提交于
      With this patch, the IGMP and MLD message validation functions are moved
      from the bridge code to IPv4/IPv6 multicast files. Some small
      refactoring was done to enhance readibility and to iron out some
      differences in behaviour between the IGMP and MLD parsing code (e.g. the
      skb-cloning of MLD messages is now only done if necessary, just like the
      IGMP part always did).
      
      Finally, these IGMP and MLD message validation functions are exported so
      that not only the bridge can use it but batman-adv later, too.
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9afd85c9
  20. 04 5月, 2015 1 次提交
  21. 26 4月, 2015 1 次提交
    • E
      net: fix crash in build_skb() · 2ea2f62c
      Eric Dumazet 提交于
      When I added pfmemalloc support in build_skb(), I forgot netlink
      was using build_skb() with a vmalloc() area.
      
      In this patch I introduce __build_skb() for netlink use,
      and build_skb() is a wrapper handling both skb->head_frag and
      skb->pfmemalloc
      
      This means netlink no longer has to hack skb->head_frag
      
      [ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26!
      [ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      [ 1567.700067] Dumping ftrace buffer:
      [ 1567.700067]    (ftrace buffer empty)
      [ 1567.700067] Modules linked in:
      [ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167
      [ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000
      [ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3))
      [ 1567.700067] RSP: 0018:ffff8802467779d8  EFLAGS: 00010202
      [ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c
      [ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049
      [ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000
      [ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000
      [ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000
      [ 1567.700067] FS:  00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000
      [ 1567.700067] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0
      [ 1567.700067] Stack:
      [ 1567.700067]  ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000
      [ 1567.700067]  ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08
      [ 1567.700067]  ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821
      [ 1567.700067] Call Trace:
      [ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316)
      [ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329)
      [ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311)
      [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
      [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
      [ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623)
      [ 1567.774369] sock_write_iter (net/socket.c:823)
      [ 1567.774369] ? sock_sendmsg (net/socket.c:806)
      [ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491)
      [ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249)
      [ 1567.774369] ? default_llseek (fs/read_write.c:487)
      [ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701)
      [ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4))
      [ 1567.774369] vfs_write (fs/read_write.c:539)
      [ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577)
      [ 1567.774369] ? SyS_read (fs/read_write.c:577)
      [ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
      [ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636)
      [ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
      [ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261)
      
      Fixes: 79930f58 ("net: do not deplete pfmemalloc reserve")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ea2f62c
  22. 21 4月, 2015 1 次提交
  23. 08 4月, 2015 3 次提交
    • F
      netfilter: bridge: make BRNF_PKT_TYPE flag a bool · a1e67951
      Florian Westphal 提交于
      nf_bridge_info->mask is used for several things, for example to
      remember if skb->pkt_type was set to OTHER_HOST.
      
      For a bridge, OTHER_HOST is expected case. For ip forward its a non-starter
      though -- routing expects PACKET_HOST.
      
      Bridge netfilter thus changes OTHER_HOST to PACKET_HOST before hook
      invocation and then un-does it after hook traversal.
      
      This information is irrelevant outside of br_netfilter.
      
      After this change, ->mask now only contains flags that need to be
      known outside of br_netfilter in fast-path.
      
      Future patch changes mask into a 2bit state field in sk_buff, so that
      we can remove skb->nf_bridge pointer for good and consider all remaining
      places that access nf_bridge info content a not-so fastpath.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a1e67951
    • F
      netfilter: bridge: start splitting mask into public/private chunks · 3eaf4025
      Florian Westphal 提交于
      ->mask is a bit info field that mixes various use cases.
      
      In particular, we have flags that are mutually exlusive, and flags that
      are only used within br_netfilter while others need to be exposed to
      other parts of the kernel.
      
      Remove BRNF_8021Q/PPPoE flags.  They're mutually exclusive and only
      needed within br_netfilter context.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3eaf4025
    • F
      netfilter: bridge: don't use nf_bridge_info data to store mac header · e70deecb
      Florian Westphal 提交于
      br_netfilter maintains an extra state, nf_bridge_info, which is attached
      to skb via skb->nf_bridge pointer.
      
      Amongst other things we use skb->nf_bridge->data to store the original
      mac header for every processed skb.
      
      This is required for ip refragmentation when using conntrack
      on top of bridge, because ip_fragment doesn't copy it from original skb.
      
      However there is no need anymore to do this unconditionally.
      
      Move this to the one place where its needed -- when br_netfilter calls
      ip_fragment().
      
      Also switch to percpu storage for this so we can handle fragmenting
      without accessing nf_bridge meta data.
      
      Only user left is neigh resolution when DNAT is detected, to hold
      the original source mac address (neigh resolution builds new mac header
      using bridge mac), so rename ->data and reduce its size to whats needed.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e70deecb