1. 15 8月, 2013 1 次提交
    • N
      ipip: add x-netns support · 6c742e71
      Nicolas Dichtel 提交于
      This patch allows to switch the netns when packet is encapsulated or
      decapsulated. In other word, the encapsulated packet is received in a netns,
      where the lookup is done to find the tunnel. Once the tunnel is found, the
      packet is decapsulated and injecting into the corresponding interface which
      stands to another netns.
      
      When one of the two netns is removed, the tunnel is destroyed.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c742e71
  2. 10 8月, 2013 1 次提交
  3. 02 7月, 2013 1 次提交
    • C
      gre: fix a regression in ioctl · 6c734fb8
      Cong Wang 提交于
      When testing GRE tunnel, I got:
      
       # ip tunnel show
       get tunnel gre0 failed: Invalid argument
       get tunnel gre1 failed: Invalid argument
      
      This is a regression introduced by commit c5441932
      ("GRE: Refactor GRE tunneling code.") because previously we
      only check the parameters for SIOCADDTUNNEL and SIOCCHGTUNNEL,
      after that commit, the check is moved for all commands.
      
      So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.
      
      After this patch I got:
      
       # ip tunnel show
       gre0: gre/ip  remote any  local any  ttl inherit  nopmtudisc
       gre1: gre/ip  remote 192.168.122.101  local 192.168.122.45  ttl inherit
      
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c734fb8
  4. 20 6月, 2013 3 次提交
  5. 01 6月, 2013 1 次提交
  6. 20 5月, 2013 1 次提交
  7. 25 4月, 2013 1 次提交
  8. 09 4月, 2013 1 次提交
  9. 31 3月, 2013 1 次提交
    • A
      ip_gre: don't overwrite iflink during net_dev init · 537aadc3
      Antonio Quartulli 提交于
      iflink is currently set to 0 in __gre_tunnel_init(). This
      function is invoked in gre_tap_init() and
      ipgre_tunnel_init() which are both used to initialise the
      ndo_init field of the respective net_device_ops structs
      (ipgre.. and gre_tap..) used by GRE interfaces.
      
      However, in netdevice_register() iflink is first set to -1,
      then ndo_init is invoked and then iflink is assigned to a
      proper value if and only if it still was -1.
      
      Assigning 0 to iflink in ndo_init is therefore first
      preventing netdev_register() to correctly assign it a proper
      value and then breaking iflink at all since 0 has not
      correct meaning.
      
      Fix this by removing the iflink assignment in
      __gre_tunnel_init().
      
      Introduced by c5441932
      ("GRE: Refactor GRE tunneling code.")
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAntonio Quartulli <ordex@autistici.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      537aadc3
  10. 27 3月, 2013 1 次提交
    • P
      GRE: Refactor GRE tunneling code. · c5441932
      Pravin B Shelar 提交于
      Following patch refactors GRE code into ip tunneling code and GRE
      specific code. Common tunneling code is moved to ip_tunnel module.
      ip_tunnel module is written as generic library which can be used
      by different tunneling implementations.
      
      ip_tunnel module contains following components:
       - packet xmit and rcv generic code. xmit flow looks like
         (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
       - hash table of all devices.
       - lookup for tunnel devices.
       - control plane operations like device create, destroy, ioctl, netlink
         operations code.
       - registration for tunneling modules, like gre, ipip etc.
       - define single pcpu_tstats dev->tstats.
       - struct tnl_ptk_info added to pass parsed tunnel packet parameters.
      
      ipip.h header is renamed to ip_tunnel.h
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5441932
  11. 17 3月, 2013 1 次提交
  12. 10 3月, 2013 1 次提交
  13. 26 2月, 2013 3 次提交
    • P
      Revert "ip_gre: propogate target device GSO capability to the tunnel device" · 7992ae6d
      Pravin B Shelar 提交于
      This reverts commit eb6b9a8c.
      
      Above commit limits GSO capability of gre device to just TSO, but
      software GRE-GSO is capable of handling all GSO capabilities.
      
      This patch also fixes following panic which reverted commit introduced:-
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000a2
      IP: [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre]
      PGD 42bc19067 PUD 42bca9067 PMD 0
      Oops: 0000 [#1] SMP
      Pid: 2636, comm: ip Tainted: GF            3.8.0+ #83 Dell Inc. PowerEdge R620/0KCKR5
      RIP: 0010:[<ffffffffa0680fd1>]  [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre]
      RSP: 0018:ffff88042bfcb708  EFLAGS: 00010246
      RAX: 00000000000005b6 RBX: ffff88042d2fa000 RCX: 0000000000000044
      RDX: 0000000000000018 RSI: 0000000000000078 RDI: 0000000000000060
      RBP: ffff88042bfcb748 R08: 0000000000000018 R09: 000000000000000c
      R10: 0000000000000020 R11: 000000000101010a R12: ffff88042d2fa800
      R13: 0000000000000000 R14: ffff88042d2fa800 R15: ffff88042cd7f650
      FS:  00007fa784f55700(0000) GS:ffff88043fd20000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000000a2 CR3: 000000042d8b9000 CR4: 00000000000407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process ip (pid: 2636, threadinfo ffff88042bfca000, task ffff88042d142a80)
      Stack:
       0000000100000000 002f000000000000 0a01010100000000 000000000b010101
       ffff88042d2fa800 ffff88042d2fa000 ffff88042bfcb858 ffff88042f418c00
       ffff88042bfcb798 ffffffffa068199a ffff88042bfcb798 ffff88042d2fa830
      Call Trace:
       [<ffffffffa068199a>] ipgre_newlink+0xca/0x160 [ip_gre]
       [<ffffffff8143b692>] rtnl_newlink+0x532/0x5f0
       [<ffffffff8143b2fc>] ? rtnl_newlink+0x19c/0x5f0
       [<ffffffff81438978>] rtnetlink_rcv_msg+0x2c8/0x340
       [<ffffffff814386b0>] ? rtnetlink_rcv+0x40/0x40
       [<ffffffff814560f9>] netlink_rcv_skb+0xa9/0xd0
       [<ffffffff81438695>] rtnetlink_rcv+0x25/0x40
       [<ffffffff81455ddc>] netlink_unicast+0x1ac/0x230
       [<ffffffff81456a45>] netlink_sendmsg+0x265/0x380
       [<ffffffff814138c0>] sock_sendmsg+0xb0/0xe0
       [<ffffffff8141141e>] ? move_addr_to_kernel+0x4e/0x90
       [<ffffffff81420445>] ? verify_iovec+0x85/0xf0
       [<ffffffff81414ffd>] __sys_sendmsg+0x3fd/0x420
       [<ffffffff8114b701>] ? handle_mm_fault+0x251/0x3b0
       [<ffffffff8114f39f>] ? vma_link+0xcf/0xe0
       [<ffffffff81415239>] sys_sendmsg+0x49/0x90
       [<ffffffff814ffd19>] system_call_fastpath+0x16/0x1b
      
      CC: Dmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7992ae6d
    • P
      IP_GRE: Fix GRE_CSUM case. · 8f10098f
      Pravin B Shelar 提交于
      commit "ip_gre: allow CSUM capable devices to handle packets"
      aa0e51cd, broke GRE_CSUM case.
      GRE_CSUM needs checksum computed for inner packet. Therefore
      csum-calculation can not be offloaded if tunnel device requires
      GRE_CSUM.  Following patch fixes it by computing inner packet checksum
      for GRE_CSUM type, for all other type of GRE devices csum is offloaded.
      
      CC: Dmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f10098f
    • P
      IP_GRE: Fix IP-Identification. · 490ab081
      Pravin B Shelar 提交于
      GRE-GSO generates ip fragments with id 0,2,3,4... for every
      GSO packet, which is not correct. Following patch fixes it
      by setting ip-header id unique id of fragments are allowed.
      As Eric Dumazet suggested it is optimized by using inner ip-header
      whenever inner packet is ipv4.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      490ab081
  14. 20 2月, 2013 1 次提交
  15. 19 2月, 2013 2 次提交
  16. 16 2月, 2013 1 次提交
    • P
      v4 GRE: Add TCP segmentation offload for GRE · 68c33163
      Pravin B Shelar 提交于
      Following patch adds GRE protocol offload handler so that
      skb_gso_segment() can segment GRE packets.
      SKB GSO CB is added to keep track of total header length so that
      skb_segment can push entire header. e.g. in case of GRE, skb_segment
      need to push inner and outer headers to every segment.
      New NETIF_F_GRE_GSO feature is added for devices which support HW
      GRE TSO offload. Currently none of devices support it therefore GRE GSO
      always fall backs to software GSO.
      
      [ Compute pkt_len before ip_local_out() invocation. -DaveM ]
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68c33163
  17. 30 1月, 2013 1 次提交
    • D
      ip_gre: When TOS is inherited, use configured TOS value for non-IP packets · 040468a0
      David Ward 提交于
      A GRE tunnel can be configured so that outgoing tunnel packets inherit
      the value of the TOS field from the inner IP header. In doing so, when
      a non-IP packet is transmitted through the tunnel, the TOS field will
      always be set to 0.
      
      Instead, the user should be able to configure a different TOS value as
      the fallback to use for non-IP packets. This is helpful when the non-IP
      packets are all control packets and should be handled by routers outside
      the tunnel as having Internet Control precedence. One example of this is
      the NHRP packets that control a DMVPN-compatible mGRE tunnel; they are
      encapsulated directly by GRE and do not contain an inner IP header.
      
      Under the existing behavior, the IFLA_GRE_TOS parameter must be set to
      '1' for the TOS value to be inherited. Now, only the least significant
      bit of this parameter must be set to '1', and when a non-IP packet is
      sent through the tunnel, the upper 6 bits of this same parameter will be
      copied into the TOS field. (The ECN bits get masked off as before.)
      
      This behavior is backwards-compatible with existing configurations and
      iproute2 versions.
      Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      040468a0
  18. 28 1月, 2013 2 次提交
    • E
      net: fix possible wrong checksum generation · cef401de
      Eric Dumazet 提交于
      Pravin Shelar mentioned that GSO could potentially generate
      wrong TX checksum if skb has fragments that are overwritten
      by the user between the checksum computation and transmit.
      
      He suggested to linearize skbs but this extra copy can be
      avoided for normal tcp skbs cooked by tcp_sendmsg().
      
      This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
      in skb_shinfo(skb)->gso_type if at least one frag can be
      modified by the user.
      
      Typical sources of such possible overwrites are {vm}splice(),
      sendfile(), and macvtap/tun/virtio_net drivers.
      
      Tested:
      
      $ netperf -H 7.7.8.84
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      7.7.8.84 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3959.52
      
      $ netperf -H 7.7.8.84 -t TCP_SENDFILE
      TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
      port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3216.80
      
      Performance of the SENDFILE is impacted by the extra allocation and
      copy, and because we use order-0 pages, while the TCP_STREAM uses
      bigger pages.
      Reported-by: NPravin Shelar <pshelar@nicira.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cef401de
    • P
      IP_GRE: Fix kernel panic in IP_GRE with GRE csum. · 5465740a
      Pravin B Shelar 提交于
      Due to IP_GRE GSO support, GRE can recieve non linear skb which
      results in panic in case of GRE_CSUM.  Following patch fixes it by
      using correct csum API.
      
      Bug introduced in commit 6b78f16e (gre: add GSO support)
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5465740a
  19. 27 12月, 2012 1 次提交
  20. 22 12月, 2012 2 次提交
  21. 19 11月, 2012 1 次提交
    • E
      net: Allow userns root to control ipv4 · 52e804c6
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed
      while resource control is left unchanged.
      
      Allow creating raw sockets.
      Allow the SIOCSARP ioctl to control the arp cache.
      Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
      Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
      Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
      Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
      Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
      Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting gre tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipip tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipsec virtual tunnel interfaces.
      
      Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
      MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
      sockets.
      
      Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
      arbitrary ip options.
      
      Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
      Allow setting the IP_TRANSPARENT ipv4 socket option.
      Allow setting the TCP_REPAIR socket option.
      Allow setting the TCP_CONGESTION socket option.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52e804c6
  22. 15 11月, 2012 2 次提交
  23. 02 10月, 2012 2 次提交
  24. 28 9月, 2012 3 次提交
    • S
      tunnel: drop packet if ECN present with not-ECT · eccc1bb8
      stephen hemminger 提交于
      Linux tunnels were written before RFC6040 and therefore never
      implemented the corner case of ECN getting set in the outer header
      and the inner header not being ready for it.
      
      Section 4.2.  Default Tunnel Egress Behaviour.
       o If the inner ECN field is Not-ECT, the decapsulator MUST NOT
            propagate any other ECN codepoint onwards.  This is because the
            inner Not-ECT marking is set by transports that rely on dropped
            packets as an indication of congestion and would not understand or
            respond to any other ECN codepoint [RFC4774].  Specifically:
      
            *  If the inner ECN field is Not-ECT and the outer ECN field is
               CE, the decapsulator MUST drop the packet.
      
            *  If the inner ECN field is Not-ECT and the outer ECN field is
               Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
               outgoing packet with the ECN field cleared to Not-ECT.
      
      This patch moves the ECN decap logic out of the individual tunnels
      into a common place.
      
      It also adds logging to allow detecting broken systems that
      set ECN bits incorrectly when tunneling (or an intermediate
      router might be changing the header).
      
      Overloads rx_frame_error to keep track of ECN related error.
      
      Thanks to Chris Wright who caught this while reviewing the new VXLAN
      tunnel.
      
      This code was tested by injecting faulty logic in other end GRE
      to send incorrectly encapsulated packets.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eccc1bb8
    • S
      gre: remove unnecessary rcu_read_lock/unlock · 0c5794a6
      stephen hemminger 提交于
      The gre function pointers for receive and error handling are
      always called (from gre.c) with rcu_read_lock already held.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c5794a6
    • S
      gre: fix handling of key 0 · d2083287
      stephen hemminger 提交于
      GRE driver incorrectly uses zero as a flag value. Zero is a perfectly
      valid value for key, and the tunnel should match packets with no key only
      with tunnels created without key, and vice versa.
      
      This is a slightly visible  change since previously it might be possible to
      construct a working tunnel that sent key 0 and received only because
      of the key wildcard of zero.  I.e the sender sent key of zero, but tunnel
      was defined without key.
      
      Note: using gre key 0 requires iproute2 utilities v3.2 or later.
      The original utility code was broken as well.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2083287
  25. 20 9月, 2012 1 次提交
  26. 21 7月, 2012 1 次提交
  27. 17 7月, 2012 1 次提交
    • D
      net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270
      David S. Miller 提交于
      This will be used so that we can compose a full flow key.
      
      Even though we have a route in this context, we need more.  In the
      future the routes will be without destination address, source address,
      etc. keying.  One ipv4 route will cover entire subnets, etc.
      
      In this environment we have to have a way to possess persistent storage
      for redirects and PMTU information.  This persistent storage will exist
      in the FIB tables, and that's why we'll need to be able to rebuild a
      full lookup flow key here.  Using that flow key will do a fib_lookup()
      and create/update the persistent entry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6700c270
  28. 12 7月, 2012 1 次提交
  29. 15 6月, 2012 1 次提交
    • D
      ipv4: Handle PMTU in all ICMP error handlers. · 36393395
      David S. Miller 提交于
      With ip_rt_frag_needed() removed, we have to explicitly update PMTU
      information in every ICMP error handler.
      
      Create two helper functions to facilitate this.
      
      1) ipv4_sk_update_pmtu()
      
         This updates the PMTU when we have a socket context to
         work with.
      
      2) ipv4_update_pmtu()
      
         Raw version, used when no socket context is available.  For this
         interface, we essentially just pass in explicit arguments for
         the flow identity information we would have extracted from the
         socket.
      
         And you'll notice that ipv4_sk_update_pmtu() is simply implemented
         in terms of ipv4_update_pmtu()
      
      Note that __ip_route_output_key() is used, rather than something like
      ip_route_output_flow() or ip_route_output_key().  This is because we
      absolutely do not want to end up with a route that does IPSEC
      encapsulation and the like.  Instead, we only want the route that
      would get us to the node described by the outermost IP header.
      Reported-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36393395