1. 01 11月, 2016 3 次提交
  2. 30 10月, 2016 3 次提交
  3. 28 10月, 2016 5 次提交
    • A
      net: skip genenerating uevents for network namespaces that are exiting · 002d8a1a
      Andrey Vagin 提交于
      No one can see these events, because a network namespace can not be
      destroyed, if it has sockets.
      
      Unlike other devices, uevent-s for network devices are generated
      only inside their network namespaces. They are filtered in
      kobj_bcast_filter()
      
      My experiments shows that net namespaces are destroyed more 30% faster
      with this optimization.
      
      Here is a perf output for destroying network namespaces without this
      patch.
      
      -   94.76%     0.02%  kworker/u48:1  [kernel.kallsyms]     [k] cleanup_net
         - 94.74% cleanup_net
            - 94.64% ops_exit_list.isra.4
               - 41.61% default_device_exit_batch
                  - 41.47% unregister_netdevice_many
                     - rollback_registered_many
                        - 40.36% netdev_unregister_kobject
                           - 14.55% device_del
                              + 13.71% kobject_uevent
                           - 13.04% netdev_queue_update_kobjects
                              + 12.96% kobject_put
                           - 12.72% net_rx_queue_update_kobjects
                                kobject_put
                              - kobject_release
                                 + 12.69% kobject_uevent
                        + 0.80% call_netdevice_notifiers_info
               + 19.57% nfsd_exit_net
               + 11.15% tcp_net_metrics_exit
               + 8.25% rpcsec_gss_exit_net
      
      It's very critical to optimize the exit path for network namespaces,
      because they are destroyed under net_mutex and many namespaces can be
      destroyed for one iteration.
      
      v2: use dev_set_uevent_suppress()
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      002d8a1a
    • A
      flow_dissector: fix vlan tag handling · bc72f3dd
      Arnd Bergmann 提交于
      gcc warns about an uninitialized pointer dereference in the vlan
      priority handling:
      
      net/core/flow_dissector.c: In function '__skb_flow_dissect':
      net/core/flow_dissector.c:281:61: error: 'vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      As pointed out by Jiri Pirko, the variable is never actually used
      without being initialized first as the only way it end up uninitialized
      is with skb_vlan_tag_present(skb)==true, and that means it does not
      get accessed.
      
      However, the warning hints at some related issues that I'm addressing
      here:
      
      - the second check for the vlan tag is different from the first one
        that tests the skb for being NULL first, causing both the warning
        and a possible NULL pointer dereference that was not entirely fixed.
      - The same patch that introduced the NULL pointer check dropped an
        earlier optimization that skipped the repeated check of the
        protocol type
      - The local '_vlan' variable is referenced through the 'vlan' pointer
        but the variable has gone out of scope by the time that it is
        accessed, causing undefined behavior
      
      Caching the result of the 'skb && skb_vlan_tag_present(skb)' check
      in a local variable allows the compiler to further optimize the
      later check. With those changes, the warning also disappears.
      
      Fixes: 3805a938 ("flow_dissector: Check skb for VLAN only if skb specified.")
      Fixes: d5709f7a ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NEric Garver <e@erig.me>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc72f3dd
    • J
      genetlink: mark families as __ro_after_init · 56989f6d
      Johannes Berg 提交于
      Now genl_register_family() is the only thing (other than the
      users themselves, perhaps, but I didn't find any doing that)
      writing to the family struct.
      
      In all families that I found, genl_register_family() is only
      called from __init functions (some indirectly, in which case
      I've add __init annotations to clarifly things), so all can
      actually be marked __ro_after_init.
      
      This protects the data structure from accidental corruption.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56989f6d
    • J
      genetlink: statically initialize families · 489111e5
      Johannes Berg 提交于
      Instead of providing macros/inline functions to initialize
      the families, make all users initialize them statically and
      get rid of the macros.
      
      This reduces the kernel code size by about 1.6k on x86-64
      (with allyesconfig).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      489111e5
    • J
      genetlink: no longer support using static family IDs · a07ea4d9
      Johannes Berg 提交于
      Static family IDs have never really been used, the only
      use case was the workaround I introduced for those users
      that assumed their family ID was also their multicast
      group ID.
      
      Additionally, because static family IDs would never be
      reserved by the generic netlink code, using a relatively
      low ID would only work for built-in families that can be
      registered immediately after generic netlink is started,
      which is basically only the control family (apart from
      the workaround code, which I also had to add code for so
      it would reserve those IDs)
      
      Thus, anything other than GENL_ID_GENERATE is flawed and
      luckily not used except in the cases I mentioned. Move
      those workarounds into a few lines of code, and then get
      rid of GENL_ID_GENERATE entirely, making it more robust.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a07ea4d9
  4. 27 10月, 2016 1 次提交
  5. 24 10月, 2016 1 次提交
  6. 23 10月, 2016 3 次提交
  7. 21 10月, 2016 1 次提交
    • S
      net: add recursion limit to GRO · fcd91dd4
      Sabrina Dubroca 提交于
      Currently, GRO can do unlimited recursion through the gro_receive
      handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
      to one level with encap_mark, but both VLAN and TEB still have this
      problem.  Thus, the kernel is vulnerable to a stack overflow, if we
      receive a packet composed entirely of VLAN headers.
      
      This patch adds a recursion counter to the GRO layer to prevent stack
      overflow.  When a gro_receive function hits the recursion limit, GRO is
      aborted for this skb and it is processed normally.  This recursion
      counter is put in the GRO CB, but could be turned into a percpu counter
      if we run out of space in the CB.
      
      Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.
      
      Fixes: CVE-2016-7039
      Fixes: 9b174d88 ("net: Add Transparent Ethernet Bridging GRO support.")
      Fixes: 66e5133f ("vlan: Add GRO support for non hardware accelerated vlan")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcd91dd4
  8. 19 10月, 2016 3 次提交
    • I
      net: core: Correctly iterate over lower adjacency list · e4961b07
      Ido Schimmel 提交于
      Tamir reported the following trace when processing ARP requests received
      via a vlan device on top of a VLAN-aware bridge:
      
       NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
      [...]
       CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.8.0-rc7 #1
       Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
       task: ffff88017edfea40 task.stack: ffff88017ee10000
       RIP: 0010:[<ffffffff815dcc73>]  [<ffffffff815dcc73>] netdev_all_lower_get_next_rcu+0x33/0x60
      [...]
       Call Trace:
        <IRQ>
        [<ffffffffa015de0a>] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
        [<ffffffffa016f1b0>] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
        [<ffffffff810ad07a>] notifier_call_chain+0x4a/0x70
        [<ffffffff810ad13a>] atomic_notifier_call_chain+0x1a/0x20
        [<ffffffff815ee77b>] call_netevent_notifiers+0x1b/0x20
        [<ffffffff815f2eb6>] neigh_update+0x306/0x740
        [<ffffffff815f38ce>] neigh_event_ns+0x4e/0xb0
        [<ffffffff8165ea3f>] arp_process+0x66f/0x700
        [<ffffffff8170214c>] ? common_interrupt+0x8c/0x8c
        [<ffffffff8165ec29>] arp_rcv+0x139/0x1d0
        [<ffffffff816e505a>] ? vlan_do_receive+0xda/0x320
        [<ffffffff815e3794>] __netif_receive_skb_core+0x524/0xab0
        [<ffffffff815e6830>] ? dev_queue_xmit+0x10/0x20
        [<ffffffffa06d612d>] ? br_forward_finish+0x3d/0xc0 [bridge]
        [<ffffffffa06e5796>] ? br_handle_vlan+0xf6/0x1b0 [bridge]
        [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
        [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
        [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
        [<ffffffffa06d7856>] br_pass_frame_up+0xc6/0x160 [bridge]
        [<ffffffffa06d63d7>] ? deliver_clone+0x37/0x50 [bridge]
        [<ffffffffa06d656c>] ? br_flood+0xcc/0x160 [bridge]
        [<ffffffffa06d7b14>] br_handle_frame_finish+0x224/0x4f0 [bridge]
        [<ffffffffa06d7f94>] br_handle_frame+0x174/0x300 [bridge]
        [<ffffffff815e3599>] __netif_receive_skb_core+0x329/0xab0
        [<ffffffff81374815>] ? find_next_bit+0x15/0x20
        [<ffffffff8135e802>] ? cpumask_next_and+0x32/0x50
        [<ffffffff810c9968>] ? load_balance+0x178/0x9b0
        [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
        [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
        [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
        [<ffffffffa01544a1>] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
        [<ffffffffa005c9f7>] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
        [<ffffffffa007332a>] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
        [<ffffffff81091986>] tasklet_action+0xf6/0x110
        [<ffffffff81704556>] __do_softirq+0xf6/0x280
        [<ffffffff8109213f>] irq_exit+0xdf/0xf0
        [<ffffffff817042b4>] do_IRQ+0x54/0xd0
        [<ffffffff8170214c>] common_interrupt+0x8c/0x8c
      
      The problem is that netdev_all_lower_get_next_rcu() never advances the
      iterator, thereby causing the loop over the lower adjacency list to run
      forever.
      
      Fix this by advancing the iterator and avoid the infinite loop.
      
      Fixes: 7ce856aa ("mlxsw: spectrum: Add couple of lower device helper functions")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NTamir Winetroub <tamirw@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4961b07
    • E
      flow_dissector: Check skb for VLAN only if skb specified. · 3805a938
      Eric Garver 提交于
      Fixes a panic when calling eth_get_headlen(). Noticed on i40e driver.
      
      Fixes: d5709f7a ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci")
      Signed-off-by: NEric Garver <e@erig.me>
      Reviewed-by: NJakub Sitnicki <jkbs@redhat.com>
      Acked-by: NAmir Vadai <amir@vadai.me>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3805a938
    • E
      soreuseport: do not export reuseport_add_sock() · 41ee9c55
      Eric Dumazet 提交于
      reuseport_add_sock() is not used from a module,
      no need to export it.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41ee9c55
  9. 18 10月, 2016 6 次提交
  10. 17 10月, 2016 1 次提交
    • E
      net: pktgen: remove rcu locking in pktgen_change_name() · 9a0b1e8b
      Eric Dumazet 提交于
      After Jesper commit back in linux-3.18, we trigger a lockdep
      splat in proc_create_data() while allocating memory from
      pktgen_change_name().
      
      This patch converts t->if_lock to a mutex, since it is now only
      used from control path, and adds proper locking to pktgen_change_name()
      
      1) pktgen_thread_lock to protect the outer loop (iterating threads)
      2) t->if_lock to protect the inner loop (iterating devices)
      
      Note that before Jesper patch, pktgen_change_name() was lacking proper
      protection, but lockdep was not able to detect the problem.
      
      Fixes: 8788370a ("pktgen: RCU-ify "if_list" to remove lock in next_to_run()")
      Reported-by: NJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a0b1e8b
  11. 16 10月, 2016 1 次提交
  12. 14 10月, 2016 2 次提交
  13. 13 10月, 2016 1 次提交
    • J
      net: centralize net_device min/max MTU checking · 61e84623
      Jarod Wilson 提交于
      While looking into an MTU issue with sfc, I started noticing that almost
      every NIC driver with an ndo_change_mtu function implemented almost
      exactly the same range checks, and in many cases, that was the only
      practical thing their ndo_change_mtu function was doing. Quite a few
      drivers have either 68, 64, 60 or 46 as their minimum MTU value checked,
      and then various sizes from 1500 to 65535 for their maximum MTU value. We
      can remove a whole lot of redundant code here if we simple store min_mtu
      and max_mtu in net_device, and check against those in net/core/dev.c's
      dev_set_mtu().
      
      In theory, there should be zero functional change with this patch, it just
      puts the infrastructure in place. Subsequent patches will attempt to start
      using said infrastructure, with theoretically zero change in
      functionality.
      
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61e84623
  14. 11 10月, 2016 1 次提交
    • E
      latent_entropy: Mark functions with __latent_entropy · 0766f788
      Emese Revfy 提交于
      The __latent_entropy gcc attribute can be used only on functions and
      variables.  If it is on a function then the plugin will instrument it for
      gathering control-flow entropy. If the attribute is on a variable then
      the plugin will initialize it with random contents.  The variable must
      be an integer, an integer array type or a structure with integer fields.
      
      These specific functions have been selected because they are init
      functions (to help gather boot-time entropy), are called at unpredictable
      times, or they have variable loops, each of which provide some level of
      latent entropy.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0766f788
  15. 08 10月, 2016 1 次提交
  16. 04 10月, 2016 3 次提交
    • A
      net: Add netdev all_adj_list refcnt propagation to fix panic · 93409033
      Andrew Collins 提交于
      This is a respin of a patch to fix a relatively easily reproducible kernel
      panic related to the all_adj_list handling for netdevs in recent kernels.
      
      The following sequence of commands will reproduce the issue:
      
      ip link add link eth0 name eth0.100 type vlan id 100
      ip link add link eth0 name eth0.200 type vlan id 200
      ip link add name testbr type bridge
      ip link set eth0.100 master testbr
      ip link set eth0.200 master testbr
      ip link add link testbr mac0 type macvlan
      ip link delete dev testbr
      
      This creates an upper/lower tree of (excuse the poor ASCII art):
      
                  /---eth0.100-eth0
      mac0-testbr-
                  \---eth0.200-eth0
      
      When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
      the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
      reference to eth0 is added, so this results in a panic.
      
      This change adds reference count propagation so things are handled properly.
      
      Matthias Schiffer reported a similar crash in batman-adv:
      
      https://github.com/freifunk-gluon/gluon/issues/680
      https://www.open-mesh.org/issues/247
      
      which this patch also seems to resolve.
      Signed-off-by: NAndrew Collins <acollins@cradlepoint.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93409033
    • S
      net: skbuff: Limit skb_vlan_pop/push() to expect skb->data at mac header · b6a79208
      Shmulik Ladkani 提交于
      skb_vlan_pop/push were too generic, trying to support the cases where
      skb->data is at mac header, and cases where skb->data is arbitrarily
      elsewhere.
      
      Supporting an arbitrary skb->data was complex and bogus:
       - It failed to unwind skb->data to its original location post actual
         pop/push.
         (Also, semantic is not well defined for unwinding: If data was into
          the eth header, need to use same offset from start; But if data was
          at network header or beyond, need to adjust the original offset
          according to the push/pull)
       - It mangled the rcsum post actual push/pop, without taking into account
         that the eth bytes might already have been pulled out of the csum.
      
      Most callers (ovs, bpf) already had their skb->data at mac_header upon
      invoking skb_vlan_pop/push.
      Last caller that failed to do so (act_vlan) has been recently fixed.
      
      Therefore, to simplify things, no longer support arbitrary skb->data
      inputs for skb_vlan_pop/push().
      
      skb->data is expected to be exactly at mac_header; WARN otherwise.
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6a79208
    • A
      skb_splice_bits(): get rid of callback · 25869262
      Al Viro 提交于
      since pipe_lock is the outermost now, we don't need to drop/regain
      socket locks around the call of splice_to_pipe() from skb_splice_bits(),
      which kills the need to have a socket-specific callback; we can just
      call splice_to_pipe() and be done with that.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      25869262
  17. 03 10月, 2016 2 次提交
    • A
      net: rtnl: avoid uninitialized data in IFLA_VF_VLAN_LIST handling · fa34cd94
      Arnd Bergmann 提交于
      With the newly added support for IFLA_VF_VLAN_LIST netlink messages,
      we get a warning about potential uninitialized variable use in
      the parsing of the user input when enabling the -Wmaybe-uninitialized
      warning:
      
      net/core/rtnetlink.c: In function 'do_setvfinfo':
      net/core/rtnetlink.c:1756:9: error: 'ivvl$' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      I have not been able to prove whether it is possible to arrive in
      this code with an empty IFLA_VF_VLAN_LIST block, but if we do,
      then ndo_set_vf_vlan gets called with uninitialized arguments.
      
      This adds an explicit check for an empty list, making it obvious
      to the reader and the compiler that this cannot happen.
      
      Fixes: 79aab093 ("net: Update API for VF vlan protocol 802.1ad support")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa34cd94
    • P
      net: pktgen: fix pkt_size · 63d75463
      Paolo Abeni 提交于
      The commit 879c7220 ("net: pktgen: Observe needed_headroom
      of the device") increased the 'pkt_overhead' field value by
      LL_RESERVED_SPACE.
      As a side effect the generated packet size, computed as:
      
      	/* Eth + IPh + UDPh + mpls */
      	datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8 -
      		  pkt_dev->pkt_overhead;
      
      is decreased by the same value.
      The above changed slightly the behavior of existing pktgen users,
      and made the procfs interface somewhat inconsistent.
      Fix it by restoring the previous pkt_overhead value and using
      LL_RESERVED_SPACE as extralen in skb allocation.
      Also, change pktgen_alloc_skb() to only partially reserve
      the headroom to allow the caller to prefetch from ll header
      start.
      
      v1 -> v2:
       - fixed some typos in the comments
      
      Fixes: 879c7220 ("net: pktgen: Observe needed_headroom of the device")
      Suggested-by: NBen Greear <greearb@candelatech.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63d75463
  18. 29 9月, 2016 1 次提交
  19. 25 9月, 2016 1 次提交