1. 26 1月, 2017 4 次提交
  2. 25 1月, 2017 15 次提交
  3. 24 1月, 2017 4 次提交
    • J
      mac80211: don't try to sleep in rate_control_rate_init() · 115865fa
      Johannes Berg 提交于
      In my previous patch, I missed that rate_control_rate_init() is
      called from some places that cannot sleep, so it cannot call
      ieee80211_recalc_min_chandef(). Remove that call for now to fix
      the context bug, we'll have to find a different way to fix the
      minimum channel width issue.
      
      Fixes: 96aa2e7c ("mac80211: calculate min channel width correctly")
      Reported-by: NXiaolong Ye (via lkp-robot) <xiaolong.ye@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      115865fa
    • L
      netfilter: nf_tables: validate the name size when possible · b2fbd044
      Liping Zhang 提交于
      Currently, if the user add a stateful object with the name size exceed
      NFT_OBJ_MAXNAMELEN - 1 (i.e. 31), we truncate it down to 31 silently.
      This is not friendly, furthermore, this will cause duplicated stateful
      objects when the first 31 characters of the name is same. So limit the
      stateful object's name size to NFT_OBJ_MAXNAMELEN - 1.
      
      After apply this patch, error message will be printed out like this:
        # name_32=$(printf "%0.sQ" {1..32})
        # nft add counter filter $name_32
        <cmdline>:1:1-52: Error: Could not process rule: Numerical result out
        of range
        add counter filter QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      Also this patch cleans up the codes which missing the name size limit
      validation in nftables.
      
      Fixes: e5009240 ("netfilter: nf_tables: add stateful objects")
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b2fbd044
    • F
      net: dsa: Check return value of phy_connect_direct() · 4078b76c
      Florian Fainelli 提交于
      We need to check the return value of phy_connect_direct() in
      dsa_slave_phy_connect() otherwise we may be continuing the
      initialization of a slave network device with a PHY that already
      attached somewhere else and which will soon be in error because the PHY
      device is in error.
      
      The conditions for such an error to occur are that we have a port of our
      switch that is not disabled, and has the same port number as a PHY
      address (say both 5) that can be probed using the DSA slave MII bus. We
      end-up having this slave network device find a PHY at the same address
      as our port number, and we try to attach to it.
      
      A slave network (e.g: port 0) has already attached to our PHY device,
      and we try to re-attach it with a different network device, but since we
      ignore the error we would end-up initializating incorrect device
      references by the time the slave network interface is opened.
      
      The code has been (re)organized several times, making it hard to provide
      an exact Fixes tag, this is a bugfix nonetheless.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4078b76c
    • D
      net: mpls: Fix multipath selection for LSR use case · 9f427a0e
      David Ahern 提交于
      MPLS multipath for LSR is broken -- always selecting the first nexthop
      in the one label case. For example:
      
          $ ip -f mpls ro ls
          100
                  nexthop as to 200 via inet 172.16.2.2  dev virt12
                  nexthop as to 300 via inet 172.16.3.2  dev virt13
          101
                  nexthop as to 201 via inet6 2000:2::2  dev virt12
                  nexthop as to 301 via inet6 2000:3::2  dev virt13
      
      In this example incoming packets have a single MPLS labels which means
      BOS bit is set. The BOS bit is passed from mpls_forward down to
      mpls_multipath_hash which never processes the hash loop because BOS is 1.
      
      Update mpls_multipath_hash to process the entire label stack. mpls_hdr_len
      tracks the total mpls header length on each pass (on pass N mpls_hdr_len
      is N * sizeof(mpls_shim_hdr)). When the label is found with the BOS set
      it verifies the skb has sufficient header for ipv4 or ipv6, and find the
      IPv4 and IPv6 header by using the last mpls_hdr pointer and adding 1 to
      advance past it.
      
      With these changes I have verified the code correctly sees the label,
      BOS, IPv4 and IPv6 addresses in the network header and icmp/tcp/udp
      traffic for ipv4 and ipv6 are distributed across the nexthops.
      
      Fixes: 1c78efa8 ("mpls: flow-based multipath selection")
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f427a0e
  4. 21 1月, 2017 5 次提交
  5. 20 1月, 2017 2 次提交
    • A
      tcp: initialize max window for a new fastopen socket · 0dbd7ff3
      Alexey Kodanev 提交于
      Found that if we run LTP netstress test with large MSS (65K),
      the first attempt from server to send data comparable to this
      MSS on fastopen connection will be delayed by the probe timer.
      
      Here is an example:
      
           < S  seq 0:0 win 43690 options [mss 65495 wscale 7 tfo cookie] length 32
           > S. seq 0:0 ack 1 win 43690 options [mss 65495 wscale 7] length 0
           < .  ack 1 win 342 length 0
      
      Inside tcp_sendmsg(), tcp_send_mss() returns max MSS in 'mss_now',
      as well as in 'size_goal'. This results the segment not queued for
      transmition until all the data copied from user buffer. Then, inside
      __tcp_push_pending_frames(), it breaks on send window test and
      continues with the check probe timer.
      
      Fragmentation occurs in tcp_write_wakeup()...
      
      +0.2 > P. seq 1:43777 ack 1 win 342 length 43776
           < .  ack 43777, win 1365 length 0
           > P. seq 43777:65001 ack 1 win 342 options [...] length 21224
           ...
      
      This also contradicts with the fact that we should bound to the half
      of the window if it is large.
      
      Fix this flaw by correctly initializing max_window. Before that, it
      could have large values that affect further calculations of 'size_goal'.
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0dbd7ff3
    • K
      ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lock · 03e4deff
      Kefeng Wang 提交于
      Just like commit 4acd4945 ("ipv6: addrconf: Avoid calling
      netdevice notifiers with RCU read-side lock"), it is unnecessary
      to make addrconf_disable_change() use RCU iteration over the
      netdev list, since it already holds the RTNL lock, or we may meet
      Illegal context switch in RCU read-side critical section.
      Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03e4deff
  6. 19 1月, 2017 6 次提交
    • F
      netfilter: conntrack: refine gc worker heuristics, redux · e5072053
      Florian Westphal 提交于
      This further refines the changes made to conntrack gc_worker in
      commit e0df8cae ("netfilter: conntrack: refine gc worker heuristics").
      
      The main idea of that change was to reduce the scan interval when evictions
      take place.
      
      However, on the reporters' setup, there are 1-2 million conntrack entries
      in total and roughly 8k new (and closing) connections per second.
      
      In this case we'll always evict at least one entry per gc cycle and scan
      interval is always at 1 jiffy because of this test:
      
       } else if (expired_count) {
           gc_work->next_gc_run /= 2U;
           next_run = msecs_to_jiffies(1);
      
      being true almost all the time.
      
      Given we scan ~10k entries per run its clearly wrong to reduce interval
      based on nonzero eviction count, it will only waste cpu cycles since a vast
      majorities of conntracks are not timed out.
      
      Thus only look at the ratio (scanned entries vs. evicted entries) to make
      a decision on whether to reduce or not.
      
      Because evictor is supposed to only kick in when system turns idle after
      a busy period, pick a high ratio -- this makes it 50%.  We thus keep
      the idea of increasing scan rate when its likely that table contains many
      expired entries.
      
      In order to not let timed-out entries hang around for too long
      (important when using event logging, in which case we want to timely
      destroy events), we now scan the full table within at most
      GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all
      timed-out entries sit in same slot.
      
      I tested this with a vm under synflood (with
      sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).
      
      While flood is ongoing, interval now stays at its max rate
      (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms).
      
      With feedback from Nicolas Dichtel.
      Reported-by: NDenys Fedoryshchenko <nuclearcat@nuclearcat.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Fixes: b87a2f91 ("netfilter: conntrack: add gc worker to remove timed-out entries")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Tested-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Tested-by: NDenys Fedoryshchenko <nuclearcat@nuclearcat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e5072053
    • F
      netfilter: conntrack: remove GC_MAX_EVICTS break · 524b698d
      Florian Westphal 提交于
      Instead of breaking loop and instant resched, don't bother checking
      this in first place (the loop calls cond_resched for every bucket anyway).
      Suggested-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      524b698d
    • D
      lwtunnel: fix autoload of lwt modules · 9ed59592
      David Ahern 提交于
      Trying to add an mpls encap route when the MPLS modules are not loaded
      hangs. For example:
      
          CONFIG_MPLS=y
          CONFIG_NET_MPLS_GSO=m
          CONFIG_MPLS_ROUTING=m
          CONFIG_MPLS_IPTUNNEL=m
      
          $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
      The ip command hangs:
      root       880   826  0 21:25 pts/0    00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
          $ cat /proc/880/stack
          [<ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134
          [<ffffffff81065efc>] __request_module+0x27b/0x30a
          [<ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178
          [<ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4
          [<ffffffff814ae451>] fib_table_insert+0x90/0x41f
          [<ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52
          ...
      
      modprobe is trying to load rtnl-lwt-MPLS:
      
      root       881     5  0 21:25 ?        00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS
      
      and it hangs after loading mpls_router:
      
          $ cat /proc/881/stack
          [<ffffffff81441537>] rtnl_lock+0x12/0x14
          [<ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179
          [<ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router]
          [<ffffffff81000471>] do_one_initcall+0x8e/0x13f
          [<ffffffff81119961>] do_init_module+0x5a/0x1e5
          [<ffffffff810bd070>] load_module+0x13bd/0x17d6
          ...
      
      The problem is that lwtunnel_build_state is called with rtnl lock
      held preventing mpls_init from registering.
      
      Given the potential references held by the time lwtunnel_build_state it
      can not drop the rtnl lock to the load module. So, extract the module
      loading code from lwtunnel_build_state into a new function to validate
      the encap type. The new function is called while converting the user
      request into a fib_config which is well before any table, device or
      fib entries are examined.
      
      Fixes: 745041e2 ("lwtunnel: autoload of lwt modules")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ed59592
    • E
      net: fix harmonize_features() vs NETIF_F_HIGHDMA · 7be2c82c
      Eric Dumazet 提交于
      Ashizuka reported a highmem oddity and sent a patch for freescale
      fec driver.
      
      But the problem root cause is that core networking stack
      must ensure no skb with highmem fragment is ever sent through
      a device that does not assert NETIF_F_HIGHDMA in its features.
      
      We need to call illegal_highdma() from harmonize_features()
      regardless of CSUM checks.
      
      Fixes: ec5f0615 ("net: Kill link between CSUM and SG features.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Reported-by: N"Ashizuka, Yuusuke" <ashiduka@jp.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7be2c82c
    • A
      netfilter: ipt_CLUSTERIP: fix build error without procfs · 3fd0b634
      Arnd Bergmann 提交于
      We can't access c->pde if CONFIG_PROC_FS is disabled:
      
      net/ipv4/netfilter/ipt_CLUSTERIP.c: In function 'clusterip_config_find_get':
      net/ipv4/netfilter/ipt_CLUSTERIP.c:147:9: error: 'struct clusterip_config' has no member named 'pde'
      
      This moves the check inside of another #ifdef.
      
      Fixes: 6c5d5cfb ("netfilter: ipt_CLUSTERIP: check duplicate config when initializing")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3fd0b634
    • E
      net: ethtool: Initialize buffer when querying device channel settings · 31a86d13
      Eran Ben Elisha 提交于
      Ethtool channels respond struct was uninitialized when querying device
      channel boundaries settings. As a result, unreported fields by the driver
      hold garbage.  This may cause sending unsupported params to driver.
      
      Fixes: 8bf36862 ('ethtool: ensure channel counts are within bounds ...')
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      CC: John W. Linville <linville@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31a86d13
  7. 17 1月, 2017 4 次提交
    • J
      net sched actions: fix refcnt when GETing of action after bind · 0faa9cb5
      Jamal Hadi Salim 提交于
      Demonstrating the issue:
      
      .. add a drop action
      $sudo $TC actions add action drop index 10
      
      .. retrieve it
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 0 installed 29 sec used 29 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      ... bug 1 above: reference is two.
          Reference is actually 1 but we forget to subtract 1.
      
      ... do a GET again and we see the same issue
          try a few times and nothing changes
      ~$ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 0 installed 31 sec used 31 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      ... lets try to bind the action to a filter..
      $ sudo $TC qdisc add dev lo ingress
      $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
        u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
      
      ... and now a few GETs:
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 3 bind 1 installed 204 sec used 204 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 4 bind 1 installed 206 sec used 206 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 5 bind 1 installed 235 sec used 235 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      .... as can be observed the reference count keeps going up.
      
      After the fix
      
      $ sudo $TC actions add action drop index 10
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 1 bind 0 installed 4 sec used 4 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 1 bind 0 installed 6 sec used 6 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC qdisc add dev lo ingress
      $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
        u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 1 installed 32 sec used 32 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 1 installed 33 sec used 33 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      Fixes: aecc5cef ("net sched actions: fix GETing actions")
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0faa9cb5
    • B
      ax25: Fix segfault after sock connection timeout · 8a367e74
      Basil Gunn 提交于
      The ax.25 socket connection timed out & the sock struct has been
      previously taken down ie. sock struct is now a NULL pointer. Checking
      the sock_flag causes the segfault.  Check if the socket struct pointer
      is NULL before checking sock_flag. This segfault is seen in
      timed out netrom connections.
      
      Please submit to -stable.
      Signed-off-by: NBasil Gunn <basil@pacabunga.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a367e74
    • D
      bpf: rework prog_digest into prog_tag · f1f7714e
      Daniel Borkmann 提交于
      Commit 7bd509e3 ("bpf: add prog_digest and expose it via
      fdinfo/netlink") was recently discussed, partially due to
      admittedly suboptimal name of "prog_digest" in combination
      with sha1 hash usage, thus inevitably and rightfully concerns
      about its security in terms of collision resistance were
      raised with regards to use-cases.
      
      The intended use cases are for debugging resp. introspection
      only for providing a stable "tag" over the instruction sequence
      that both kernel and user space can calculate independently.
      It's not usable at all for making a security relevant decision.
      So collisions where two different instruction sequences generate
      the same tag can happen, but ideally at a rather low rate. The
      "tag" will be dumped in hex and is short enough to introspect
      in tracepoints or kallsyms output along with other data such
      as stack trace, etc. Thus, this patch performs a rename into
      prog_tag and truncates the tag to a short output (64 bits) to
      make it obvious it's not collision-free.
      
      Should in future a hash or facility be needed with a security
      relevant focus, then we can think about requirements, constraints,
      etc that would fit to that situation. For now, rework the exposed
      parts for the current use cases as long as nothing has been
      released yet. Tested on x86_64 and s390x.
      
      Fixes: 7bd509e3 ("bpf: add prog_digest and expose it via fdinfo/netlink")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1f7714e
    • P
      tipc: allocate user memory with GFP_KERNEL flag · 57d5f64d
      Parthasarathy Bhuvaragan 提交于
      Until now, we allocate memory always with GFP_ATOMIC flag.
      When the system is under memory pressure and a user tries to send,
      the send fails due to low memory. However, the user application
      can wait for free memory if we allocate it using GFP_KERNEL flag.
      
      In this commit, we use allocate memory with GFP_KERNEL for all user
      allocation.
      Reported-by: NRune Torgersen <runet@innovsys.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57d5f64d