1. 27 7月, 2015 5 次提交
  2. 25 7月, 2015 5 次提交
    • K
      cgroup: net_cls: fix false-positive "suspicious RCU usage" · cc9f4daa
      Konstantin Khlebnikov 提交于
      In dev_queue_xmit() net_cls protected with rcu-bh.
      
      [  270.730026] ===============================
      [  270.730029] [ INFO: suspicious RCU usage. ]
      [  270.730033] 4.2.0-rc3+ #2 Not tainted
      [  270.730036] -------------------------------
      [  270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage!
      [  270.730041] other info that might help us debug this:
      [  270.730043] rcu_scheduler_active = 1, debug_locks = 1
      [  270.730045] 2 locks held by dhclient/748:
      [  270.730046]  #0:  (rcu_read_lock_bh){......}, at: [<ffffffff81682b70>] __dev_queue_xmit+0x50/0x960
      [  270.730085]  #1:  (&qdisc_tx_lock){+.....}, at: [<ffffffff81682d60>] __dev_queue_xmit+0x240/0x960
      [  270.730090] stack backtrace:
      [  270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2
      [  270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
      [  270.730100]  0000000000000001 ffff8800bafeba58 ffffffff817ad487 0000000000000007
      [  270.730103]  ffff880232a0a780 ffff8800bafeba88 ffffffff810ca4f2 ffff88022fb23e00
      [  270.730105]  ffff880232a0a780 ffff8800bafebb68 ffff8800bafebb68 ffff8800bafebaa8
      [  270.730108] Call Trace:
      [  270.730121]  [<ffffffff817ad487>] dump_stack+0x4c/0x65
      [  270.730148]  [<ffffffff810ca4f2>] lockdep_rcu_suspicious+0xe2/0x120
      [  270.730153]  [<ffffffff816a62d2>] task_cls_state+0x92/0xa0
      [  270.730158]  [<ffffffffa00b534f>] cls_cgroup_classify+0x4f/0x120 [cls_cgroup]
      [  270.730164]  [<ffffffff816aac74>] tc_classify_compat+0x74/0xc0
      [  270.730166]  [<ffffffff816ab573>] tc_classify+0x33/0x90
      [  270.730170]  [<ffffffffa00bcb0a>] htb_enqueue+0xaa/0x4a0 [sch_htb]
      [  270.730172]  [<ffffffff81682e26>] __dev_queue_xmit+0x306/0x960
      [  270.730174]  [<ffffffff81682b70>] ? __dev_queue_xmit+0x50/0x960
      [  270.730176]  [<ffffffff816834a3>] dev_queue_xmit_sk+0x13/0x20
      [  270.730185]  [<ffffffff81787770>] dev_queue_xmit+0x10/0x20
      [  270.730187]  [<ffffffff8178b91c>] packet_snd.isra.62+0x54c/0x760
      [  270.730190]  [<ffffffff8178be25>] packet_sendmsg+0x2f5/0x3f0
      [  270.730203]  [<ffffffff81665245>] ? sock_def_readable+0x5/0x190
      [  270.730210]  [<ffffffff817b64bb>] ? _raw_spin_unlock+0x2b/0x40
      [  270.730216]  [<ffffffff8173bcbc>] ? unix_dgram_sendmsg+0x5cc/0x640
      [  270.730219]  [<ffffffff8165f367>] sock_sendmsg+0x47/0x50
      [  270.730221]  [<ffffffff8165f42f>] sock_write_iter+0x7f/0xd0
      [  270.730232]  [<ffffffff811fd4c7>] __vfs_write+0xa7/0xf0
      [  270.730234]  [<ffffffff811fe5b8>] vfs_write+0xb8/0x190
      [  270.730236]  [<ffffffff811fe8c2>] SyS_write+0x52/0xb0
      [  270.730239]  [<ffffffff817b6bae>] entry_SYSCALL_64_fastpath+0x12/0x76
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc9f4daa
    • W
      77e62da6
    • W
      sch_plug: purge buffered packets during reset · fe6bea7f
      WANG Cong 提交于
      Otherwise the skbuff related structures are not correctly
      refcount'ed.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe6bea7f
    • J
      ipv4: consider TOS in fib_select_default · 2392debc
      Julian Anastasov 提交于
      fib_select_default considers alternative routes only when
      res->fi is for the first alias in res->fa_head. In the
      common case this can happen only when the initial lookup
      matches the first alias with highest TOS value. This
      prevents the alternative routes to require specific TOS.
      
      This patch solves the problem as follows:
      
      - routes that require specific TOS should be returned by
      fib_select_default only when TOS matches, as already done
      in fib_table_lookup. This rule implies that depending on the
      TOS we can have many different lists of alternative gateways
      and we have to keep the last used gateway (fa_default) in first
      alias for the TOS instead of using single tb_default value.
      
      - as the aliases are ordered by many keys (TOS desc,
      fib_priority asc), we restrict the possible results to
      routes with matching TOS and lowest metric (fib_priority)
      and routes that match any TOS, again with lowest metric.
      
      For example, packet with TOS 8 can not use gw3 (not lowest
      metric), gw4 (different TOS) and gw6 (not lowest metric),
      all other gateways can be used:
      
      tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
      tos 8 via gw2 metric 2
      tos 8 via gw3 metric 3
      tos 4 via gw4
      tos 0 via gw5
      tos 0 via gw6 metric 1
      Reported-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2392debc
    • J
      ipv4: fib_select_default should match the prefix · 18a912e9
      Julian Anastasov 提交于
      fib_trie starting from 4.1 can link fib aliases from
      different prefixes in same list. Make sure the alternative
      gateways are in same table and for same prefix (0) by
      checking tb_id and fa_slen.
      
      Fixes: 79e5ad2c ("fib_trie: Remove leaf_info")
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18a912e9
  3. 22 7月, 2015 5 次提交
    • J
      netfilter: nf_conntrack: Support expectations in different zones · 4b31814d
      Joe Stringer 提交于
      When zones were originally introduced, the expectation functions were
      all extended to perform lookup using the zone. However, insertion was
      not modified to check the zone. This means that two expectations which
      are intended to apply for different connections that have the same tuple
      but exist in different zones cannot both be tracked.
      
      Fixes: 5d0aa2cc (netfilter: nf_conntrack: add support for "conntrack zones")
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4b31814d
    • C
      openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes · bac541e4
      Chris J Arges 提交于
      Some architectures like POWER can have a NUMA node_possible_map that
      contains sparse entries. This causes memory corruption with openvswitch
      since it allocates flow_cache with a multiple of num_possible_nodes() and
      assumes the node variable returned by for_each_node will index into
      flow->stats[node].
      
      Use nr_node_ids to allocate a maximal sparse array instead of
      num_possible_nodes().
      
      The crash was noticed after 3af229f2 was applied as it changed the
      node_possible_map to match node_online_map on boot.
      Fixes: 3af229f2Signed-off-by: NChris J Arges <chris.j.arges@canonical.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bac541e4
    • F
      netlink: don't hold mutex in rcu callback when releasing mmapd ring · 0470eb99
      Florian Westphal 提交于
      Kirill A. Shutemov says:
      
      This simple test-case trigers few locking asserts in kernel:
      
      int main(int argc, char **argv)
      {
              unsigned int block_size = 16 * 4096;
              struct nl_mmap_req req = {
                      .nm_block_size          = block_size,
                      .nm_block_nr            = 64,
                      .nm_frame_size          = 16384,
                      .nm_frame_nr            = 64 * block_size / 16384,
              };
              unsigned int ring_size;
      	int fd;
      
      	fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
              if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
                      exit(1);
              if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
                      exit(1);
      
      	ring_size = req.nm_block_nr * req.nm_block_size;
      	mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
      	return 0;
      }
      
      +++ exited with 0 +++
      BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
      in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
      3 locks held by init/1:
       #0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
       #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
       #2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
      Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20
      
      CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
       ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
       0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
       ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
      Call Trace:
       <IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
       [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
       [<ffffffff81085bed>] __might_sleep+0x4d/0x90
       [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
       [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
       [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
       [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
       [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
       [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
       [<ffffffff817e484d>] __sk_free+0x1d/0x160
       [<ffffffff817e49a9>] sk_free+0x19/0x20
      [..]
      
      Cong Wang says:
      
      We can't hold mutex lock in a rcu callback, [..]
      
      Thomas Graf says:
      
      The socket should be dead at this point. It might be simpler to
      add a netlink_release_ring() function which doesn't require
      locking at all.
      Reported-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
      Diagnosed-by: NCong Wang <cwang@twopensource.com>
      Suggested-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0470eb99
    • E
      tcp: suppress a division by zero warning · 89e478a2
      Eric Dumazet 提交于
      Andrew Morton reported following warning on one ARM build
      with gcc-4.4 :
      
      net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
      net/ipv4/inet_hashtables.c:617: warning: division by zero
      
      Even guarded with a test on sizeof(spinlock_t), compiler does not
      like current construct on a !CONFIG_SMP build.
      
      Remove the warning by using a temporary variable.
      
      Fixes: 095dc8e0 ("tcp: fix/cleanup inet_ehash_locks_alloc()")
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89e478a2
    • E
      inet: frags: fix defragmented packet's IP header for af_packet · 0848f642
      Edward Hyunkoo Jee 提交于
      When ip_frag_queue() computes positions, it assumes that the passed
      sk_buff does not contain L2 headers.
      
      However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
      functions can be called on outgoing packets that contain L2 headers.
      
      Also, IPv4 checksum is not corrected after reassembly.
      
      Fixes: 7736d33f ("packet: Add pre-defragmentation support for ipv4 fanouts.")
      Signed-off-by: NEdward Hyunkoo Jee <edjee@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Jerry Chu <hkchu@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0848f642
  4. 21 7月, 2015 6 次提交
  5. 20 7月, 2015 1 次提交
    • P
      netfilter: fix netns dependencies with conntrack templates · 0838aa7f
      Pablo Neira Ayuso 提交于
      Quoting Daniel Borkmann:
      
      "When adding connection tracking template rules to a netns, f.e. to
      configure netfilter zones, the kernel will endlessly busy-loop as soon
      as we try to delete the given netns in case there's at least one
      template present, which is problematic i.e. if there is such bravery that
      the priviledged user inside the netns is assumed untrusted.
      
      Minimal example:
      
        ip netns add foo
        ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
        ip netns del foo
      
      What happens is that when nf_ct_iterate_cleanup() is being called from
      nf_conntrack_cleanup_net_list() for a provided netns, we always end up
      with a net->ct.count > 0 and thus jump back to i_see_dead_people. We
      don't get a soft-lockup as we still have a schedule() point, but the
      serving CPU spins on 100% from that point onwards.
      
      Since templates are normally allocated with nf_conntrack_alloc(), we
      also bump net->ct.count. The issue why they are not yet nf_ct_put() is
      because the per netns .exit() handler from x_tables (which would eventually
      invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is
      called in the dependency chain at a *later* point in time than the per
      netns .exit() handler for the connection tracker.
      
      This is clearly a chicken'n'egg problem: after the connection tracker
      .exit() handler, we've teared down all the connection tracking
      infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
      invoked at a later point in time during the netns cleanup, as that would
      lead to a use-after-free. At the same time, we cannot make x_tables depend
      on the connection tracker module, so that the xt_ct_tg_destroy() would
      be invoked earlier in the cleanup chain."
      
      Daniel confirms this has to do with the order in which modules are loaded or
      having compiled nf_conntrack as modules while x_tables built-in. So we have no
      guarantees regarding the order in which netns callbacks are executed.
      
      Fix this by allocating the templates through kmalloc() from the respective
      SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
      Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
      is marked as unlikely since conntrack templates are rarely allocated and only
      from the configuration plane path.
      
      Note that templates are not kept in any list to avoid further dependencies with
      nf_conntrack anymore, thus, the tmpl larval list is removed.
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Tested-by: NDaniel Borkmann <daniel@iogearbox.net>
      0838aa7f
  6. 17 7月, 2015 8 次提交
    • A
      cfg80211: use RTNL locked reg_can_beacon for IR-relaxation · 923b352f
      Arik Nemtsov 提交于
      The RTNL is required to check for IR-relaxation conditions that allow
      more channels to beacon. Export an RTNL locked version of reg_can_beacon
      and use it where possible in AP/STA interface type flows, where
      IR-relaxation may be applicable.
      
      Fixes: 06f207fc ("cfg80211: change GO_CONCURRENT to IR_CONCURRENT for STA")
      Signed-off-by: NArik Nemtsov <arikx.nemtsov@intel.com>
      Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      923b352f
    • B
      mac80211: add missing length check for confirm frames · b3e7de87
      Bob Copeland 提交于
      Although mesh_rx_plink_frame() already checks that frames have enough
      bytes for the action code plus another two bytes for capability/reason
      code, it doesn't take into account that confirm frames also have an
      additional two-byte aid.  As a result, a corrupt frame could cause a
      subsequent subtraction to wrap around to ill effect.  Add another
      check for this case.
      Signed-off-by: NBob Copeland <me@bobcopeland.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      b3e7de87
    • B
      mac80211: correct aid location in peering frames · 2ea752cd
      Bob Copeland 提交于
      According to 802.11-2012 8.5.16.3.2 AID comes directly after the
      capability bytes in mesh peering confirm frames.  The existing
      code, however, was adding a 2 byte offset to this location,
      resulting in garbage data going out over the air.  Remove the
      offset to fix it.
      Signed-off-by: NBob Copeland <me@bobcopeland.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      2ea752cd
    • T
      wireless: regulatory: reduce log level of CRDA related messages · 042ab5fc
      Thomas Petazzoni 提交于
      With a basic Linux userspace, the messages "Calling CRDA to update
      world regulatory domain" appears 10 times after boot every second or
      so, followed by a final "Exceeded CRDA call max attempts. Not calling
      CRDA". For those of us not having the corresponding userspace parts,
      having those messages repeatedly displayed at boot time is a bit
      annoying, so this commit reduces their log level to pr_debug().
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      042ab5fc
    • J
      mac80211: shut down interfaces before destroying interface list · d8d9008c
      Johannes Berg 提交于
      If the hardware is unregistered while interfaces are up, mac80211 will
      unregister all interfaces, which in turns causes mac80211 to be called
      again to remove them all from the driver and eventually shut down the
      hardware.
      
      During this shutdown, however, it's currently already unsafe to iterate
      the list of interfaces atomically, as the list is manipulated in an
      unsafe manner. This puts an undue burden on the driver - it must stop
      all its activities before calling ieee80211_unregister_hw(), while in
      the normal stop path it can do all cleanup in the stop method. If, for
      example, it's using the iteration during RX for some reason, it would
      have to stop RX before unregistering to avoid crashes.
      
      Fix this problem by closing all interfaces before unregistering them.
      This will cause the driver stop to have completed before we manipulate
      the interface list, and after the driver is stopped *and* has called
      ieee80211_unregister_hw() it really musn't be iterating any more as
      the memory will be freed as well.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      d8d9008c
    • C
      mac80211: wowlan: enable powersave if suspend while ps-polling · 541b6ed7
      Chaitanya T K 提交于
      If for any reason we're in the middle of PS-polling or awake after
      TX due to dynamic powersave while going to suspend, go back to save
      power. This might cause a response frame to get lost, but since we
      can't really wait for it while going to suspend that's still better
      than not enabling powersave which would cause higher power usage
      during (and possibly even after) suspend.
      
      Note that this really only affects the very few drivers that use
      the powersave implementation in mac80211.
      Signed-off-by: NChaitanya T K <chaitanya.mgit@gmail.com>
      [rewrite misleading commit log]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      541b6ed7
    • M
      mac80211: don't clear all tx flags when requeing · e9de0190
      Michal Kazior 提交于
      When acting as AP and a PS-Poll frame is received
      associated station is marked as one in a Service
      Period. This state is kept until Tx status for
      released frame is reported. While a station is in
      Service Period PS-Poll frames are ignored.
      
      However if PS-Poll was received during A-MPDU
      teardown it was possible to have the to-be
      released frame re-queued back to pending queue.
      In such case the frame was stripped of 2 important
      flags:
      
       (a) IEEE80211_TX_CTL_NO_PS_BUFFER
       (b) IEEE80211_TX_STATUS_EOSP
      
      Stripping of (a) led to the frame that was to be
      released to be queued back to ps_tx_buf queue. If
      station remained to use only PS-Poll frames the
      re-queued frame (and new ones) was never actually
      transmitted because mac80211 would ignore
      subsequent PS-Poll frames due to station being in
      Service Period. There was nothing left to clear
      the Service Period bit (no xmit -> no tx status ->
      no SP end), i.e. the AP would have the station
      stuck in Service Period. Beacon TIM would
      repeatedly prompt station to poll for frames but
      it would get none.
      
      Once (a) is not stripped (b) becomes important
      because it's the main condition to clear the
      Service Period bit of the station when Tx status
      for the released frame is reported back.
      
      This problem was observed with ath9k acting as P2P
      GO in some testing scenarios but isn't limited to
      it. AP operation with mac80211 based Tx A-MPDU
      control combined with clients using PS-Poll frames
      is subject to this race.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      e9de0190
    • T
      mac80211: clear subdir_stations when removing debugfs · 4479004e
      Tom Hughes 提交于
      If we don't do this, and we then fail to recreate the debugfs
      directory during a mode change, then we will fail later trying
      to add stations to this now bogus directory:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000006c
      IP: [<c0a92202>] mutex_lock+0x12/0x30
      Call Trace:
      [<c0678ab4>] start_creating+0x44/0xc0
      [<c0679203>] debugfs_create_dir+0x13/0xf0
      [<f8a938ae>] ieee80211_sta_debugfs_add+0x6e/0x490 [mac80211]
      
      Cc: stable@kernel.org
      Signed-off-by: NTom Hughes <tom@compton.nu>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      4479004e
  7. 16 7月, 2015 10 次提交