1. 02 3月, 2017 1 次提交
    • D
      KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload() · 0837e49a
      David Howells 提交于
      rcu_dereference_key() and user_key_payload() are currently being used in
      two different, incompatible ways:
      
       (1) As a wrapper to rcu_dereference() - when only the RCU read lock used
           to protect the key.
      
       (2) As a wrapper to rcu_dereference_protected() - when the key semaphor is
           used to protect the key and the may be being modified.
      
      Fix this by splitting both of the key wrappers to produce:
      
       (1) RCU accessors for keys when caller has the key semaphore locked:
      
      	dereference_key_locked()
      	user_key_payload_locked()
      
       (2) RCU accessors for keys when caller holds the RCU read lock:
      
      	dereference_key_rcu()
      	user_key_payload_rcu()
      
      This should fix following warning in the NFS idmapper
      
        ===============================
        [ INFO: suspicious RCU usage. ]
        4.10.0 #1 Tainted: G        W
        -------------------------------
        ./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage!
        other info that might help us debug this:
        rcu_scheduler_active = 2, debug_locks = 0
        1 lock held by mount.nfs/5987:
          #0:  (rcu_read_lock){......}, at: [<d000000002527abc>] nfs_idmap_get_key+0x15c/0x420 [nfsv4]
        stack backtrace:
        CPU: 1 PID: 5987 Comm: mount.nfs Tainted: G        W       4.10.0 #1
        Call Trace:
          dump_stack+0xe8/0x154 (unreliable)
          lockdep_rcu_suspicious+0x140/0x190
          nfs_idmap_get_key+0x380/0x420 [nfsv4]
          nfs_map_name_to_uid+0x2a0/0x3b0 [nfsv4]
          decode_getfattr_attrs+0xfac/0x16b0 [nfsv4]
          decode_getfattr_generic.constprop.106+0xbc/0x150 [nfsv4]
          nfs4_xdr_dec_lookup_root+0xac/0xb0 [nfsv4]
          rpcauth_unwrap_resp+0xe8/0x140 [sunrpc]
          call_decode+0x29c/0x910 [sunrpc]
          __rpc_execute+0x140/0x8f0 [sunrpc]
          rpc_run_task+0x170/0x200 [sunrpc]
          nfs4_call_sync_sequence+0x68/0xa0 [nfsv4]
          _nfs4_lookup_root.isra.44+0xd0/0xf0 [nfsv4]
          nfs4_lookup_root+0xe0/0x350 [nfsv4]
          nfs4_lookup_root_sec+0x70/0xa0 [nfsv4]
          nfs4_find_root_sec+0xc4/0x100 [nfsv4]
          nfs4_proc_get_rootfh+0x5c/0xf0 [nfsv4]
          nfs4_get_rootfh+0x6c/0x190 [nfsv4]
          nfs4_server_common_setup+0xc4/0x260 [nfsv4]
          nfs4_create_server+0x278/0x3c0 [nfsv4]
          nfs4_remote_mount+0x50/0xb0 [nfsv4]
          mount_fs+0x74/0x210
          vfs_kern_mount+0x78/0x220
          nfs_do_root_mount+0xb0/0x140 [nfsv4]
          nfs4_try_mount+0x60/0x100 [nfsv4]
          nfs_fs_mount+0x5ec/0xda0 [nfs]
          mount_fs+0x74/0x210
          vfs_kern_mount+0x78/0x220
          do_mount+0x254/0xf70
          SyS_mount+0x94/0x100
          system_call+0x38/0xe0
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      0837e49a
  2. 28 2月, 2017 5 次提交
  3. 27 2月, 2017 11 次提交
    • P
      l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv · 51fb60eb
      Paul Hüber 提交于
      l2tp_ip_backlog_recv may not return -1 if the packet gets dropped.
      The return value is passed up to ip_local_deliver_finish, which treats
      negative values as an IP protocol number for resubmission.
      Signed-off-by: NPaul Hüber <phueber@kernsp.in>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51fb60eb
    • J
      xfrm: provide correct dst in xfrm_neigh_lookup · 1ecc9ad0
      Julian Anastasov 提交于
      Fix xfrm_neigh_lookup to provide dst->path to the
      neigh_lookup dst_ops method.
      
      When skb is provided, the IP address in packet should already
      match the dst->path address family. But for the non-skb case,
      we should consider the last tunnel address as nexthop address.
      
      Fixes: f894cbf8 ("net: Add optional SKB arg to dst_ops->neigh_lookup().")
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ecc9ad0
    • R
      net sched actions: do not overwrite status of action creation. · 37f1c63e
      Roman Mashak 提交于
      nla_memdup_cookie was overwriting err value, declared at function
      scope and earlier initialized with result of ->init(). At success
      nla_memdup_cookie() returns 0, and thus module refcnt decremented,
      although the action was installed.
      
      $ sudo tc actions add action pass index 1 cookie 1234
      $ sudo tc actions ls action gact
      
              action order 0: gact action pass
               random type none pass val 0
               index 1 ref 1 bind 0
      $
      $ lsmod
      Module                  Size  Used by
      act_gact               16384  0
      ...
      $
      $ sudo rmmod act_gact
      [   52.310283] ------------[ cut here ]------------
      [   52.312551] WARNING: CPU: 1 PID: 455 at kernel/module.c:1113
      module_put+0x99/0xa0
      [   52.316278] Modules linked in: act_gact(-) crct10dif_pclmul crc32_pclmul
      ghash_clmulni_intel psmouse pcbc evbug aesni_intel aes_x86_64 crypto_simd
      serio_raw glue_helper pcspkr cryptd
      [   52.322285] CPU: 1 PID: 455 Comm: rmmod Not tainted 4.10.0+ #11
      [   52.324261] Call Trace:
      [   52.325132]  dump_stack+0x63/0x87
      [   52.326236]  __warn+0xd1/0xf0
      [   52.326260]  warn_slowpath_null+0x1d/0x20
      [   52.326260]  module_put+0x99/0xa0
      [   52.326260]  tcf_hashinfo_destroy+0x7f/0x90
      [   52.326260]  gact_exit_net+0x27/0x40 [act_gact]
      [   52.326260]  ops_exit_list.isra.6+0x38/0x60
      [   52.326260]  unregister_pernet_operations+0x90/0xe0
      [   52.326260]  unregister_pernet_subsys+0x21/0x30
      [   52.326260]  tcf_unregister_action+0x68/0xa0
      [   52.326260]  gact_cleanup_module+0x17/0xa0f [act_gact]
      [   52.326260]  SyS_delete_module+0x1ba/0x220
      [   52.326260]  entry_SYSCALL_64_fastpath+0x1e/0xad
      [   52.326260] RIP: 0033:0x7f527ffae367
      [   52.326260] RSP: 002b:00007ffeb402a598 EFLAGS: 00000202 ORIG_RAX:
      00000000000000b0
      [   52.326260] RAX: ffffffffffffffda RBX: 0000559b069912a0 RCX: 00007f527ffae367
      [   52.326260] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559b06991308
      [   52.326260] RBP: 0000000000000003 R08: 00007f5280264420 R09: 00007ffeb4029511
      [   52.326260] R10: 000000000000087b R11: 0000000000000202 R12: 00007ffeb4029580
      [   52.326260] R13: 0000000000000000 R14: 0000000000000000 R15: 0000559b069912a0
      [   52.354856] ---[ end trace 90d89401542b0db6 ]---
      $
      
      With the fix:
      
      $ sudo modprobe act_gact
      $ lsmod
      Module                  Size  Used by
      act_gact               16384  0
      ...
      $ sudo tc actions add action pass index 1 cookie 1234
      $ sudo tc actions ls action gact
      
              action order 0: gact action pass
               random type none pass val 0
               index 1 ref 1 bind 0
      $
      $ lsmod
      Module                  Size  Used by
      act_gact               16384  1
      ...
      $ sudo rmmod act_gact
      rmmod: ERROR: Module act_gact is in use
      $
      $ sudo /home/mrv/bin/tc actions del action gact index 1
      $ sudo rmmod act_gact
      $ lsmod
      Module                  Size  Used by
      $
      
      Fixes: 1045ba77 ("net sched actions: Add support for user cookies")
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37f1c63e
    • D
      rxrpc: Kernel calls get stuck in recvmsg · d7e15835
      David Howells 提交于
      Calls made through the in-kernel interface can end up getting stuck because
      of a missed variable update in a loop in rxrpc_recvmsg_data().  The problem
      is like this:
      
       (1) A new packet comes in and doesn't cause a notification to be given to
           the client as there's still another packet in the ring - the
           assumption being that if the client will keep drawing off data until
           the ring is empty.
      
       (2) The client is in rxrpc_recvmsg_data(), inside the big while loop that
           iterates through the packets.  This copies the window pointers into
           variables rather than using the information in the call struct
           because:
      
           (a) MSG_PEEK might be in effect;
      
           (b) we need a barrier after reading call->rx_top to pair with the
           	 barrier in the softirq routine that loads the buffer.
      
       (3) The reading of call->rx_top is done outside of the loop, and top is
           never updated whilst we're in the loop.  This means that even through
           there's a new packet available, we don't see it and may return -EFAULT
           to the caller - who will happily return to the scheduler and await the
           next notification.
      
       (4) No further notifications are forthcoming until there's an abort as the
           ring isn't empty.
      
      The fix is to move the read of call->rx_top inside the loop - but it needs
      to be done before the condition is checked.
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7e15835
    • R
      net sched actions: decrement module reference count after table flush. · edb9d1bf
      Roman Mashak 提交于
      When tc actions are loaded as a module and no actions have been installed,
      flushing them would result in actions removed from the memory, but modules
      reference count not being decremented, so that the modules would not be
      unloaded.
      
      Following is example with GACT action:
      
      % sudo modprobe act_gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      %
      % sudo tc actions ls action gact
      %
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  1
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  2
      % sudo rmmod act_gact
      rmmod: ERROR: Module act_gact is in use
      ....
      
      After the fix:
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      %
      % sudo tc actions add action pass index 1
      % sudo tc actions add action pass index 2
      % sudo tc actions add action pass index 3
      % lsmod
      Module                  Size  Used by
      act_gact               16384  3
      %
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      %
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      % sudo rmmod act_gact
      % lsmod
      Module                  Size  Used by
      %
      
      Fixes: f97017cd ("net-sched: Fix actions flushing")
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edb9d1bf
    • X
      ipv6: check sk sk_type and protocol early in ip_mroute_set/getsockopt · 99253eb7
      Xin Long 提交于
      Commit 5e1859fb ("ipv4: ipmr: various fixes and cleanups") fixed
      the issue for ipv4 ipmr:
      
        ip_mroute_setsockopt() & ip_mroute_getsockopt() should not
        access/set raw_sk(sk)->ipmr_table before making sure the socket
        is a raw socket, and protocol is IGMP
      
      The same fix should be done for ipv6 ipmr as well.
      
      This patch can fix the panic caused by overwriting the same offset
      as ipmr_table as in raw_sk(sk) when accessing other type's socket
      by ip_mroute_setsockopt().
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99253eb7
    • X
      sctp: set sin_port for addr param when checking duplicate address · 2e3ce5bc
      Xin Long 提交于
      Commit b8607805 ("sctp: not copying duplicate addrs to the assoc's
      bind address list") tried to check for duplicate address before copying
      to asoc's bind_addr list from global addr list.
      
      But all the addrs' sin_ports in global addr list are 0 while the addrs'
      sin_ports are bp->port in asoc's bind_addr list. It means even if it's
      a duplicate address, af->cmp_addr will still return 0 as the their
      sin_ports are different.
      
      This patch is to fix it by setting the sin_port for addr param with
      bp->port before comparing the addrs.
      
      Fixes: b8607805 ("sctp: not copying duplicate addrs to the assoc's bind address list")
      Reported-by: NWei Chen <weichen@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e3ce5bc
    • P
      netfilter: nft_set_bitmap: incorrect bitmap size · 13aa5a8f
      Pablo Neira Ayuso 提交于
      priv->bitmap_size stores the real bitmap size, instead of the full
      struct nft_bitmap object.
      
      Fixes: 665153ff ("netfilter: nf_tables: add bitmap set type")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      13aa5a8f
    • J
      netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value. · 4b86c459
      Jarno Rajahalme 提交于
      Commit 4dee62b1 ("netfilter: nf_ct_expect: nf_ct_expect_insert()
      returns void") inadvertently changed the successful return value of
      nf_ct_expect_related_report() from 0 to 1 due to
      __nf_ct_expect_check() returning 1 on success.  Prevent this
      regression in the future by changing the return value of
      __nf_ct_expect_check() to 0 on success.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4b86c459
    • J
      ipv4: mask tos for input route · 6e28099d
      Julian Anastasov 提交于
      Restore the lost masking of TOS in input route code to
      allow ip rules to match it properly.
      
      Problem [1] noticed by Shmulik Ladkani <shmulik.ladkani@gmail.com>
      
      [1] http://marc.info/?t=137331755300040&r=1&w=2
      
      Fixes: 89aef892 ("ipv4: Delete routing cache.")
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e28099d
    • J
      ipv4: add missing initialization for flowi4_uid · 8bcfd092
      Julian Anastasov 提交于
      Avoid matching of random stack value for uid when rules
      are looked up on input route or when RP filter is used.
      Problem should affect only setups that use ip rules with
      uid range.
      
      Fixes: 622ec2c9 ("net: core: add UID to flows, rules, and routes")
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8bcfd092
  4. 25 2月, 2017 13 次提交
  5. 24 2月, 2017 4 次提交
  6. 23 2月, 2017 4 次提交
    • A
      tcp: account for ts offset only if tsecr not zero · eee2faab
      Alexey Kodanev 提交于
      We can get SYN with zero tsecr, don't apply offset in this case.
      
      Fixes: ee684b6f ("tcp: send packets with a socket timestamp")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eee2faab
    • A
      tcp: setup timestamp offset when write_seq already set · 00355fa5
      Alexey Kodanev 提交于
      Found that when randomized tcp offsets are enabled (by default)
      TCP client can still start new connections without them. Later,
      if server does active close and re-uses sockets in TIME-WAIT
      state, new SYN from client can be rejected on PAWS check inside
      tcp_timewait_state_process(), because either tw_ts_recent or
      rcv_tsval doesn't really have an offset set.
      
      Here is how to reproduce it with LTP netstress tool:
          netstress -R 1 &
          netstress -H 127.0.0.1 -lr 1000000 -a1
      
          [...]
          < S  seq 1956977072 win 43690 TS val 295618 ecr 459956970
          > .  ack 1956911535 win 342 TS val 459967184 ecr 1547117608
          < R  seq 1956911535 win 0 length 0
      +1. < S  seq 1956977072 win 43690 TS val 296640 ecr 459956970
          > S. seq 657450664 ack 1956977073 win 43690 TS val 459968205 ecr 296640
      
      Fixes: 95a22cae ("tcp: randomize tcp timestamp offsets for each connection")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00355fa5
    • A
      net/dccp: fix use after free in tw_timer_handler() · ec7cb62d
      Andrey Ryabinin 提交于
      DCCP doesn't purge timewait sockets on network namespace shutdown.
      So, after net namespace destroyed we could still have an active timer
      which will trigger use after free in tw_timer_handler():
      
          BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10
          Read of size 8 by task swapper/1/0
          Call Trace:
           __asan_load8+0x54/0x90
           tw_timer_handler+0x4a/0xa0
           call_timer_fn+0x127/0x480
           expire_timers+0x1db/0x2e0
           run_timer_softirq+0x12f/0x2a0
           __do_softirq+0x105/0x5b4
           irq_exit+0xdd/0xf0
           smp_apic_timer_interrupt+0x57/0x70
           apic_timer_interrupt+0x90/0xa0
      
          Object at ffff88010e0d1bc0, in cache net_namespace size: 6848
          Allocated:
           save_stack_trace+0x1b/0x20
           kasan_kmalloc+0xee/0x180
           kasan_slab_alloc+0x12/0x20
           kmem_cache_alloc+0x134/0x310
           copy_net_ns+0x8d/0x280
           create_new_namespaces+0x23f/0x340
           unshare_nsproxy_namespaces+0x75/0xf0
           SyS_unshare+0x299/0x4f0
           entry_SYSCALL_64_fastpath+0x18/0xad
          Freed:
           save_stack_trace+0x1b/0x20
           kasan_slab_free+0xae/0x180
           kmem_cache_free+0xb4/0x350
           net_drop_ns+0x3f/0x50
           cleanup_net+0x3df/0x450
           process_one_work+0x419/0xbb0
           worker_thread+0x92/0x850
           kthread+0x192/0x1e0
           ret_from_fork+0x2e/0x40
      
      Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
      timewait sockets on net namespace destruction and prevent above issue.
      
      Fixes: f2bf415c ("mib: add net to NET_ADD_STATS_BH")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec7cb62d
    • R
      l2tp: Avoid schedule while atomic in exit_net · 12d656af
      Ridge Kennedy 提交于
      While destroying a network namespace that contains a L2TP tunnel a
      "BUG: scheduling while atomic" can be observed.
      
      Enabling lockdep shows that this is happening because l2tp_exit_net()
      is calling l2tp_tunnel_closeall() (via l2tp_tunnel_delete()) from
      within an RCU critical section.
      
      l2tp_exit_net() takes rcu_read_lock_bh()
        << list_for_each_entry_rcu() >>
        l2tp_tunnel_delete()
          l2tp_tunnel_closeall()
            __l2tp_session_unhash()
              synchronize_rcu() << Illegal inside RCU critical section >>
      
      BUG: sleeping function called from invalid context
      in_atomic(): 1, irqs_disabled(): 0, pid: 86, name: kworker/u16:2
      INFO: lockdep is turned off.
      CPU: 2 PID: 86 Comm: kworker/u16:2 Tainted: G        W  O    4.4.6-at1 #2
      Hardware name: Xen HVM domU, BIOS 4.6.1-xs125300 05/09/2016
      Workqueue: netns cleanup_net
       0000000000000000 ffff880202417b90 ffffffff812b0013 ffff880202410ac0
       ffffffff81870de8 ffff880202417bb8 ffffffff8107aee8 ffffffff81870de8
       0000000000000c51 0000000000000000 ffff880202417be0 ffffffff8107b024
      Call Trace:
       [<ffffffff812b0013>] dump_stack+0x85/0xc2
       [<ffffffff8107aee8>] ___might_sleep+0x148/0x240
       [<ffffffff8107b024>] __might_sleep+0x44/0x80
       [<ffffffff810b21bd>] synchronize_sched+0x2d/0xe0
       [<ffffffff8109be6d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff8105c7bb>] ? __local_bh_enable_ip+0x6b/0xc0
       [<ffffffff816a1b00>] ? _raw_spin_unlock_bh+0x30/0x40
       [<ffffffff81667482>] __l2tp_session_unhash+0x172/0x220
       [<ffffffff81667397>] ? __l2tp_session_unhash+0x87/0x220
       [<ffffffff8166888b>] l2tp_tunnel_closeall+0x9b/0x140
       [<ffffffff81668c74>] l2tp_tunnel_delete+0x14/0x60
       [<ffffffff81668dd0>] l2tp_exit_net+0x110/0x270
       [<ffffffff81668d5c>] ? l2tp_exit_net+0x9c/0x270
       [<ffffffff815001c3>] ops_exit_list.isra.6+0x33/0x60
       [<ffffffff81501166>] cleanup_net+0x1b6/0x280
       ...
      
      This bug can easily be reproduced with a few steps:
      
       $ sudo unshare -n bash  # Create a shell in a new namespace
       # ip link set lo up
       # ip addr add 127.0.0.1 dev lo
       # ip l2tp add tunnel remote 127.0.0.1 local 127.0.0.1 tunnel_id 1 \
          peer_tunnel_id 1 udp_sport 50000 udp_dport 50000
       # ip l2tp add session name foo tunnel_id 1 session_id 1 \
          peer_session_id 1
       # ip link set foo up
       # exit  # Exit the shell, in turn exiting the namespace
       $ dmesg
       ...
       [942121.089216] BUG: scheduling while atomic: kworker/u16:3/13872/0x00000200
       ...
      
      To fix this, move the call to l2tp_tunnel_closeall() out of the RCU
      critical section, and instead call it from l2tp_tunnel_del_work(), which
      is running from the l2tp_wq workqueue.
      
      Fixes: 2b551c6e ("l2tp: close sessions before initiating tunnel delete")
      Signed-off-by: NRidge Kennedy <ridge.kennedy@alliedtelesis.co.nz>
      Acked-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12d656af
  7. 22 2月, 2017 2 次提交