1. 05 10月, 2018 2 次提交
    • C
      net_sched: convert idrinfo->lock from spinlock to a mutex · 95278dda
      Cong Wang 提交于
      In commit ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
      with the spinlock held. Unfortunately, this causes a lot of
      troubles here:
      
      1. tcf_chain_destroy() could be called right after we queue the work
         but before the work runs. This is a use-after-free.
      
      2. The chain refcnt is already 0, we can't even just hold it again.
         We can check refcnt==1 but it is ugly.
      
      3. The chain with refcnt 0 is still visible in its block, which means
         it could be still found and used!
      
      4. The block has a refcnt too, we can't hold it without introducing a
         proper API either.
      
      We can make it working but the end result is ugly. Instead of wasting
      time on reviewing it, let's just convert the troubling spinlock to
      a mutex, which allows us to use non-atomic allocations too.
      
      Fixes: ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95278dda
    • V
      tc: Add support for configuring the taprio scheduler · 5a781ccb
      Vinicius Costa Gomes 提交于
      This traffic scheduler allows traffic classes states (transmission
      allowed/not allowed, in the simplest case) to be scheduled, according
      to a pre-generated time sequence. This is the basis of the IEEE
      802.1Qbv specification.
      
      Example configuration:
      
      tc qdisc replace dev enp3s0 parent root handle 100 taprio \
                num_tc 3 \
      	  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      	  queues 1@0 1@1 2@2 \
      	  base-time 1528743495910289987 \
      	  sched-entry S 01 300000 \
      	  sched-entry S 02 300000 \
      	  sched-entry S 04 300000 \
      	  clockid CLOCK_TAI
      
      The configuration format is similar to mqprio. The main difference is
      the presence of a schedule, built by multiple "sched-entry"
      definitions, each entry has the following format:
      
           sched-entry <CMD> <GATE MASK> <INTERVAL>
      
      The only supported <CMD> is "S", which means "SetGateStates",
      following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
      is a bitmask where each bit is a associated with a traffic class, so
      bit 0 (the least significant bit) being "on" means that traffic class
      0 is "active" for that schedule entry. <INTERVAL> is a time duration
      in nanoseconds that specifies for how long that state defined by <CMD>
      and <GATE MASK> should be held before moving to the next entry.
      
      This schedule is circular, that is, after the last entry is executed
      it starts from the first one, indefinitely.
      
      The other parameters can be defined as follows:
      
       - base-time: specifies the instant when the schedule starts, if
        'base-time' is a time in the past, the schedule will start at
      
       	      base-time + (N * cycle-time)
      
         where N is the smallest integer so the resulting time is greater
         than "now", and "cycle-time" is the sum of all the intervals of the
         entries in the schedule;
      
       - clockid: specifies the reference clock to be used;
      
      The parameters should be similar to what the IEEE 802.1Q family of
      specification defines.
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a781ccb
  2. 02 10月, 2018 3 次提交
  3. 29 9月, 2018 1 次提交
  4. 26 9月, 2018 8 次提交
  5. 25 9月, 2018 1 次提交
  6. 22 9月, 2018 3 次提交
    • E
      net_sched: sch_fq: remove dead code dealing with retransmits · 90caf67b
      Eric Dumazet 提交于
      With the earliest departure time model, we no longer plan
      special casing TCP retransmits. We therefore remove dead
      code (since most compilers understood skb_is_retransmit()
      was false)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90caf67b
    • E
      tcp: switch tcp and sch_fq to new earliest departure time model · ab408b6d
      Eric Dumazet 提交于
      TCP keeps track of tcp_wstamp_ns by itself, meaning sch_fq
      no longer has to do it.
      
      Thanks to this model, TCP can get more accurate RTT samples,
      since pacing no longer inflates them.
      
      This has the nice effect of removing some delays caused by FQ
      quantum mechanism, causing inflated max/P99 latencies.
      
      Also we might relax TCP Small Queue tight limits in the future,
      since this new model allow TCP to build bigger batches, since
      sch_fq (or a device with earliest departure time offload) ensure
      these packets will be delivered on time.
      
      Note that other protocols are not converted (they will probably
      never be) so sch_fq has still support for SO_MAX_PACING_RATE
      
      Tested:
      
      Test showing FQ pacing quantum artifact for low-rate flows,
      adding unexpected throttles for RPC flows, inflating max and P99 latencies.
      
      The parameters chosen here are to show what happens typically when
      a TCP flow has a reduced pacing rate (this can be caused by a reduced
      cwin after few losses, or/and rtt above few ms)
      
      MIBS="MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY,P99_LATENCY,STDDEV_LATENCY"
      Before :
      $ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind
       Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds
      19,82.78,5279,3825,482.02
      
      After :
      $ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind
      Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds
      20,49.94,128,63,3.18
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab408b6d
    • E
      net_sched: sch_fq: switch to CLOCK_TAI · 142537e4
      Eric Dumazet 提交于
      TCP will soon provide per skb->tstamp with earliest departure time,
      so that sch_fq does not have to determine departure time by looking
      at socket sk_pacing_rate.
      
      We chose in linux-4.19 CLOCK_TAI as the clock base for transports,
      qdiscs, and NIC offloads.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      142537e4
  7. 21 9月, 2018 1 次提交
    • V
      net_sched: change tcf_del_walker() to take idrinfo->lock · ec3ed293
      Vlad Buslov 提交于
      Action API was changed to work with actions and action_idr in concurrency
      safe manner, however tcf_del_walker() still uses actions without taking a
      reference or idrinfo->lock first, and deletes them directly, disregarding
      possible concurrent delete.
      
      Change tcf_del_walker() to take idrinfo->lock while iterating over actions
      and use new tcf_idr_release_unsafe() to release them while holding the
      lock.
      
      And the blocking function fl_hw_destroy_tmplt() could be called when we
      put a filter chain, so defer it to a work queue.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      [xiyou.wangcong@gmail.com: heavily modify the code and changelog]
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec3ed293
  8. 20 9月, 2018 1 次提交
  9. 17 9月, 2018 2 次提交
  10. 14 9月, 2018 2 次提交
    • D
      net/sched: act_sample: fix NULL dereference in the data path · 34043d25
      Davide Caratti 提交于
      Matteo reported the following splat, testing the datapath of TC 'sample':
      
       BUG: KASAN: null-ptr-deref in tcf_sample_act+0xc4/0x310
       Read of size 8 at addr 0000000000000000 by task nc/433
      
       CPU: 0 PID: 433 Comm: nc Not tainted 4.19.0-rc3-kvm #17
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
       Call Trace:
        kasan_report.cold.6+0x6c/0x2fa
        tcf_sample_act+0xc4/0x310
        ? dev_hard_start_xmit+0x117/0x180
        tcf_action_exec+0xa3/0x160
        tcf_classify+0xdd/0x1d0
        htb_enqueue+0x18e/0x6b0
        ? deref_stack_reg+0x7a/0xb0
        ? htb_delete+0x4b0/0x4b0
        ? unwind_next_frame+0x819/0x8f0
        ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
        __dev_queue_xmit+0x722/0xca0
        ? unwind_get_return_address_ptr+0x50/0x50
        ? netdev_pick_tx+0xe0/0xe0
        ? save_stack+0x8c/0xb0
        ? kasan_kmalloc+0xbe/0xd0
        ? __kmalloc_track_caller+0xe4/0x1c0
        ? __kmalloc_reserve.isra.45+0x24/0x70
        ? __alloc_skb+0xdd/0x2e0
        ? sk_stream_alloc_skb+0x91/0x3b0
        ? tcp_sendmsg_locked+0x71b/0x15a0
        ? tcp_sendmsg+0x22/0x40
        ? __sys_sendto+0x1b0/0x250
        ? __x64_sys_sendto+0x6f/0x80
        ? do_syscall_64+0x5d/0x150
        ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
        ? __sys_sendto+0x1b0/0x250
        ? __x64_sys_sendto+0x6f/0x80
        ? do_syscall_64+0x5d/0x150
        ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
        ip_finish_output2+0x495/0x590
        ? ip_copy_metadata+0x2e0/0x2e0
        ? skb_gso_validate_network_len+0x6f/0x110
        ? ip_finish_output+0x174/0x280
        __tcp_transmit_skb+0xb17/0x12b0
        ? __tcp_select_window+0x380/0x380
        tcp_write_xmit+0x913/0x1de0
        ? __sk_mem_schedule+0x50/0x80
        tcp_sendmsg_locked+0x49d/0x15a0
        ? tcp_rcv_established+0x8da/0xa30
        ? tcp_set_state+0x220/0x220
        ? clear_user+0x1f/0x50
        ? iov_iter_zero+0x1ae/0x590
        ? __fget_light+0xa0/0xe0
        tcp_sendmsg+0x22/0x40
        __sys_sendto+0x1b0/0x250
        ? __ia32_sys_getpeername+0x40/0x40
        ? _copy_to_user+0x58/0x70
        ? poll_select_copy_remaining+0x176/0x200
        ? __pollwait+0x1c0/0x1c0
        ? ktime_get_ts64+0x11f/0x140
        ? kern_select+0x108/0x150
        ? core_sys_select+0x360/0x360
        ? vfs_read+0x127/0x150
        ? kernel_write+0x90/0x90
        __x64_sys_sendto+0x6f/0x80
        do_syscall_64+0x5d/0x150
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fefef2b129d
       Code: ff ff ff ff eb b6 0f 1f 80 00 00 00 00 48 8d 05 51 37 0c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41
       RSP: 002b:00007fff2f5350c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 000056118d60c120 RCX: 00007fefef2b129d
       RDX: 0000000000002000 RSI: 000056118d629320 RDI: 0000000000000003
       RBP: 000056118d530370 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
       R13: 000056118d5c2a10 R14: 000056118d5c2a10 R15: 000056118d5303b8
      
      tcf_sample_act() tried to update its per-cpu stats, but tcf_sample_init()
      forgot to allocate them, because tcf_idr_create() was called with a wrong
      value of 'cpustats'. Setting it to true proved to fix the reported crash.
      Reported-by: NMatteo Croce <mcroce@redhat.com>
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Fixes: 5c5670fa ("net/sched: Introduce sample tc action")
      Tested-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34043d25
    • C
      net_sched: notify filter deletion when deleting a chain · f5b9bac7
      Cong Wang 提交于
      When we delete a chain of filters, we need to notify
      user-space we are deleting each filters in this chain
      too.
      
      Fixes: 32a4f5ec ("net: sched: introduce chain object to uapi")
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5b9bac7
  11. 11 9月, 2018 6 次提交
  12. 09 9月, 2018 2 次提交
  13. 08 9月, 2018 1 次提交
  14. 06 9月, 2018 1 次提交
    • D
      net/sched: fix memory leak in act_tunnel_key_init() · ee28bb56
      Davide Caratti 提交于
      If users try to install act_tunnel_key 'set' rules with duplicate values
      of 'index', the tunnel metadata are allocated, but never released. Then,
      kmemleak complains as follows:
      
       # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
       # echo clear > /sys/kernel/debug/kmemleak
       # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
       Error: TC IDR already exists.
       We have an error talking to the kernel
       # echo scan > /sys/kernel/debug/kmemleak
       # cat /sys/kernel/debug/kmemleak
       unreferenced object 0xffff8800574e6c80 (size 256):
         comm "tc", pid 5617, jiffies 4298118009 (age 57.990s)
         hex dump (first 32 bytes):
           00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff  ................
           81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00  .$..............
         backtrace:
           [<00000000b7afbf4e>] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key]
           [<000000007d98fccd>] tcf_action_init_1+0x698/0xac0
           [<0000000099b8f7cc>] tcf_action_init+0x15c/0x590
           [<00000000dc60eebe>] tc_ctl_action+0x336/0x5c2
           [<000000002f5a2f7d>] rtnetlink_rcv_msg+0x357/0x8e0
           [<000000000bfe7575>] netlink_rcv_skb+0x124/0x350
           [<00000000edab656f>] netlink_unicast+0x40f/0x5d0
           [<00000000b322cdcb>] netlink_sendmsg+0x6e8/0xba0
           [<0000000063d9d490>] sock_sendmsg+0xb3/0xf0
           [<00000000f0d3315a>] ___sys_sendmsg+0x654/0x960
           [<00000000c06cbd42>] __sys_sendmsg+0xd3/0x170
           [<00000000ce72e4b0>] do_syscall_64+0xa5/0x470
           [<000000005caa2d97>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
           [<00000000fac1b476>] 0xffffffffffffffff
      
      This problem theoretically happens also in case users attempt to setup a
      geneve rule having wrong configuration data, or when the kernel fails to
      allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
      metadata also in the above conditions.
      
      Addresses-Coverity-ID: 1373974 ("Resource leak")
      Fixes: d0f6dd8a ("net/sched: Introduce act_tunnel_key")
      Fixes: 0ed5269f ("net/sched: add tunnel option support to act_tunnel_key")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee28bb56
  15. 05 9月, 2018 2 次提交
    • V
      net: sched: action_ife: take reference to meta module · 84cb8eb2
      Vlad Buslov 提交于
      Recent refactoring of add_metainfo() caused use_all_metadata() to add
      metainfo to ife action metalist without taking reference to module. This
      causes warning in module_put called from ife action cleanup function.
      
      Implement add_metainfo_and_get_ops() function that returns with reference
      to module taken if metainfo was added successfully, and call it from
      use_all_metadata(), instead of calling __add_metainfo() directly.
      
      Example warning:
      
      [  646.344393] WARNING: CPU: 1 PID: 2278 at kernel/module.c:1139 module_put+0x1cb/0x230
      [  646.352437] Modules linked in: act_meta_skbtcindex act_meta_mark act_meta_skbprio act_ife ife veth nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc mlx5_ib ib_uverbs ib_core intel_rapl sb_edac x86_pkg_temp_thermal mlx5_core coretemp kvm_intel kvm nfsd igb irqbypass crct10dif_pclmul devlink crc32_pclmul mei_me joydev ses crc32c_intel enclosure auth_rpcgss i2c_algo_bit ioatdma ptp mei pps_core ghash_clmulni_intel iTCO_wdt iTCO_vendor_support pcspkr dca ipmi_ssif lpc_ich target_core_mod i2c_i801 ipmi_si ipmi_devintf pcc_cpufreq wmi ipmi_msghandler nfs_acl lockd acpi_pad acpi_power_meter grace sunrpc mpt3sas raid_class scsi_transport_sas
      [  646.425631] CPU: 1 PID: 2278 Comm: tc Not tainted 4.19.0-rc1+ #799
      [  646.432187] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  646.440595] RIP: 0010:module_put+0x1cb/0x230
      [  646.445238] Code: f3 66 94 02 e8 26 ff fa ff 85 c0 74 11 0f b6 1d 51 30 94 02 80 fb 01 77 60 83 e3 01 74 13 65 ff 0d 3a 83 db 73 e9 2b ff ff ff <0f> 0b e9 00 ff ff ff e8 59 01 fb ff 85 c0 75 e4 48 c7 c2 20 62 6b
      [  646.464997] RSP: 0018:ffff880354d37068 EFLAGS: 00010286
      [  646.470599] RAX: 0000000000000000 RBX: ffffffffc0a52518 RCX: ffffffff8c2668db
      [  646.478118] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffc0a52518
      [  646.485641] RBP: ffffffffc0a52180 R08: fffffbfff814a4a4 R09: fffffbfff814a4a3
      [  646.493164] R10: ffffffffc0a5251b R11: fffffbfff814a4a4 R12: 1ffff1006a9a6e0d
      [  646.500687] R13: 00000000ffffffff R14: ffff880362bab890 R15: dead000000000100
      [  646.508213] FS:  00007f4164c99800(0000) GS:ffff88036fe40000(0000) knlGS:0000000000000000
      [  646.516961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  646.523080] CR2: 00007f41638b8420 CR3: 0000000351df0004 CR4: 00000000001606e0
      [  646.530595] Call Trace:
      [  646.533408]  ? find_symbol_in_section+0x260/0x260
      [  646.538509]  tcf_ife_cleanup+0x11b/0x200 [act_ife]
      [  646.543695]  tcf_action_cleanup+0x29/0xa0
      [  646.548078]  __tcf_action_put+0x5a/0xb0
      [  646.552289]  ? nla_put+0x65/0xe0
      [  646.555889]  __tcf_idr_release+0x48/0x60
      [  646.560187]  tcf_generic_walker+0x448/0x6b0
      [  646.564764]  ? tcf_action_dump_1+0x450/0x450
      [  646.569411]  ? __lock_is_held+0x84/0x110
      [  646.573720]  ? tcf_ife_walker+0x10c/0x20f [act_ife]
      [  646.578982]  tca_action_gd+0x972/0xc40
      [  646.583129]  ? tca_get_fill.constprop.17+0x250/0x250
      [  646.588471]  ? mark_lock+0xcf/0x980
      [  646.592324]  ? check_chain_key+0x140/0x1f0
      [  646.596832]  ? debug_show_all_locks+0x240/0x240
      [  646.601839]  ? memset+0x1f/0x40
      [  646.605350]  ? nla_parse+0xca/0x1a0
      [  646.609217]  tc_ctl_action+0x215/0x230
      [  646.613339]  ? tcf_action_add+0x220/0x220
      [  646.617748]  rtnetlink_rcv_msg+0x56a/0x6d0
      [  646.622227]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  646.626466]  netlink_rcv_skb+0x18d/0x200
      [  646.630752]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  646.634959]  ? netlink_ack+0x500/0x500
      [  646.639106]  netlink_unicast+0x2d0/0x370
      [  646.643409]  ? netlink_attachskb+0x340/0x340
      [  646.648050]  ? _copy_from_iter_full+0xe9/0x3e0
      [  646.652870]  ? import_iovec+0x11e/0x1c0
      [  646.657083]  netlink_sendmsg+0x3b9/0x6a0
      [  646.661388]  ? netlink_unicast+0x370/0x370
      [  646.665877]  ? netlink_unicast+0x370/0x370
      [  646.670351]  sock_sendmsg+0x6b/0x80
      [  646.674212]  ___sys_sendmsg+0x4a1/0x520
      [  646.678443]  ? copy_msghdr_from_user+0x210/0x210
      [  646.683463]  ? lock_downgrade+0x320/0x320
      [  646.687849]  ? debug_show_all_locks+0x240/0x240
      [  646.692760]  ? do_raw_spin_unlock+0xa2/0x130
      [  646.697418]  ? _raw_spin_unlock+0x24/0x30
      [  646.701798]  ? __handle_mm_fault+0x1819/0x1c10
      [  646.706619]  ? __pmd_alloc+0x320/0x320
      [  646.710738]  ? debug_show_all_locks+0x240/0x240
      [  646.715649]  ? restore_nameidata+0x7b/0xa0
      [  646.720117]  ? check_chain_key+0x140/0x1f0
      [  646.724590]  ? check_chain_key+0x140/0x1f0
      [  646.729070]  ? __fget_light+0xbc/0xd0
      [  646.733121]  ? __sys_sendmsg+0xd7/0x150
      [  646.737329]  __sys_sendmsg+0xd7/0x150
      [  646.741359]  ? __ia32_sys_shutdown+0x30/0x30
      [  646.746003]  ? up_read+0x53/0x90
      [  646.749601]  ? __do_page_fault+0x484/0x780
      [  646.754105]  ? do_syscall_64+0x1e/0x2c0
      [  646.758320]  do_syscall_64+0x72/0x2c0
      [  646.762353]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  646.767776] RIP: 0033:0x7f4163872150
      [  646.771713] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
      [  646.791474] RSP: 002b:00007ffdef7d6b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  646.799721] RAX: ffffffffffffffda RBX: 0000000000000024 RCX: 00007f4163872150
      [  646.807240] RDX: 0000000000000000 RSI: 00007ffdef7d6bd0 RDI: 0000000000000003
      [  646.814760] RBP: 000000005b8b9482 R08: 0000000000000001 R09: 0000000000000000
      [  646.822286] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdef7dad20
      [  646.829807] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000679bc0
      [  646.837360] irq event stamp: 6083
      [  646.841043] hardirqs last  enabled at (6081): [<ffffffff8c220a7d>] __call_rcu+0x17d/0x500
      [  646.849882] hardirqs last disabled at (6083): [<ffffffff8c004f06>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  646.859775] softirqs last  enabled at (5968): [<ffffffff8d4004a1>] __do_softirq+0x4a1/0x6ee
      [  646.868784] softirqs last disabled at (6082): [<ffffffffc0a78759>] tcf_ife_cleanup+0x39/0x200 [act_ife]
      [  646.878845] ---[ end trace b1b8c12ffe51e657 ]---
      
      Fixes: 5ffe57da ("act_ife: fix a potential deadlock")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84cb8eb2
    • C
      act_ife: fix a potential use-after-free · 6d784f16
      Cong Wang 提交于
      Immediately after module_put(), user could delete this
      module, so e->ops could be already freed before we call
      e->ops->release().
      
      Fix this by moving module_put() after ops->release().
      
      Fixes: ef6980b6 ("introduce IFE action")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d784f16
  16. 04 9月, 2018 1 次提交
    • V
      net: sched: null actions array pointer before releasing action · c10bbfae
      Vlad Buslov 提交于
      Currently, tcf_action_delete() nulls actions array pointer after putting
      and deleting it. However, if tcf_idr_delete_index() returns an error,
      pointer to action is not set to null. That results it being released second
      time in error handling code of tca_action_gd().
      
      Kasan error:
      
      [  807.367755] ==================================================================
      [  807.375844] BUG: KASAN: use-after-free in tc_setup_cb_call+0x14e/0x250
      [  807.382763] Read of size 8 at addr ffff88033e636000 by task tc/2732
      
      [  807.391289] CPU: 0 PID: 2732 Comm: tc Tainted: G        W         4.19.0-rc1+ #799
      [  807.399542] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  807.407948] Call Trace:
      [  807.410763]  dump_stack+0x92/0xeb
      [  807.414456]  print_address_description+0x70/0x360
      [  807.419549]  kasan_report+0x14d/0x300
      [  807.423582]  ? tc_setup_cb_call+0x14e/0x250
      [  807.428150]  tc_setup_cb_call+0x14e/0x250
      [  807.432539]  ? nla_put+0x65/0xe0
      [  807.436146]  fl_dump+0x394/0x3f0 [cls_flower]
      [  807.440890]  ? fl_tmplt_dump+0x140/0x140 [cls_flower]
      [  807.446327]  ? lock_downgrade+0x320/0x320
      [  807.450702]  ? lock_acquire+0xe2/0x220
      [  807.454819]  ? is_bpf_text_address+0x5/0x140
      [  807.459475]  ? memcpy+0x34/0x50
      [  807.462980]  ? nla_put+0x65/0xe0
      [  807.466582]  tcf_fill_node+0x341/0x430
      [  807.470717]  ? tcf_block_put+0xe0/0xe0
      [  807.474859]  tcf_node_dump+0xdb/0xf0
      [  807.478821]  fl_walk+0x8e/0x170 [cls_flower]
      [  807.483474]  tcf_chain_dump+0x35a/0x4d0
      [  807.487703]  ? tfilter_notify+0x170/0x170
      [  807.492091]  ? tcf_fill_node+0x430/0x430
      [  807.496411]  tc_dump_tfilter+0x362/0x3f0
      [  807.500712]  ? tc_del_tfilter+0x850/0x850
      [  807.505104]  ? kasan_unpoison_shadow+0x30/0x40
      [  807.509940]  ? __mutex_unlock_slowpath+0xcf/0x410
      [  807.515031]  netlink_dump+0x263/0x4f0
      [  807.519077]  __netlink_dump_start+0x2a0/0x300
      [  807.523817]  ? tc_del_tfilter+0x850/0x850
      [  807.528198]  rtnetlink_rcv_msg+0x46a/0x6d0
      [  807.532671]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  807.536878]  ? tc_del_tfilter+0x850/0x850
      [  807.541280]  netlink_rcv_skb+0x18d/0x200
      [  807.545570]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  807.549773]  ? netlink_ack+0x500/0x500
      [  807.553913]  netlink_unicast+0x2d0/0x370
      [  807.558212]  ? netlink_attachskb+0x340/0x340
      [  807.562855]  ? _copy_from_iter_full+0xe9/0x3e0
      [  807.567677]  ? import_iovec+0x11e/0x1c0
      [  807.571890]  netlink_sendmsg+0x3b9/0x6a0
      [  807.576192]  ? netlink_unicast+0x370/0x370
      [  807.580684]  ? netlink_unicast+0x370/0x370
      [  807.585154]  sock_sendmsg+0x6b/0x80
      [  807.589015]  ___sys_sendmsg+0x4a1/0x520
      [  807.593230]  ? copy_msghdr_from_user+0x210/0x210
      [  807.598232]  ? do_wp_page+0x174/0x880
      [  807.602276]  ? __handle_mm_fault+0x749/0x1c10
      [  807.607021]  ? __handle_mm_fault+0x1046/0x1c10
      [  807.611849]  ? __pmd_alloc+0x320/0x320
      [  807.615973]  ? check_chain_key+0x140/0x1f0
      [  807.620450]  ? check_chain_key+0x140/0x1f0
      [  807.624929]  ? __fget_light+0xbc/0xd0
      [  807.628970]  ? __sys_sendmsg+0xd7/0x150
      [  807.633172]  __sys_sendmsg+0xd7/0x150
      [  807.637201]  ? __ia32_sys_shutdown+0x30/0x30
      [  807.641846]  ? up_read+0x53/0x90
      [  807.645442]  ? __do_page_fault+0x484/0x780
      [  807.649949]  ? do_syscall_64+0x1e/0x2c0
      [  807.654164]  do_syscall_64+0x72/0x2c0
      [  807.658198]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  807.663625] RIP: 0033:0x7f42e9870150
      [  807.667568] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
      [  807.687328] RSP: 002b:00007ffdbf595b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  807.695564] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f42e9870150
      [  807.703083] RDX: 0000000000000000 RSI: 00007ffdbf595b80 RDI: 0000000000000003
      [  807.710605] RBP: 00007ffdbf599d90 R08: 0000000000679bc0 R09: 000000000000000f
      [  807.718127] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdbf599d88
      [  807.725651] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
      [  807.735048] Allocated by task 2687:
      [  807.738902]  kasan_kmalloc+0xa0/0xd0
      [  807.742852]  __kmalloc+0x118/0x2d0
      [  807.746615]  tcf_idr_create+0x44/0x320
      [  807.750738]  tcf_nat_init+0x41e/0x530 [act_nat]
      [  807.755638]  tcf_action_init_1+0x4e0/0x650
      [  807.760104]  tcf_action_init+0x1ce/0x2d0
      [  807.764395]  tcf_exts_validate+0x1d8/0x200
      [  807.768861]  fl_change+0x55a/0x26b4 [cls_flower]
      [  807.773845]  tc_new_tfilter+0x748/0xa20
      [  807.778051]  rtnetlink_rcv_msg+0x56a/0x6d0
      [  807.782517]  netlink_rcv_skb+0x18d/0x200
      [  807.786804]  netlink_unicast+0x2d0/0x370
      [  807.791095]  netlink_sendmsg+0x3b9/0x6a0
      [  807.795387]  sock_sendmsg+0x6b/0x80
      [  807.799240]  ___sys_sendmsg+0x4a1/0x520
      [  807.803445]  __sys_sendmsg+0xd7/0x150
      [  807.807473]  do_syscall_64+0x72/0x2c0
      [  807.811506]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [  807.818776] Freed by task 2728:
      [  807.822283]  __kasan_slab_free+0x122/0x180
      [  807.826752]  kfree+0xf4/0x2f0
      [  807.830080]  __tcf_action_put+0x5a/0xb0
      [  807.834281]  tcf_action_put_many+0x46/0x70
      [  807.838747]  tca_action_gd+0x232/0xc40
      [  807.842862]  tc_ctl_action+0x215/0x230
      [  807.846977]  rtnetlink_rcv_msg+0x56a/0x6d0
      [  807.851444]  netlink_rcv_skb+0x18d/0x200
      [  807.855731]  netlink_unicast+0x2d0/0x370
      [  807.860021]  netlink_sendmsg+0x3b9/0x6a0
      [  807.864312]  sock_sendmsg+0x6b/0x80
      [  807.868166]  ___sys_sendmsg+0x4a1/0x520
      [  807.872372]  __sys_sendmsg+0xd7/0x150
      [  807.876401]  do_syscall_64+0x72/0x2c0
      [  807.880431]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [  807.887704] The buggy address belongs to the object at ffff88033e636000
                      which belongs to the cache kmalloc-256 of size 256
      [  807.900909] The buggy address is located 0 bytes inside of
                      256-byte region [ffff88033e636000, ffff88033e636100)
      [  807.913155] The buggy address belongs to the page:
      [  807.918322] page:ffffea000cf98d80 count:1 mapcount:0 mapping:ffff88036f80ee00 index:0x0 compound_mapcount: 0
      [  807.928831] flags: 0x5fff8000008100(slab|head)
      [  807.933647] raw: 005fff8000008100 ffffea000db44f00 0000000400000004 ffff88036f80ee00
      [  807.942050] raw: 0000000000000000 0000000080190019 00000001ffffffff 0000000000000000
      [  807.950456] page dumped because: kasan: bad access detected
      
      [  807.958240] Memory state around the buggy address:
      [  807.963405]  ffff88033e635f00: fc fc fc fc fb fb fb fb fb fb fb fc fc fc fc fb
      [  807.971288]  ffff88033e635f80: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
      [  807.979166] >ffff88033e636000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  807.994882]                    ^
      [  807.998477]  ffff88033e636080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  808.006352]  ffff88033e636100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
      [  808.014230] ==================================================================
      [  808.022108] Disabling lock debugging due to kernel taint
      
      Fixes: edfaf94f ("net_sched: improve and refactor tcf_action_put_many()")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c10bbfae
  17. 01 9月, 2018 2 次提交
  18. 30 8月, 2018 1 次提交