1. 06 8月, 2019 1 次提交
    • D
      net: sched: use temporary variable for actions indexes · 7be8ef2c
      Dmytro Linkin 提交于
      Currently init call of all actions (except ipt) init their 'parm'
      structure as a direct pointer to nla data in skb. This leads to race
      condition when some of the filter actions were initialized successfully
      (and were assigned with idr action index that was written directly
      into nla data), but then were deleted and retried (due to following
      action module missing or classifier-initiated retry), in which case
      action init code tries to insert action to idr with index that was
      assigned on previous iteration. During retry the index can be reused
      by another action that was inserted concurrently, which causes
      unintended action sharing between filters.
      To fix described race condition, save action idr index to temporary
      stack-allocated variable instead on nla data.
      
      Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
      Signed-off-by: NDmytro Linkin <dmitrolin@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7be8ef2c
  2. 31 5月, 2019 1 次提交
  3. 28 4月, 2019 1 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
  4. 22 3月, 2019 2 次提交
    • D
      net/sched: act_csum: validate the control action inside init() · f5c29d83
      Davide Caratti 提交于
      the following script:
      
       # tc qdisc add dev crash0 clsact
       # tc filter add dev crash0 egress matchall action csum icmp pass index 90
       # tc actions replace action csum icmp goto chain 42 index 90 \
       > cookie c1a0c1a0
       # tc actions show action csum
      
      had the following output:
      
      Error: Failed to init TC action chain.
      We have an error talking to the kernel
      total acts 1
      
              action order 0: csum (icmp) action goto chain 42
              index 90 ref 2 bind 1
              cookie c1a0c1a0
      
      Then, the first packet transmitted by crash0 made the kernel crash:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
       #PF error: [normal kernel read fault]
       PGD 8000000074692067 P4D 8000000074692067 PUD 2e210067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.0.0-rc4.gotochain_crash+ #533
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:tcf_action_exec+0xb8/0x100
       Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
       RSP: 0018:ffff93153da03be0 EFLAGS: 00010246
       RAX: 000000002000002a RBX: ffff9314ee40f700 RCX: 0000000000003a00
       RDX: 0000000000000000 RSI: ffff931537c87828 RDI: ffff931537c87818
       RBP: ffff93153da03c80 R08: 00000000527cffff R09: 0000000000000003
       R10: 000000000000003f R11: 0000000000000028 R12: ffff9314edf68400
       R13: ffff9314edf68408 R14: 0000000000000001 R15: ffff9314ed67b600
       FS:  0000000000000000(0000) GS:ffff93153da00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 0000000073e32003 CR4: 00000000001606f0
       Call Trace:
        <IRQ>
        tcf_classify+0x58/0x120
        __dev_queue_xmit+0x40a/0x890
        ? ip6_finish_output2+0x369/0x590
        ip6_finish_output2+0x369/0x590
        ? ip6_output+0x68/0x110
        ip6_output+0x68/0x110
        ? nf_hook.constprop.35+0x79/0xc0
        mld_sendpack+0x16f/0x220
        mld_ifc_timer_expire+0x195/0x2c0
        ? igmp6_timer_handler+0x70/0x70
        call_timer_fn+0x2b/0x130
        run_timer_softirq+0x3e8/0x440
        ? tick_sched_timer+0x37/0x70
        __do_softirq+0xe3/0x2f5
        irq_exit+0xf0/0x100
        smp_apic_timer_interrupt+0x6c/0x130
        apic_timer_interrupt+0xf/0x20
        </IRQ>
       RIP: 0010:native_safe_halt+0x2/0x10
       Code: 66 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
       RSP: 0018:ffffffff9a803e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
       RAX: ffffffff99e184f0 RBX: 0000000000000000 RCX: 0000000000000001
       RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000000
       RBP: 0000000000000000 R08: 000eb5c4572376b3 R09: 0000000000000000
       R10: ffffa53e806a3ca0 R11: 00000000000f4240 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
        ? __sched_text_end+0x1/0x1
        default_idle+0x1c/0x140
        do_idle+0x1c4/0x280
        cpu_startup_entry+0x19/0x20
        start_kernel+0x49e/0x4be
        secondary_startup_64+0xa4/0xb0
       Modules linked in: act_csum veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel mbcache snd_hda_codec jbd2 snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt net_failover fb_sys_fops virtio_console virtio_blk ttm failover drm ata_piix crc32c_intel floppy virtio_pci serio_raw libata virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
       CR2: 0000000000000000
      
      Validating the control action within tcf_csum_init() proved to fix the
      above issue. A TDC selftest is added to verify the correct behavior.
      
      Fixes: db50514f ("net: sched: add termination action to allow goto chain")
      Fixes: 97763dc0 ("net_sched: reject unknown tcfa_action values")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5c29d83
    • D
      net/sched: prepare TC actions to properly validate the control action · 85d0966f
      Davide Caratti 提交于
      - pass a pointer to struct tcf_proto in each actions's init() handler,
        to allow validating the control action, checking whether the chain
        exists and (eventually) refcounting it.
      - remove code that validates the control action after a successful call
        to the action's init() handler, and replace it with a test that forbids
        addition of actions having 'goto_chain' and NULL goto_chain pointer at
        the same time.
      - add tcf_action_check_ctrlact(), that will validate the control action
        and eventually allocate the action 'goto_chain' within the init()
        handler.
      - add tcf_action_set_ctrlact(), that will assign the control action and
        swap the current 'goto_chain' pointer with the new given one.
      
      This disallows 'goto_chain' on actions that don't initialize it properly
      in their init() handler, i.e. calling tcf_action_check_ctrlact() after
      successful IDR reservation and then calling tcf_action_set_ctrlact()
      to assign 'goto_chain' and 'tcf_action' consistently.
      
      By doing this, the kernel does not leak anymore refcounts when a valid
      'goto chain' handle is replaced in TC actions, causing kmemleak splats
      like the following one:
      
       # tc chain add dev dd0 chain 42 ingress protocol ip flower \
       > ip_proto tcp action drop
       # tc chain add dev dd0 chain 43 ingress protocol ip flower \
       > ip_proto udp action drop
       # tc filter add dev dd0 ingress matchall \
       > action gact goto chain 42 index 66
       # tc filter replace dev dd0 ingress matchall \
       > action gact goto chain 43 index 66
       # echo scan >/sys/kernel/debug/kmemleak
       <...>
       unreferenced object 0xffff93c0ee09f000 (size 1024):
       comm "tc", pid 2565, jiffies 4295339808 (age 65.426s)
       hex dump (first 32 bytes):
         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
         00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00  ................
       backtrace:
         [<000000009b63f92d>] tc_ctl_chain+0x3d2/0x4c0
         [<00000000683a8d72>] rtnetlink_rcv_msg+0x263/0x2d0
         [<00000000ddd88f8e>] netlink_rcv_skb+0x4a/0x110
         [<000000006126a348>] netlink_unicast+0x1a0/0x250
         [<00000000b3340877>] netlink_sendmsg+0x2c1/0x3c0
         [<00000000a25a2171>] sock_sendmsg+0x36/0x40
         [<00000000f19ee1ec>] ___sys_sendmsg+0x280/0x2f0
         [<00000000d0422042>] __sys_sendmsg+0x5e/0xa0
         [<000000007a6c61f9>] do_syscall_64+0x5b/0x180
         [<00000000ccd07542>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
         [<0000000013eaa334>] 0xffffffffffffffff
      
      Fixes: db50514f ("net: sched: add termination action to allow goto chain")
      Fixes: 97763dc0 ("net_sched: reject unknown tcfa_action values")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85d0966f
  5. 28 2月, 2019 1 次提交
  6. 11 2月, 2019 1 次提交
  7. 01 9月, 2018 1 次提交
  8. 22 8月, 2018 1 次提交
  9. 20 8月, 2018 1 次提交
    • V
      net: sched: always disable bh when taking tcf_lock · 653cd284
      Vlad Buslov 提交于
      Recently, ops->init() and ops->dump() of all actions were modified to
      always obtain tcf_lock when accessing private action state. Actions that
      don't depend on tcf_lock for synchronization with their data path use
      non-bh locking API. However, tcf_lock is also used to protect rate
      estimator stats in softirq context by timer callback.
      
      Change ops->init() and ops->dump() of all actions to disable bh when using
      tcf_lock to prevent deadlock reported by following lockdep warning:
      
      [  105.470398] ================================
      [  105.475014] WARNING: inconsistent lock state
      [  105.479628] 4.18.0-rc8+ #664 Not tainted
      [  105.483897] --------------------------------
      [  105.488511] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [  105.500449] 00000000f86c012e (&(&p->tcfa_lock)->rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
      [  105.509696] {SOFTIRQ-ON-W} state was registered at:
      [  105.514925]   _raw_spin_lock+0x2c/0x40
      [  105.519022]   tcf_bpf_init+0x579/0x820 [act_bpf]
      [  105.523990]   tcf_action_init_1+0x4e4/0x660
      [  105.528518]   tcf_action_init+0x1ce/0x2d0
      [  105.532880]   tcf_exts_validate+0x1d8/0x200
      [  105.537416]   fl_change+0x55a/0x268b [cls_flower]
      [  105.542469]   tc_new_tfilter+0x748/0xa20
      [  105.546738]   rtnetlink_rcv_msg+0x56a/0x6d0
      [  105.551268]   netlink_rcv_skb+0x18d/0x200
      [  105.555628]   netlink_unicast+0x2d0/0x370
      [  105.559990]   netlink_sendmsg+0x3b9/0x6a0
      [  105.564349]   sock_sendmsg+0x6b/0x80
      [  105.568271]   ___sys_sendmsg+0x4a1/0x520
      [  105.572547]   __sys_sendmsg+0xd7/0x150
      [  105.576655]   do_syscall_64+0x72/0x2c0
      [  105.580757]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  105.586243] irq event stamp: 489296
      [  105.590084] hardirqs last  enabled at (489296): [<ffffffffb507e639>] _raw_spin_unlock_irq+0x29/0x40
      [  105.599765] hardirqs last disabled at (489295): [<ffffffffb507e745>] _raw_spin_lock_irq+0x15/0x50
      [  105.609277] softirqs last  enabled at (489292): [<ffffffffb413a6a3>] irq_enter+0x83/0xa0
      [  105.618001] softirqs last disabled at (489293): [<ffffffffb413a800>] irq_exit+0x140/0x190
      [  105.626813]
                     other info that might help us debug this:
      [  105.633976]  Possible unsafe locking scenario:
      
      [  105.640526]        CPU0
      [  105.643325]        ----
      [  105.646125]   lock(&(&p->tcfa_lock)->rlock);
      [  105.650747]   <Interrupt>
      [  105.653717]     lock(&(&p->tcfa_lock)->rlock);
      [  105.658514]
                      *** DEADLOCK ***
      
      [  105.665349] 1 lock held by swapper/16/0:
      [  105.669629]  #0: 00000000a640ad99 ((&est->timer)){+.-.}, at: call_timer_fn+0x10b/0x550
      [  105.678200]
                     stack backtrace:
      [  105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
      [  105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  105.698626] Call Trace:
      [  105.701421]  <IRQ>
      [  105.703791]  dump_stack+0x92/0xeb
      [  105.707461]  print_usage_bug+0x336/0x34c
      [  105.711744]  mark_lock+0x7c9/0x980
      [  105.715500]  ? print_shortest_lock_dependencies+0x2e0/0x2e0
      [  105.721424]  ? check_usage_forwards+0x230/0x230
      [  105.726315]  __lock_acquire+0x923/0x26f0
      [  105.730597]  ? debug_show_all_locks+0x240/0x240
      [  105.735478]  ? mark_lock+0x493/0x980
      [  105.739412]  ? check_chain_key+0x140/0x1f0
      [  105.743861]  ? __lock_acquire+0x836/0x26f0
      [  105.748323]  ? lock_acquire+0x12e/0x290
      [  105.752516]  lock_acquire+0x12e/0x290
      [  105.756539]  ? est_fetch_counters+0x3c/0xa0
      [  105.761084]  _raw_spin_lock+0x2c/0x40
      [  105.765099]  ? est_fetch_counters+0x3c/0xa0
      [  105.769633]  est_fetch_counters+0x3c/0xa0
      [  105.773995]  est_timer+0x87/0x390
      [  105.777670]  ? est_fetch_counters+0xa0/0xa0
      [  105.782210]  ? lock_acquire+0x12e/0x290
      [  105.786410]  call_timer_fn+0x161/0x550
      [  105.790512]  ? est_fetch_counters+0xa0/0xa0
      [  105.795055]  ? del_timer_sync+0xd0/0xd0
      [  105.799249]  ? __lock_is_held+0x93/0x110
      [  105.803531]  ? mark_held_locks+0x20/0xe0
      [  105.807813]  ? _raw_spin_unlock_irq+0x29/0x40
      [  105.812525]  ? est_fetch_counters+0xa0/0xa0
      [  105.817069]  ? est_fetch_counters+0xa0/0xa0
      [  105.821610]  run_timer_softirq+0x3c4/0x9f0
      [  105.826064]  ? lock_acquire+0x12e/0x290
      [  105.830257]  ? __bpf_trace_timer_class+0x10/0x10
      [  105.835237]  ? __lock_is_held+0x25/0x110
      [  105.839517]  __do_softirq+0x11d/0x7bf
      [  105.843542]  irq_exit+0x140/0x190
      [  105.847208]  smp_apic_timer_interrupt+0xac/0x3b0
      [  105.852182]  apic_timer_interrupt+0xf/0x20
      [  105.856628]  </IRQ>
      [  105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
      [  105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 <4c> 8b 6c 24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
      [  105.884288] RSP: 0018:ffff8803ad94fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      [  105.892494] RAX: 0000000000000000 RBX: ffffe8fb300829c0 RCX: ffffffffb41e19e1
      [  105.899988] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8803ad9358ac
      [  105.907503] RBP: ffffffffb6636300 R08: 0000000000000004 R09: 0000000000000000
      [  105.914997] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
      [  105.922487] R13: ffffffffb6636140 R14: ffffffffb66362d8 R15: 000000188d36091b
      [  105.929988]  ? trace_hardirqs_on_caller+0x141/0x2d0
      [  105.935232]  do_idle+0x28e/0x320
      [  105.938817]  ? arch_cpu_idle_exit+0x40/0x40
      [  105.943361]  ? mark_lock+0x8c1/0x980
      [  105.947295]  ? _raw_spin_unlock_irqrestore+0x32/0x60
      [  105.952619]  cpu_startup_entry+0xc2/0xd0
      [  105.956900]  ? cpu_in_idle+0x20/0x20
      [  105.960830]  ? _raw_spin_unlock_irqrestore+0x32/0x60
      [  105.966146]  ? trace_hardirqs_on_caller+0x141/0x2d0
      [  105.971391]  start_secondary+0x2b5/0x360
      [  105.975669]  ? set_cpu_sibling_map+0x1330/0x1330
      [  105.980654]  secondary_startup_64+0xa5/0xb0
      
      Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
      warning regarding possible irq lock inversion dependency between tcf_lock,
      and psample_groups_lock that is taken when holding tcf_lock in sample init:
      
      [  162.108959]  Possible interrupt unsafe locking scenario:
      
      [  162.116386]        CPU0                    CPU1
      [  162.121277]        ----                    ----
      [  162.126162]   lock(psample_groups_lock);
      [  162.130447]                                local_irq_disable();
      [  162.136772]                                lock(&(&p->tcfa_lock)->rlock);
      [  162.143957]                                lock(psample_groups_lock);
      [  162.150813]   <Interrupt>
      [  162.153808]     lock(&(&p->tcfa_lock)->rlock);
      [  162.158608]
                      *** DEADLOCK ***
      
      In order to prevent potential lock inversion dependency between tcf_lock
      and psample_groups_lock, extract call to psample_group_get() from tcf_lock
      protected section in sample action init function.
      
      Fixes: 4e232818 ("net: sched: act_mirred: remove dependency on rtnl lock")
      Fixes: 764e9a24 ("net: sched: act_vlan: remove dependency on rtnl lock")
      Fixes: 729e0126 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
      Fixes: d7728495 ("net: sched: act_sample: remove dependency on rtnl lock")
      Fixes: e8917f43 ("net: sched: act_gact: remove dependency on rtnl lock")
      Fixes: b6a2b971 ("net: sched: act_csum: remove dependency on rtnl lock")
      Fixes: 2142236b ("net: sched: act_bpf: remove dependency on rtnl lock")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      653cd284
  10. 14 8月, 2018 1 次提交
  11. 12 8月, 2018 1 次提交
  12. 31 7月, 2018 1 次提交
    • P
      tc/act: remove unneeded RCU lock in action callback · 7fd4b288
      Paolo Abeni 提交于
      Each lockless action currently does its own RCU locking in ->act().
      This allows using plain RCU accessor, even if the context
      is really RCU BH.
      
      This change drops the per action RCU lock, replace the accessors
      with the _bh variant, cleans up a bit the surrounding code and
      documents the RCU status in the relevant header.
      No functional nor performance change is intended.
      
      The goal of this patch is clarifying that the RCU critical section
      used by the tc actions extends up to the classifier's caller.
      
      v1 -> v2:
       - preserve rcu lock in act_bpf: it's needed by eBPF helpers,
         as pointed out by Daniel
      
      v3 -> v4:
       - fixed some typos in the commit message (JiriP)
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fd4b288
  13. 08 7月, 2018 5 次提交
  14. 07 7月, 2018 1 次提交
    • D
      net/sched: act_csum: fix NULL dereference when 'goto chain' is used · 11a245e2
      Davide Caratti 提交于
      the control action in the common member of struct tcf_csum must be a valid
      value, as it can contain the chain index when 'goto chain' is used. Ensure
      that the control action can be read as x->tcfa_action, when x is a pointer
      to struct tc_action and x->ops->type is TCA_ACT_CSUM, to prevent the
      following command:
      
        # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
        > $tcflags dst_mac $h2mac action csum ip or tcp or udp or sctp goto chain 1
      
      from triggering a NULL pointer dereference when a matching packet is
      received.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
       PGD 800000010416b067 P4D 800000010416b067 PUD 1041be067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 0 PID: 3072 Comm: mausezahn Tainted: G            E     4.18.0-rc2.auguri+ #421
       Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.58 02/07/2013
       RIP: 0010:tcf_action_exec+0xb8/0x100
       Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
       RSP: 0018:ffffa020dea03c40 EFLAGS: 00010246
       RAX: 0000000020000001 RBX: ffffa020d7ccef00 RCX: 0000000000000054
       RDX: 0000000000000000 RSI: ffffa020ca5ae000 RDI: ffffa020d7ccef00
       RBP: ffffa020dea03e60 R08: 0000000000000000 R09: ffffa020dea03c9c
       R10: ffffa020dea03c78 R11: 0000000000000008 R12: ffffa020d3fe4f00
       R13: ffffa020d3fe4f08 R14: 0000000000000001 R15: ffffa020d53ca300
       FS:  00007f5a46942740(0000) GS:ffffa020dea00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 0000000104218002 CR4: 00000000001606f0
       Call Trace:
        <IRQ>
        fl_classify+0x1ad/0x1c0 [cls_flower]
        ? arp_rcv+0x121/0x1b0
        ? __x2apic_send_IPI_dest+0x40/0x40
        ? smp_reschedule_interrupt+0x1c/0xd0
        ? reschedule_interrupt+0xf/0x20
        ? reschedule_interrupt+0xa/0x20
        ? device_is_rmrr_locked+0xe/0x50
        ? iommu_should_identity_map+0x49/0xd0
        ? __intel_map_single+0x30/0x140
        ? e1000e_update_rdt_wa.isra.52+0x22/0xb0 [e1000e]
        ? e1000_alloc_rx_buffers+0x233/0x250 [e1000e]
        ? kmem_cache_alloc+0x38/0x1c0
        tcf_classify+0x89/0x140
        __netif_receive_skb_core+0x5ea/0xb70
        ? enqueue_task_fair+0xb6/0x7d0
        ? process_backlog+0x97/0x150
        process_backlog+0x97/0x150
        net_rx_action+0x14b/0x3e0
        __do_softirq+0xde/0x2b4
        do_softirq_own_stack+0x2a/0x40
        </IRQ>
        do_softirq.part.18+0x49/0x50
        __local_bh_enable_ip+0x49/0x50
        __dev_queue_xmit+0x4ab/0x8a0
        ? wait_woken+0x80/0x80
        ? packet_sendmsg+0x38f/0x810
        ? __dev_queue_xmit+0x8a0/0x8a0
        packet_sendmsg+0x38f/0x810
        sock_sendmsg+0x36/0x40
        __sys_sendto+0x10e/0x140
        ? do_vfs_ioctl+0xa4/0x630
        ? syscall_trace_enter+0x1df/0x2e0
        ? __audit_syscall_exit+0x22a/0x290
        __x64_sys_sendto+0x24/0x30
        do_syscall_64+0x5b/0x180
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f5a45cbec93
       Code: 48 8b 0d 18 83 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c7 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b f7 ff ff 48 89 04 24
       RSP: 002b:00007ffd0ee6d748 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000001161010 RCX: 00007f5a45cbec93
       RDX: 0000000000000062 RSI: 0000000001161322 RDI: 0000000000000003
       RBP: 00007ffd0ee6d780 R08: 00007ffd0ee6d760 R09: 0000000000000014
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000062
       R13: 0000000001161322 R14: 00007ffd0ee6d760 R15: 0000000000000003
       Modules linked in: act_csum act_gact cls_flower sch_ingress vrf veth act_tunnel_key(E) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi snd_hda_codec_realtek kvm snd_hda_codec_generic hp_wmi iTCO_wdt sparse_keymap rfkill mei_wdt iTCO_vendor_support wmi_bmof gpio_ich irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_intel crypto_simd cryptd snd_hda_codec glue_helper snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm pcspkr i2c_i801 snd_timer snd sg lpc_ich soundcore wmi mei_me
        mei ie31200_edac nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ahci libahci crc32c_intel i915 ixgbe serio_raw libata video dca i2c_algo_bit sfc drm_kms_helper syscopyarea mtd sysfillrect mdio sysimgblt fb_sys_fops drm e1000e i2c_core
       CR2: 0000000000000000
       ---[ end trace 3c9e9d1a77df4026 ]---
       RIP: 0010:tcf_action_exec+0xb8/0x100
       Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
       RSP: 0018:ffffa020dea03c40 EFLAGS: 00010246
       RAX: 0000000020000001 RBX: ffffa020d7ccef00 RCX: 0000000000000054
       RDX: 0000000000000000 RSI: ffffa020ca5ae000 RDI: ffffa020d7ccef00
       RBP: ffffa020dea03e60 R08: 0000000000000000 R09: ffffa020dea03c9c
       R10: ffffa020dea03c78 R11: 0000000000000008 R12: ffffa020d3fe4f00
       R13: ffffa020d3fe4f08 R14: 0000000000000001 R15: ffffa020d53ca300
       FS:  00007f5a46942740(0000) GS:ffffa020dea00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 0000000104218002 CR4: 00000000001606f0
       Kernel panic - not syncing: Fatal exception in interrupt
       Kernel Offset: 0x26400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Fixes: 9c5f69bb ("net/sched: act_csum: don't use spinlock in the fast path")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11a245e2
  15. 03 5月, 2018 1 次提交
  16. 28 3月, 2018 1 次提交
  17. 18 3月, 2018 1 次提交
    • D
      net/sched: fix NULL dereference in the error path of tcf_csum_init() · aab378a7
      Davide Caratti 提交于
      when the following command
      
       # tc action add action csum udp continue index 100
      
      is run for the first time, and tcf_csum_init() fails allocating struct
      tcf_csum, tcf_csum_cleanup() calls kfree_rcu(NULL,...). This causes the
      following error:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
       IP: __call_rcu+0x23/0x2b0
       PGD 80000000740b4067 P4D 80000000740b4067 PUD 32e7f067 PMD 0
       Oops: 0002 [#1] SMP PTI
       Modules linked in: act_csum(E) act_vlan ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic pcbc snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer aesni_intel crypto_simd glue_helper cryptd snd joydev pcspkr virtio_balloon i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_blk drm virtio_net virtio_console ata_piix crc32c_intel libata virtio_pci serio_raw i2c_core virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_vlan]
       CPU: 2 PID: 5763 Comm: tc Tainted: G            E    4.16.0-rc4.act_vlan.orig+ #403
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:__call_rcu+0x23/0x2b0
       RSP: 0018:ffffb275803e77c0 EFLAGS: 00010246
       RAX: ffffffffc057b080 RBX: ffff9674bc6f5240 RCX: 00000000ffffffff
       RDX: ffffffff928a5f00 RSI: 0000000000000008 RDI: 0000000000000008
       RBP: 0000000000000008 R08: 0000000000000001 R09: 0000000000000044
       R10: 0000000000000220 R11: ffff9674b9ab4821 R12: 0000000000000000
       R13: ffffffff928a5f00 R14: 0000000000000000 R15: 0000000000000001
       FS:  00007fa6368d8740(0000) GS:ffff9674bfd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000010 CR3: 0000000073dec001 CR4: 00000000001606e0
       Call Trace:
        __tcf_idr_release+0x79/0xf0
        tcf_csum_init+0xfb/0x180 [act_csum]
        tcf_action_init_1+0x2cc/0x430
        tcf_action_init+0xd3/0x1b0
        tc_ctl_action+0x18b/0x240
        rtnetlink_rcv_msg+0x29c/0x310
        ? _cond_resched+0x15/0x30
        ? __kmalloc_node_track_caller+0x1b9/0x270
        ? rtnl_calcit.isra.28+0x100/0x100
        netlink_rcv_skb+0xd2/0x110
        netlink_unicast+0x17c/0x230
        netlink_sendmsg+0x2cd/0x3c0
        sock_sendmsg+0x30/0x40
        ___sys_sendmsg+0x27a/0x290
        ? filemap_map_pages+0x34a/0x3a0
        ? __handle_mm_fault+0xbfd/0xe20
        __sys_sendmsg+0x51/0x90
        do_syscall_64+0x6e/0x1a0
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
       RIP: 0033:0x7fa635ce9ba0
       RSP: 002b:00007ffc185b0fc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007ffc185b10f0 RCX: 00007fa635ce9ba0
       RDX: 0000000000000000 RSI: 00007ffc185b1040 RDI: 0000000000000003
       RBP: 000000005aaa85e0 R08: 0000000000000002 R09: 0000000000000000
       R10: 00007ffc185b0a20 R11: 0000000000000246 R12: 0000000000000000
       R13: 00007ffc185b1104 R14: 0000000000000001 R15: 0000000000669f60
       Code: 5d e9 42 da ff ff 66 90 0f 1f 44 00 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 08 40 f6 c7 07 0f 85 19 02 00 00 <48> 89 75 08 48 c7 45 00 00 00 00 00 9c 58 0f 1f 44 00 00 49 89
       RIP: __call_rcu+0x23/0x2b0 RSP: ffffb275803e77c0
       CR2: 0000000000000010
      
      fix this in tcf_csum_cleanup(), ensuring that kfree_rcu(param, ...) is
      called only when param is not NULL.
      
      Fixes: 9c5f69bb ("net/sched: act_csum: don't use spinlock in the fast path")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aab378a7
  18. 10 3月, 2018 1 次提交
  19. 28 2月, 2018 1 次提交
    • K
      net: Convert tc_action_net_init() and tc_action_net_exit() based pernet_operations · 685ecfb1
      Kirill Tkhai 提交于
      These pernet_operations are from net/sched directory, and they call only
      tc_action_net_init() and tc_action_net_exit():
      
      bpf_net_ops
      connmark_net_ops
      csum_net_ops
      gact_net_ops
      ife_net_ops
      ipt_net_ops
      xt_net_ops
      mirred_net_ops
      nat_net_ops
      pedit_net_ops
      police_net_ops
      sample_net_ops
      simp_net_ops
      skbedit_net_ops
      skbmod_net_ops
      tunnel_key_net_ops
      vlan_net_ops
      
      1)tc_action_net_init() just allocates and initializes per-net memory.
      2)There should not be in-flight packets at the time of tc_action_net_exit()
      call, or another pernet_operations send packets to dying net (except
      netlink). So, it seems they can be marked as async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      685ecfb1
  20. 17 2月, 2018 4 次提交
  21. 24 1月, 2018 2 次提交
  22. 14 12月, 2017 1 次提交
  23. 24 11月, 2017 1 次提交
    • W
      net: accept UFO datagrams from tuntap and packet · 0c19f846
      Willem de Bruijn 提交于
      Tuntap and similar devices can inject GSO packets. Accept type
      VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
      
      Processes are expected to use feature negotiation such as TUNSETOFFLOAD
      to detect supported offload types and refrain from injecting other
      packets. This process breaks down with live migration: guest kernels
      do not renegotiate flags, so destination hosts need to expose all
      features that the source host does.
      
      Partially revert the UFO removal from 182e0b6b~1..d9d30adf.
      This patch introduces nearly(*) no new code to simplify verification.
      It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
      insertion and software UFO segmentation.
      
      It does not reinstate protocol stack support, hardware offload
      (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
      of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
      
      To support SKB_GSO_UDP reappearing in the stack, also reinstate
      logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
      by squashing in commit 93991221 ("net: skb_needs_check() removes
      CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6
      ("net: avoid skb_warn_bad_offload false positives on UFO").
      
      (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
      ipv6_proxy_select_ident is changed to return a __be32 and this is
      assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
      at the end of the enum to minimize code churn.
      
      Tested
        Booted a v4.13 guest kernel with QEMU. On a host kernel before this
        patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
        enabled, same as on a v4.13 host kernel.
      
        A UFO packet sent from the guest appears on the tap device:
          host:
            nc -l -p -u 8000 &
            tcpdump -n -i tap0
      
          guest:
            dd if=/dev/zero of=payload.txt bs=1 count=2000
            nc -u 192.16.1.1 8000 < payload.txt
      
        Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
        packets arriving fragmented:
      
          ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
          (from https://github.com/wdebruij/kerneltools/tree/master/tests)
      
      Changes
        v1 -> v2
          - simplified set_offload change (review comment)
          - documented test procedure
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
      Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
      Reported-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c19f846
  24. 09 11月, 2017 1 次提交
  25. 03 11月, 2017 1 次提交
  26. 31 8月, 2017 1 次提交
  27. 18 7月, 2017 1 次提交
  28. 20 5月, 2017 1 次提交
  29. 14 4月, 2017 1 次提交
  30. 24 3月, 2017 1 次提交
    • D
      sched: act_csum: don't mangle TCP and UDP GSO packets · add641e7
      Davide Caratti 提交于
      after act_csum computes the checksum on skbs carrying GSO TCP/UDP packets,
      subsequent segmentation fails because skb_needs_check(skb, true) returns
      true. Because of that, skb_warn_bad_offload() is invoked and the following
      message is displayed:
      
      WARNING: CPU: 3 PID: 28 at net/core/dev.c:2553 skb_warn_bad_offload+0xf0/0xfd
      <...>
      
        [<ffffffff8171f486>] skb_warn_bad_offload+0xf0/0xfd
        [<ffffffff8161304c>] __skb_gso_segment+0xec/0x110
        [<ffffffff8161340d>] validate_xmit_skb+0x12d/0x2b0
        [<ffffffff816135d2>] validate_xmit_skb_list+0x42/0x70
        [<ffffffff8163c560>] sch_direct_xmit+0xd0/0x1b0
        [<ffffffff8163c760>] __qdisc_run+0x120/0x270
        [<ffffffff81613b3d>] __dev_queue_xmit+0x23d/0x690
        [<ffffffff81613fa0>] dev_queue_xmit+0x10/0x20
      
      Since GSO is able to compute checksum on individual segments of such skbs,
      we can simply skip mangling the packet.
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      add641e7
  31. 10 1月, 2017 1 次提交