1. 08 7月, 2018 5 次提交
  2. 20 6月, 2018 2 次提交
    • D
      net/sched: act_ife: preserve the action control in case of error · cbf56c29
      Davide Caratti 提交于
      in the following script
      
       # tc actions add action ife encode allow prio pass index 42
       # tc actions replace action ife encode allow tcindex drop index 42
      
      the action control should remain equal to 'pass', if the kernel failed
      to replace the TC action. Pospone the assignment of the action control,
      to ensure it is not overwritten in the error path of tcf_ife_init().
      
      Fixes: ef6980b6 ("introduce IFE action")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbf56c29
    • D
      net/sched: act_ife: fix recursive lock and idr leak · 0a889b94
      Davide Caratti 提交于
      a recursive lock warning [1] can be observed with the following script,
      
       # $TC actions add action ife encode allow prio pass index 42
       IFE type 0xED3E
       # $TC actions replace action ife encode allow tcindex pass index 42
      
      in case the kernel was unable to run the last command (e.g. because of
      the impossibility to load 'act_meta_skbtcindex'). For a similar reason,
      the kernel can leak idr in the error path of tcf_ife_init(), because
      tcf_idr_release() is not called after successful idr reservation:
      
       # $TC actions add action ife encode allow tcindex index 47
       IFE type 0xED3E
       RTNETLINK answers: No such file or directory
       We have an error talking to the kernel
       # $TC actions add action ife encode allow tcindex index 47
       IFE type 0xED3E
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # $TC actions add action ife encode use mark 7 type 0xfefe pass index 47
       IFE type 0xFEFE
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
      
      Since tcfa_lock is already taken when the action is being edited, a call
      to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock
      again. On the other hand, tcf_idr_release() needs to be called in the
      error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation.
      Fix both problems in tcf_ife_init().
      Since the cleanup() routine can now be called when ife->params is NULL,
      also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu).
      
       [1]
       ============================================
       WARNING: possible recursive locking detected
       4.17.0-rc4.kasan+ #417 Tainted: G            E
       --------------------------------------------
       tc/3932 is trying to acquire lock:
       000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_cleanup+0x19/0x80 [act_ife]
      
       but task is already holding lock:
       000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&(&p->tcfa_lock)->rlock);
         lock(&(&p->tcfa_lock)->rlock);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       2 locks held by tc/3932:
        #0: 000000007ca8e990 (rtnl_mutex){+.+.}, at: tcf_ife_init+0xf61/0x13c0 [act_ife]
        #1: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]
      
       stack backtrace:
       CPU: 3 PID: 3932 Comm: tc Tainted: G            E     4.17.0-rc4.kasan+ #417
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       Call Trace:
        dump_stack+0x9a/0xeb
        __lock_acquire+0xf43/0x34a0
        ? debug_check_no_locks_freed+0x2b0/0x2b0
        ? debug_check_no_locks_freed+0x2b0/0x2b0
        ? debug_check_no_locks_freed+0x2b0/0x2b0
        ? __mutex_lock+0x62f/0x1240
        ? kvm_sched_clock_read+0x1a/0x30
        ? sched_clock+0x5/0x10
        ? sched_clock_cpu+0x18/0x170
        ? find_held_lock+0x39/0x1d0
        ? lock_acquire+0x10b/0x330
        lock_acquire+0x10b/0x330
        ? tcf_ife_cleanup+0x19/0x80 [act_ife]
        _raw_spin_lock_bh+0x38/0x70
        ? tcf_ife_cleanup+0x19/0x80 [act_ife]
        tcf_ife_cleanup+0x19/0x80 [act_ife]
        __tcf_idr_release+0xff/0x350
        tcf_ife_init+0xdde/0x13c0 [act_ife]
        ? ife_exit_net+0x290/0x290 [act_ife]
        ? __lock_is_held+0xb4/0x140
        tcf_action_init_1+0x67b/0xad0
        ? tcf_action_dump_old+0xa0/0xa0
        ? sched_clock+0x5/0x10
        ? sched_clock_cpu+0x18/0x170
        ? kvm_sched_clock_read+0x1a/0x30
        ? sched_clock+0x5/0x10
        ? sched_clock_cpu+0x18/0x170
        ? memset+0x1f/0x40
        tcf_action_init+0x30f/0x590
        ? tcf_action_init_1+0xad0/0xad0
        ? memset+0x1f/0x40
        tc_ctl_action+0x48e/0x5e0
        ? mutex_lock_io_nested+0x1160/0x1160
        ? tca_action_gd+0x990/0x990
        ? sched_clock+0x5/0x10
        ? find_held_lock+0x39/0x1d0
        rtnetlink_rcv_msg+0x4da/0x990
        ? validate_linkmsg+0x680/0x680
        ? sched_clock_cpu+0x18/0x170
        ? find_held_lock+0x39/0x1d0
        netlink_rcv_skb+0x127/0x350
        ? validate_linkmsg+0x680/0x680
        ? netlink_ack+0x970/0x970
        ? __kmalloc_node_track_caller+0x304/0x3a0
        netlink_unicast+0x40f/0x5d0
        ? netlink_attachskb+0x580/0x580
        ? _copy_from_iter_full+0x187/0x760
        ? import_iovec+0x90/0x390
        netlink_sendmsg+0x67f/0xb50
        ? netlink_unicast+0x5d0/0x5d0
        ? copy_msghdr_from_user+0x206/0x340
        ? netlink_unicast+0x5d0/0x5d0
        sock_sendmsg+0xb3/0xf0
        ___sys_sendmsg+0x60a/0x8b0
        ? copy_msghdr_from_user+0x340/0x340
        ? lock_downgrade+0x5e0/0x5e0
        ? tty_write_lock+0x18/0x50
        ? kvm_sched_clock_read+0x1a/0x30
        ? sched_clock+0x5/0x10
        ? sched_clock_cpu+0x18/0x170
        ? find_held_lock+0x39/0x1d0
        ? lock_downgrade+0x5e0/0x5e0
        ? lock_acquire+0x10b/0x330
        ? __audit_syscall_entry+0x316/0x690
        ? current_kernel_time64+0x6b/0xd0
        ? __fget_light+0x55/0x1f0
        ? __sys_sendmsg+0xd2/0x170
        __sys_sendmsg+0xd2/0x170
        ? __ia32_sys_shutdown+0x70/0x70
        ? syscall_trace_enter+0x57a/0xd60
        ? rcu_read_lock_sched_held+0xdc/0x110
        ? __bpf_trace_sys_enter+0x10/0x10
        ? do_syscall_64+0x22/0x480
        do_syscall_64+0xa5/0x480
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7fd646988ba0
       RSP: 002b:00007fffc9fab3c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007fffc9fab4f0 RCX: 00007fd646988ba0
       RDX: 0000000000000000 RSI: 00007fffc9fab440 RDI: 0000000000000003
       RBP: 000000005b28c8b3 R08: 0000000000000002 R09: 0000000000000000
       R10: 00007fffc9faae20 R11: 0000000000000246 R12: 0000000000000000
       R13: 00007fffc9fab504 R14: 0000000000000001 R15: 000000000066c100
      
      Fixes: 4e8c8615 ("net sched: net sched: ife action fix late binding")
      Fixes: ef6980b6 ("introduce IFE action")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a889b94
  3. 23 4月, 2018 2 次提交
  4. 28 3月, 2018 1 次提交
  5. 28 2月, 2018 1 次提交
    • K
      net: Convert tc_action_net_init() and tc_action_net_exit() based pernet_operations · 685ecfb1
      Kirill Tkhai 提交于
      These pernet_operations are from net/sched directory, and they call only
      tc_action_net_init() and tc_action_net_exit():
      
      bpf_net_ops
      connmark_net_ops
      csum_net_ops
      gact_net_ops
      ife_net_ops
      ipt_net_ops
      xt_net_ops
      mirred_net_ops
      nat_net_ops
      pedit_net_ops
      police_net_ops
      sample_net_ops
      simp_net_ops
      skbedit_net_ops
      skbmod_net_ops
      tunnel_key_net_ops
      vlan_net_ops
      
      1)tc_action_net_init() just allocates and initializes per-net memory.
      2)There should not be in-flight packets at the time of tc_action_net_exit()
      call, or another pernet_operations send packets to dying net (except
      netlink). So, it seems they can be marked as async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      685ecfb1
  6. 17 2月, 2018 4 次提交
  7. 14 12月, 2017 1 次提交
  8. 06 12月, 2017 1 次提交
  9. 09 11月, 2017 1 次提交
  10. 03 11月, 2017 1 次提交
  11. 15 10月, 2017 1 次提交
  12. 13 10月, 2017 5 次提交
  13. 31 8月, 2017 1 次提交
  14. 30 8月, 2017 1 次提交
  15. 14 4月, 2017 1 次提交
  16. 17 3月, 2017 1 次提交
  17. 04 2月, 2017 2 次提交
  18. 09 1月, 2017 1 次提交
  19. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  20. 27 9月, 2016 2 次提交
    • Y
      act_ife: Fix false encoding · c006da0b
      Yotam Gigi 提交于
      On ife encode side, the action stores the different tlvs inside the ife
      header, where each tlv length field should refer to the length of the
      whole tlv (without additional padding) and not just the data length.
      
      On ife decode side, the action iterates over the tlvs in the ife header
      and parses them one by one, where in each iteration the current pointer is
      advanced according to the tlv size.
      
      Before, the encoding encoded only the data length inside the tlv, which led
      to false parsing of ife the header. In addition, due to the fact that the
      loop counter was unsigned, it could lead to infinite parsing loop.
      
      This fix changes the loop counter to be signed and fixes the encoding to
      take into account the tlv type and size.
      
      Fixes: 28a10c42 ("net sched: fix encoding to use real length")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c006da0b
    • Y
      act_ife: Fix external mac header on encode · 4b1d488a
      Yotam Gigi 提交于
      On ife encode side, external mac header is copied from the original packet
      and may be overridden if the user requests. Before, the mac header copy
      was done from memory region that might not be accessible anymore, as
      skb_cow_head might free it and copy the packet. This led to random values
      in the external mac header once the values were not set by user.
      
      This fix takes the internal mac header from the packet, after the call to
      skb_cow_head.
      
      Fixes: ef6980b6 ("net sched: introduce IFE action")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b1d488a
  21. 20 9月, 2016 1 次提交
  22. 23 8月, 2016 1 次提交
    • J
      net sched: fix encoding to use real length · 28a10c42
      Jamal Hadi Salim 提交于
      Encoding of the metadata was using the padded length as opposed to
      the real length of the data which is a bug per specification.
      This has not been an issue todate because all metadatum specified
      so far has been 32 bit where aligned and data length are the same width.
      This also includes a bug fix for validating the length of a u16 field.
      But since there is no metadata of size u16 yes we are fine to include it
      here.
      
      While at it get rid of magic numbers.
      
      Fixes: ef6980b6 ("net sched: introduce IFE action")
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28a10c42
  23. 26 7月, 2016 1 次提交
    • W
      net_sched: move tc_action into tcf_common · a85a970a
      WANG Cong 提交于
      struct tc_action is confusing, currently we use it for two purposes:
      1) Pass in arguments and carry out results from helper functions
      2) A generic representation for tc actions
      
      The first one is error-prone, since we need to make sure we don't
      miss anything. This patch aims to get rid of this use, by moving
      tc_action into tcf_common, so that they are allocated together
      in hashtable and can be cast'ed easily.
      
      And together with the following patch, we could really make
      tc_action a generic representation for all tc actions and each
      type of action can inherit from it.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a85a970a
  24. 24 6月, 2016 2 次提交