1. 18 1月, 2018 1 次提交
  2. 17 1月, 2018 2 次提交
    • D
      net, sched: fix panic when updating miniq {b,q}stats · 81d947e2
      Daniel Borkmann 提交于
      While working on fixing another bug, I ran into the following panic
      on arm64 by simply attaching clsact qdisc, adding a filter and running
      traffic on ingress to it:
      
        [...]
        [  178.188591] Unable to handle kernel read from unreadable memory at virtual address 810fb501f000
        [  178.197314] Mem abort info:
        [  178.200121]   ESR = 0x96000004
        [  178.203168]   Exception class = DABT (current EL), IL = 32 bits
        [  178.209095]   SET = 0, FnV = 0
        [  178.212157]   EA = 0, S1PTW = 0
        [  178.215288] Data abort info:
        [  178.218175]   ISV = 0, ISS = 0x00000004
        [  178.222019]   CM = 0, WnR = 0
        [  178.224997] user pgtable: 4k pages, 48-bit VAs, pgd = 0000000023cb3f33
        [  178.231531] [0000810fb501f000] *pgd=0000000000000000
        [  178.236508] Internal error: Oops: 96000004 [#1] SMP
        [...]
        [  178.311855] CPU: 73 PID: 2497 Comm: ping Tainted: G        W        4.15.0-rc7+ #5
        [  178.319413] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB18A 03/31/2017
        [  178.326887] pstate: 60400005 (nZCv daif +PAN -UAO)
        [  178.331685] pc : __netif_receive_skb_core+0x49c/0xac8
        [  178.336728] lr : __netif_receive_skb+0x28/0x78
        [  178.341161] sp : ffff00002344b750
        [  178.344465] x29: ffff00002344b750 x28: ffff810fbdfd0580
        [  178.349769] x27: 0000000000000000 x26: ffff000009378000
        [...]
        [  178.418715] x1 : 0000000000000054 x0 : 0000000000000000
        [  178.424020] Process ping (pid: 2497, stack limit = 0x000000009f0a3ff4)
        [  178.430537] Call trace:
        [  178.432976]  __netif_receive_skb_core+0x49c/0xac8
        [  178.437670]  __netif_receive_skb+0x28/0x78
        [  178.441757]  process_backlog+0x9c/0x160
        [  178.445584]  net_rx_action+0x2f8/0x3f0
        [...]
      
      Reason is that sch_ingress and sch_clsact are doing mini_qdisc_pair_init()
      which sets up miniq pointers to cpu_{b,q}stats from the underlying qdisc.
      Problem is that this cannot work since they are actually set up right after
      the qdisc ->init() callback in qdisc_create(), so first packet going into
      sch_handle_ingress() tries to call mini_qdisc_bstats_cpu_update() and we
      therefore panic.
      
      In order to fix this, allocation of {b,q}stats needs to happen before we
      call into ->init(). In net-next, there's already such option through commit
      d59f5ffa ("net: sched: a dflt qdisc may be used with per cpu stats").
      However, the bug needs to be fixed in net still for 4.15. Thus, include
      these bits to reduce any merge churn and reuse the static_flags field to
      set TCQ_F_CPUSTATS, and remove the allocation from qdisc_create() since
      there is no other user left. Prashant Bhole ran into the same issue but
      for net-next, thus adding him below as well as co-author. Same issue was
      also reported by Sandipan Das when using bcc.
      
      Fixes: 46209401 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath")
      Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2018-January/001190.htmlReported-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Co-authored-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Co-authored-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81d947e2
    • A
      net: delete /proc THIS_MODULE references · 96890d62
      Alexey Dobriyan 提交于
      /proc has been ignoring struct file_operations::owner field for 10 years.
      Specifically, it started with commit 786d7e16
      ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
      inode->i_fop is initialized with proxy struct file_operations for
      regular files:
      
      	-               if (de->proc_fops)
      	-                       inode->i_fop = de->proc_fops;
      	+               if (de->proc_fops) {
      	+                       if (S_ISREG(inode->i_mode))
      	+                               inode->i_fop = &proc_reg_file_ops;
      	+                       else
      	+                               inode->i_fop = de->proc_fops;
      	+               }
      
      VFS stopped pinning module at this point.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96890d62
  3. 03 1月, 2018 1 次提交
  4. 29 12月, 2017 1 次提交
  5. 22 12月, 2017 9 次提交
  6. 16 12月, 2017 1 次提交
  7. 09 12月, 2017 2 次提交
  8. 06 12月, 2017 2 次提交
  9. 29 10月, 2017 1 次提交
    • C
      net_sched: avoid matching qdisc with zero handle · 50317fce
      Cong Wang 提交于
      Davide found the following script triggers a NULL pointer
      dereference:
      
      ip l a name eth0 type dummy
      tc q a dev eth0 parent :1 handle 1: htb
      
      This is because for a freshly created netdevice noop_qdisc
      is attached and when passing 'parent :1', kernel actually
      tries to match the major handle which is 0 and noop_qdisc
      has handle 0 so is matched by mistake. Commit 69012ae4
      tries to fix a similar bug but still misses this case.
      
      Handle 0 is not a valid one, should be just skipped. In
      fact, kernel uses it as TC_H_UNSPEC.
      
      Fixes: 69012ae4 ("net: sched: fix handling of singleton qdiscs with qdisc_hash")
      Fixes: 59cc1f61 ("net: sched:convert qdisc linked list to hashtable")
      Reported-by: NDavide Caratti <dcaratti@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50317fce
  10. 17 10月, 2017 1 次提交
  11. 19 9月, 2017 1 次提交
  12. 01 9月, 2017 1 次提交
    • C
      net_sched: add reverse binding for tc class · 07d79fc7
      Cong Wang 提交于
      TC filters when used as classifiers are bound to TC classes.
      However, there is a hidden difference when adding them in different
      orders:
      
      1. If we add tc classes before its filters, everything is fine.
         Logically, the classes exist before we specify their ID's in
         filters, it is easy to bind them together, just as in the current
         code base.
      
      2. If we add tc filters before the tc classes they bind, we have to
         do dynamic lookup in fast path. What's worse, this happens all
         the time not just once, because on fast path tcf_result is passed
         on stack, there is no way to propagate back to the one in tc filters.
      
      This hidden difference hurts performance silently if we have many tc
      classes in hierarchy.
      
      This patch intends to close this gap by doing the reverse binding when
      we create a new class, in this case we can actually search all the
      filters in its parent, match and fixup by classid. And because
      tcf_result is specific to each type of tc filter, we have to introduce
      a new ops for each filter to tell how to bind the class.
      
      Note, we still can NOT totally get rid of those class lookup in
      ->enqueue() because cgroup and flow filters have no way to determine
      the classid at setup time, they still have to go through dynamic lookup.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07d79fc7
  13. 26 8月, 2017 3 次提交
  14. 25 8月, 2017 1 次提交
    • E
      net_sched: fix a refcount_t issue with noop_qdisc · 551143d8
      Eric Dumazet 提交于
      syzkaller reported a refcount_t warning [1]
      
      Issue here is that noop_qdisc refcnt was never really considered as
      a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set :
      
      if (qdisc->flags & TCQ_F_BUILTIN ||
          !refcount_dec_and_test(&qdisc->refcnt)))
      	return;
      
      Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not
      really needed, but harmless until refcount_t came.
      
      To fix this problem, we simply need to not increment noop_qdisc.refcnt,
      since we never decrement it.
      
      [1]
      refcount_t: increment on 0; use-after-free.
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ #20
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x417 kernel/panic.c:180
       __warn+0x1c4/0x1d9 kernel/panic.c:541
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
       do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
       invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
      RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152
      RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282
      RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000
      RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8
      RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0
      R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000
       attach_default_qdiscs net/sched/sch_generic.c:792 [inline]
       dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833
       __dev_open+0x227/0x330 net/core/dev.c:1380
       __dev_change_flags+0x695/0x990 net/core/dev.c:6726
       dev_change_flags+0x88/0x140 net/core/dev.c:6792
       dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256
       dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554
       sock_do_ioctl+0x94/0xb0 net/socket.c:968
       sock_ioctl+0x2c2/0x440 net/socket.c:1058
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
      
      Fixes: 7b936405 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Reshetova, Elena <elena.reshetova@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      551143d8
  15. 23 8月, 2017 1 次提交
  16. 17 8月, 2017 1 次提交
  17. 16 8月, 2017 1 次提交
  18. 10 8月, 2017 1 次提交
  19. 05 7月, 2017 1 次提交
  20. 30 6月, 2017 1 次提交
    • G
      net: sched: Fix one possible panic when no destroy callback · c1a4872e
      Gao Feng 提交于
      When qdisc fail to init, qdisc_create would invoke the destroy callback
      to cleanup. But there is no check if the callback exists really. So it
      would cause the panic if there is no real destroy callback like the qdisc
      codel, fq, and so on.
      
      Take codel as an example following:
      When a malicious user constructs one invalid netlink msg, it would cause
      codel_init->codel_change->nla_parse_nested failed.
      Then kernel would invoke the destroy callback directly but qdisc codel
      doesn't define one. It causes one panic as a result.
      
      Now add one the check for destroy to avoid the possible panic.
      
      Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation")
      Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1a4872e
  21. 18 5月, 2017 2 次提交
  22. 12 5月, 2017 1 次提交
    • E
      net: sched: optimize class dumps · cb395b20
      Eric Dumazet 提交于
      In commit 59cc1f61 ("net: sched: convert qdisc linked list to
      hashtable") we missed the opportunity to considerably speed up
      tc_dump_tclass_root() if a qdisc handle is provided by user.
      
      Instead of iterating all the qdiscs, use qdisc_match_from_root()
      to directly get the one we look for.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb395b20
  23. 18 4月, 2017 2 次提交
  24. 14 4月, 2017 1 次提交
  25. 13 3月, 2017 1 次提交