1. 31 8月, 2017 5 次提交
    • N
      sch_cbq: fix null pointer dereferences on init failure · 3501d059
      Nikolay Aleksandrov 提交于
      CBQ can fail on ->init by wrong nl attributes or simply for missing any,
      f.e. if it's set as a default qdisc then TCA_OPTIONS (opt) will be NULL
      when it is activated. The first thing init does is parse opt but it will
      dereference a null pointer if used as a default qdisc, also since init
      failure at default qdisc invokes ->reset() which cancels all timers then
      we'll also dereference two more null pointers (timer->base) as they were
      never initialized.
      
      To reproduce:
      $ sysctl net.core.default_qdisc=cbq
      $ ip l set ethX up
      
      Crash log of the first null ptr deref:
      [44727.907454] BUG: unable to handle kernel NULL pointer dereference at (null)
      [44727.907600] IP: cbq_init+0x27/0x205
      [44727.907676] PGD 59ff4067
      [44727.907677] P4D 59ff4067
      [44727.907742] PUD 59c70067
      [44727.907807] PMD 0
      [44727.907873]
      [44727.907982] Oops: 0000 [#1] SMP
      [44727.908054] Modules linked in:
      [44727.908126] CPU: 1 PID: 21312 Comm: ip Not tainted 4.13.0-rc6+ #60
      [44727.908235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [44727.908477] task: ffff88005ad42700 task.stack: ffff880037214000
      [44727.908672] RIP: 0010:cbq_init+0x27/0x205
      [44727.908838] RSP: 0018:ffff8800372175f0 EFLAGS: 00010286
      [44727.909018] RAX: ffffffff816c3852 RBX: ffff880058c53800 RCX: 0000000000000000
      [44727.909222] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff8800372175f8
      [44727.909427] RBP: ffff880037217650 R08: ffffffff81b0f380 R09: 0000000000000000
      [44727.909631] R10: ffff880037217660 R11: 0000000000000020 R12: ffffffff822a44c0
      [44727.909835] R13: ffff880058b92000 R14: 00000000ffffffff R15: 0000000000000001
      [44727.910040] FS:  00007ff8bc583740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000
      [44727.910339] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [44727.910525] CR2: 0000000000000000 CR3: 00000000371e5000 CR4: 00000000000406e0
      [44727.910731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [44727.910936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [44727.911141] Call Trace:
      [44727.911291]  ? lockdep_init_map+0xb6/0x1ba
      [44727.911461]  ? qdisc_alloc+0x14e/0x187
      [44727.911626]  qdisc_create_dflt+0x7a/0x94
      [44727.911794]  ? dev_activate+0x129/0x129
      [44727.911959]  attach_one_default_qdisc+0x36/0x63
      [44727.912132]  netdev_for_each_tx_queue+0x3d/0x48
      [44727.912305]  dev_activate+0x4b/0x129
      [44727.912468]  __dev_open+0xe7/0x104
      [44727.912631]  __dev_change_flags+0xc6/0x15c
      [44727.912799]  dev_change_flags+0x25/0x59
      [44727.912966]  do_setlink+0x30c/0xb3f
      [44727.913129]  ? check_chain_key+0xb0/0xfd
      [44727.913294]  ? check_chain_key+0xb0/0xfd
      [44727.913463]  rtnl_newlink+0x3a4/0x729
      [44727.913626]  ? rtnl_newlink+0x117/0x729
      [44727.913801]  ? ns_capable_common+0xd/0xb1
      [44727.913968]  ? ns_capable+0x13/0x15
      [44727.914131]  rtnetlink_rcv_msg+0x188/0x197
      [44727.914300]  ? rcu_read_unlock+0x3e/0x5f
      [44727.914465]  ? rtnl_newlink+0x729/0x729
      [44727.914630]  netlink_rcv_skb+0x6c/0xce
      [44727.914796]  rtnetlink_rcv+0x23/0x2a
      [44727.914956]  netlink_unicast+0x103/0x181
      [44727.915122]  netlink_sendmsg+0x326/0x337
      [44727.915291]  sock_sendmsg_nosec+0x14/0x3f
      [44727.915459]  sock_sendmsg+0x29/0x2e
      [44727.915619]  ___sys_sendmsg+0x209/0x28b
      [44727.915784]  ? do_raw_spin_unlock+0xcd/0xf8
      [44727.915954]  ? _raw_spin_unlock+0x27/0x31
      [44727.916121]  ? __handle_mm_fault+0x651/0xdb1
      [44727.916290]  ? check_chain_key+0xb0/0xfd
      [44727.916461]  __sys_sendmsg+0x45/0x63
      [44727.916626]  ? __sys_sendmsg+0x45/0x63
      [44727.916792]  SyS_sendmsg+0x19/0x1b
      [44727.916950]  entry_SYSCALL_64_fastpath+0x23/0xc2
      [44727.917125] RIP: 0033:0x7ff8bbc96690
      [44727.917286] RSP: 002b:00007ffc360991e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [44727.917579] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007ff8bbc96690
      [44727.917783] RDX: 0000000000000000 RSI: 00007ffc36099230 RDI: 0000000000000003
      [44727.917987] RBP: ffff880037217f98 R08: 0000000000000001 R09: 0000000000000003
      [44727.918190] R10: 00007ffc36098fb0 R11: 0000000000000246 R12: 0000000000000006
      [44727.918393] R13: 000000000066f1a0 R14: 00007ffc360a12e0 R15: 0000000000000000
      [44727.918597]  ? trace_hardirqs_off_caller+0xa7/0xcf
      [44727.918774] Code: 41 5f 5d c3 66 66 66 66 90 55 48 8d 56 04 45 31 c9
      49 c7 c0 80 f3 b0 81 48 89 e5 41 55 41 54 53 48 89 fb 48 8d 7d a8 48 83
      ec 48 <0f> b7 0e be 07 00 00 00 83 e9 04 e8 e6 f7 d8 ff 85 c0 0f 88 bb
      [44727.919332] RIP: cbq_init+0x27/0x205 RSP: ffff8800372175f0
      [44727.919516] CR2: 0000000000000000
      
      Fixes: 0fbbeb1b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3501d059
    • N
      sch_hfsc: fix null pointer deref and double free on init failure · 3bdac362
      Nikolay Aleksandrov 提交于
      Depending on where ->init fails we can get a null pointer deref due to
      uninitialized hires timer (watchdog) or a double free of the qdisc hash
      because it is already freed by ->destroy().
      
      Fixes: 8d553738 ("net/sched/hfsc: allocate tcf block for hfsc root class")
      Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bdac362
    • N
      sch_hhf: fix null pointer dereference on init failure · 32db864d
      Nikolay Aleksandrov 提交于
      If sch_hhf fails in its ->init() function (either due to wrong
      user-space arguments as below or memory alloc failure of hh_flows) it
      will do a null pointer deref of q->hh_flows in its ->destroy() function.
      
      To reproduce the crash:
      $ tc qdisc add dev eth0 root hhf quantum 2000000 non_hh_weight 10000000
      
      Crash log:
      [  690.654882] BUG: unable to handle kernel NULL pointer dereference at (null)
      [  690.655565] IP: hhf_destroy+0x48/0xbc
      [  690.655944] PGD 37345067
      [  690.655948] P4D 37345067
      [  690.656252] PUD 58402067
      [  690.656554] PMD 0
      [  690.656857]
      [  690.657362] Oops: 0000 [#1] SMP
      [  690.657696] Modules linked in:
      [  690.658032] CPU: 3 PID: 920 Comm: tc Not tainted 4.13.0-rc6+ #57
      [  690.658525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [  690.659255] task: ffff880058578000 task.stack: ffff88005acbc000
      [  690.659747] RIP: 0010:hhf_destroy+0x48/0xbc
      [  690.660146] RSP: 0018:ffff88005acbf9e0 EFLAGS: 00010246
      [  690.660601] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
      [  690.661155] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff821f63f0
      [  690.661710] RBP: ffff88005acbfa08 R08: ffffffff81b10a90 R09: 0000000000000000
      [  690.662267] R10: 00000000f42b7019 R11: ffff880058578000 R12: 00000000ffffffea
      [  690.662820] R13: ffff8800372f6400 R14: 0000000000000000 R15: 0000000000000000
      [  690.663769] FS:  00007f8ae5e8b740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
      [  690.667069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  690.667965] CR2: 0000000000000000 CR3: 0000000058523000 CR4: 00000000000406e0
      [  690.668918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  690.669945] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  690.671003] Call Trace:
      [  690.671743]  qdisc_create+0x377/0x3fd
      [  690.672534]  tc_modify_qdisc+0x4d2/0x4fd
      [  690.673324]  rtnetlink_rcv_msg+0x188/0x197
      [  690.674204]  ? rcu_read_unlock+0x3e/0x5f
      [  690.675091]  ? rtnl_newlink+0x729/0x729
      [  690.675877]  netlink_rcv_skb+0x6c/0xce
      [  690.676648]  rtnetlink_rcv+0x23/0x2a
      [  690.677405]  netlink_unicast+0x103/0x181
      [  690.678179]  netlink_sendmsg+0x326/0x337
      [  690.678958]  sock_sendmsg_nosec+0x14/0x3f
      [  690.679743]  sock_sendmsg+0x29/0x2e
      [  690.680506]  ___sys_sendmsg+0x209/0x28b
      [  690.681283]  ? __handle_mm_fault+0xc7d/0xdb1
      [  690.681915]  ? check_chain_key+0xb0/0xfd
      [  690.682449]  __sys_sendmsg+0x45/0x63
      [  690.682954]  ? __sys_sendmsg+0x45/0x63
      [  690.683471]  SyS_sendmsg+0x19/0x1b
      [  690.683974]  entry_SYSCALL_64_fastpath+0x23/0xc2
      [  690.684516] RIP: 0033:0x7f8ae529d690
      [  690.685016] RSP: 002b:00007fff26d2d6b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  690.685931] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f8ae529d690
      [  690.686573] RDX: 0000000000000000 RSI: 00007fff26d2d700 RDI: 0000000000000003
      [  690.687047] RBP: ffff88005acbff98 R08: 0000000000000001 R09: 0000000000000000
      [  690.687519] R10: 00007fff26d2d480 R11: 0000000000000246 R12: 0000000000000002
      [  690.687996] R13: 0000000001258070 R14: 0000000000000001 R15: 0000000000000000
      [  690.688475]  ? trace_hardirqs_off_caller+0xa7/0xcf
      [  690.688887] Code: 00 00 e8 2a 02 ae ff 49 8b bc 1d 60 02 00 00 48 83
      c3 08 e8 19 02 ae ff 48 83 fb 20 75 dc 45 31 f6 4d 89 f7 4d 03 bd 20 02
      00 00 <49> 8b 07 49 39 c7 75 24 49 83 c6 10 49 81 fe 00 40 00 00 75 e1
      [  690.690200] RIP: hhf_destroy+0x48/0xbc RSP: ffff88005acbf9e0
      [  690.690636] CR2: 0000000000000000
      
      Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation")
      Fixes: 10239edf ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32db864d
    • N
      sch_multiq: fix double free on init failure · e89d469e
      Nikolay Aleksandrov 提交于
      The below commit added a call to ->destroy() on init failure, but multiq
      still frees ->queues on error in init, but ->queues is also freed by
      ->destroy() thus we get double free and corrupted memory.
      
      Very easy to reproduce (eth0 not multiqueue):
      $ tc qdisc add dev eth0 root multiq
      RTNETLINK answers: Operation not supported
      $ ip l add dumdum type dummy
      (crash)
      
      Trace log:
      [ 3929.467747] general protection fault: 0000 [#1] SMP
      [ 3929.468083] Modules linked in:
      [ 3929.468302] CPU: 3 PID: 967 Comm: ip Not tainted 4.13.0-rc6+ #56
      [ 3929.468625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 3929.469124] task: ffff88003716a700 task.stack: ffff88005872c000
      [ 3929.469449] RIP: 0010:__kmalloc_track_caller+0x117/0x1be
      [ 3929.469746] RSP: 0018:ffff88005872f6a0 EFLAGS: 00010246
      [ 3929.470042] RAX: 00000000000002de RBX: 0000000058a59000 RCX: 00000000000002df
      [ 3929.470406] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff821f7020
      [ 3929.470770] RBP: ffff88005872f6e8 R08: 000000000001f010 R09: 0000000000000000
      [ 3929.471133] R10: ffff88005872f730 R11: 0000000000008cdd R12: ff006d75646d7564
      [ 3929.471496] R13: 00000000014000c0 R14: ffff88005b403c00 R15: ffff88005b403c00
      [ 3929.471869] FS:  00007f0b70480740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
      [ 3929.472286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3929.472677] CR2: 00007ffcee4f3000 CR3: 0000000059d45000 CR4: 00000000000406e0
      [ 3929.473209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 3929.474109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 3929.474873] Call Trace:
      [ 3929.475337]  ? kstrdup_const+0x23/0x25
      [ 3929.475863]  kstrdup+0x2e/0x4b
      [ 3929.476338]  kstrdup_const+0x23/0x25
      [ 3929.478084]  __kernfs_new_node+0x28/0xbc
      [ 3929.478478]  kernfs_new_node+0x35/0x55
      [ 3929.478929]  kernfs_create_link+0x23/0x76
      [ 3929.479478]  sysfs_do_create_link_sd.isra.2+0x85/0xd7
      [ 3929.480096]  sysfs_create_link+0x33/0x35
      [ 3929.480649]  device_add+0x200/0x589
      [ 3929.481184]  netdev_register_kobject+0x7c/0x12f
      [ 3929.481711]  register_netdevice+0x373/0x471
      [ 3929.482174]  rtnl_newlink+0x614/0x729
      [ 3929.482610]  ? rtnl_newlink+0x17f/0x729
      [ 3929.483080]  rtnetlink_rcv_msg+0x188/0x197
      [ 3929.483533]  ? rcu_read_unlock+0x3e/0x5f
      [ 3929.483984]  ? rtnl_newlink+0x729/0x729
      [ 3929.484420]  netlink_rcv_skb+0x6c/0xce
      [ 3929.484858]  rtnetlink_rcv+0x23/0x2a
      [ 3929.485291]  netlink_unicast+0x103/0x181
      [ 3929.485735]  netlink_sendmsg+0x326/0x337
      [ 3929.486181]  sock_sendmsg_nosec+0x14/0x3f
      [ 3929.486614]  sock_sendmsg+0x29/0x2e
      [ 3929.486973]  ___sys_sendmsg+0x209/0x28b
      [ 3929.487340]  ? do_raw_spin_unlock+0xcd/0xf8
      [ 3929.487719]  ? _raw_spin_unlock+0x27/0x31
      [ 3929.488092]  ? __handle_mm_fault+0x651/0xdb1
      [ 3929.488471]  ? check_chain_key+0xb0/0xfd
      [ 3929.488847]  __sys_sendmsg+0x45/0x63
      [ 3929.489206]  ? __sys_sendmsg+0x45/0x63
      [ 3929.489576]  SyS_sendmsg+0x19/0x1b
      [ 3929.489901]  entry_SYSCALL_64_fastpath+0x23/0xc2
      [ 3929.490172] RIP: 0033:0x7f0b6fb93690
      [ 3929.490423] RSP: 002b:00007ffcee4ed588 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [ 3929.490881] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f0b6fb93690
      [ 3929.491198] RDX: 0000000000000000 RSI: 00007ffcee4ed5d0 RDI: 0000000000000003
      [ 3929.491521] RBP: ffff88005872ff98 R08: 0000000000000001 R09: 0000000000000000
      [ 3929.491801] R10: 00007ffcee4ed350 R11: 0000000000000246 R12: 0000000000000002
      [ 3929.492075] R13: 000000000066f1a0 R14: 00007ffcee4f5680 R15: 0000000000000000
      [ 3929.492352]  ? trace_hardirqs_off_caller+0xa7/0xcf
      [ 3929.492590] Code: 8b 45 c0 48 8b 45 b8 74 17 48 8b 4d c8 83 ca ff 44
      89 ee 4c 89 f7 e8 83 ca ff ff 49 89 c4 eb 49 49 63 56 20 48 8d 48 01 4d
      8b 06 <49> 8b 1c 14 48 89 c2 4c 89 e0 65 49 0f c7 08 0f 94 c0 83 f0 01
      [ 3929.493335] RIP: __kmalloc_track_caller+0x117/0x1be RSP: ffff88005872f6a0
      
      Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation")
      Fixes: f07d1501 ("multiq: Further multiqueue cleanup")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e89d469e
    • N
      sch_htb: fix crash on init failure · 88c2ace6
      Nikolay Aleksandrov 提交于
      The commit below added a call to the ->destroy() callback for all qdiscs
      which failed in their ->init(), but some were not prepared for such
      change and can't handle partially initialized qdisc. HTB is one of them
      and if any error occurs before the qdisc watchdog timer and qdisc work are
      initialized then we can hit either a null ptr deref (timer->base) when
      canceling in ->destroy or lockdep error info about trying to register
      a non-static key and a stack dump. So to fix these two move the watchdog
      timer and workqueue init before anything that can err out.
      To reproduce userspace needs to send broken htb qdisc create request,
      tested with a modified tc (q_htb.c).
      
      Trace log:
      [ 2710.897602] BUG: unable to handle kernel NULL pointer dereference at (null)
      [ 2710.897977] IP: hrtimer_active+0x17/0x8a
      [ 2710.898174] PGD 58fab067
      [ 2710.898175] P4D 58fab067
      [ 2710.898353] PUD 586c0067
      [ 2710.898531] PMD 0
      [ 2710.898710]
      [ 2710.899045] Oops: 0000 [#1] SMP
      [ 2710.899232] Modules linked in:
      [ 2710.899419] CPU: 1 PID: 950 Comm: tc Not tainted 4.13.0-rc6+ #54
      [ 2710.899646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 2710.900035] task: ffff880059ed2700 task.stack: ffff88005ad4c000
      [ 2710.900262] RIP: 0010:hrtimer_active+0x17/0x8a
      [ 2710.900467] RSP: 0018:ffff88005ad4f960 EFLAGS: 00010246
      [ 2710.900684] RAX: 0000000000000000 RBX: ffff88003701e298 RCX: 0000000000000000
      [ 2710.900933] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003701e298
      [ 2710.901177] RBP: ffff88005ad4f980 R08: 0000000000000001 R09: 0000000000000001
      [ 2710.901419] R10: ffff88005ad4f800 R11: 0000000000000400 R12: 0000000000000000
      [ 2710.901663] R13: ffff88003701e298 R14: ffffffff822a4540 R15: ffff88005ad4fac0
      [ 2710.901907] FS:  00007f2f5e90f740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000
      [ 2710.902277] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2710.902500] CR2: 0000000000000000 CR3: 0000000058ca3000 CR4: 00000000000406e0
      [ 2710.902744] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 2710.902977] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 2710.903180] Call Trace:
      [ 2710.903332]  hrtimer_try_to_cancel+0x1a/0x93
      [ 2710.903504]  hrtimer_cancel+0x15/0x20
      [ 2710.903667]  qdisc_watchdog_cancel+0x12/0x14
      [ 2710.903866]  htb_destroy+0x2e/0xf7
      [ 2710.904097]  qdisc_create+0x377/0x3fd
      [ 2710.904330]  tc_modify_qdisc+0x4d2/0x4fd
      [ 2710.904511]  rtnetlink_rcv_msg+0x188/0x197
      [ 2710.904682]  ? rcu_read_unlock+0x3e/0x5f
      [ 2710.904849]  ? rtnl_newlink+0x729/0x729
      [ 2710.905017]  netlink_rcv_skb+0x6c/0xce
      [ 2710.905183]  rtnetlink_rcv+0x23/0x2a
      [ 2710.905345]  netlink_unicast+0x103/0x181
      [ 2710.905511]  netlink_sendmsg+0x326/0x337
      [ 2710.905679]  sock_sendmsg_nosec+0x14/0x3f
      [ 2710.905847]  sock_sendmsg+0x29/0x2e
      [ 2710.906010]  ___sys_sendmsg+0x209/0x28b
      [ 2710.906176]  ? do_raw_spin_unlock+0xcd/0xf8
      [ 2710.906346]  ? _raw_spin_unlock+0x27/0x31
      [ 2710.906514]  ? __handle_mm_fault+0x651/0xdb1
      [ 2710.906685]  ? check_chain_key+0xb0/0xfd
      [ 2710.906855]  __sys_sendmsg+0x45/0x63
      [ 2710.907018]  ? __sys_sendmsg+0x45/0x63
      [ 2710.907185]  SyS_sendmsg+0x19/0x1b
      [ 2710.907344]  entry_SYSCALL_64_fastpath+0x23/0xc2
      
      Note that probably this bug goes further back because the default qdisc
      handling always calls ->destroy on init failure too.
      
      Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation")
      Fixes: 0fbbeb1b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88c2ace6
  2. 25 8月, 2017 1 次提交
    • E
      net_sched: fix a refcount_t issue with noop_qdisc · 551143d8
      Eric Dumazet 提交于
      syzkaller reported a refcount_t warning [1]
      
      Issue here is that noop_qdisc refcnt was never really considered as
      a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set :
      
      if (qdisc->flags & TCQ_F_BUILTIN ||
          !refcount_dec_and_test(&qdisc->refcnt)))
      	return;
      
      Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not
      really needed, but harmless until refcount_t came.
      
      To fix this problem, we simply need to not increment noop_qdisc.refcnt,
      since we never decrement it.
      
      [1]
      refcount_t: increment on 0; use-after-free.
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ #20
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x417 kernel/panic.c:180
       __warn+0x1c4/0x1d9 kernel/panic.c:541
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
       do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
       invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
      RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152
      RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282
      RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000
      RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8
      RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0
      R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000
       attach_default_qdiscs net/sched/sch_generic.c:792 [inline]
       dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833
       __dev_open+0x227/0x330 net/core/dev.c:1380
       __dev_change_flags+0x695/0x990 net/core/dev.c:6726
       dev_change_flags+0x88/0x140 net/core/dev.c:6792
       dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256
       dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554
       sock_do_ioctl+0x94/0xb0 net/socket.c:968
       sock_ioctl+0x2c2/0x440 net/socket.c:1058
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
      
      Fixes: 7b936405 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Reshetova, Elena <elena.reshetova@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      551143d8
  3. 23 8月, 2017 2 次提交
    • J
      net: sched: don't do tcf_chain_flush from tcf_chain_destroy · 30d65e8f
      Jiri Pirko 提交于
      tcf_chain_flush needs to be called with RTNL. However, on
      free_tcf->
       tcf_action_goto_chain_fini->
        tcf_chain_put->
         tcf_chain_destroy->
          tcf_chain_flush
      callpath, it is called without RTNL.
      This issue was notified by following warning:
      
      [  155.599052] WARNING: suspicious RCU usage
      [  155.603165] 4.13.0-rc5jiri+ #54 Not tainted
      [  155.607456] -----------------------------
      [  155.611561] net/sched/cls_api.c:195 suspicious rcu_dereference_protected() usage!
      
      Since on this callpath, the chain is guaranteed to be already empty
      by check in tcf_chain_put, move the tcf_chain_flush call out and call it
      only where it is needed - into tcf_block_put.
      
      Fixes: db50514f ("net: sched: add termination action to allow goto chain")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30d65e8f
    • J
      net: sched: fix use after free when tcf_chain_destroy is called multiple times · 744a4cf6
      Jiri Pirko 提交于
      The goto_chain termination action takes a reference of a chain. In that
      case, there is an issue when block_put is called tcf_chain_destroy
      directly. The follo-up call of tcf_chain_put by goto_chain action free
      works with memory that is already freed. This was caught by kasan:
      
      [  220.337908] BUG: KASAN: use-after-free in tcf_chain_put+0x1b/0x50
      [  220.344103] Read of size 4 at addr ffff88036d1f2cec by task systemd-journal/261
      [  220.353047] CPU: 0 PID: 261 Comm: systemd-journal Not tainted 4.13.0-rc5jiri+ #54
      [  220.360661] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016
      [  220.371784] Call Trace:
      [  220.374290]  <IRQ>
      [  220.376355]  dump_stack+0xd5/0x150
      [  220.391485]  print_address_description+0x86/0x410
      [  220.396308]  kasan_report+0x181/0x4c0
      [  220.415211]  tcf_chain_put+0x1b/0x50
      [  220.418949]  free_tcf+0x95/0xc0
      
      So allow tcf_chain_destroy to be called multiple times, free only in
      case the reference count drops to 0.
      
      Fixes: 5bc17018 ("net: sched: introduce multichain support for filters")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      744a4cf6
  4. 19 8月, 2017 2 次提交
  5. 16 8月, 2017 3 次提交
  6. 12 8月, 2017 1 次提交
  7. 10 8月, 2017 1 次提交
  8. 09 8月, 2017 1 次提交
    • X
      net: sched: set xt_tgchk_param par.net properly in ipt_init_target · ec0acb09
      Xin Long 提交于
      Now xt_tgchk_param par in ipt_init_target is a local varibale,
      par.net is not initialized there. Later when xt_check_target
      calls target's checkentry in which it may access par.net, it
      would cause kernel panic.
      
      Jaroslav found this panic when running:
      
        # ip link add TestIface type dummy
        # tc qd add dev TestIface ingress handle ffff:
        # tc filter add dev TestIface parent ffff: u32 match u32 0 0 \
          action xt -j CONNMARK --set-mark 4
      
      This patch is to pass net param into ipt_init_target and set
      par.net with it properly in there.
      
      v1->v2:
        As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix
        it by also passing net_id to __tcf_ipt_init.
      v2->v3:
        Missed the fixes tag, so add it.
      
      Fixes: ecb2421b ("netfilter: add and use nf_ct_netns_get/put")
      Reported-by: NJaroslav Aster <jaster@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0acb09
  9. 14 7月, 2017 1 次提交
  10. 13 7月, 2017 1 次提交
    • M
      mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic · dcda9b04
      Michal Hocko 提交于
      __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
      the page allocator.  This has been true but only for allocations
      requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
      ignored for smaller sizes.  This is a bit unfortunate because there is
      no way to express the same semantic for those requests and they are
      considered too important to fail so they might end up looping in the
      page allocator for ever, similarly to GFP_NOFAIL requests.
      
      Now that the whole tree has been cleaned up and accidental or misled
      usage of __GFP_REPEAT flag has been removed for !costly requests we can
      give the original flag a better name and more importantly a more useful
      semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
      that the allocator would try really hard but there is no promise of a
      success.  This will work independent of the order and overrides the
      default allocator behavior.  Page allocator users have several levels of
      guarantee vs.  cost options (take GFP_KERNEL as an example)
      
       - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
         attempt to free memory at all. The most light weight mode which even
         doesn't kick the background reclaim. Should be used carefully because
         it might deplete the memory and the next user might hit the more
         aggressive reclaim
      
       - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
         allocation without any attempt to free memory from the current
         context but can wake kswapd to reclaim memory if the zone is below
         the low watermark. Can be used from either atomic contexts or when
         the request is a performance optimization and there is another
         fallback for a slow path.
      
       - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
         non sleeping allocation with an expensive fallback so it can access
         some portion of memory reserves. Usually used from interrupt/bh
         context with an expensive slow path fallback.
      
       - GFP_KERNEL - both background and direct reclaim are allowed and the
         _default_ page allocator behavior is used. That means that !costly
         allocation requests are basically nofail but there is no guarantee of
         that behavior so failures have to be checked properly by callers
         (e.g. OOM killer victim is allowed to fail currently).
      
       - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
         and all allocation requests fail early rather than cause disruptive
         reclaim (one round of reclaim in this implementation). The OOM killer
         is not invoked.
      
       - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
         behavior and all allocation requests try really hard. The request
         will fail if the reclaim cannot make any progress. The OOM killer
         won't be triggered.
      
       - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
         and all allocation requests will loop endlessly until they succeed.
         This might be really dangerous especially for larger orders.
      
      Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
      because they already had their semantic.  No new users are added.
      __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
      there is no progress and we have already passed the OOM point.
      
      This means that all the reclaim opportunities have been exhausted except
      the most disruptive one (the OOM killer) and a user defined fallback
      behavior is more sensible than keep retrying in the page allocator.
      
      [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
      [mhocko@suse.com: semantic fix]
        Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
      [mhocko@kernel.org: address other thing spotted by Vlastimil]
        Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcda9b04
  11. 05 7月, 2017 1 次提交
  12. 01 7月, 2017 2 次提交
  13. 30 6月, 2017 1 次提交
    • G
      net: sched: Fix one possible panic when no destroy callback · c1a4872e
      Gao Feng 提交于
      When qdisc fail to init, qdisc_create would invoke the destroy callback
      to cleanup. But there is no check if the callback exists really. So it
      would cause the panic if there is no real destroy callback like the qdisc
      codel, fq, and so on.
      
      Take codel as an example following:
      When a malicious user constructs one invalid netlink msg, it would cause
      codel_init->codel_change->nla_parse_nested failed.
      Then kernel would invoke the destroy callback directly but qdisc codel
      doesn't define one. It causes one panic as a result.
      
      Now add one the check for destroy to avoid the possible panic.
      
      Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation")
      Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1a4872e
  14. 22 6月, 2017 1 次提交
  15. 16 6月, 2017 2 次提交
  16. 15 6月, 2017 2 次提交
  17. 08 6月, 2017 1 次提交
  18. 07 6月, 2017 1 次提交
  19. 05 6月, 2017 2 次提交
  20. 26 5月, 2017 2 次提交
  21. 25 5月, 2017 1 次提交
  22. 23 5月, 2017 3 次提交
  23. 20 5月, 2017 1 次提交
  24. 18 5月, 2017 2 次提交