1. 20 10月, 2017 4 次提交
  2. 19 10月, 2017 9 次提交
    • X
      sctp: do not peel off an assoc from one netns to another one · df80cd9b
      Xin Long 提交于
      Now when peeling off an association to the sock in another netns, all
      transports in this assoc are not to be rehashed and keep use the old
      key in hashtable.
      
      As a transport uses sk->net as the hash key to insert into hashtable,
      it would miss removing these transports from hashtable due to the new
      netns when closing the sock and all transports are being freeed, then
      later an use-after-free issue could be caused when looking up an asoc
      and dereferencing those transports.
      
      This is a very old issue since very beginning, ChunYu found it with
      syzkaller fuzz testing with this series:
      
        socket$inet6_sctp()
        bind$inet6()
        sendto$inet6()
        unshare(0x40000000)
        getsockopt$inet_sctp6_SCTP_GET_ASSOC_ID_LIST()
        getsockopt$inet_sctp6_SCTP_SOCKOPT_PEELOFF()
      
      This patch is to block this call when peeling one assoc off from one
      netns to another one, so that the netns of all transport would not
      go out-sync with the key in hashtable.
      
      Note that this patch didn't fix it by rehashing transports, as it's
      difficult to handle the situation when the tuple is already in use
      in the new netns. Besides, no one would like to peel off one assoc
      to another netns, considering ipaddrs, ifaces, etc. are usually
      different.
      Reported-by: NChunYu Wang <chunwang@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df80cd9b
    • D
      Merge branch 'bpf-Fix-for-BPF-devmap-percpu-allocation-splat' · 4bbb5083
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      bpf: Fix for BPF devmap percpu allocation splat
      
      The set fixes a splat in devmap percpu allocation when we alloc
      the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
      patch 1 is rather small, so if this could be routed via -net, for
      example, with Tejun's Ack that would be good. Patch 3 gets rid of
      remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
      internals and should not be used.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bbb5083
    • D
      bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations · bc6d5031
      Daniel Borkmann 提交于
      PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu
      allocator. Given we support __GFP_NOWARN now, lets just let
      the allocation request fail naturally instead. The two call
      sites from BPF mistakenly assumed __GFP_NOWARN would work, so
      no changes needed to their actual __alloc_percpu_gfp() calls
      which use the flag already.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc6d5031
    • D
      bpf: fix splat for illegal devmap percpu allocation · 82f8dd28
      Daniel Borkmann 提交于
      It was reported that syzkaller was able to trigger a splat on
      devmap percpu allocation due to illegal/unsupported allocation
      request size passed to __alloc_percpu():
      
        [   70.094249] illegal size (32776) or align (8) for percpu allocation
        [   70.094256] ------------[ cut here ]------------
        [   70.094259] WARNING: CPU: 3 PID: 3451 at mm/percpu.c:1365 pcpu_alloc+0x96/0x630
        [...]
        [   70.094325] Call Trace:
        [   70.094328]  __alloc_percpu_gfp+0x12/0x20
        [   70.094330]  dev_map_alloc+0x134/0x1e0
        [   70.094331]  SyS_bpf+0x9bc/0x1610
        [   70.094333]  ? selinux_task_setrlimit+0x5a/0x60
        [   70.094334]  ? security_task_setrlimit+0x43/0x60
        [   70.094336]  entry_SYSCALL_64_fastpath+0x1a/0xa5
      
      This was due to too large max_entries for the map such that we
      surpassed the upper limit of PCPU_MIN_UNIT_SIZE. It's fine to
      fail naturally here, so switch to __alloc_percpu_gfp() and pass
      __GFP_NOWARN instead.
      
      Fixes: 11393cc9 ("xdp: Add batching support to redirect map")
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Reported-by: NShankara Pailoor <sp3485@columbia.edu>
      Reported-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82f8dd28
    • D
      mm, percpu: add support for __GFP_NOWARN flag · 0ea7eeec
      Daniel Borkmann 提交于
      Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
      Currently, we always throw a warning when size or alignment
      is unsupported (and also dump stack on failed allocation
      requests). The warning itself is harmless since we return
      NULL anyway for any failed request, which callers are
      required to handle anyway. However, it becomes harmful when
      panic_on_warn is set.
      
      The rationale for the WARN() in pcpu_alloc() is that it can
      be tracked when larger than supported allocation requests are
      made such that allocations limits can be tweaked if warranted.
      This makes sense for in-kernel users, however, there are users
      of pcpu allocator where allocation size is derived from user
      space requests, e.g. when creating BPF maps. In these cases,
      the requests should fail gracefully without throwing a splat.
      
      The current work-around was to check allocation size against
      the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
      bailing out prior to a call to pcpu_alloc() in order to
      avoid throwing the WARN(). This is bad in multiple ways since
      PCPU_MIN_UNIT_SIZE is an implementation detail, and having
      the checks on call-sites only complicates the code for no
      good reason. Thus, lets fix it generically by supporting the
      __GFP_NOWARN flag that users can then use with calling the
      __alloc_percpu_gfp() helper instead.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ea7eeec
    • D
      Merge branch 'ena-fixes' · 3fd3b03b
      David S. Miller 提交于
      Netanel Belgazal says:
      
      ====================
      ENA ethernet driver bug fixes
      
      Some fixes for ENA ethernet driver
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fd3b03b
    • N
      net: ena: fix wrong max Tx/Rx queues on ethtool · a59df396
      Netanel Belgazal 提交于
      ethtool ena_get_channels() expose the max number of queues as the max
      number of queues ENA supports (128 queues) and not the actual number
      of created queues.
      Signed-off-by: NNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a59df396
    • N
      net: ena: fix rare kernel crash when bar memory remap fails · 411838e7
      Netanel Belgazal 提交于
      This failure is rare and only found on testing where deliberately fail
      devm_ioremap()
      
      [  451.170464] ena 0000:04:00.0: failed to remap regs bar
      451.170549] Workqueue: pciehp-1 pciehp_power_thread
      [  451.170551] task: ffff88085a5f2d00 task.stack: ffffc9000756c000
      [  451.170552] RIP: 0010:devm_iounmap+0x2d/0x40
      [  451.170553] RSP: 0018:ffffc9000756fac0 EFLAGS: 00010282
      [  451.170554] RAX: 00000000fffffffe RBX: 0000000000000000 RCX:
      0000000000000000
      [  451.170555] RDX: ffffffff813a7e00 RSI: 0000000000000282 RDI:
      0000000000000282
      [  451.170556] RBP: ffffc9000756fac8 R08: 00000000fffffffe R09:
      00000000000009b7
      [  451.170557] R10: 0000000000000005 R11: 00000000000009b6 R12:
      ffff880856c9d0a0
      [  451.170558] R13: ffffc9000f5c90c0 R14: ffff880856c9d0a0 R15:
      0000000000000028
      [  451.170559] FS:  0000000000000000(0000) GS:ffff88085f400000(0000)
      knlGS:0000000000000000
      [  451.170560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  451.170561] CR2: 00007f169038b000 CR3: 0000000001c09000 CR4:
      00000000003406f0
      [  451.170562] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000
      [  451.170562] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
      0000000000000400
      [  451.170563] Call Trace:
      [  451.170572]  ena_release_bars.isra.48+0x34/0x60 [ena]
      [  451.170574]  ena_probe+0x144/0xd90 [ena]
      [  451.170579]  ? ida_simple_get+0x98/0x100
      [  451.170585]  ? kernfs_next_descendant_post+0x40/0x50
      [  451.170591]  local_pci_probe+0x45/0xa0
      [  451.170592]  pci_device_probe+0x157/0x180
      [  451.170599]  driver_probe_device+0x2a8/0x460
      [  451.170600]  __device_attach_driver+0x7e/0xe0
      [  451.170602]  ? driver_allows_async_probing+0x30/0x30
      [  451.170603]  bus_for_each_drv+0x68/0xb0
      [  451.170605]  __device_attach+0xdd/0x160
      [  451.170607]  device_attach+0x10/0x20
      [  451.170610]  pci_bus_add_device+0x4f/0xa0
      [  451.170611]  pci_bus_add_devices+0x39/0x70
      [  451.170613]  pciehp_configure_device+0x96/0x120
      [  451.170614]  pciehp_enable_slot+0x1b3/0x290
      [  451.170616]  pciehp_power_thread+0x3b/0xb0
      [  451.170622]  process_one_work+0x149/0x360
      [  451.170623]  worker_thread+0x4d/0x3c0
      [  451.170626]  kthread+0x109/0x140
      [  451.170627]  ? rescuer_thread+0x380/0x380
      [  451.170628]  ? kthread_park+0x60/0x60
      [  451.170632]  ret_from_fork+0x25/0x30
      Signed-off-by: NNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      411838e7
    • N
      net: ena: reduce the severity of some printouts · cd7aea18
      Netanel Belgazal 提交于
      Decrease log level of checksum errors as these messages can be
      triggered remotely by bad packets.
      Signed-off-by: NNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd7aea18
  3. 18 10月, 2017 4 次提交
    • J
      bpf: disallow arithmetic operations on context pointer · 28e33f9d
      Jakub Kicinski 提交于
      Commit f1174f77 ("bpf/verifier: rework value tracking")
      removed the crafty selection of which pointer types are
      allowed to be modified.  This is OK for most pointer types
      since adjust_ptr_min_max_vals() will catch operations on
      immutable pointers.  One exception is PTR_TO_CTX which is
      now allowed to be offseted freely.
      
      The intent of aforementioned commit was to allow context
      access via modified registers.  The offset passed to
      ->is_valid_access() verifier callback has been adjusted
      by the value of the variable offset.
      
      What is missing, however, is taking the variable offset
      into account when the context register is used.  Or in terms
      of the code adding the offset to the value passed to the
      ->convert_ctx_access() callback.  This leads to the following
      eBPF user code:
      
           r1 += 68
           r0 = *(u32 *)(r1 + 8)
           exit
      
      being translated to this in kernel space:
      
         0: (07) r1 += 68
         1: (61) r0 = *(u32 *)(r1 +180)
         2: (95) exit
      
      Offset 8 is corresponding to 180 in the kernel, but offset
      76 is valid too.  Verifier will "accept" access to offset
      68+8=76 but then "convert" access to offset 8 as 180.
      Effective access to offset 248 is beyond the kernel context.
      (This is a __sk_buff example on a debug-heavy kernel -
      packet mark is 8 -> 180, 76 would be data.)
      
      Dereferencing the modified context pointer is not as easy
      as dereferencing other types, because we have to translate
      the access to reading a field in kernel structures which is
      usually at a different offset and often of a different size.
      To allow modifying the pointer we would have to make sure
      that given eBPF instruction will always access the same
      field or the fields accessed are "compatible" in terms of
      offset and size...
      
      Disallow dereferencing modified context pointers and add
      to selftests the test case described here.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28e33f9d
    • J
      netlink: fix netlink_ack() extack race · 48044eb4
      Johannes Berg 提交于
      It seems that it's possible to toggle NETLINK_F_EXT_ACK
      through setsockopt() while another thread/CPU is building
      a message inside netlink_ack(), which could then trigger
      the WARN_ON()s I added since if it goes from being turned
      off to being turned on between allocating and filling the
      message, the skb could end up being too small.
      
      Avoid this whole situation by storing the value of this
      flag in a separate variable and using that throughout the
      function instead.
      
      Fixes: 2d4bc933 ("netlink: extended ACK reporting")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48044eb4
    • T
      ibmvnic: Fix calculation of number of TX header descriptors · 2de09681
      Thomas Falcon 提交于
      This patch correctly sets the number of additional header descriptors
      that will be sent in an indirect SCRQ entry.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2de09681
    • I
      mlxsw: core: Fix possible deadlock · d965465b
      Ido Schimmel 提交于
      When an EMAD is transmitted, a timeout work item is scheduled with a
      delay of 200ms, so that another EMAD will be retried until a maximum of
      five retries.
      
      In certain situations, it's possible for the function waiting on the
      EMAD to be associated with a work item that is queued on the same
      workqueue (`mlxsw_core`) as the timeout work item. This results in
      flushing a work item on the same workqueue.
      
      According to commit e159489b ("workqueue: relax lockdep annotation
      on flush_work()") the above may lead to a deadlock in case the workqueue
      has only one worker active or if the system in under memory pressure and
      the rescue worker is in use. The latter explains the very rare and
      random nature of the lockdep splats we have been seeing:
      
      [   52.730240] ============================================
      [   52.736179] WARNING: possible recursive locking detected
      [   52.742119] 4.14.0-rc3jiri+ #4 Not tainted
      [   52.746697] --------------------------------------------
      [   52.752635] kworker/1:3/599 is trying to acquire lock:
      [   52.758378]  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c4fa4>] flush_work+0x3a4/0x5e0
      [   52.767837]
                     but task is already holding lock:
      [   52.774360]  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
      [   52.784495]
                     other info that might help us debug this:
      [   52.791794]  Possible unsafe locking scenario:
      [   52.798413]        CPU0
      [   52.801144]        ----
      [   52.803875]   lock(mlxsw_core_driver_name);
      [   52.808556]   lock(mlxsw_core_driver_name);
      [   52.813236]
                      *** DEADLOCK ***
      [   52.819857]  May be due to missing lock nesting notation
      [   52.827450] 3 locks held by kworker/1:3/599:
      [   52.832221]  #0:  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
      [   52.842846]  #1:  ((&(&bridge->fdb_notify.dw)->work)){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
      [   52.854537]  #2:  (rtnl_mutex){+.+.}, at: [<ffffffff822ad8e7>] rtnl_lock+0x17/0x20
      [   52.863021]
                     stack backtrace:
      [   52.867890] CPU: 1 PID: 599 Comm: kworker/1:3 Not tainted 4.14.0-rc3jiri+ #4
      [   52.875773] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
      [   52.886267] Workqueue: mlxsw_core mlxsw_sp_fdb_notify_work [mlxsw_spectrum]
      [   52.894060] Call Trace:
      [   52.909122]  __lock_acquire+0xf6f/0x2a10
      [   53.025412]  lock_acquire+0x158/0x440
      [   53.047557]  flush_work+0x3c4/0x5e0
      [   53.087571]  __cancel_work_timer+0x3ca/0x5e0
      [   53.177051]  cancel_delayed_work_sync+0x13/0x20
      [   53.182142]  mlxsw_reg_trans_bulk_wait+0x12d/0x7a0 [mlxsw_core]
      [   53.194571]  mlxsw_core_reg_access+0x586/0x990 [mlxsw_core]
      [   53.225365]  mlxsw_reg_query+0x10/0x20 [mlxsw_core]
      [   53.230882]  mlxsw_sp_fdb_notify_work+0x2a3/0x9d0 [mlxsw_spectrum]
      [   53.237801]  process_one_work+0x8f1/0x12f0
      [   53.321804]  worker_thread+0x1fd/0x10c0
      [   53.435158]  kthread+0x28e/0x370
      [   53.448703]  ret_from_fork+0x2a/0x40
      [   53.453017] mlxsw_spectrum 0000:01:00.0: EMAD retries (2/5) (tid=bf4549b100000774)
      [   53.453119] mlxsw_spectrum 0000:01:00.0: EMAD retries (5/5) (tid=bf4549b100000770)
      [   53.453132] mlxsw_spectrum 0000:01:00.0: EMAD reg access failed (tid=bf4549b100000770,reg_id=200b(sfn),type=query,status=0(operation performed))
      [   53.453143] mlxsw_spectrum 0000:01:00.0: Failed to get FDB notifications
      
      Fix this by creating another workqueue for EMAD timeouts, thereby
      preventing the situation of a work item trying to flush a work item
      queued on the same workqueue.
      
      Fixes: caf7297e ("mlxsw: core: Introduce support for asynchronous EMAD register access")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d965465b
  4. 17 10月, 2017 12 次提交
  5. 16 10月, 2017 1 次提交
    • J
      mac80211: accept key reinstall without changing anything · fdf7cb41
      Johannes Berg 提交于
      When a key is reinstalled we can reset the replay counters
      etc. which can lead to nonce reuse and/or replay detection
      being impossible, breaking security properties, as described
      in the "KRACK attacks".
      
      In particular, CVE-2017-13080 applies to GTK rekeying that
      happened in firmware while the host is in D3, with the second
      part of the attack being done after the host wakes up. In
      this case, the wpa_supplicant mitigation isn't sufficient
      since wpa_supplicant doesn't know the GTK material.
      
      In case this happens, simply silently accept the new key
      coming from userspace but don't take any action on it since
      it's the same key; this keeps the PN replay counters intact.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      fdf7cb41
  6. 15 10月, 2017 10 次提交