1. 19 10月, 2017 7 次提交
    • D
      bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations · bc6d5031
      Daniel Borkmann 提交于
      PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu
      allocator. Given we support __GFP_NOWARN now, lets just let
      the allocation request fail naturally instead. The two call
      sites from BPF mistakenly assumed __GFP_NOWARN would work, so
      no changes needed to their actual __alloc_percpu_gfp() calls
      which use the flag already.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc6d5031
    • D
      bpf: fix splat for illegal devmap percpu allocation · 82f8dd28
      Daniel Borkmann 提交于
      It was reported that syzkaller was able to trigger a splat on
      devmap percpu allocation due to illegal/unsupported allocation
      request size passed to __alloc_percpu():
      
        [   70.094249] illegal size (32776) or align (8) for percpu allocation
        [   70.094256] ------------[ cut here ]------------
        [   70.094259] WARNING: CPU: 3 PID: 3451 at mm/percpu.c:1365 pcpu_alloc+0x96/0x630
        [...]
        [   70.094325] Call Trace:
        [   70.094328]  __alloc_percpu_gfp+0x12/0x20
        [   70.094330]  dev_map_alloc+0x134/0x1e0
        [   70.094331]  SyS_bpf+0x9bc/0x1610
        [   70.094333]  ? selinux_task_setrlimit+0x5a/0x60
        [   70.094334]  ? security_task_setrlimit+0x43/0x60
        [   70.094336]  entry_SYSCALL_64_fastpath+0x1a/0xa5
      
      This was due to too large max_entries for the map such that we
      surpassed the upper limit of PCPU_MIN_UNIT_SIZE. It's fine to
      fail naturally here, so switch to __alloc_percpu_gfp() and pass
      __GFP_NOWARN instead.
      
      Fixes: 11393cc9 ("xdp: Add batching support to redirect map")
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Reported-by: NShankara Pailoor <sp3485@columbia.edu>
      Reported-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82f8dd28
    • D
      mm, percpu: add support for __GFP_NOWARN flag · 0ea7eeec
      Daniel Borkmann 提交于
      Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
      Currently, we always throw a warning when size or alignment
      is unsupported (and also dump stack on failed allocation
      requests). The warning itself is harmless since we return
      NULL anyway for any failed request, which callers are
      required to handle anyway. However, it becomes harmful when
      panic_on_warn is set.
      
      The rationale for the WARN() in pcpu_alloc() is that it can
      be tracked when larger than supported allocation requests are
      made such that allocations limits can be tweaked if warranted.
      This makes sense for in-kernel users, however, there are users
      of pcpu allocator where allocation size is derived from user
      space requests, e.g. when creating BPF maps. In these cases,
      the requests should fail gracefully without throwing a splat.
      
      The current work-around was to check allocation size against
      the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
      bailing out prior to a call to pcpu_alloc() in order to
      avoid throwing the WARN(). This is bad in multiple ways since
      PCPU_MIN_UNIT_SIZE is an implementation detail, and having
      the checks on call-sites only complicates the code for no
      good reason. Thus, lets fix it generically by supporting the
      __GFP_NOWARN flag that users can then use with calling the
      __alloc_percpu_gfp() helper instead.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ea7eeec
    • D
      Merge branch 'ena-fixes' · 3fd3b03b
      David S. Miller 提交于
      Netanel Belgazal says:
      
      ====================
      ENA ethernet driver bug fixes
      
      Some fixes for ENA ethernet driver
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fd3b03b
    • N
      net: ena: fix wrong max Tx/Rx queues on ethtool · a59df396
      Netanel Belgazal 提交于
      ethtool ena_get_channels() expose the max number of queues as the max
      number of queues ENA supports (128 queues) and not the actual number
      of created queues.
      Signed-off-by: NNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a59df396
    • N
      net: ena: fix rare kernel crash when bar memory remap fails · 411838e7
      Netanel Belgazal 提交于
      This failure is rare and only found on testing where deliberately fail
      devm_ioremap()
      
      [  451.170464] ena 0000:04:00.0: failed to remap regs bar
      451.170549] Workqueue: pciehp-1 pciehp_power_thread
      [  451.170551] task: ffff88085a5f2d00 task.stack: ffffc9000756c000
      [  451.170552] RIP: 0010:devm_iounmap+0x2d/0x40
      [  451.170553] RSP: 0018:ffffc9000756fac0 EFLAGS: 00010282
      [  451.170554] RAX: 00000000fffffffe RBX: 0000000000000000 RCX:
      0000000000000000
      [  451.170555] RDX: ffffffff813a7e00 RSI: 0000000000000282 RDI:
      0000000000000282
      [  451.170556] RBP: ffffc9000756fac8 R08: 00000000fffffffe R09:
      00000000000009b7
      [  451.170557] R10: 0000000000000005 R11: 00000000000009b6 R12:
      ffff880856c9d0a0
      [  451.170558] R13: ffffc9000f5c90c0 R14: ffff880856c9d0a0 R15:
      0000000000000028
      [  451.170559] FS:  0000000000000000(0000) GS:ffff88085f400000(0000)
      knlGS:0000000000000000
      [  451.170560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  451.170561] CR2: 00007f169038b000 CR3: 0000000001c09000 CR4:
      00000000003406f0
      [  451.170562] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000
      [  451.170562] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
      0000000000000400
      [  451.170563] Call Trace:
      [  451.170572]  ena_release_bars.isra.48+0x34/0x60 [ena]
      [  451.170574]  ena_probe+0x144/0xd90 [ena]
      [  451.170579]  ? ida_simple_get+0x98/0x100
      [  451.170585]  ? kernfs_next_descendant_post+0x40/0x50
      [  451.170591]  local_pci_probe+0x45/0xa0
      [  451.170592]  pci_device_probe+0x157/0x180
      [  451.170599]  driver_probe_device+0x2a8/0x460
      [  451.170600]  __device_attach_driver+0x7e/0xe0
      [  451.170602]  ? driver_allows_async_probing+0x30/0x30
      [  451.170603]  bus_for_each_drv+0x68/0xb0
      [  451.170605]  __device_attach+0xdd/0x160
      [  451.170607]  device_attach+0x10/0x20
      [  451.170610]  pci_bus_add_device+0x4f/0xa0
      [  451.170611]  pci_bus_add_devices+0x39/0x70
      [  451.170613]  pciehp_configure_device+0x96/0x120
      [  451.170614]  pciehp_enable_slot+0x1b3/0x290
      [  451.170616]  pciehp_power_thread+0x3b/0xb0
      [  451.170622]  process_one_work+0x149/0x360
      [  451.170623]  worker_thread+0x4d/0x3c0
      [  451.170626]  kthread+0x109/0x140
      [  451.170627]  ? rescuer_thread+0x380/0x380
      [  451.170628]  ? kthread_park+0x60/0x60
      [  451.170632]  ret_from_fork+0x25/0x30
      Signed-off-by: NNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      411838e7
    • N
      net: ena: reduce the severity of some printouts · cd7aea18
      Netanel Belgazal 提交于
      Decrease log level of checksum errors as these messages can be
      triggered remotely by bad packets.
      Signed-off-by: NNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd7aea18
  2. 18 10月, 2017 4 次提交
    • J
      bpf: disallow arithmetic operations on context pointer · 28e33f9d
      Jakub Kicinski 提交于
      Commit f1174f77 ("bpf/verifier: rework value tracking")
      removed the crafty selection of which pointer types are
      allowed to be modified.  This is OK for most pointer types
      since adjust_ptr_min_max_vals() will catch operations on
      immutable pointers.  One exception is PTR_TO_CTX which is
      now allowed to be offseted freely.
      
      The intent of aforementioned commit was to allow context
      access via modified registers.  The offset passed to
      ->is_valid_access() verifier callback has been adjusted
      by the value of the variable offset.
      
      What is missing, however, is taking the variable offset
      into account when the context register is used.  Or in terms
      of the code adding the offset to the value passed to the
      ->convert_ctx_access() callback.  This leads to the following
      eBPF user code:
      
           r1 += 68
           r0 = *(u32 *)(r1 + 8)
           exit
      
      being translated to this in kernel space:
      
         0: (07) r1 += 68
         1: (61) r0 = *(u32 *)(r1 +180)
         2: (95) exit
      
      Offset 8 is corresponding to 180 in the kernel, but offset
      76 is valid too.  Verifier will "accept" access to offset
      68+8=76 but then "convert" access to offset 8 as 180.
      Effective access to offset 248 is beyond the kernel context.
      (This is a __sk_buff example on a debug-heavy kernel -
      packet mark is 8 -> 180, 76 would be data.)
      
      Dereferencing the modified context pointer is not as easy
      as dereferencing other types, because we have to translate
      the access to reading a field in kernel structures which is
      usually at a different offset and often of a different size.
      To allow modifying the pointer we would have to make sure
      that given eBPF instruction will always access the same
      field or the fields accessed are "compatible" in terms of
      offset and size...
      
      Disallow dereferencing modified context pointers and add
      to selftests the test case described here.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28e33f9d
    • J
      netlink: fix netlink_ack() extack race · 48044eb4
      Johannes Berg 提交于
      It seems that it's possible to toggle NETLINK_F_EXT_ACK
      through setsockopt() while another thread/CPU is building
      a message inside netlink_ack(), which could then trigger
      the WARN_ON()s I added since if it goes from being turned
      off to being turned on between allocating and filling the
      message, the skb could end up being too small.
      
      Avoid this whole situation by storing the value of this
      flag in a separate variable and using that throughout the
      function instead.
      
      Fixes: 2d4bc933 ("netlink: extended ACK reporting")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48044eb4
    • T
      ibmvnic: Fix calculation of number of TX header descriptors · 2de09681
      Thomas Falcon 提交于
      This patch correctly sets the number of additional header descriptors
      that will be sent in an indirect SCRQ entry.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2de09681
    • I
      mlxsw: core: Fix possible deadlock · d965465b
      Ido Schimmel 提交于
      When an EMAD is transmitted, a timeout work item is scheduled with a
      delay of 200ms, so that another EMAD will be retried until a maximum of
      five retries.
      
      In certain situations, it's possible for the function waiting on the
      EMAD to be associated with a work item that is queued on the same
      workqueue (`mlxsw_core`) as the timeout work item. This results in
      flushing a work item on the same workqueue.
      
      According to commit e159489b ("workqueue: relax lockdep annotation
      on flush_work()") the above may lead to a deadlock in case the workqueue
      has only one worker active or if the system in under memory pressure and
      the rescue worker is in use. The latter explains the very rare and
      random nature of the lockdep splats we have been seeing:
      
      [   52.730240] ============================================
      [   52.736179] WARNING: possible recursive locking detected
      [   52.742119] 4.14.0-rc3jiri+ #4 Not tainted
      [   52.746697] --------------------------------------------
      [   52.752635] kworker/1:3/599 is trying to acquire lock:
      [   52.758378]  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c4fa4>] flush_work+0x3a4/0x5e0
      [   52.767837]
                     but task is already holding lock:
      [   52.774360]  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
      [   52.784495]
                     other info that might help us debug this:
      [   52.791794]  Possible unsafe locking scenario:
      [   52.798413]        CPU0
      [   52.801144]        ----
      [   52.803875]   lock(mlxsw_core_driver_name);
      [   52.808556]   lock(mlxsw_core_driver_name);
      [   52.813236]
                      *** DEADLOCK ***
      [   52.819857]  May be due to missing lock nesting notation
      [   52.827450] 3 locks held by kworker/1:3/599:
      [   52.832221]  #0:  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
      [   52.842846]  #1:  ((&(&bridge->fdb_notify.dw)->work)){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
      [   52.854537]  #2:  (rtnl_mutex){+.+.}, at: [<ffffffff822ad8e7>] rtnl_lock+0x17/0x20
      [   52.863021]
                     stack backtrace:
      [   52.867890] CPU: 1 PID: 599 Comm: kworker/1:3 Not tainted 4.14.0-rc3jiri+ #4
      [   52.875773] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
      [   52.886267] Workqueue: mlxsw_core mlxsw_sp_fdb_notify_work [mlxsw_spectrum]
      [   52.894060] Call Trace:
      [   52.909122]  __lock_acquire+0xf6f/0x2a10
      [   53.025412]  lock_acquire+0x158/0x440
      [   53.047557]  flush_work+0x3c4/0x5e0
      [   53.087571]  __cancel_work_timer+0x3ca/0x5e0
      [   53.177051]  cancel_delayed_work_sync+0x13/0x20
      [   53.182142]  mlxsw_reg_trans_bulk_wait+0x12d/0x7a0 [mlxsw_core]
      [   53.194571]  mlxsw_core_reg_access+0x586/0x990 [mlxsw_core]
      [   53.225365]  mlxsw_reg_query+0x10/0x20 [mlxsw_core]
      [   53.230882]  mlxsw_sp_fdb_notify_work+0x2a3/0x9d0 [mlxsw_spectrum]
      [   53.237801]  process_one_work+0x8f1/0x12f0
      [   53.321804]  worker_thread+0x1fd/0x10c0
      [   53.435158]  kthread+0x28e/0x370
      [   53.448703]  ret_from_fork+0x2a/0x40
      [   53.453017] mlxsw_spectrum 0000:01:00.0: EMAD retries (2/5) (tid=bf4549b100000774)
      [   53.453119] mlxsw_spectrum 0000:01:00.0: EMAD retries (5/5) (tid=bf4549b100000770)
      [   53.453132] mlxsw_spectrum 0000:01:00.0: EMAD reg access failed (tid=bf4549b100000770,reg_id=200b(sfn),type=query,status=0(operation performed))
      [   53.453143] mlxsw_spectrum 0000:01:00.0: Failed to get FDB notifications
      
      Fix this by creating another workqueue for EMAD timeouts, thereby
      preventing the situation of a work item trying to flush a work item
      queued on the same workqueue.
      
      Fixes: caf7297e ("mlxsw: core: Introduce support for asynchronous EMAD register access")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d965465b
  3. 17 10月, 2017 12 次提交
  4. 16 10月, 2017 1 次提交
    • J
      mac80211: accept key reinstall without changing anything · fdf7cb41
      Johannes Berg 提交于
      When a key is reinstalled we can reset the replay counters
      etc. which can lead to nonce reuse and/or replay detection
      being impossible, breaking security properties, as described
      in the "KRACK attacks".
      
      In particular, CVE-2017-13080 applies to GTK rekeying that
      happened in firmware while the host is in D3, with the second
      part of the attack being done after the host wakes up. In
      this case, the wpa_supplicant mitigation isn't sufficient
      since wpa_supplicant doesn't know the GTK material.
      
      In case this happens, simply silently accept the new key
      coming from userspace but don't take any action on it since
      it's the same key; this keeps the PN replay counters intact.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      fdf7cb41
  5. 15 10月, 2017 10 次提交
  6. 14 10月, 2017 2 次提交
  7. 13 10月, 2017 2 次提交
    • D
      Merge tag 'wireless-drivers-for-davem-2017-10-13' of... · db5972c9
      David S. Miller 提交于
      Merge tag 'wireless-drivers-for-davem-2017-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for 4.14
      
      Nothing really special standing out, all of these are important fixes
      which should go to 4.14.
      
      iwlwifi
      
      * fix support for 3168 device series
      
      * fix a potential crash when using FW debugging recording;
      
      * improve channel flags parsing to avoid warnings on too long traces
      
      * return -ENODATA when the temperature is not available, since the
       -EIO we were returning was causing fatal errors in userspace
      
      * avoid printing too many messages in dmesg when using monitor mode,
        since this can become very noisy and completely flood the logs
      
      brcmsmac
      
      * reduce stack usage to avoid frame size warnings with KASAN
      
      brcmfmac
      
      * add a check to avoid copying uninitialised memory
      
      rtlwifi:
      
      * fix a regression with rtl8821ae starting from v4.11 where
        connections was frequently lost
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db5972c9
    • S
      ip: update policy routing config help · 12ed3772
      Stephen Hemminger 提交于
      The kernel config help for policy routing was still pointing at
      an ancient document from 2000 that refers to Linux 2.1. Update it
      to point to something that is at least occasionally updated.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12ed3772
  8. 12 10月, 2017 2 次提交
    • S
      net/ncsi: Don't limit vids based on hot_channel · 6e9c0075
      Samuel Mendoza-Jonas 提交于
      Currently we drop any new VLAN ids if there are more than the current
      (or last used) channel can support. Most importantly this is a problem
      if no channel has been selected yet, resulting in a segfault.
      
      Secondly this does not necessarily reflect the capabilities of any other
      channels. Instead only drop a new VLAN id if we are already tracking the
      maximum allowed by the NCSI specification. Per-channel limits are
      already handled by ncsi_add_filter(), but add a message to set_one_vid()
      to make it obvious that the channel can not support any more VLAN ids.
      Signed-off-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e9c0075
    • D
      r8169: only enable PCI wakeups when WOL is active · bde135a6
      Daniel Drake 提交于
      rtl_init_one() currently enables PCI wakeups if the ethernet device
      is found to be WOL-capable. There is no need to do this when
      rtl8169_set_wol() will correctly enable or disable the same wakeup flag
      when WOL is activated/deactivated.
      
      This works around an ACPI DSDT bug which prevents the Acer laptop models
      Aspire ES1-533, Aspire ES1-732, PackardBell ENTE69AP and Gateway NE533
      from entering S3 suspend - even when no ethernet cable is connected.
      
      On these platforms, the DSDT says that GPE08 is a wakeup source for
      ethernet, but this GPE fires as soon as the system goes into suspend,
      waking the system up immediately. Having the wakeup normally disabled
      avoids this issue in the default case.
      
      With this change, WOL will continue to be unusable on these platforms
      (it will instantly wake up if WOL is later enabled by the user) but we
      do not expect this to be a commonly used feature on these consumer
      laptops. We have separately determined that WOL works fine without any
      ACPI GPEs enabled during sleep, so a DSDT fix or override would be
      possible to make WOL work.
      Signed-off-by: NDaniel Drake <drake@endlessm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bde135a6