1. 11 4月, 2019 6 次提交
    • I
      mlxsw: spectrum_buffers: Add a multicast pool for Spectrum-2 · d5949d92
      Ido Schimmel 提交于
      In Spectrum-1, when a multicast packet is admitted to the shared buffer
      it increases the quotas of all the ports and {port, TC} to which it is
      forwarded to.
      
      The above means that multicast packets are accounted multiple times in
      the shared buffer and can therefore cause the associated shared buffer
      pool to fill up very quickly.
      
      To work around this issue, commit e83c045e ("mlxsw:
      spectrum_buffers: Configure MC pool") added a dedicated multicast pool
      in which multicast packets are accounted.
      
      The issue is not present in Spectrum-2, but in order to be backward
      compatible with Spectrum-1, its default behavior is to allow a multicast
      packet to increase multiple egress quotas instead of one.
      
      Until the new (non-backward compatible) mode is supported, configure a
      dedicated multicast pool as in Spectrum-1.
      
      Fixes: fe099bf6 ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5949d92
    • I
      mlxsw: spectrum_router: Do not check VRF MAC address · 972fae68
      Ido Schimmel 提交于
      Commit 74bc9939 ("mlxsw: spectrum_router: Veto unsupported RIF MAC
      addresses") enabled the driver to veto router interface (RIF) MAC
      addresses that it cannot support.
      
      This check should only be performed for interfaces for which the driver
      actually configures a RIF. A VRF upper is not one of them, so ignore it.
      
      Without this patch it is not possible to set an IP address on the VRF
      device and use it as a loopback.
      
      Fixes: 74bc9939 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAlexander Petrovskiy <alexpe@mellanox.com>
      Tested-by: NAlexander Petrovskiy <alexpe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      972fae68
    • I
      mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw workqueue · b442fed1
      Ido Schimmel 提交于
      The workqueue is used to periodically update the networking stack about
      activity / statistics of various objects such as neighbours and TC
      actions.
      
      It should not be called as part of memory reclaim path, so remove the
      WQ_MEM_RECLAIM flag.
      
      Fixes: 3d5479e9 ("mlxsw: core: Remove deprecated create_workqueue")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b442fed1
    • I
      mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw ordered workqueue · 4af06997
      Ido Schimmel 提交于
      The ordered workqueue is used to offload various objects such as routes
      and neighbours in the order they are notified.
      
      It should not be called as part of memory reclaim path, so remove the
      WQ_MEM_RECLAIM flag. This can also result in a warning [1], if a worker
      tries to flush a non-WQ_MEM_RECLAIM workqueue.
      
      [1]
      [97703.542861] workqueue: WQ_MEM_RECLAIM mlxsw_core_ordered:mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] is flushing !WQ_MEM_RECLAIM events:rht_deferred_worker
      [97703.542884] WARNING: CPU: 1 PID: 32492 at kernel/workqueue.c:2605 check_flush_dependency+0xb5/0x130
      ...
      [97703.542988] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
      [97703.543049] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib6_event_work [mlxsw_spectrum]
      [97703.543061] RIP: 0010:check_flush_dependency+0xb5/0x130
      ...
      [97703.543071] RSP: 0018:ffffb3f08137bc00 EFLAGS: 00010086
      [97703.543076] RAX: 0000000000000000 RBX: ffff96e07740ae00 RCX: 0000000000000000
      [97703.543080] RDX: 0000000000000094 RSI: ffffffff82dc1934 RDI: 0000000000000046
      [97703.543084] RBP: ffffb3f08137bc20 R08: ffffffff82dc18a0 R09: 00000000000225c0
      [97703.543087] R10: 0000000000000000 R11: 0000000000007eec R12: ffffffff816e4ee0
      [97703.543091] R13: ffff96e06f6a5c00 R14: ffff96e077ba7700 R15: ffffffff812ab0c0
      [97703.543097] FS: 0000000000000000(0000) GS:ffff96e077a80000(0000) knlGS:0000000000000000
      [97703.543101] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [97703.543104] CR2: 00007f8cd135b280 CR3: 00000001e860e003 CR4: 00000000003606e0
      [97703.543109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [97703.543112] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [97703.543115] Call Trace:
      [97703.543129] __flush_work+0xbd/0x1e0
      [97703.543137] ? __cancel_work_timer+0x136/0x1b0
      [97703.543145] ? pwq_dec_nr_in_flight+0x49/0xa0
      [97703.543154] __cancel_work_timer+0x136/0x1b0
      [97703.543175] ? mlxsw_reg_trans_bulk_wait+0x145/0x400 [mlxsw_core]
      [97703.543184] cancel_work_sync+0x10/0x20
      [97703.543191] rhashtable_free_and_destroy+0x23/0x140
      [97703.543198] rhashtable_destroy+0xd/0x10
      [97703.543254] mlxsw_sp_fib_destroy+0xb1/0xf0 [mlxsw_spectrum]
      [97703.543310] mlxsw_sp_vr_put+0xa8/0xc0 [mlxsw_spectrum]
      [97703.543364] mlxsw_sp_fib_node_put+0xbf/0x140 [mlxsw_spectrum]
      [97703.543418] ? mlxsw_sp_fib6_entry_destroy+0xe8/0x110 [mlxsw_spectrum]
      [97703.543475] mlxsw_sp_router_fib6_event_work+0x6cd/0x7f0 [mlxsw_spectrum]
      [97703.543484] process_one_work+0x1fd/0x400
      [97703.543493] worker_thread+0x34/0x410
      [97703.543500] kthread+0x121/0x140
      [97703.543507] ? process_one_work+0x400/0x400
      [97703.543512] ? kthread_park+0x90/0x90
      [97703.543523] ret_from_fork+0x35/0x40
      
      Fixes: a3832b31 ("mlxsw: core: Create an ordered workqueue for FIB offload")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NSemion Lisyansky <semionl@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4af06997
    • I
      mlxsw: core: Do not use WQ_MEM_RECLAIM for EMAD workqueue · a8c133b0
      Ido Schimmel 提交于
      The EMAD workqueue is used to handle retransmission of EMAD packets that
      contain configuration data for the device's firmware.
      
      Given the workers need to allocate these packets and that the code is
      not called as part of memory reclaim path, remove the WQ_MEM_RECLAIM
      flag.
      
      Fixes: d965465b ("mlxsw: core: Fix possible deadlock")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8c133b0
    • I
      mlxsw: spectrum_switchdev: Add MDB entries in prepare phase · d4d0e409
      Ido Schimmel 提交于
      The driver cannot guarantee in the prepare phase that it will be able to
      write an MDB entry to the device. In case the driver returned success
      during the prepare phase, but then failed to add the entry in the commit
      phase, a WARNING [1] will be generated by the switchdev core.
      
      Fix this by doing the work in the prepare phase instead.
      
      [1]
      [  358.544486] swp12s0: Commit of object (id=2) failed.
      [  358.550061] WARNING: CPU: 0 PID: 30 at net/switchdev/switchdev.c:281 switchdev_port_obj_add_now+0x9b/0xe0
      [  358.560754] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 5.0.0-custom-13382-gf2449babf221 #1350
      [  358.570472] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  358.580582] Workqueue: events switchdev_deferred_process_work
      [  358.587001] RIP: 0010:switchdev_port_obj_add_now+0x9b/0xe0
      ...
      [  358.614109] RSP: 0018:ffffa6b900d6fe18 EFLAGS: 00010286
      [  358.619943] RAX: 0000000000000000 RBX: ffff8b00797ff000 RCX: 0000000000000000
      [  358.627912] RDX: ffff8b00b7a1d4c0 RSI: ffff8b00b7a152e8 RDI: ffff8b00b7a152e8
      [  358.635881] RBP: ffff8b005c3f5bc0 R08: 000000000000022b R09: 0000000000000000
      [  358.643850] R10: 0000000000000000 R11: ffffa6b900d6fcc8 R12: 0000000000000000
      [  358.651819] R13: dead000000000100 R14: ffff8b00b65a23c0 R15: 0ffff8b00b7a2200
      [  358.659790] FS:  0000000000000000(0000) GS:ffff8b00b7a00000(0000) knlGS:0000000000000000
      [  358.668820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  358.675228] CR2: 00007f00aad90de0 CR3: 00000001ca80d000 CR4: 00000000001006f0
      [  358.683188] Call Trace:
      [  358.685918]  switchdev_port_obj_add_deferred+0x13/0x60
      [  358.691655]  switchdev_deferred_process+0x6b/0xf0
      [  358.696907]  switchdev_deferred_process_work+0xa/0x10
      [  358.702548]  process_one_work+0x1f5/0x3f0
      [  358.707022]  worker_thread+0x28/0x3c0
      [  358.711099]  ? process_one_work+0x3f0/0x3f0
      [  358.715768]  kthread+0x10d/0x130
      [  358.719369]  ? __kthread_create_on_node+0x180/0x180
      [  358.724815]  ret_from_fork+0x35/0x40
      
      Fixes: 3a49b4fd ("mlxsw: Adding layer 2 multicast support")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAlex Kushnarov <alexanderk@mellanox.com>
      Tested-by: NAlex Kushnarov <alexanderk@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4d0e409
  2. 09 4月, 2019 9 次提交
  3. 08 4月, 2019 3 次提交
    • F
      mac80211: make ieee80211_schedule_txq schedule empty TXQs · 2b4a6698
      Felix Fietkau 提交于
      Currently there is no way for the driver to signal to mac80211 that it should
      schedule a TXQ even if there are no packets on the mac80211 part of that queue.
      This is problematic if the driver has an internal retry queue to deal with
      software A-MPDU retry.
      
      This patch changes the behavior of ieee80211_schedule_txq to always schedule
      the queue, as its only user (ath9k) seems to expect such behavior already:
      it calls this function on tx status and on powersave wakeup whenever its
      internal retry queue is not empty.
      
      Also add an extra argument to ieee80211_return_txq to get the same behavior.
      
      This fixes an issue on ath9k where tx queues with packets to retry (and no
      new packets in mac80211) would not get serviced.
      
      Fixes: 89cea749 ("ath9k: Switch to mac80211 TXQ scheduling and airtime APIs")
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      2b4a6698
    • J
      mac80211_hwsim: calculate if_combination.max_interfaces · 45fcef8b
      Johannes Berg 提交于
      If we just set this to 2048, and have multiple limits you
      can select from, the total number might run over and cause
      a warning in cfg80211. This doesn't make sense, so we just
      calculate the total max_interfaces now.
      
      Reported-by: syzbot+8f91bd563bbff230d0ee@syzkaller.appspotmail.com
      Fixes: 99e3a44b ("mac80211_hwsim: allow setting iftype support")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      45fcef8b
    • M
      net: vrf: Fix ping failed when vrf mtu is set to 0 · 5055376a
      Miaohe Lin 提交于
      When the mtu of a vrf device is set to 0, it would cause ping
      failed. So I think we should limit vrf mtu in a reasonable range
      to solve this problem. I set dev->min_mtu to IPV6_MIN_MTU, so it
      will works for both ipv4 and ipv6. And if dev->max_mtu still be 0
      can be confusing, so I set dev->max_mtu to ETH_MAX_MTU.
      
      Here is the reproduce step:
      
      1.Config vrf interface and set mtu to 0:
      3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
      master vrf1 state UP mode DEFAULT group default qlen 1000
          link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
      
      2.Ping peer:
      3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
      master vrf1 state UP group default qlen 1000
          link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
          inet 10.0.0.1/16 scope global enp4s0
             valid_lft forever preferred_lft forever
      connect: Network is unreachable
      
      3.Set mtu to default value, ping works:
      PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
      64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=1.88 ms
      
      Fixes: ad49bc63 ("net: vrf: remove MTU limits for vrf device")
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5055376a
  4. 07 4月, 2019 4 次提交
    • H
      Revert: parisc: Use F_EXTEND() macro in iosapic code · c2f8d7cb
      Helge Deller 提交于
      Revert parts of commit 97d7e2e3 ("parisc: Use F_EXTEND() macro in
      iosapic code"). It breaks booting the 32-bit kernel on some machines.
      Reported-by: NSven Schnelle <svens@stackframe.org>
      Tested-by: NSven Schnelle <svens@stackframe.org>
      Fixes: 97d7e2e3 ("parisc: Use F_EXTEND() macro in iosapic code")
      Signed-off-by: NHelge Deller <deller@gmx.de>
      c2f8d7cb
    • K
      fs: stream_open - opener for stream-like files so that read and write can run... · 10dce8af
      Kirill Smelkov 提交于
      fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
      
      Commit 9c225f26 ("vfs: atomic f_pos accesses as per POSIX") added
      locking for file.f_pos access and in particular made concurrent read and
      write not possible - now both those functions take f_pos lock for the
      whole run, and so if e.g. a read is blocked waiting for data, write will
      deadlock waiting for that read to complete.
      
      This caused regression for stream-like files where previously read and
      write could run simultaneously, but after that patch could not do so
      anymore. See e.g. commit 581d21a2 ("xenbus: fix deadlock on writes
      to /proc/xen/xenbus") which fixes such regression for particular case of
      /proc/xen/xenbus.
      
      The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
      safety for read/write/lseek and added the locking to file descriptors of
      all regular files. In 2014 that thread-safety problem was not new as it
      was already discussed earlier in 2006.
      
      However even though 2006'th version of Linus's patch was adding f_pos
      locking "only for files that are marked seekable with FMODE_LSEEK (thus
      avoiding the stream-like objects like pipes and sockets)", the 2014
      version - the one that actually made it into the tree as 9c225f26 -
      is doing so irregardless of whether a file is seekable or not.
      
      See
      
          https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
          https://lwn.net/Articles/180387
          https://lwn.net/Articles/180396
      
      for historic context.
      
      The reason that it did so is, probably, that there are many files that
      are marked non-seekable, but e.g. their read implementation actually
      depends on knowing current position to correctly handle the read. Some
      examples:
      
      	kernel/power/user.c		snapshot_read
      	fs/debugfs/file.c		u32_array_read
      	fs/fuse/control.c		fuse_conn_waiting_read + ...
      	drivers/hwmon/asus_atk0110.c	atk_debugfs_ggrp_read
      	arch/s390/hypfs/inode.c		hypfs_read_iter
      	...
      
      Despite that, many nonseekable_open users implement read and write with
      pure stream semantics - they don't depend on passed ppos at all. And for
      those cases where read could wait for something inside, it creates a
      situation similar to xenbus - the write could be never made to go until
      read is done, and read is waiting for some, potentially external, event,
      for potentially unbounded time -> deadlock.
      
      Besides xenbus, there are 14 such places in the kernel that I've found
      with semantic patch (see below):
      
      	drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
      	drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
      	drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
      	drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
      	net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
      	drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
      	drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
      	drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
      	net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
      	drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
      	drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
      	drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
      	drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
      	drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()
      
      In addition to the cases above another regression caused by f_pos
      locking is that now FUSE filesystems that implement open with
      FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
      stream-like files - for the same reason as above e.g. read can deadlock
      write locking on file.f_pos in the kernel.
      
      FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990 ("fuse:
      implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
      in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
      write routines not depending on current position at all, and with both
      read and write being potentially blocking operations:
      
      See
      
          https://github.com/libfuse/osspd
          https://lwn.net/Articles/308445
      
          https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
          https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
          https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510
      
      Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
      "somewhat pipe-like files ..." with read handler not using offset.
      However that test implements only read without write and cannot exercise
      the deadlock scenario:
      
          https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
          https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
          https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216
      
      I've actually hit the read vs write deadlock for real while implementing
      my FUSE filesystem where there is /head/watch file, for which open
      creates separate bidirectional socket-like stream in between filesystem
      and its user with both read and write being later performed
      simultaneously. And there it is semantically not easy to split the
      stream into two separate read-only and write-only channels:
      
          https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169
      
      Let's fix this regression. The plan is:
      
      1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
         doing so would break many in-kernel nonseekable_open users which
         actually use ppos in read/write handlers.
      
      2. Add stream_open() to kernel to open stream-like non-seekable file
         descriptors. Read and write on such file descriptors would never use
         nor change ppos. And with that property on stream-like files read and
         write will be running without taking f_pos lock - i.e. read and write
         could be running simultaneously.
      
      3. With semantic patch search and convert to stream_open all in-kernel
         nonseekable_open users for which read and write actually do not
         depend on ppos and where there is no other methods in file_operations
         which assume @offset access.
      
      4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
         steam_open if that bit is present in filesystem open reply.
      
         It was tempting to change fs/fuse/ open handler to use stream_open
         instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
         grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
         and in particular GVFS which actually uses offset in its read and
         write handlers
      
      	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
      	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
      	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
      	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
      
         so if we would do such a change it will break a real user.
      
      5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
         from v3.14+ (the kernel where 9c225f26 first appeared).
      
         This will allow to patch OSSPD and other FUSE filesystems that
         provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
         in their open handler and this way avoid the deadlock on all kernel
         versions. This should work because fs/fuse/ ignores unknown open
         flags returned from a filesystem and so passing FOPEN_STREAM to a
         kernel that is not aware of this flag cannot hurt. In turn the kernel
         that is not aware of FOPEN_STREAM will be < v3.14 where just
         FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
         write deadlock.
      
      This patch adds stream_open, converts /proc/xen/xenbus to it and adds
      semantic patch to automatically locate in-kernel places that are either
      required to be converted due to read vs write deadlock, or that are just
      safe to be converted because read and write do not use ppos and there
      are no other funky methods in file_operations.
      
      Regarding semantic patch I've verified each generated change manually -
      that it is correct to convert - and each other nonseekable_open instance
      left - that it is either not correct to convert there, or that it is not
      converted due to current stream_open.cocci limitations.
      
      The script also does not convert files that should be valid to convert,
      but that currently have .llseek = noop_llseek or generic_file_llseek for
      unknown reason despite file being opened with nonseekable_open (e.g.
      drivers/input/mousedev.c)
      
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yongzhi Pan <panyongzhi@gmail.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Nikolaus Rath <Nikolaus@rath.org>
      Cc: Han-Wen Nienhuys <hanwen@google.com>
      Signed-off-by: NKirill Smelkov <kirr@nexedi.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10dce8af
    • G
      xsysace: Fix error handling in ace_setup · 47b16820
      Guenter Roeck 提交于
      If xace hardware reports a bad version number, the error handling code
      in ace_setup() calls put_disk(), followed by queue cleanup. However, since
      the disk data structure has the queue pointer set, put_disk() also
      cleans and releases the queue. This results in blk_cleanup_queue()
      accessing an already released data structure, which in turn may result
      in a crash such as the following.
      
      [   10.681671] BUG: Kernel NULL pointer dereference at 0x00000040
      [   10.681826] Faulting instruction address: 0xc0431480
      [   10.682072] Oops: Kernel access of bad area, sig: 11 [#1]
      [   10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440
      [   10.682387] Modules linked in:
      [   10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: G        W         5.0.0-rc6-next-20190218+ #2
      [   10.682733] NIP:  c0431480 LR: c043147c CTR: c0422ad8
      [   10.682863] REGS: cf82fbe0 TRAP: 0300   Tainted: G        W          (5.0.0-rc6-next-20190218+)
      [   10.683065] MSR:  00029000 <CE,EE,ME>  CR: 22000222  XER: 00000000
      [   10.683236] DEAR: 00000040 ESR: 00000000
      [   10.683236] GPR00: c043147c cf82fc90 cf82ccc0 00000000 00000000 00000000 00000002 00000000
      [   10.683236] GPR08: 00000000 00000000 c04310bc 00000000 22000222 00000000 c0002c54 00000000
      [   10.683236] GPR16: 00000000 00000001 c09aa39c c09021b0 c09021dc 00000007 c0a68c08 00000000
      [   10.683236] GPR24: 00000001 ced6d400 ced6dcf0 c0815d9c 00000000 00000000 00000000 cedf0800
      [   10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114
      [   10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114
      [   10.684602] Call Trace:
      [   10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 (unreliable)
      [   10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c
      [   10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68
      [   10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c
      [   10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508
      [   10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8
      [   10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c
      [   10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464
      [   10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4
      [   10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc
      [   10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0
      [   10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234
      [   10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c
      [   10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac
      [   10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330
      [   10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478
      [   10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114
      [   10.687188] [cf82ff40] [c000f2f0] ret_from_kernel_thread+0x14/0x1c
      [   10.687349] Instruction dump:
      [   10.687435] 3863ffd4 4bfffd70 9421ffd0 7c0802a6 93c10028 7c9e2378 93e1002c 38810008
      [   10.687637] 7c7f1b78 90010034 4bfffc25 813f008c <81290040> 75290100 4182002c 80810008
      [   10.688056] ---[ end trace 13c9ff51d41b9d40 ]---
      
      Fix the problem by setting the disk queue pointer to NULL before calling
      put_disk(). A more comprehensive fix might be to rearrange the code
      to check the hardware version before initializing data structures,
      but I don't know if this would have undesirable side effects, and
      it would increase the complexity of backporting the fix to older kernels.
      
      Fixes: 74489a91 ("Add support for Xilinx SystemACE CompactFlash interface")
      Acked-by: NMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      47b16820
    • J
      null_blk: prevent crash from bad home_node value · 7ff684a6
      John Pittman 提交于
      At module load, if the selected home_node value is greater than
      the available numa nodes, the system will crash in
      __alloc_pages_nodemask() due to a bad paging request.  Prevent this
      user error crash by detecting the bad value, logging an error, and
      setting g_home_node back to the default of NUMA_NO_NODE.
      Signed-off-by: NJohn Pittman <jpittman@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7ff684a6
  5. 06 4月, 2019 2 次提交
  6. 05 4月, 2019 14 次提交
    • G
      tty: mark Siemens R3964 line discipline as BROKEN · c7084edc
      Greg Kroah-Hartman 提交于
      The n_r3964 line discipline driver was written in a different time, when
      SMP machines were rare, and users were trusted to do the right thing.
      Since then, the world has moved on but not this code, it has stayed
      rooted in the past with its lovely hand-crafted list structures and
      loads of "interesting" race conditions all over the place.
      
      After attempting to clean up most of the issues, I just gave up and am
      now marking the driver as BROKEN so that hopefully someone who has this
      hardware will show up out of the woodwork (I know you are out there!)
      and will help with debugging a raft of changes that I had laying around
      for the code, but was too afraid to commit as odds are they would break
      things.
      
      Many thanks to Jann and Linus for pointing out the initial problems in
      this codebase, as well as many reviews of my attempts to fix the issues.
      It was a case of whack-a-mole, and as you can see, the mole won.
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7084edc
    • Y
      paride/pcd: Fix potential NULL pointer dereference and mem leak · f0d17625
      YueHaibing 提交于
      Syzkaller report this:
      
      pcd: pcd version 1.07, major 46, nice 0
      pcd0: Autoprobe failed
      pcd: No CD-ROM drive found
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN PTI
      CPU: 1 PID: 4525 Comm: syz-executor.0 Not tainted 5.1.0-rc3+ #8
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:pcd_init+0x95c/0x1000 [pcd]
      Code: c4 ab f7 48 89 d8 48 c1 e8 03 80 3c 28 00 74 08 48 89 df e8 56 a3 da f7 4c 8b 23 49 8d bc 24 80 05 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 74 05 e8 39 a3 da f7 49 8b bc 24 80 05 00 00 e8 cc b2
      RSP: 0018:ffff8881e84df880 EFLAGS: 00010202
      RAX: 00000000000000b0 RBX: ffffffffc155a088 RCX: ffffffffc1508935
      RDX: 0000000000040000 RSI: ffffc900014f0000 RDI: 0000000000000580
      RBP: dffffc0000000000 R08: ffffed103ee658b8 R09: ffffed103ee658b8
      R10: 0000000000000001 R11: ffffed103ee658b7 R12: 0000000000000000
      R13: ffffffffc155a778 R14: ffffffffc155a4a8 R15: 0000000000000003
      FS:  00007fe71bee3700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055a7334441a8 CR3: 00000001e9674003 CR4: 00000000007606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       ? 0xffffffffc1508000
       ? 0xffffffffc1508000
       do_one_initcall+0xbc/0x47d init/main.c:901
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fe71bee2c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 00007fe71bee2c70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe71bee36bc
      R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
      Modules linked in: pcd(+) paride solos_pci atm ts_fsm rtc_mt6397 mac80211 nhc_mobility nhc_udp nhc_ipv6 nhc_hop nhc_dest nhc_fragment nhc_routing 6lowpan rtc_cros_ec memconsole intel_xhci_usb_role_switch roles rtc_wm8350 usbcore industrialio_triggered_buffer kfifo_buf industrialio asc7621 dm_era dm_persistent_data dm_bufio dm_mod tpm gnss_ubx gnss_serial serdev gnss max2165 cpufreq_dt hid_penmount hid menf21bmc_wdt rc_core n_tracesink ide_gd_mod cdns_csi2tx v4l2_fwnode videodev media pinctrl_lewisburg pinctrl_intel iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd
       ide_pci_generic piix input_leds cryptd glue_helper psmouse ide_core intel_agp serio_raw intel_gtt ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: bmc150_magn]
      Dumping ftrace buffer:
         (ftrace buffer empty)
      ---[ end trace d873691c3cd69f56 ]---
      
      If alloc_disk fails in pcd_init_units, cd->disk will be
      NULL, however in pcd_detect and pcd_exit, it's not check
      this before free.It may result a NULL pointer dereference.
      
      Also when register_blkdev failed, blk_cleanup_queue() and
      blk_mq_free_tag_set() should be called to free resources.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Fixes: 81b74ac6 ("paride/pcd: cleanup queues when detection fails")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f0d17625
    • A
      xen: use struct_size() helper in kzalloc() · ad94dc3a
      Andrea Righi 提交于
      struct privcmd_buf_vma_private has a zero-sized array at the end
      (pages), use the new struct_size() helper to determine the proper
      allocation size and avoid potential type mistakes.
      Signed-off-by: NAndrea Righi <andrea.righi@canonical.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      ad94dc3a
    • T
      ibmvnic: Fix completion structure initialization · bbd669a8
      Thomas Falcon 提交于
      Fix device initialization completion handling for vNIC adapters.
      Initialize the completion structure on probe and reinitialize when needed.
      This also fixes a race condition during kdump where the driver can attempt
      to access the completion struct before it is initialized:
      
      Unable to handle kernel paging request for data at address 0x00000000
      Faulting instruction address: 0xc0000000081acbe0
      Oops: Kernel access of bad area, sig: 11 [#1]
      LE SMP NR_CPUS=2048 NUMA pSeries
      Modules linked in: ibmvnic(+) ibmveth sunrpc overlay squashfs loop
      CPU: 19 PID: 301 Comm: systemd-udevd Not tainted 4.18.0-64.el8.ppc64le #1
      NIP:  c0000000081acbe0 LR: c0000000081ad964 CTR: c0000000081ad900
      REGS: c000000027f3f990 TRAP: 0300   Not tainted  (4.18.0-64.el8.ppc64le)
      MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28228288  XER: 00000006
      CFAR: c000000008008934 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1
      GPR00: c0000000081ad964 c000000027f3fc10 c0000000095b5800 c0000000221b4e58
      GPR04: 0000000000000003 0000000000000001 000049a086918581 00000000000000d4
      GPR08: 0000000000000007 0000000000000000 ffffffffffffffe8 d0000000014dde28
      GPR12: c0000000081ad900 c000000009a00c00 0000000000000001 0000000000000100
      GPR16: 0000000000000038 0000000000000007 c0000000095e2230 0000000000000006
      GPR20: 0000000000400140 0000000000000001 c00000000910c880 0000000000000000
      GPR24: 0000000000000000 0000000000000006 0000000000000000 0000000000000003
      GPR28: 0000000000000001 0000000000000001 c0000000221b4e60 c0000000221b4e58
      NIP [c0000000081acbe0] __wake_up_locked+0x50/0x100
      LR [c0000000081ad964] complete+0x64/0xa0
      Call Trace:
      [c000000027f3fc10] [c000000027f3fc60] 0xc000000027f3fc60 (unreliable)
      [c000000027f3fc60] [c0000000081ad964] complete+0x64/0xa0
      [c000000027f3fca0] [d0000000014dad58] ibmvnic_handle_crq+0xce0/0x1160 [ibmvnic]
      [c000000027f3fd50] [d0000000014db270] ibmvnic_tasklet+0x98/0x130 [ibmvnic]
      [c000000027f3fda0] [c00000000813f334] tasklet_action_common.isra.3+0xc4/0x1a0
      [c000000027f3fe00] [c000000008cd13f4] __do_softirq+0x164/0x400
      [c000000027f3fef0] [c00000000813ed64] irq_exit+0x184/0x1c0
      [c000000027f3ff20] [c0000000080188e8] __do_irq+0xb8/0x210
      [c000000027f3ff90] [c00000000802d0a4] call_do_irq+0x14/0x24
      [c000000026a5b010] [c000000008018adc] do_IRQ+0x9c/0x130
      [c000000026a5b060] [c000000008008ce4] hardware_interrupt_common+0x114/0x120
      Signed-off-by: NThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbd669a8
    • V
      libcxgb: fix incorrect ppmax calculation · cc5a726c
      Varun Prakash 提交于
      BITS_TO_LONGS() uses DIV_ROUND_UP() because of
      this ppmax value can be greater than available
      per cpu page pods.
      
      This patch removes BITS_TO_LONGS() to fix this
      issue.
      Signed-off-by: NVarun Prakash <varun@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc5a726c
    • L
      mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer · d9b8a67b
      Liu Jian 提交于
      In function do_write_buffer(), in the for loop, there is a case
      chip_ready() returns 1 while chip_good() returns 0, so it never
      break the loop.
      To fix this, chip_good() is enough and it should timeout if it stay
      bad for a while.
      
      Fixes: dfeae107("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
      Signed-off-by: NYi Huaijie <yihuaijie@huawei.com>
      Signed-off-by: NLiu Jian <liujian56@huawei.com>
      Reviewed-by: NTokunori Ikegami <ikegami_to@yahoo.co.jp>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      d9b8a67b
    • M
      dm: disable DISCARD if the underlying storage no longer supports it · bcb44433
      Mike Snitzer 提交于
      Storage devices which report supporting discard commands like
      WRITE_SAME_16 with unmap, but reject discard commands sent to the
      storage device.  This is a clear storage firmware bug but it doesn't
      change the fact that should a program cause discards to be sent to a
      multipath device layered on this buggy storage, all paths can end up
      failed at the same time from the discards, causing possible I/O loss.
      
      The first discard to a path will fail with Illegal Request, Invalid
      field in cdb, e.g.:
       kernel: sd 8:0:8:19: [sdfn] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
       kernel: sd 8:0:8:19: [sdfn] tag#0 Sense Key : Illegal Request [current]
       kernel: sd 8:0:8:19: [sdfn] tag#0 Add. Sense: Invalid field in cdb
       kernel: sd 8:0:8:19: [sdfn] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 a0 08 00 00 00 80 00 00 00
       kernel: blk_update_request: critical target error, dev sdfn, sector 10487808
      
      The SCSI layer converts this to the BLK_STS_TARGET error number, the sd
      device disables its support for discard on this path, and because of the
      BLK_STS_TARGET error multipath fails the discard without failing any
      path or retrying down a different path.  But subsequent discards can
      cause path failures.  Any discards sent to the path which already failed
      a discard ends up failing with EIO from blk_cloned_rq_check_limits with
      an "over max size limit" error since the discard limit was set to 0 by
      the sd driver for the path.  As the error is EIO, this now fails the
      path and multipath tries to send the discard down the next path.  This
      cycle continues as discards are sent until all paths fail.
      
      Fix this by training DM core to disable DISCARD if the underlying
      storage already did so.
      
      Also, fix branching in dm_done() and clone_endio() to reflect the
      mutually exclussive nature of the IO operations in question.
      
      Cc: stable@vger.kernel.org
      Reported-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      bcb44433
    • L
      net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop · 2ec1ed2a
      Lorenzo Bianconi 提交于
      When a bpf program is uploaded, the driver computes the number of
      xdp tx queues resulting in the allocation of additional qsets.
      Starting from commit '2ecbe4f4 ("net: thunderx: replace global
      nicvf_rx_mode_wq work queue for all VFs to private for each of them")'
      the driver runs link state polling for each VF resulting in the
      following NULL pointer dereference:
      
      [   56.169256] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
      [   56.178032] Mem abort info:
      [   56.180834]   ESR = 0x96000005
      [   56.183877]   Exception class = DABT (current EL), IL = 32 bits
      [   56.189792]   SET = 0, FnV = 0
      [   56.192834]   EA = 0, S1PTW = 0
      [   56.195963] Data abort info:
      [   56.198831]   ISV = 0, ISS = 0x00000005
      [   56.202662]   CM = 0, WnR = 0
      [   56.205619] user pgtable: 64k pages, 48-bit VAs, pgdp = 0000000021f0c7a0
      [   56.212315] [0000000000000020] pgd=0000000000000000, pud=0000000000000000
      [   56.219094] Internal error: Oops: 96000005 [#1] SMP
      [   56.260459] CPU: 39 PID: 2034 Comm: ip Not tainted 5.1.0-rc3+ #3
      [   56.266452] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS T49 02/02/2018
      [   56.273315] pstate: 80000005 (Nzcv daif -PAN -UAO)
      [   56.278098] pc : __ll_sc___cmpxchg_case_acq_64+0x4/0x20
      [   56.283312] lr : mutex_lock+0x2c/0x50
      [   56.286962] sp : ffff0000219af1b0
      [   56.290264] x29: ffff0000219af1b0 x28: ffff800f64de49a0
      [   56.295565] x27: 0000000000000000 x26: 0000000000000015
      [   56.300865] x25: 0000000000000000 x24: 0000000000000000
      [   56.306165] x23: 0000000000000000 x22: ffff000011117000
      [   56.311465] x21: ffff800f64dfc080 x20: 0000000000000020
      [   56.316766] x19: 0000000000000020 x18: 0000000000000001
      [   56.322066] x17: 0000000000000000 x16: ffff800f2e077080
      [   56.327367] x15: 0000000000000004 x14: 0000000000000000
      [   56.332667] x13: ffff000010964438 x12: 0000000000000002
      [   56.337967] x11: 0000000000000000 x10: 0000000000000c70
      [   56.343268] x9 : ffff0000219af120 x8 : ffff800f2e077d50
      [   56.348568] x7 : 0000000000000027 x6 : 000000062a9d6a84
      [   56.353869] x5 : 0000000000000000 x4 : ffff800f2e077480
      [   56.359169] x3 : 0000000000000008 x2 : ffff800f2e077080
      [   56.364469] x1 : 0000000000000000 x0 : 0000000000000020
      [   56.369770] Process ip (pid: 2034, stack limit = 0x00000000c862da3a)
      [   56.376110] Call trace:
      [   56.378546]  __ll_sc___cmpxchg_case_acq_64+0x4/0x20
      [   56.383414]  drain_workqueue+0x34/0x198
      [   56.387247]  nicvf_open+0x48/0x9e8 [nicvf]
      [   56.391334]  nicvf_open+0x898/0x9e8 [nicvf]
      [   56.395507]  nicvf_xdp+0x1bc/0x238 [nicvf]
      [   56.399595]  dev_xdp_install+0x68/0x90
      [   56.403333]  dev_change_xdp_fd+0xc8/0x240
      [   56.407333]  do_setlink+0x8e0/0xbe8
      [   56.410810]  __rtnl_newlink+0x5b8/0x6d8
      [   56.414634]  rtnl_newlink+0x54/0x80
      [   56.418112]  rtnetlink_rcv_msg+0x22c/0x2f8
      [   56.422199]  netlink_rcv_skb+0x60/0x120
      [   56.426023]  rtnetlink_rcv+0x28/0x38
      [   56.429587]  netlink_unicast+0x1c8/0x258
      [   56.433498]  netlink_sendmsg+0x1b4/0x350
      [   56.437410]  sock_sendmsg+0x4c/0x68
      [   56.440887]  ___sys_sendmsg+0x240/0x280
      [   56.444711]  __sys_sendmsg+0x68/0xb0
      [   56.448275]  __arm64_sys_sendmsg+0x2c/0x38
      [   56.452361]  el0_svc_handler+0x9c/0x128
      [   56.456186]  el0_svc+0x8/0xc
      [   56.459056] Code: 35ffff91 2a1003e0 d65f03c0 f9800011 (c85ffc10)
      [   56.465166] ---[ end trace 4a57fdc27b0a572c ]---
      [   56.469772] Kernel panic - not syncing: Fatal exception
      
      Fix it by checking nicvf_rx_mode_wq pointer in nicvf_open and nicvf_stop
      
      Fixes: 2ecbe4f4 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them")
      Fixes: 2c632ad8 ("net: thunderx: move link state polling function to VF")
      Reported-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Tested-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ec1ed2a
    • Y
      net: hns: Fix sparse: some warnings in HNS drivers · 15400663
      Yonglong Liu 提交于
      There are some sparse warnings in the HNS drivers:
      
      warning: incorrect type in assignment (different address spaces)
          expected void [noderef] <asn:2> *io_base
          got void *vaddr
      warning: cast removes address space '<asn:2>' of expression
      [...]
      
      Add __iomem and change all the u8 __iomem to void __iomem to
      fix these kind of  warnings.
      
      warning: incorrect type in argument 1 (different address spaces)
          expected void [noderef] <asn:2> *base
          got unsigned char [usertype] *base_addr
      warning: cast to restricted __le16
      warning: incorrect type in assignment (different base types)
          expected unsigned int [usertype] tbl_tcam_data_high
          got restricted __le32 [usertype]
      warning: cast to restricted __le32
      [...]
      
      These variables used u32/u16 as their type, and finally as a
      parameter of writel(), writel() will do the cpu_to_le32 coversion
      so remove the little endian covert code to fix these kind of warnings.
      Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15400663
    • Y
      net: hns: Fix WARNING when remove HNS driver with SMMU enabled · 8601a99d
      Yonglong Liu 提交于
      When enable SMMU, remove HNS driver will cause a WARNING:
      
      [  141.924177] WARNING: CPU: 36 PID: 2708 at drivers/iommu/dma-iommu.c:443 __iommu_dma_unmap+0xc0/0xc8
      [  141.954673] Modules linked in: hns_enet_drv(-)
      [  141.963615] CPU: 36 PID: 2708 Comm: rmmod Tainted: G        W         5.0.0-rc1-28723-gb729c57de95c-dirty #32
      [  141.983593] Hardware name: Huawei D05/D05, BIOS Hisilicon D05 UEFI Nemo 1.8 RC0 08/31/2017
      [  142.000244] pstate: 60000005 (nZCv daif -PAN -UAO)
      [  142.009886] pc : __iommu_dma_unmap+0xc0/0xc8
      [  142.018476] lr : __iommu_dma_unmap+0xc0/0xc8
      [  142.027066] sp : ffff000013533b90
      [  142.033728] x29: ffff000013533b90 x28: ffff8013e6983600
      [  142.044420] x27: 0000000000000000 x26: 0000000000000000
      [  142.055113] x25: 0000000056000000 x24: 0000000000000015
      [  142.065806] x23: 0000000000000028 x22: ffff8013e66eee68
      [  142.076499] x21: ffff8013db919800 x20: 0000ffffefbff000
      [  142.087192] x19: 0000000000001000 x18: 0000000000000007
      [  142.097885] x17: 000000000000000e x16: 0000000000000001
      [  142.108578] x15: 0000000000000019 x14: 363139343a70616d
      [  142.119270] x13: 6e75656761705f67 x12: 0000000000000000
      [  142.129963] x11: 00000000ffffffff x10: 0000000000000006
      [  142.140656] x9 : 1346c1aa88093500 x8 : ffff0000114de4e0
      [  142.151349] x7 : 6662666578303d72 x6 : ffff0000105ffec8
      [  142.162042] x5 : 0000000000000000 x4 : 0000000000000000
      [  142.172734] x3 : 00000000ffffffff x2 : ffff0000114de500
      [  142.183427] x1 : 0000000000000000 x0 : 0000000000000035
      [  142.194120] Call trace:
      [  142.199030]  __iommu_dma_unmap+0xc0/0xc8
      [  142.206920]  iommu_dma_unmap_page+0x20/0x28
      [  142.215335]  __iommu_unmap_page+0x40/0x60
      [  142.223399]  hnae_unmap_buffer+0x110/0x134
      [  142.231639]  hnae_free_desc+0x6c/0x10c
      [  142.239177]  hnae_fini_ring+0x14/0x34
      [  142.246540]  hnae_fini_queue+0x2c/0x40
      [  142.254080]  hnae_put_handle+0x38/0xcc
      [  142.261619]  hns_nic_dev_remove+0x54/0xfc [hns_enet_drv]
      [  142.272312]  platform_drv_remove+0x24/0x64
      [  142.280552]  device_release_driver_internal+0x17c/0x20c
      [  142.291070]  driver_detach+0x4c/0x90
      [  142.298259]  bus_remove_driver+0x5c/0xd8
      [  142.306148]  driver_unregister+0x2c/0x54
      [  142.314037]  platform_driver_unregister+0x10/0x18
      [  142.323505]  hns_nic_dev_driver_exit+0x14/0xf0c [hns_enet_drv]
      [  142.335248]  __arm64_sys_delete_module+0x214/0x25c
      [  142.344891]  el0_svc_common+0xb0/0x10c
      [  142.352430]  el0_svc_handler+0x24/0x80
      [  142.359968]  el0_svc+0x8/0x7c0
      [  142.366104] ---[ end trace 60ad1cd58e63c407 ]---
      
      The tx ring buffer map when xmit and unmap when xmit done. So in
      hnae_init_ring() did not map tx ring buffer, but in hnae_fini_ring()
      have a unmap operation for tx ring buffer, which is already unmapped
      when xmit done, than cause this WARNING.
      
      The hnae_alloc_buffers() is called in hnae_init_ring(),
      so the hnae_free_buffers() should be in hnae_fini_ring(), not in
      hnae_free_desc().
      
      In hnae_fini_ring(), adds a check is_rx_ring() as in hnae_init_ring().
      When the ring buffer is tx ring, adds a piece of code to ensure that
      the tx ring is unmap.
      Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: NPeng Li <lipeng321@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8601a99d
    • Y
      net: hns: fix ICMP6 neighbor solicitation messages discard problem · f058e468
      Yonglong Liu 提交于
      ICMP6 neighbor solicitation messages will be discard by the Hip06
      chips, because of not setting forwarding pool. Enable promisc mode
      has the same problem.
      
      This patch fix the wrong forwarding table configs for the multicast
      vague matching when enable promisc mode, and add forwarding pool
      for the forwarding table.
      Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f058e468
    • Y
      net: hns: Fix probabilistic memory overwrite when HNS driver initialized · c0b09844
      Yonglong Liu 提交于
      When reboot the system again and again, may cause a memory
      overwrite.
      
      [   15.638922] systemd[1]: Reached target Swap.
      [   15.667561] tun: Universal TUN/TAP device driver, 1.6
      [   15.676756] Bridge firewalling registered
      [   17.344135] Unable to handle kernel paging request at virtual address 0000000200000040
      [   17.352179] Mem abort info:
      [   17.355007]   ESR = 0x96000004
      [   17.358105]   Exception class = DABT (current EL), IL = 32 bits
      [   17.364112]   SET = 0, FnV = 0
      [   17.367209]   EA = 0, S1PTW = 0
      [   17.370393] Data abort info:
      [   17.373315]   ISV = 0, ISS = 0x00000004
      [   17.377206]   CM = 0, WnR = 0
      [   17.380214] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
      [   17.386926] [0000000200000040] pgd=0000000000000000
      [   17.391878] Internal error: Oops: 96000004 [#1] SMP
      [   17.396824] CPU: 23 PID: 95 Comm: kworker/u130:0 Tainted: G            E     4.19.25-1.2.78.aarch64 #1
      [   17.414175] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.54 08/16/2018
      [   17.425615] Workqueue: events_unbound async_run_entry_fn
      [   17.435151] pstate: 00000005 (nzcv daif -PAN -UAO)
      [   17.444139] pc : __mutex_lock.isra.1+0x74/0x540
      [   17.453002] lr : __mutex_lock.isra.1+0x3c/0x540
      [   17.461701] sp : ffff000100d9bb60
      [   17.469146] x29: ffff000100d9bb60 x28: 0000000000000000
      [   17.478547] x27: 0000000000000000 x26: ffff802fb8945000
      [   17.488063] x25: 0000000000000000 x24: ffff802fa32081a8
      [   17.497381] x23: 0000000000000002 x22: ffff801fa2b15220
      [   17.506701] x21: ffff000009809000 x20: ffff802fa23a0888
      [   17.515980] x19: ffff801fa2b15220 x18: 0000000000000000
      [   17.525272] x17: 0000000200000000 x16: 0000000200000000
      [   17.534511] x15: 0000000000000000 x14: 0000000000000000
      [   17.543652] x13: ffff000008d95db8 x12: 000000000000000d
      [   17.552780] x11: ffff000008d95d90 x10: 0000000000000b00
      [   17.561819] x9 : ffff000100d9bb90 x8 : ffff802fb89d6560
      [   17.570829] x7 : 0000000000000004 x6 : 00000004a1801d05
      [   17.579839] x5 : 0000000000000000 x4 : 0000000000000000
      [   17.588852] x3 : ffff802fb89d5a00 x2 : 0000000000000000
      [   17.597734] x1 : 0000000200000000 x0 : 0000000200000000
      [   17.606631] Process kworker/u130:0 (pid: 95, stack limit = 0x(____ptrval____))
      [   17.617438] Call trace:
      [   17.623349]  __mutex_lock.isra.1+0x74/0x540
      [   17.630927]  __mutex_lock_slowpath+0x24/0x30
      [   17.638602]  mutex_lock+0x50/0x60
      [   17.645295]  drain_workqueue+0x34/0x198
      [   17.652623]  __sas_drain_work+0x7c/0x168
      [   17.659903]  sas_drain_work+0x60/0x68
      [   17.666947]  hisi_sas_scan_finished+0x30/0x40 [hisi_sas_main]
      [   17.676129]  do_scsi_scan_host+0x70/0xb0
      [   17.683534]  do_scan_async+0x20/0x228
      [   17.690586]  async_run_entry_fn+0x4c/0x1d0
      [   17.697997]  process_one_work+0x1b4/0x3f8
      [   17.705296]  worker_thread+0x54/0x470
      
      Every time the call trace is not the same, but the overwrite address
      is always the same:
      Unable to handle kernel paging request at virtual address 0000000200000040
      
      The root cause is, when write the reg XGMAC_MAC_TX_LF_RF_CONTROL_REG,
      didn't use the io_base offset.
      Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0b09844
    • Y
      net: hns: Use NAPI_POLL_WEIGHT for hns driver · acb1ce15
      Yonglong Liu 提交于
      When the HNS driver loaded, always have an error print:
      "netif_napi_add() called with weight 256"
      
      This is because the kernel checks the NAPI polling weights
      requested by drivers and it prints an error message if a driver
      requests a weight bigger than 64.
      
      So use NAPI_POLL_WEIGHT to fix it.
      Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: NPeng Li <lipeng321@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acb1ce15
    • L
      net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw() · 3a39a12a
      Liubin Shu 提交于
      This patch is trying to fix the issue due to:
      [27237.844750] BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x708/0xa18[hns_enet_drv]
      
      After hnae_queue_xmit() in hns_nic_net_xmit_hw(), can be
      interrupted by interruptions, and than call hns_nic_tx_poll_one()
      to handle the new packets, and free the skb. So, when turn back to
      hns_nic_net_xmit_hw(), calling skb->len will cause use-after-free.
      
      This patch update tx ring statistics in hns_nic_tx_poll_one() to
      fix the bug.
      Signed-off-by: NLiubin Shu <shuliubin@huawei.com>
      Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: NPeng Li <lipeng321@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a39a12a
  7. 04 4月, 2019 2 次提交