1. 25 11月, 2017 4 次提交
    • M
      net: thunderbolt: Stop using zero to mean no valid DMA mapping · 540c1115
      Mika Westerberg 提交于
      Commit 86dabda4 ("net: thunderbolt: Clear finished Tx frame bus
      address in tbnet_tx_callback()") fixed a DMA-API violation where the
      driver called dma_unmap_page() in tbnet_free_buffers() for a bus address
      that might already be unmapped. The fix was to zero out the bus address
      of a frame in tbnet_tx_callback().
      
      However, as pointed out by David Miller, zero might well be valid
      mapping (at least in theory) so it is not good idea to use it here.
      
      It turns out that we don't need the whole map/unmap dance for Tx buffers
      at all. Instead we can map the buffers when they are initially allocated
      and unmap them when the interface is brought down. In between we just
      DMA sync the buffers for the CPU or device as needed.
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540c1115
    • S
      net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts · fa6d7cb5
      Sunil Goutham 提交于
      Don't offload IP header checksum to NIC.
      
      This fixes a previous patch which enabled checksum offloading
      for both IPv4 and IPv6 packets.  So L3 checksum offload was
      getting enabled for IPv6 pkts.  And HW is dropping these pkts
      as it assumes the pkt is IPv4 when IP csum offload is set
      in the SQ descriptor.
      
      Fixes:  3a9024f5 ("net: thunderx: Enable TSO and checksum offloads for ipv6")
      Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: NAleksey Makarov <aleksey.makarov@auriga.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa6d7cb5
    • Z
      forcedeth: replace pci_unmap_page with dma_unmap_page · ca43a0c7
      Zhu Yanjun 提交于
      The function pci_unmap_page is obsolete. So it is replaced with
      the function dma_unmap_page.
      
      CC: Srinivas Eeda <srinivas.eeda@oracle.com>
      CC: Joe Jin <joe.jin@oracle.com>
      CC: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca43a0c7
    • D
      Merge tag 'rxrpc-fixes-20171124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 5f109b94
      David S. Miller 提交于
      David Howells says:
      
      ====================
      rxrpc: Fixes and improvements
      
      Here's a set of patches that fix and improve some stuff in the AF_RXRPC
      protocol:
      
      The patches are:
      
       (1) Unlock mutex returned by rxrpc_accept_call().
      
       (2) Don't set connection upgrade by default.
      
       (3) Differentiate the call->user_mutex used by the kernel from that used
           by userspace calling sendmsg() to avoid lockdep warnings.
      
       (4) Delay terminal ACK transmission to a work queue so that it can be
           replaced by the next call if there is one.
      
       (5) Split the call parameters from the connection parameters so that more
           call-specific parameters can be passed through.
      
       (6) Fix the call timeouts to work the same as for other RxRPC/AFS
           implementations.
      
       (7) Don't transmit DELAY ACKs immediately, but instead delay them slightly
           so that can be discarded or can represent more packets.
      
       (8) Use RTT to calculate certain protocol timeouts.
      
       (9) Add a timeout to detect lost ACK/DATA packets.
      
      (10) Add a keepalive function so that we ping the peer if we haven't
           transmitted for a short while, thereby keeping intervening firewall
           routes open.
      
      (11) Make service endpoints expire like they're supposed to so that the UDP
           port can be reused.
      
      (12) Fix connection expiry timers to make cleanup happen in a more timely
           fashion.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f109b94
  2. 24 11月, 2017 36 次提交
    • D
      rxrpc: Fix conn expiry timers · 3d18cbb7
      David Howells 提交于
      Fix the rxrpc connection expiry timers so that connections for closed
      AF_RXRPC sockets get deleted in a more timely fashion, freeing up the
      transport UDP port much more quickly.
      
       (1) Replace the delayed work items with work items plus timers so that
           timer_reduce() can be used to shorten them and so that the timer
           doesn't requeue the work item if the net namespace is dead.
      
       (2) Don't use queue_delayed_work() as that won't alter the timeout if the
           timer is already running.
      
       (3) Don't rearm the timers if the network namespace is dead.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3d18cbb7
    • D
      rxrpc: Fix service endpoint expiry · f859ab61
      David Howells 提交于
      RxRPC service endpoints expire like they're supposed to by the following
      means:
      
       (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
           global service conn timeout, otherwise the first rxrpc_net struct to
           die will cause connections on all others to expire immediately from
           then on.
      
       (2) Mark local service endpoints for which the socket has been closed
           (->service_closed) so that the expiration timeout can be much
           shortened for service and client connections going through that
           endpoint.
      
       (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
           count reaches 1, not 0, as idle conns have a 1 count.
      
       (4) The accumulator for the earliest time we might want to schedule for
           should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
           the comparison functions use signed arithmetic.
      
       (5) Simplify the expiration handling, adding the expiration value to the
           idle timestamp each time rather than keeping track of the time in the
           past before which the idle timestamp must go to be expired.  This is
           much easier to read.
      
       (6) Ignore the timeouts if the net namespace is dead.
      
       (7) Restart the service reaper work item rather the client reaper.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f859ab61
    • D
      rxrpc: Add keepalive for a call · 415f44e4
      David Howells 提交于
      We need to transmit a packet every so often to act as a keepalive for the
      peer (which has a timeout from the last time it received a packet) and also
      to prevent any intervening firewalls from closing the route.
      
      Do this by resetting a timer every time we transmit a packet.  If the timer
      ever expires, we transmit a PING ACK packet and thereby also elicit a PING
      RESPONSE ACK from the other side - which prevents our last-rx timeout from
      expiring.
      
      The timer is set to 1/6 of the last-rx timeout so that we can detect the
      other side going away if it misses 6 replies in a row.
      
      This is particularly necessary for servers where the processing of the
      service function may take a significant amount of time.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      415f44e4
    • D
      rxrpc: Add a timeout for detecting lost ACKs/lost DATA · bd1fdf8c
      David Howells 提交于
      Add an extra timeout that is set/updated when we send a DATA packet that
      has the request-ack flag set.  This allows us to detect if we don't get an
      ACK in response to the latest flagged packet.
      
      The ACK packet is adjudged to have been lost if it doesn't turn up within
      2*RTT of the transmission.
      
      If the timeout occurs, we schedule the sending of a PING ACK to find out
      the state of the other side.  If a new DATA packet is ready to go sooner,
      we cancel the sending of the ping and set the request-ack flag on that
      instead.
      
      If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what
      we had at the time of the ping transmission, we adjudge all the DATA
      packets sent between the response tx_top and the ping-time tx_top to have
      been lost and retransmit immediately.
      
      Rather than sending a PING ACK, we could just pick a DATA packet and
      speculatively retransmit that with request-ack set.  It should result in
      either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the
      a PING-RESPONSE ACK mentioned above.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bd1fdf8c
    • D
      rxrpc: Express protocol timeouts in terms of RTT · beb8e5e4
      David Howells 提交于
      Express protocol timeouts for data retransmission and deferred ack
      generation in terms on RTT rather than specified timeouts once we have
      sufficient RTT samples.
      
      For the moment, this requires just one RTT sample to be able to use this
      for ack deferral and two for data retransmission.
      
      The data retransmission timeout is set at RTT*1.5 and the ACK deferral
      timeout is set at RTT.
      
      Note that the calculated timeout is limited to a minimum of 4ns to make
      sure it doesn't happen too quickly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      beb8e5e4
    • D
      rxrpc: Don't transmit DELAY ACKs immediately on proposal · 8637abaa
      David Howells 提交于
      Don't transmit a DELAY ACK immediately on proposal when the Rx window is
      rotated, but rather defer it to the work function.  This means that we have
      a chance to queue/consume more received packets before we actually send the
      DELAY ACK, or even cancel it entirely, thereby reducing the number of
      packets transmitted.
      
      We do, however, want to continue sending other types of packet immediately,
      particularly REQUESTED ACKs, as they may be used for RTT calculation by the
      other side.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8637abaa
    • D
      rxrpc: Fix call timeouts · a158bdd3
      David Howells 提交于
      Fix the rxrpc call expiration timeouts and make them settable from
      userspace.  By analogy with other rx implementations, there should be three
      timeouts:
      
       (1) "Normal timeout"
      
           This is set for all calls and is triggered if we haven't received any
           packets from the peer in a while.  It is measured from the last time
           we received any packet on that call.  This is not reset by any
           connection packets (such as CHALLENGE/RESPONSE packets).
      
           If a service operation takes a long time, the server should generate
           PING ACKs at a duration that's substantially less than the normal
           timeout so is to keep both sides alive.  This is set at 1/6 of normal
           timeout.
      
       (2) "Idle timeout"
      
           This is set only for a service call and is triggered if we stop
           receiving the DATA packets that comprise the request data.  It is
           measured from the last time we received a DATA packet.
      
       (3) "Hard timeout"
      
           This can be set for a call and specified the maximum lifetime of that
           call.  It should not be specified by default.  Some operations (such
           as volume transfer) take a long time.
      
      Allow userspace to set/change the timeouts on a call with sendmsg, using a
      control message:
      
      	RXRPC_SET_CALL_TIMEOUTS
      
      The data to the message is a number of 32-bit words, not all of which need
      be given:
      
      	u32 hard_timeout;	/* sec from first packet */
      	u32 idle_timeout;	/* msec from packet Rx */
      	u32 normal_timeout;	/* msec from data Rx */
      
      This can be set in combination with any other sendmsg() that affects a
      call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a158bdd3
    • D
      rxrpc: Split the call params from the operation params · 48124178
      David Howells 提交于
      When rxrpc_sendmsg() parses the control message buffer, it places the
      parameters extracted into a structure, but lumps together call parameters
      (such as user call ID) with operation parameters (such as whether to send
      data, send an abort or accept a call).
      
      Split the call parameters out into their own structure, a copy of which is
      then embedded in the operation parameters struct.
      
      The call parameters struct is then passed down into the places that need it
      instead of passing the individual parameters.  This allows for extra call
      parameters to be added.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      48124178
    • D
      rxrpc: Delay terminal ACK transmission on a client call · 3136ef49
      David Howells 提交于
      Delay terminal ACK transmission on a client call by deferring it to the
      connection processor.  This allows it to be skipped if we can send the next
      call instead, the first DATA packet of which will implicitly ack this call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3136ef49
    • D
      rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls · 9faaff59
      David Howells 提交于
      Provide a different lockdep key for rxrpc_call::user_mutex when the call is
      made on a kernel socket, such as by the AFS filesystem.
      
      The problem is that lockdep registers a false positive between userspace
      calling the sendmsg syscall on a user socket where call->user_mutex is held
      whilst userspace memory is accessed whereas the AFS filesystem may perform
      operations with mmap_sem held by the caller.
      
      In such a case, the following warning is produced.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.14.0-fscache+ #243 Tainted: G            E
      ------------------------------------------------------
      modpost/16701 is trying to acquire lock:
       (&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs]
      
      but task is already holding lock:
       (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 (&mm->mmap_sem){++++}:
             __might_fault+0x61/0x89
             _copy_from_iter_full+0x40/0x1fa
             rxrpc_send_data+0x8dc/0xff3
             rxrpc_do_sendmsg+0x62f/0x6a1
             rxrpc_sendmsg+0x166/0x1b7
             sock_sendmsg+0x2d/0x39
             ___sys_sendmsg+0x1ad/0x22b
             __sys_sendmsg+0x41/0x62
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #2 (&call->user_mutex){+.+.}:
             __mutex_lock+0x86/0x7d2
             rxrpc_new_client_call+0x378/0x80e
             rxrpc_kernel_begin_call+0xf3/0x154
             afs_make_call+0x195/0x454 [kafs]
             afs_vl_get_capabilities+0x193/0x198 [kafs]
             afs_vl_lookup_vldb+0x5f/0x151 [kafs]
             afs_create_volume+0x2e/0x2f4 [kafs]
             afs_mount+0x56a/0x8d7 [kafs]
             mount_fs+0x6a/0x109
             vfs_kern_mount+0x67/0x135
             do_mount+0x90b/0xb57
             SyS_mount+0x72/0x98
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #1 (k-sk_lock-AF_RXRPC){+.+.}:
             lock_sock_nested+0x74/0x8a
             rxrpc_kernel_begin_call+0x8a/0x154
             afs_make_call+0x195/0x454 [kafs]
             afs_fs_get_capabilities+0x17a/0x17f [kafs]
             afs_probe_fileserver+0xf7/0x2f0 [kafs]
             afs_select_fileserver+0x83f/0x903 [kafs]
             afs_fetch_status+0x89/0x11d [kafs]
             afs_iget+0x16f/0x4f8 [kafs]
             afs_mount+0x6c6/0x8d7 [kafs]
             mount_fs+0x6a/0x109
             vfs_kern_mount+0x67/0x135
             do_mount+0x90b/0xb57
             SyS_mount+0x72/0x98
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #0 (&vnode->io_lock){+.+.}:
             lock_acquire+0x174/0x19f
             __mutex_lock+0x86/0x7d2
             afs_begin_vnode_operation+0x33/0x77 [kafs]
             afs_fetch_data+0x80/0x12a [kafs]
             afs_readpages+0x314/0x405 [kafs]
             __do_page_cache_readahead+0x203/0x2ba
             filemap_fault+0x179/0x54d
             __do_fault+0x17/0x60
             __handle_mm_fault+0x6d7/0x95c
             handle_mm_fault+0x24e/0x2a3
             __do_page_fault+0x301/0x486
             do_page_fault+0x236/0x259
             page_fault+0x22/0x30
             __clear_user+0x3d/0x60
             padzero+0x1c/0x2b
             load_elf_binary+0x785/0xdc7
             search_binary_handler+0x81/0x1ff
             do_execveat_common.isra.14+0x600/0x888
             do_execve+0x1f/0x21
             SyS_execve+0x28/0x2f
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      other info that might help us debug this:
      
      Chain exists of:
        &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&mm->mmap_sem);
                                     lock(&call->user_mutex);
                                     lock(&mm->mmap_sem);
        lock(&vnode->io_lock);
      
       *** DEADLOCK ***
      
      1 lock held by modpost/16701:
       #0:  (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
      
      stack backtrace:
      CPU: 0 PID: 16701 Comm: modpost Tainted: G            E   4.14.0-fscache+ #243
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Call Trace:
       dump_stack+0x67/0x8e
       print_circular_bug+0x341/0x34f
       check_prev_add+0x11f/0x5d4
       ? add_lock_to_list.isra.12+0x8b/0x8b
       ? add_lock_to_list.isra.12+0x8b/0x8b
       ? __lock_acquire+0xf77/0x10b4
       __lock_acquire+0xf77/0x10b4
       lock_acquire+0x174/0x19f
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       __mutex_lock+0x86/0x7d2
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       afs_begin_vnode_operation+0x33/0x77 [kafs]
       afs_fetch_data+0x80/0x12a [kafs]
       afs_readpages+0x314/0x405 [kafs]
       __do_page_cache_readahead+0x203/0x2ba
       ? filemap_fault+0x179/0x54d
       filemap_fault+0x179/0x54d
       __do_fault+0x17/0x60
       __handle_mm_fault+0x6d7/0x95c
       handle_mm_fault+0x24e/0x2a3
       __do_page_fault+0x301/0x486
       do_page_fault+0x236/0x259
       page_fault+0x22/0x30
      RIP: 0010:__clear_user+0x3d/0x60
      RSP: 0018:ffff880071e93da0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c
      RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720
      RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00
      R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000
       padzero+0x1c/0x2b
       load_elf_binary+0x785/0xdc7
       search_binary_handler+0x81/0x1ff
       do_execveat_common.isra.14+0x600/0x888
       do_execve+0x1f/0x21
       SyS_execve+0x28/0x2f
       do_syscall_64+0x89/0x1be
       entry_SYSCALL64_slow_path+0x25/0x25
      RIP: 0033:0x7fdb6009ee07
      RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
      RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07
      RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900
      RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000
      R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000
      R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9faaff59
    • D
      rxrpc: Don't set upgrade by default in sendmsg() · 48ca2463
      David Howells 提交于
      Don't set upgrade by default when creating a call from sendmsg().  This is
      a holdover from when I was testing the code.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      48ca2463
    • D
      rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing · 03a6c822
      David Howells 提交于
      The caller of rxrpc_accept_call() must release the lock on call->user_mutex
      returned by that function.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      03a6c822
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 1d3b78bb
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix PCI IDs of 9000 series iwlwifi devices, from Luca Coelho.
      
       2) bpf offload bug fixes from Jakub Kicinski.
      
       3) Fix bpf verifier to NOP out code which is dead at run time because
          due to branch pruning the verifier will not explore such
          instructions. From Alexei Starovoitov.
      
       4) Fix crash when deleting secondary chains in packet scheduler
          classifier. From Roman Kapl.
      
       5) Fix buffer management bugs in smc, from Ursula Braun.
      
       6) Fix regression in anycast route handling, from David Ahern.
      
       7) Fix link settings regression in r8169, from Tobias Jakobi.
      
       8) Add back enough UFO support so that live migration still works, from
          Willem de Bruijn.
      
       9) Linearize enough packet data for the full extent to which the ipvlan
          code will inspect the packet headers, from Gao Feng.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
        ipvlan: Fix insufficient skb linear check for ipv6 icmp
        ipvlan: Fix insufficient skb linear check for arp
        geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6
        net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY
        net: accept UFO datagrams from tuntap and packet
        net: realtek: r8169: implement set_link_ksettings()
        net: ipv6: Fixup device for anycast routes during copy
        net/smc: Fix preinitialization of buf_desc in __smc_buf_create()
        net/smc: use sk_rcvbuf as start for rmb creation
        ipv6: Do not consider linkdown nexthops during multipath
        net: sched: fix crash when deleting secondary chains
        net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
        bpf: fix branch pruning logic
        bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO
        bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO
        bpf: remove explicit handling of 0 for arg2 in bpf_probe_read
        bpf: introduce ARG_PTR_TO_MEM_OR_NULL
        i40evf: Use smp_rmb rather than read_barrier_depends
        fm10k: Use smp_rmb rather than read_barrier_depends
        igb: Use smp_rmb rather than read_barrier_depends
        ...
      1d3b78bb
    • L
      Merge tag 'platform-drivers-x86-v4.15-2' of git://git.infradead.org/linux-platform-drivers-x86 · 36f20ee2
      Linus Torvalds 提交于
      Pull x86 platform driver fixes from Darren Hart:
       "Fix two issues resulting from the dell-smbios refactoring and
        introduction of the dell-smbios-wmi dispatcher.
      
        The first ensures a proper error code is returned when kzalloc fails.
      
        The second avoids an issue in older Dell BIOS implementations which
        would fail if the more complex calls were made by limiting those
        platforms to the simple calls such as those used by the existing
        dell-laptop and dell-wmi drivers, preserving their functionality prior
        to the addition of the dell-smbios-wmi dispatcher"
      
      * tag 'platform-drivers-x86-v4.15-2' of git://git.infradead.org/linux-platform-drivers-x86:
        platform/x86: dell-laptop: fix error return code in dell_init()
        platform/x86: dell-smbios-wmi: Disable userspace interface if missing hotfix
      36f20ee2
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 06c94400
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "Two basic fixes: one for the sparse problem with the blacklist flags
        and another for a hang forever in bnx2i"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: Use 'blist_flags_t' for scsi_devinfo flags
        scsi: bnx2fc: Fix hung task messages when a cleanup response is not received during abort
      06c94400
    • L
      Merge tag 'sound-fix-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · b64f26c6
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "All commits found here are small fixes for regression or stable:
      
         - PCM timestamp behavior fix that could be seen as a regression
      
         - Remove spurious WARN_ON() from ALSA timer 32bit compat ioctl
      
         - HD-audio HDMI/DP channel mapping fix for 32bit archs
      
         - Fix the previous fix for HD-audio initialization code
      
         - More hardening USB-audio against malicious USB descriptors
      
         - HD-audio quirks/fixes (Realtek codec, AMD controller)
      
         - Missing help text for the recent Intel SST kconfig change"
      
      * tag 'sound-fix-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda: Add Raven PCI ID
        ALSA: hda/realtek - Fix ALC700 family no sound issue
        ALSA: hda - Fix yet remaining issue with vmaster 0dB initialization
        ALSA: usb-audio: Add sanity checks in v2 clock parsers
        ALSA: usb-audio: Fix potential zero-division at parsing FU
        ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
        ALSA: usb-audio: Add sanity checks to FE parser
        ALSA: timer: Remove kernel warning at compat ioctl error paths
        ALSA: pcm: update tstamp only if audio_tstamp changed
        ALSA: hda/realtek: Add headset mic support for Intel NUC Skull Canyon
        ALSA: hda: Fix too short HDMI/DP chmap reporting
        ALSA: usb-audio: uac1: Invalidate ctl on interrupt
        ALSA: hda/realtek - Fix ALC275 no sound issue
        ASoC: Intel: Add help text for SND_SOC_INTEL_SST_TOPLEVEL
      b64f26c6
    • L
      Merge tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux · c353bfc6
      Linus Torvalds 提交于
      Pull more drm updates from Dave Airlie:
       "Fixes/cleanups for rc1, non-desktop flags for VR
      
         - remove the MSM dt-bindings file Rob managed to push in the previous
           pull.
      
         - add a property/edid quirk to denote HMD devices, I had these
           hanging around for a few weeks and Keith had done some work on
           them, they are fairly self contained and small, and only affect
           people using HTC Vive VR headsets so far.
      
         - amdgpu, tegra, tilcdc, fsl fixes
      
         - some imx-drm cleanups I missed, these seemed pretty small, and no
           reason to hold off.
      
        I have one TTM regression fix (fixes bochs-vga in qemu) sitting
        locally awaiting review I'll probably send that in a separate pull
        request tomorrow"
      
      * tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux: (33 commits)
        dt-bindings: remove file that was added accidentally
        drm/edid: quirk HTC vive headset as non-desktop. [v2]
        drm/fb: add support for not enabling fbcon on non-desktop displays [v2]
        drm: add connector info/property for non-desktop displays [v2]
        drm/amdgpu: fix rmmod KCQ disable failed error
        drm/amdgpu: fix kernel hang when starting VNC server
        drm/amdgpu: don't skip attributes when powerplay is enabled
        drm/amd/pp: fix typecast error in powerplay.
        drm/tilcdc: Remove obsolete "ti,tilcdc,slave" dts binding support
        drm/tegra: sor: Reimplement pad clock
        Revert "drm/radeon: dont switch vt on suspend"
        drm/amd/amdgpu: fix over-bound accessing in amdgpu_cs_wait_any_fence
        drm/amd/powerplay: fix unfreeze level smc message for smu7
        drm/amdgpu:fix memleak
        drm/amdgpu:fix memleak in takedown
        drm/amd/pp: fix dpm randomly failed on Vega10
        drm/amdgpu: set f_mapping on exported DMA-bufs
        drm/amdgpu: Properly allocate VM invalidate eng v2
        drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume()
        drm/fsl-dcu: avoid disabling pixel clock twice on suspend
        ...
      c353bfc6
    • L
      Merge tag 'docs-4.15-2' of git://git.lwn.net/linux · 1d3bc636
      Linus Torvalds 提交于
      Pull documentation updates from Jonathan Corbet:
       "A few late-arriving docs updates that have no real reason to wait.
      
        There's a new "Co-Developed-by" tag described by Greg, and a build
        enhancement from Willy to generate docs warnings during a kernel build
        (but only when additional warnings have been requested in general)"
      
      * tag 'docs-4.15-2' of git://git.lwn.net/linux:
        Add optional check for bad kernel-doc comments
        Documentation: fix profile= options in kernel-parameters.txt
        documentation/svga.txt: update outdated file
        kokr/memory-barriers.txt: Fix typo in paring example
        kokr/memory-barriers/txt: Replace uses of "transitive"
        Documentation/process: add Co-Developed-by: tag for patches with multiple authors
      1d3bc636
    • L
      Merge branch 'next-keys' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · dab0badc
      Linus Torvalds 提交于
      Pull keys update from James Morris:
       "There's nothing too controversial here:
      
         - Doc fix for keyctl_read().
      
         - time_t -> time64_t replacement.
      
         - Set the module licence on things to prevent tainting"
      
      * 'next-keys' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        pkcs7: Set the module licence to prevent tainting
        security: keys: Replace time_t with time64_t for struct key_preparsed_payload
        security: keys: Replace time_t/timespec with time64_t
        KEYS: fix in-kernel documentation for keyctl_read()
      dab0badc
    • L
      Merge tag 'apparmor-pr-2017-11-21' of... · 26064dea
      Linus Torvalds 提交于
      Merge tag 'apparmor-pr-2017-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
      
      Pull apparmor updates from John Johansen:
       "No features this time, just minor cleanups and bug fixes.
      
        Cleanups:
         - fix spelling mistake: "resoure" -> "resource"
         - remove unused redundant variable stop
         - Fix bool initialization/comparison
      
        Bug Fixes:
         - initialized returned struct aa_perms
         - fix leak of null profile name if profile allocation fails
         - ensure that undecidable profile attachments fail
         - fix profile attachment for special unconfined profiles
         - fix locking when creating a new complain profile.
         - fix possible recursive lock warning in __aa_create_ns"
      
      * tag 'apparmor-pr-2017-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
        apparmor: fix possible recursive lock warning in __aa_create_ns
        apparmor: fix locking when creating a new complain profile.
        apparmor: fix profile attachment for special unconfined profiles
        apparmor: ensure that undecidable profile attachments fail
        apparmor: fix leak of null profile name if profile allocation fails
        apparmor: remove unused redundant variable stop
        apparmor: Fix bool initialization/comparison
        apparmor: initialized returned struct aa_perms
        apparmor: fix spelling mistake: "resoure" -> "resource"
      26064dea
    • J
      Merge tag 'keys-next-20171123' of... · ce44cd8d
      James Morris 提交于
      Merge tag 'keys-next-20171123' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next-keys
      
      Merge keys subsystem changes from David Howells, for v4.15.
      ce44cd8d
    • D
      Merge branch 'ipvlan-Fix-insufficient-skb-linear-check' · 9ed33805
      David S. Miller 提交于
      Gao Feng says:
      
      ====================
      ipvlan: Fix insufficient skb linear check
      
      The current ipvlan codes use pskb_may_pull to get the skb linear header in
      func ipvlan_get_L3_hdr, but the size isn't enough for arp and ipv6 icmp.
      So it may access the unexpected momory in ipvlan_addr_lookup.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ed33805
    • G
      ipvlan: Fix insufficient skb linear check for ipv6 icmp · 747a7135
      Gao Feng 提交于
      In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to
      make sure the skb header has enough linear room for ipv6 header. But it
      would use the latter memory directly without linear check when it is icmp.
      So it still may access the unepxected memory in ipvlan_addr_lookup.
      
      Now invoke the pskb_may_pull again if it is ipv6 icmp.
      Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      747a7135
    • G
      ipvlan: Fix insufficient skb linear check for arp · 5fc9220a
      Gao Feng 提交于
      In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to
      make sure the skb header has enough linear room for arp header. But it
      would access the arp payload in func ipvlan_addr_lookup. So it still may
      access the unepxected memory.
      
      Now use arp_hdr_len(port->dev) instead of the arp header as the param.
      Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fc9220a
    • H
      geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6 · f9094b76
      Hangbin Liu 提交于
      Stefano pointed that configure or show UDP_ZERO_CSUM6_RX/TX info doesn't
      make sense if we haven't enabled CONFIG_IPV6. Fix it by adding
      if IS_ENABLED(CONFIG_IPV6) check.
      
      Fixes: abe492b4 ("geneve: UDP checksum configuration via netlink")
      Fixes: fd7eafd0 ("geneve: fix fill_info when link down")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9094b76
    • D
      Merge tag 'wireless-drivers-for-davem-2017-11-22' of... · d6efab62
      David S. Miller 提交于
      Merge tag 'wireless-drivers-for-davem-2017-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for 4.15
      
      First set of fixes for 4.15. Most important here is the iwlwifi fix
      for scan command firmware interface change.
      
      ath10k
      
      * fix CCMP-256, GCMP and GCMP-256 in raw mode, it was never working
      
      wcn36xx
      
      * fix device tree node search
      
      iwlwifi
      
      * fix a regression with firmware API change of scan cmd (introduced in
        firmware version 34)
      
      * add a bunch of PCI IDs and fix configuration structs for A000 devices
      
      * fix the exported firmware name strings for 9000 and A000 devices
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6efab62
    • D
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 003cd770
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Fixes 2017-11-21
      
      This series contains fixes for igb/vf, ixgbe/vf, i40e/vf and fm10k.
      
      Jake fixes a regression issue with older firmware, where we were using
      the NVM lock to synchronize NVM reads for all devices and firmware
      versions, yet this caused issues with older firmware prior to version
      1.5.  Fixed this by only grabbing the lock for newer devices and firmware
      version 1.5 or newer.
      
      Zijie Pan fixes the calculation of the i40e VF MAC addresses, where it was
      possible to increment to the next MAC entry without calling
      i40e_add_mac_filter().
      
      Amritha removes the upper limit of 64 queues on a channel VSI since the
      upper bound is determined by the VSI's num_queue_pairs.
      
      Filip fixes an issue during FLR resets, where should have been checking
      for upcoming core reset and if so, just return with I40E_ERR_NOT_READY.
      
      Alan fixes the notifying clients of l2 parameters by copying the
      parameters to the client instance struct and re-organizes the priority
      in which the client tasks fire so that if the flag for notifying l2
      params is set, it will trigger before the client open task.  Also fixed
      the promiscuous settings after reset for all the VSI's.
      
      Brian King from IBM fixes an issue seen on Power systems which would
      result in skb list corruption and eventual kernel oops.  Brian
      provides the same fix for nearly all our drivers, to replace the
      read_barrier_depends with smp_rmb() to ensure loads are ordered with
      respect to the load of tx_buffer->next_to_watch.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      003cd770
    • F
      net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY · 4b52d010
      Florian Fainelli 提交于
      The PHY on BCM7278 has an additional bit that needs to be cleared:
      IDDQ_GLOBAL_PWR, without doing this, the PHY remains stuck in reset out
      of suspend/resume cycles.
      
      Fixes: 0fe99338 ("net: dsa: bcm_sf2: Add support for BCM7278 integrated switch")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b52d010
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · e4be7bab
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2017-11-23
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Several BPF offloading fixes, from Jakub. Among others:
      
          - Limit offload to cls_bpf and XDP program types only.
          - Move device validation into the driver and don't make
            any assumptions about the device in the classifier due
            to shared blocks semantics.
          - Don't pass offloaded XDP program into the driver when
            it should be run in native XDP instead. Offloaded ones
            are not JITed for the host in such cases.
          - Don't destroy device offload state when moved to
            another namespace.
          - Revert dumping offload info into user space for now,
            since ifindex alone is not sufficient. This will be
            redone properly for bpf-next tree.
      
      2) Fix test_verifier to avoid using bpf_probe_write_user()
         helper in test cases, since it's dumping a warning into
         kernel log which may confuse users when only running tests.
         Switch to use bpf_trace_printk() instead, from Yonghong.
      
      3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics
         before it becomes uabi, from Gianluca. More specifically:
      
          - Add a type ARG_PTR_TO_MEM_OR_NULL that is used only
            by bpf_csum_diff(), where the argument is either a
            valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO
            then enforces a valid pointer in case of non-0 size
            or a valid pointer or NULL in case of size 0. Given
            that, the semantics for ARG_PTR_TO_MEM in combination
            with ARG_CONST_SIZE_OR_ZERO are now such that in case
            of size 0, the pointer must always be valid and cannot
            be NULL. This fix in semantics allows for bpf_probe_read()
            to drop the recently added size == 0 check in the helper
            that would become part of uabi otherwise once released.
            At the same time we can then fix bpf_probe_read_str() and
            bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO
            instead of ARG_CONST_SIZE in order to fix recently
            reported issues by Arnaldo et al, where LLVM optimizes
            two boundary checks into a single one for unknown
            variables where the verifier looses track of the variable
            bounds and thus rejects valid programs otherwise.
      
      4) A fix for the verifier for the case when it detects
         comparison of two constants where the branch is guaranteed
         to not be taken at runtime. Verifier will rightfully prune
         the exploration of such paths, but we still pass the program
         to JITs, where they would complain about using reserved
         fields, etc. Track such dead instructions and sanitize
         them with mov r0,r0. Rejection is not possible since LLVM
         may generate them for valid C code and doesn't do as much
         data flow analysis as verifier. For bpf-next we might
         implement removal of such dead code and adjust branches
         instead. Fix from Alexei.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4be7bab
    • W
      net: accept UFO datagrams from tuntap and packet · 0c19f846
      Willem de Bruijn 提交于
      Tuntap and similar devices can inject GSO packets. Accept type
      VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
      
      Processes are expected to use feature negotiation such as TUNSETOFFLOAD
      to detect supported offload types and refrain from injecting other
      packets. This process breaks down with live migration: guest kernels
      do not renegotiate flags, so destination hosts need to expose all
      features that the source host does.
      
      Partially revert the UFO removal from 182e0b6b~1..d9d30adf.
      This patch introduces nearly(*) no new code to simplify verification.
      It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
      insertion and software UFO segmentation.
      
      It does not reinstate protocol stack support, hardware offload
      (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
      of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
      
      To support SKB_GSO_UDP reappearing in the stack, also reinstate
      logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
      by squashing in commit 93991221 ("net: skb_needs_check() removes
      CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6
      ("net: avoid skb_warn_bad_offload false positives on UFO").
      
      (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
      ipv6_proxy_select_ident is changed to return a __be32 and this is
      assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
      at the end of the enum to minimize code churn.
      
      Tested
        Booted a v4.13 guest kernel with QEMU. On a host kernel before this
        patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
        enabled, same as on a v4.13 host kernel.
      
        A UFO packet sent from the guest appears on the tap device:
          host:
            nc -l -p -u 8000 &
            tcpdump -n -i tap0
      
          guest:
            dd if=/dev/zero of=payload.txt bs=1 count=2000
            nc -u 192.16.1.1 8000 < payload.txt
      
        Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
        packets arriving fragmented:
      
          ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
          (from https://github.com/wdebruij/kerneltools/tree/master/tests)
      
      Changes
        v1 -> v2
          - simplified set_offload change (review comment)
          - documented test procedure
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
      Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
      Reported-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c19f846
    • T
      net: realtek: r8169: implement set_link_ksettings() · 9e77d7a5
      Tobias Jakobi 提交于
      Commit 6fa1ba61 partially
      implemented the new ethtool API, by replacing get_settings()
      with get_link_ksettings(). This breaks ethtool, since the
      userspace tool (according to the new API specs) never tries
      the legacy set() call, when the new get() call succeeds.
      
      All attempts to chance some setting from userspace result in:
      > Cannot set new settings: Operation not supported
      
      Implement the missing set() call.
      Signed-off-by: NTobias Jakobi <tjakobi@math.uni-bielefeld.de>
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e77d7a5
    • D
      net: ipv6: Fixup device for anycast routes during copy · 98d11291
      David Ahern 提交于
      Florian reported a breakage with anycast routes due to commit
      4832c30d ("net: ipv6: put host and anycast routes on device with
      address"). Prior to this commit anycast routes were added against the
      loopback device causing repetitive route entries with no insight into
      why they existed. e.g.:
        $ ip -6 ro ls  table local type anycast
        anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium
        anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium
        anycast fe80:: dev lo proto kernel metric 0 pref medium
        anycast fe80:: dev lo proto kernel metric 0 pref medium
      
      The point of commit 4832c30d is to add the routes using the device
      with the address which is causing the route to be added. e.g.,:
        $ ip -6 ro ls  table local type anycast
        anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium
        anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium
        anycast fe80:: dev eth2 proto kernel metric 0 pref medium
        anycast fe80:: dev eth1 proto kernel metric 0 pref medium
      
      For traffic to work as it did before, the dst device needs to be switched
      to the loopback when the copy is created similar to local routes.
      
      Fixes: 4832c30d ("net: ipv6: put host and anycast routes on device with address")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98d11291
    • D
      Merge branch 'smc-fixes-for-smc-buffer-handling' · 9477fef4
      David S. Miller 提交于
      Ursula Braun says:
      
      ====================
      net/smc: fixes for smc buffer handling
      
      here are 2 cleanup patches for smc buffer handling.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9477fef4
    • G
      net/smc: Fix preinitialization of buf_desc in __smc_buf_create() · 68870370
      Geert Uytterhoeven 提交于
      With gcc-4.1.2:
      
          net/smc/smc_core.c: In function ‘__smc_buf_create’:
          net/smc/smc_core.c:567: warning: ‘bufsize’ may be used uninitialized in this function
      
      Indeed, if the for-loop is never executed, bufsize is used
      uninitialized.  In addition, buf_desc is stored for later use, while it
      is still a NULL pointer.
      
      Before, error handling was done by checking if buf_desc is non-NULL.
      The cleanup changed this to an error check, but forgot to update the
      preinitialization of buf_desc to an error pointer.
      
      Update the preinitializatin of buf_desc to fix this.
      
      Fixes: b33982c3 ("net/smc: cleanup function __smc_buf_create()")
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68870370
    • U
      net/smc: use sk_rcvbuf as start for rmb creation · 4e1061f4
      Ursula Braun 提交于
      Commit 3e034725 ("net/smc: common functions for RMBs and send buffers")
      merged handling of SMC receive and send buffers. It introduced sk_buf_size
      as merged start value for size determination. But since sk_buf_size is not
      used at all, sk_sndbuf is erroneously used as start for rmb creation.
      This patch makes sure, sk_buf_size is really used as intended, and
      sk_rcvbuf is used as start value for rmb creation.
      
      Fixes: 3e034725 ("net/smc: common functions for RMBs and send buffers")
      Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
      Reviewed-by: NHans Wippel <hwippel@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e1061f4
    • I
      ipv6: Do not consider linkdown nexthops during multipath · bbfcd776
      Ido Schimmel 提交于
      When the 'ignore_routes_with_linkdown' sysctl is set, we should not
      consider linkdown nexthops during route lookup.
      
      While the code correctly verifies that the initially selected route
      ('match') has a carrier, it does not perform the same check in the
      subsequent multipath selection, resulting in a potential packet loss.
      
      In case the chosen route does not have a carrier and the sysctl is set,
      choose the initially selected route.
      
      Fixes: 35103d11 ("net: ipv6 sysctl option to ignore routes when nexthop link is down")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbfcd776