1. 19 8月, 2017 2 次提交
  2. 17 8月, 2017 7 次提交
  3. 16 8月, 2017 2 次提交
  4. 15 8月, 2017 1 次提交
  5. 14 8月, 2017 1 次提交
    • J
      net: export some generic xdp helpers · 7c497478
      Jason Wang 提交于
      This patch tries to export some generic xdp helpers to drivers. This
      can let driver to do XDP for a specific skb. This is useful for the
      case when the packet is hard to be processed at page level directly
      (e.g jumbo/GSO frame).
      
      With this patch, there's no need for driver to forbid the XDP set when
      configuration is not suitable. Instead, it can defer the XDP for
      packets that is hard to be processed directly after skb is created.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c497478
  6. 11 8月, 2017 5 次提交
    • M
      mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem · 99baac21
      Minchan Kim 提交于
      Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
      problem and Mel fixed it[1] and found same problem on MADV_FREE[2].
      
      Quote from Mel Gorman:
       "The race in question is CPU 0 running madv_free and updating some PTEs
        while CPU 1 is also running madv_free and looking at the same PTEs.
        CPU 1 may have writable TLB entries for a page but fail the pte_dirty
        check (because CPU 0 has updated it already) and potentially fail to
        flush.
      
        Hence, when madv_free on CPU 1 returns, there are still potentially
        writable TLB entries and the underlying PTE is still present so that a
        subsequent write does not necessarily propagate the dirty bit to the
        underlying PTE any more. Reclaim at some unknown time at the future
        may then see that the PTE is still clean and discard the page even
        though a write has happened in the meantime. I think this is possible
        but I could have missed some protection in madv_free that prevents it
        happening."
      
      This patch aims for solving both problems all at once and is ready for
      other problem with KSM, MADV_FREE and soft-dirty story[3].
      
      TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
      and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
      catch there are parallel threads going on.  In that case, forcefully,
      flush TLB to prevent for user to access memory via stale TLB entry
      although it fail to gather page table entry.
      
      I confirmed this patch works with [4] test program Nadav gave so this
      patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
      v2" in current mmotm.
      
      NOTE:
      
      This patch modifies arch-specific TLB gathering interface(x86, ia64,
      s390, sh, um).  It seems most of architecture are straightforward but
      s390 need to be careful because tlb_flush_mmu works only if
      mm->context.flush_mm is set to non-zero which happens only a pte entry
      really is cleared by ptep_get_and_clear and friends.  However, this
      problem never changes the pte entries but need to flush to prevent
      memory access from stale tlb.
      
      [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
      [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
      [3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
      [4] https://patchwork.kernel.org/patch/9861621/
      
      [minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
        Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
      Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Reported-by: NNadav Amit <namit@vmware.com>
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99baac21
    • M
      mm: make tlb_flush_pending global · 0a2dd266
      Minchan Kim 提交于
      Currently, tlb_flush_pending is used only for CONFIG_[NUMA_BALANCING|
      COMPACTION] but upcoming patches to solve subtle TLB flush batching
      problem will use it regardless of compaction/NUMA so this patch doesn't
      remove the dependency.
      
      [akpm@linux-foundation.org: remove more ifdefs from world's ugliest printk statement]
      Link: http://lkml.kernel.org/r/20170802000818.4760-6-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a2dd266
    • M
      mm: refactor TLB gathering API · 56236a59
      Minchan Kim 提交于
      This patch is a preparatory patch for solving race problems caused by
      TLB batch.  For that, we will increase/decrease TLB flush pending count
      of mm_struct whenever tlb_[gather|finish]_mmu is called.
      
      Before making it simple, this patch separates architecture specific part
      and rename it to arch_tlb_[gather|finish]_mmu and generic part just
      calls it.
      
      It shouldn't change any behavior.
      
      Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56236a59
    • N
      mm: migrate: fix barriers around tlb_flush_pending · 0a2c4048
      Nadav Amit 提交于
      Reading tlb_flush_pending while the page-table lock is taken does not
      require a barrier, since the lock/unlock already acts as a barrier.
      Removing the barrier in mm_tlb_flush_pending() to address this issue.
      
      However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending()
      while the page-table lock is already released, which may present a
      problem on architectures with weak memory model (PPC).  To deal with
      this case, a new parameter is added to mm_tlb_flush_pending() to
      indicate if it is read without the page-table lock taken, and calling
      smp_mb__after_unlock_lock() in this case.
      
      Link: http://lkml.kernel.org/r/20170802000818.4760-3-namit@vmware.comSigned-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a2c4048
    • N
      mm: migrate: prevent racy access to tlb_flush_pending · 16af97dc
      Nadav Amit 提交于
      Patch series "fixes of TLB batching races", v6.
      
      It turns out that Linux TLB batching mechanism suffers from various
      races.  Races that are caused due to batching during reclamation were
      recently handled by Mel and this patch-set deals with others.  The more
      fundamental issue is that concurrent updates of the page-tables allow
      for TLB flushes to be batched on one core, while another core changes
      the page-tables.  This other core may assume a PTE change does not
      require a flush based on the updated PTE value, while it is unaware that
      TLB flushes are still pending.
      
      This behavior affects KSM (which may result in memory corruption) and
      MADV_FREE and MADV_DONTNEED (which may result in incorrect behavior).  A
      proof-of-concept can easily produce the wrong behavior of MADV_DONTNEED.
      Memory corruption in KSM is harder to produce in practice, but was
      observed by hacking the kernel and adding a delay before flushing and
      replacing the KSM page.
      
      Finally, there is also one memory barrier missing, which may affect
      architectures with weak memory model.
      
      This patch (of 7):
      
      Setting and clearing mm->tlb_flush_pending can be performed by multiple
      threads, since mmap_sem may only be acquired for read in
      task_numa_work().  If this happens, tlb_flush_pending might be cleared
      while one of the threads still changes PTEs and batches TLB flushes.
      
      This can lead to the same race between migration and
      change_protection_range() that led to the introduction of
      tlb_flush_pending.  The result of this race was data corruption, which
      means that this patch also addresses a theoretically possible data
      corruption.
      
      An actual data corruption was not observed, yet the race was was
      confirmed by adding assertion to check tlb_flush_pending is not set by
      two threads, adding artificial latency in change_protection_range() and
      using sysctl to reduce kernel.numa_balancing_scan_delay_ms.
      
      Link: http://lkml.kernel.org/r/20170802000818.4760-2-namit@vmware.com
      Fixes: 20841405 ("mm: fix TLB flush race between migration, and
      change_protection_range")
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16af97dc
  7. 10 8月, 2017 2 次提交
    • J
      nvmet_fc: add defer_req callback for deferment of cmd buffer return · 0fb228d3
      James Smart 提交于
      At queue creation, the transport allocates a local job struct
      (struct nvmet_fc_fcp_iod) for each possible element of the queue.
      When a new CMD is received from the wire, a jobs struct is allocated
      from the queue and then used for the duration of the command.
      The job struct contains buffer space for the wire command iu. Thus,
      upon allocation of the job struct, the cmd iu buffer is copied to
      the job struct and the LLDD may immediately free/reuse the CMD IU
      buffer passed in the call.
      
      However, in some circumstances, due to the packetized nature of FC
      and the api of the FC LLDD which may issue a hw command to send the
      wire response, but the LLDD may not get the hw completion for the
      command and upcall the nvmet_fc layer before a new command may be
      asynchronously received on the wire. In other words, its possible
      for the initiator to get the response from the wire, thus believe a
      command slot free, and send a new command iu. The new command iu
      may be received by the LLDD and passed to the transport before the
      LLDD had serviced the hw completion and made the teardown calls for
      the original job struct. As such, there is no available job struct
      available for the new io. E.g. it appears like the host sent more
      queue elements than the queue size. It didn't based on it's
      understanding.
      
      Rather than treat this as a hard connection failure queue the new
      request until the job struct does free up. As the buffer isn't
      copied as there's no job struct, a special return value must be
      returned to the LLDD to signify to hold off on recycling the cmd
      iu buffer.  And later, when a job struct is allocated and the
      buffer copied, a new LLDD callback is introduced to notify the
      LLDD and allow it to recycle it's command iu buffer.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      0fb228d3
    • W
      sock: fix zerocopy_success regression with msg_zerocopy · 0a4a060b
      Willem de Bruijn 提交于
      Do not use uarg->zerocopy outside msg_zerocopy. In other paths the
      field is not explicitly initialized and aliases another field.
      
      Those paths have only one reference so do not need this intermediate
      variable. Call uarg->callback directly.
      
      Fixes: 1f8b977a ("sock: enable MSG_ZEROCOPY")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a4a060b
  8. 09 8月, 2017 2 次提交
    • E
      bpf/verifier: track signed and unsigned min/max values · b03c9f9f
      Edward Cree 提交于
      Allows us to, sometimes, combine information from a signed check of one
       bound and an unsigned check of the other.
      We now track the full range of possible values, rather than restricting
       ourselves to [0, 1<<30) and considering anything beyond that as
       unknown.  While this is probably not necessary, it makes the code more
       straightforward and symmetrical between signed and unsigned bounds.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b03c9f9f
    • E
      bpf/verifier: rework value tracking · f1174f77
      Edward Cree 提交于
      Unifies adjusted and unadjusted register value types (e.g. FRAME_POINTER is
       now just a PTR_TO_STACK with zero offset).
      Tracks value alignment by means of tracking known & unknown bits.  This
       also replaces the 'reg->imm' (leading zero bits) calculations for (what
       were) UNKNOWN_VALUEs.
      If pointer leaks are allowed, and adjust_ptr_min_max_vals returns -EACCES,
       treat the pointer as an unknown scalar and try again, because we might be
       able to conclude something about the result (e.g. pointer & 0x40 is either
       0 or 0x40).
      Verifier hooks in the netronome/nfp driver were changed to match the new
       data structures.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1174f77
  9. 08 8月, 2017 9 次提交
  10. 07 8月, 2017 9 次提交
    • L
      pinctrl: generic: update references to Documentation/pinctrl.txt · 0cca6c89
      Ludovic Desroches 提交于
      Update deprecated references to Documentation/pinctrl.txt since it has been
      moved to Documentation/driver-api/pinctl.rst.
      Signed-off-by: NLudovic Desroches <ludovic.desroches@o2linux.fr>
      Fixes: 5a9b7383 ("pinctrl.txt: move it to the driver-api book")
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      0cca6c89
    • R
      net/mlx5: Increase the maximum flow counters supported · a8ffcc74
      Rabie Loulou 提交于
      Read new NIC capability field which represnts 16 MSBs of the max flow
      counters number supported (max_flow_counter_31_16).
      
      Backward compatibility with older firmware is preserved, the modified
      driver reads max_flow_counter_31_16 as 0 from the older firmware and
      uses up to 64K counters.
      
      Changed flow counter id from 16 bits to 32 bits. Backward compatibility
      with older firmware is preserved as we kept the 16 LSBs of the counter
      id in place and added 16 MSBs from reserved field.
      
      Changed the background bulk reading of flow counters to work in chunks
      of at most 32K counters, to make sure we don't attempt to allocate very
      large buffers.
      Signed-off-by: NRabie Loulou <rabiel@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a8ffcc74
    • R
      net/mlx5: Fix counter list hardware structure · 61690e09
      Rabie Loulou 提交于
      The counter list hardware structure doesn't contain a clear and
      num_of_counters fields, remove them.
      
      These wrong fields were never used by the driver hence no other driver
      changes.
      
      Fixes: a351a1b0 ("net/mlx5: Introduce bulk reading of flow counters")
      Signed-off-by: NRabie Loulou <rabiel@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      61690e09
    • E
      net/mlx5: Delay events till ib registration ends · 97834eba
      Erez Shitrit 提交于
      When mlx5_ib registers itself to mlx5_core as an interface, it will
      call mlx5_add_device which will call mlx5_ib interface add callback,
      in case the latter successfully returns, only then mlx5_core will add
      it to the interface list and async events will be forwarded to mlx5_ib.
      Between mlx5_ib interface add callback and mlx5_core adding the mlx5_ib
      interface to its devices list, arriving mlx5_core events can be missed
      by the new mlx5_ib registering interface.
      
      In other words:
      thread 1: mlx5_ib: mlx5_register_interface(dev)
      thread 1: mlx5_core: mlx5_add_device(dev)
      thread 1: mlx5_core: ctx = dev->add => (mlx5_ib)->mlx5_ib_add
      thread 2: mlx5_core_event: **new event arrives, forward to dev_list
      thread 1: mlx5_core: add_ctx_to_dev_list(ctx)
      /* previous event was missed by the new interface.*/
      It is ok to miss events before dev->add (mlx5_ib)->mlx5_ib_add_device
      but not after.
      
      We fix this race by accumulating the events that come between the
      ib_register_device (inside mlx5_add_device->(dev->add)) till the adding
      to the list completes and fire them to the new registering interface
      after that.
      
      Fixes: f1ee87fe ("net/mlx5: Organize device list API in one place")
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      97834eba
    • S
      net/mlx5: Separate between E-Switch and MPFS · eeb66cdb
      Saeed Mahameed 提交于
      Multi-Physical Function Switch (MPFs) is required for when multi-PF
      configuration is enabled to allow passing user configured unicast MAC
      addresses to the requesting PF.
      
      Before this patch eswitch.c used to manage the HW MPFS l2 table,
      E-Switch always (regardless of sriov) enabled vport(0) (NIC PF) vport's
      contexts update on unicast mac address list changes, to populate the PF's
      MPFS L2 table accordingly.
      
      In downstream patch we would like to allow compiling the driver without
      E-Switch functionalities, for that we move MPFS l2 table logic out
      of eswitch.c into its own file, and provide Kconfig flag (MLX5_MPFS) to
      allow compiling out MPFS for those who don't want Multi-PF support.
      
      NIC PF netdevice will now directly update MPFS l2 table via the new MPFS
      API. VF netdevice has no access to MPFS L2 table, so E-Switch will remain
      responsible of updating its MPFS l2 table on behalf of its VFs.
      
      Due to this change we also don't require enabling vport(0) (PF vport)
      unicast mac changes events anymore, for when SRIOV is not enabled.
      Which means E-Switch is now activated only on SRIOV activation, and not
      required otherwise.
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Cc: Jes Sorensen <jsorensen@fb.com>
      Cc: kernel-team@fb.com
      eeb66cdb
    • Y
      tcp: fix cwnd undo in Reno and HTCP congestion controls · 4faf7839
      Yuchung Cheng 提交于
      Using ssthresh to revert cwnd is less reliable when ssthresh is
      bounded to 2 packets. This patch uses an existing variable in TCP
      "prior_cwnd" that snapshots the cwnd right before entering fast
      recovery and RTO recovery in Reno.  This fixes the issue discussed
      in netdev thread: "A buggy behavior for Linux TCP Reno and HTCP"
      https://www.spinics.net/lists/netdev/msg444955.htmlSuggested-by: NNeal Cardwell <ncardwell@google.com>
      Reported-by: NWei Sun <unlcsewsun@gmail.com>
      Signed-off-by: NYuchung Cheng <ncardwell@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4faf7839
    • R
      phylink: add module EEPROM support · 770a1ad5
      Russell King 提交于
      Add support for reading module EEPROMs through phylink.
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      770a1ad5
    • R
    • R
      phylink: add phylink infrastructure · 9525ae83
      Russell King 提交于
      The link between the ethernet MAC and its PHY has become more complex
      as the interface evolves.  This is especially true with serdes links,
      where the part of the PHY is effectively integrated into the MAC.
      
      Serdes links can be connected to a variety of devices, including SFF
      modules soldered down onto the board with the MAC, a SFP cage with
      a hotpluggable SFP module which may contain a PHY or directly modulate
      the serdes signals onto optical media with or without a PHY, or even
      a classical PHY connection.
      
      Moreover, the negotiation information on serdes links comes in two
      varieties - SGMII mode, where the PHY provides its speed/duplex/flow
      control information to the MAC, and 1000base-X mode where both ends
      exchange their abilities and each resolve the link capabilities.
      
      This means we need a more flexible means to support these arrangements,
      particularly with the hotpluggable nature of SFP, where the PHY can
      be attached or detached after the network device has been brought up.
      
      Ethtool information can come from multiple sources:
      - we may have a PHY operating in either SGMII or 1000base-X mode, in
        which case we take ethtool/mii data directly from the PHY.
      - we may have a optical SFP module without a PHY, with the MAC
        operating in 1000base-X mode - the ethtool/mii data needs to come
        from the MAC.
      - we may have a copper SFP module with a PHY whic can't be accessed,
        which means we need to take ethtool/mii data from the MAC.
      
      Phylink aims to solve this by providing an intermediary between the
      MAC and PHY, providing a safe way for PHYs to be hotplugged, and
      allowing a SFP driver to reconfigure the serdes connection.
      
      Phylink also takes over support of fixed link connections, where the
      speed/duplex/flow control are fixed, but link status may be controlled
      by a GPIO signal.  By avoiding the fixed-phy implementation, phylink
      can provide a faster response to link events: fixed-phy has to wait for
      phylib to operate its state machine, which can take several seconds.
      In comparison, phylink takes milliseconds.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      
      - remove sync status
      - rework supported and advertisment handling
      - add 1000base-x speed for fixed links
      - use functionality exported from phy-core, reworking
        __phylink_ethtool_ksettings_set for it
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9525ae83