1. 30 9月, 2016 27 次提交
    • D
      Merge branch 'net_proc_perf' · bcdc6efa
      David S. Miller 提交于
      Jia He says:
      
      ====================
      Reduce cache miss for snmp_fold_field
      
      In a PowerPc server with large cpu number(160), besides commit
      a3a77372 ("net: Optimize snmp stat aggregation by walking all
      the percpu data at once"), I watched several other snmp_fold_field
      callsites which would cause high cache miss rate.
      
      test source code:
      ================
      My simple test case, which read from the procfs items endlessly:
      /***********************************************************/
      int main(int argc, char **argv)
      {
              int i;
              int fd = -1 ;
              int rdsize = 0;
              char buf[LINELEN+1];
      
              buf[LINELEN] = 0;
              memset(buf,0,LINELEN);
      
              if(1 >= argc) {
                      printf("file name empty\n");
                      return -1;
              }
      
              fd = open(argv[1], O_RDWR, 0644);
              if(0 > fd){
                      printf("open error\n");
                      return -2;
              }
      
              for(i=0;i<0xffffffff;i++) {
                      while(0 < (rdsize = read(fd,buf,LINELEN))){
                              //nothing here
                      }
      
                      lseek(fd, 0, SEEK_SET);
              }
      
              close(fd);
              return 0;
      }
      /**********************************************************/
      
      compile and run:
      ================
      gcc test.c -o test
      
      perf stat -d -e cache-misses ./test /proc/net/snmp
      perf stat -d -e cache-misses ./test /proc/net/snmp6
      perf stat -d -e cache-misses ./test /proc/net/sctp/snmp
      perf stat -d -e cache-misses ./test /proc/net/xfrm_stat
      
      before the patch set:
      ====================
       Performance counter stats for 'system wide':
      
               355911097      cache-misses                                                 [40.08%]
              2356829300      L1-dcache-loads                                              [60.04%]
               355642645      L1-dcache-load-misses     #   15.09% of all L1-dcache hits   [60.02%]
               346544541      LLC-loads                                                    [59.97%]
                  389763      LLC-load-misses           #    0.11% of all LL-cache hits    [40.02%]
      
             6.245162638 seconds time elapsed
      
      After the patch set:
      ===================
       Performance counter stats for 'system wide':
      
               194992476      cache-misses                                                 [40.03%]
              6718051877      L1-dcache-loads                                              [60.07%]
               194871921      L1-dcache-load-misses     #    2.90% of all L1-dcache hits   [60.11%]
               187632232      LLC-loads                                                    [60.04%]
                  464466      LLC-load-misses           #    0.25% of all LL-cache hits    [39.89%]
      
             6.868422769 seconds time elapsed
      The cache-miss rate can be reduced from 15% to 2.9%
      
      changelog
      =========
      v6:
      - correct v5
      v5:
      - order local variables from longest to shortest line
      v4:
      - move memset into one block of if statement in snmp6_seq_show_item
      - remove the changes in netstat_seq_show considerred the stack usage is too large
      v3:
      - introduce generic interface (suggested by Marcelo Ricardo Leitner)
      - use max_t instead of self defined macro (suggested by David Miller)
      v2:
      - fix bug in udplite statistics.
      - snmp_seq_show is split into 2 parts
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcdc6efa
    • J
      net: Suppress the "Comparison to NULL could be written" warnings · 6d4a741c
      Jia He 提交于
      This is to suppress the checkpatch.pl warning "Comparison to NULL
      could be written". No functional changes here.
      Signed-off-by: NJia He <hejianet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d4a741c
    • J
      ipv6: Remove useless parameter in __snmp6_fill_statsdev · aca05671
      Jia He 提交于
      The parameter items(is always ICMP6_MIB_MAX) is useless for __snmp6_fill_statsdev
      Signed-off-by: NJia He <hejianet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aca05671
    • J
      proc: Reduce cache miss in xfrm_statistics_seq_show · 07613873
      Jia He 提交于
      This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
      aggregate the data by going through all the items of each cpu sequentially.
      Signed-off-by: NJia He <hejianet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07613873
    • J
      proc: Reduce cache miss in sctp_snmp_seq_show · 7d64a94b
      Jia He 提交于
      This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
      aggregate the data by going through all the items of each cpu sequentially.
      Signed-off-by: NJia He <hejianet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d64a94b
    • J
      proc: Reduce cache miss in snmp6_seq_show · 4a4857b1
      Jia He 提交于
      This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
      aggregate the data by going through all the items of each cpu sequentially.
      Signed-off-by: NJia He <hejianet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a4857b1
    • J
      proc: Reduce cache miss in snmp_seq_show · f22d5c49
      Jia He 提交于
      This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
      aggregate the data by going through all the items of each cpu sequentially.
      Then snmp_seq_show is split into 2 parts to avoid build warning "the frame
      size" larger than 1024.
      Signed-off-by: NJia He <hejianet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f22d5c49
    • J
      net:snmp: Introduce generic interfaces for snmp_get_cpu_field{, 64} · 6348ef2d
      Jia He 提交于
      This is to introduce the generic interfaces for snmp_get_cpu_field{,64}.
      It exchanges the two for-loops for collecting the percpu statistics data.
      This can aggregate the data by going through all the items of each cpu
      sequentially.
      Signed-off-by: NJia He <hejianet@gmail.com>
      Suggested-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6348ef2d
    • D
      Merge tag 'rxrpc-rewrite-20160929' of... · fa140354
      David S. Miller 提交于
      Merge tag 'rxrpc-rewrite-20160929' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      David Howells says:
      
      ====================
      rxrpc: Fixes and adjustments
      
      This set of patches contains some fixes and adjustments:
      
       (1) Connections for exclusive calls are being reused because the check to
           work out whether to set RXRPC_CONN_DONT_REUSE is checking where the
           parameters will be copied to (but haven't yet).
      
       (2) Make Tx loss-injection go through the normal return, so the state gets
           set correctly for what the code thinks it has done.
      
           Note lost Tx packets in the tx_data trace rather than the skb
           tracepoint.
      
       (3) Activate channels according to the current state from within the
           channel_lock to avoid someone changing it on us.
      
       (4) Reduce the local endpoint's services list to a single pointer as we
           don't allow service AF_RXRPC sockets to share UDP ports with other
           AF_RXRPC sockets, so there can't be more than one element in the list.
      
       (5) Request more ACKs in slow-start mode to help monitor the state driving
           the window configuration.
      
       (6) Display the serial number of the packet being ACK'd rather than the
           ACK packet's own serial number in the congestion trace as this can be
           related to other entries in the trace.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa140354
    • D
      Merge tag 'wireless-drivers-next-for-davem-2016-09-29' of... · 4e9f4b39
      David S. Miller 提交于
      Merge tag 'wireless-drivers-next-for-davem-2016-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      wireless-drivers-next patches for 4.9
      
      Major changes:
      
      iwlwifi
      
      * work for new hardware support continues
      * dynamic queue allocation stabilization
      * improvements in the MSIx code
      * multiqueue support work continues
      * new firmware version support (API 26)
      * add 8275 series support
      * add 9560 series support
      * add support for MU-MIMO sniffer
      * add support for RRM by scan
      * add support for "reverse" rx packet injection faking hw descriptors
      * migrate to devm memory allocation handling
      * Remove support for older firmwares (API older than -17 and -22)
      
      wl12xx
      
      * support booting the same rootfs with both wl12xx and wl18xx
      
      hostap
      
      * mark the driver as obsolete
      
      ath9k
      
      * disable RNG by default
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e9f4b39
    • D
      Merge branch 'dsa-global-cosmetics' · df90a497
      David S. Miller 提交于
      Vivien Didelot says:
      
      ====================
      net: dsa: mv88e6xxx: Global (1) cosmetics
      
      The Global (1) internal SMI device of Marvell switches is a set of
      registers providing support to different units for MAC addresses (ATU),
      VLANs (VTU), PHY polling (PPU), etc.
      
      Chips (like 88E6060) may use a different address for it, or have
      subtleties in the units (e.g. different number of databases, changing
      how registers must be accessed), making it hard to maintain properly.
      
      This patchset is a first step to polish the Global (1) support, with no
      functional changes though. Here's basically what it does:
      
        - add helpers to access Global1 registers (same for Global2)
        - remove a few family checks (VTU/STU FID registers)
        - s/mv88e6xxx_vtu_stu_entry/mv88e6xxx_vtu_entry/
        - add a per-chip mv88e6xxx_ops structure of function pointers:
      
        struct mv88e6xxx_ops {
              int (*get_eeprom)(struct mv88e6xxx_chip *chip,
                                struct ethtool_eeprom *eeprom, u8 *data);
              int (*set_eeprom)(struct mv88e6xxx_chip *chip,
                                struct ethtool_eeprom *eeprom, u8 *data);
      
              int (*set_switch_mac)(struct mv88e6xxx_chip *chip, u8 *addr);
      
              int (*phy_read)(struct mv88e6xxx_chip *chip, int addr, int reg,
                              u16 *val);
              int (*phy_write)(struct mv88e6xxx_chip *chip, int addr, int reg,
                               u16 val);
        };
      
      Future patchsets will add ATU/VTU ops, software reset, etc.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df90a497
    • V
      net: dsa: mv88e6xxx: add eeprom ops · ee4dc2e7
      Vivien Didelot 提交于
      Remove EEPROM flags in favor of new {get,set}_eeprom chip-wide
      functions in the mv88e6xxx_ops structure.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee4dc2e7
    • V
      net: dsa: mv88e6xxx: add set_switch_mac to ops · b073d4e2
      Vivien Didelot 提交于
      Add a set_switch_mac chip-wide function to mv88e6xxx_ops and remove
      MV88E6XXX_FLAG_G2_SWITCH_MAC flags.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b073d4e2
    • V
      net: dsa: mv88e6xxx: add chip-wide ops · b3469dd8
      Vivien Didelot 提交于
      Introduce a mv88e6xxx_ops structure to describe supported chip-wide
      functions and assign the correct variant to the chip models.
      
      For the moment, add only PHY access routines. This allows to get rid of
      the PHY ops structures and the usage of PHY flags.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3469dd8
    • V
      net: dsa: mv88e6xxx: rename mv88e6xxx_ops · c08026ab
      Vivien Didelot 提交于
      The mv88e6xxx_ops is used to describe how to access the chip registers.
      It can be through SMI (via an MDIO bus), or via another interface such
      as crafted remote management frames.
      
      The correct BUS operations structure is chosen at runtime, depending on
      the chip address and connectivity.
      
      We will need the mv88e6xxx_ops name for future chip-wide operation
      structure, thus rename mv88e6xxx_ops to more explicit mv88e6xxx_bus_ops.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c08026ab
    • V
      net: dsa: mv88e6xxx: rename mv88e6xxx_vtu_stu_entry · b4e47c0f
      Vivien Didelot 提交于
      The STU (if the switch has one) is abstracted and accessed through the
      VTU operations and data registers.
      
      Thus rename the mv88e6xxx_vtu_stu_entry struct to mv88e6xxx_vtu_entry.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4e47c0f
    • V
      net: dsa: mv88e6xxx: add mv88e6xxx_num_ports helper · 370b4ffb
      Vivien Didelot 提交于
      Add an mv88e6xxx_num_ports helper instead of digging in the chip info
      structure.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      370b4ffb
    • V
      net: dsa: mv88e6xxx: expose mv88e6xxx_num_databases · de33376b
      Vivien Didelot 提交于
      The mv88e6xxx_num_databases will be used by shared code, so move it
      inline to the header file.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de33376b
    • V
      net: dsa: mv88e6xxx: add flags for FID registers · 6dc10bbc
      Vivien Didelot 提交于
      Add flags to describe the presence of Global 1 ATU FID register (0x01)
      and VTU FID register (0x02), instead of checking families.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dc10bbc
    • V
      net: dsa: mv88e6xxx: abstract REG_GLOBAL2 · 9fe850fb
      Vivien Didelot 提交于
      Similarly to the ports, phys, and Global SMI devices, abstract the SMI
      device address of the Global 2 registers in a few g2 static helpers.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fe850fb
    • V
      net: dsa: mv88e6xxx: add global1 helpers · a935c052
      Vivien Didelot 提交于
      The Global (1) internal SMI device is an extended set of registers
      containing ATU, PPU, VTU, STU, etc.
      
      It is present on every switches, usually at SMI address 0x1B. But old
      models such as 88E6060 access it at address 0xF, thus using REG_GLOBAL
      is erroneous.
      
      Add a global1_addr info member used by mv88e6xxx_g1_{read,write} and
      mv88e6xxx_g1_wait helpers in a new global1.c file.
      
      This patch finally removes _mv88e6xxx_reg_{read,write}, in favor on the
      appropriate helpers. No functional changes here.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a935c052
    • D
      rxrpc: Note serial number being ACK'd in the congestion management trace · ed1e8679
      David Howells 提交于
      Note the serial number of the packet being ACK'd in the congestion
      management trace rather than the serial number of the ACK packet.  Whilst
      the serial number of the ACK packet is useful for matching ACK packet in
      the output of wireshark, the serial number that the ACK is in response to
      is of more use in working out how different trace lines relate.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ed1e8679
    • D
      rxrpc: Request more ACKs in slow-start mode · b112a670
      David Howells 提交于
      Set the request-ACK on more DATA packets whilst we're in slow start mode so
      that we get sufficient ACKs back to supply information to configure the
      window.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b112a670
    • D
      rxrpc: Reduce the rxrpc_local::services list to a pointer · 1e9e5c95
      David Howells 提交于
      Reduce the rxrpc_local::services list to just a pointer as we don't permit
      multiple service endpoints to bind to a single transport endpoints (this is
      excluded by rxrpc_lookup_local()).
      
      The reason we don't allow this is that if you send a request to an AFS
      filesystem service, it will try to talk back to your cache manager on the
      port you sent from (this is how file change notifications are handled).  To
      prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
      sockets share a UDP socket if at least one of them has a service bound.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1e9e5c95
    • D
      rxrpc: When activating client conn channels, do state check inside lock · 2629c7fa
      David Howells 提交于
      In rxrpc_activate_channels(), the connection cache state is checked outside
      of the lock, which means it can change whilst we're waking calls up,
      thereby changing whether or not we're allowed to wake calls up.
      
      Fix this by moving the check inside the locked region.  The check to see if
      all the channels are currently busy can stay outside of the locked region.
      
      Whilst we're at it:
      
       (1) Split the locked section out into its own function so that we can call
           it from other places in a later patch.
      
       (2) Determine the mask of channels dependent on the state as we're going
           to add another state in a later patch that will restrict the number of
           simultaneous calls to 1 on a connection.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      2629c7fa
    • D
      rxrpc: Make Tx loss-injection go through normal return and adjust tracing · a1767077
      David Howells 提交于
      In rxrpc_send_data_packet() make the loss-injection path return through the
      same code as the transmission path so that the RTT determination is
      initiated and any future timer shuffling will be done, despite the packet
      having been binned.
      
      Whilst we're at it:
      
       (1) Add to the tx_data tracepoint an indication of whether or not we're
           retransmitting a data packet.
      
       (2) When we're deciding whether or not to request an ACK, rather than
           checking if we're in fast-retransmit mode check instead if we're
           retransmitting.
      
       (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're
           not altering the sk_buff refcount nor are we just seeing it after
           getting it off the Tx list.
      
       (4) The rxrpc_skb_tx_lost note is then no longer used so remove it.
      
       (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a1767077
    • D
      rxrpc: Fix exclusive client connections · 8732db67
      David Howells 提交于
      Exclusive connections are currently reusable (which they shouldn't be)
      because rxrpc_alloc_client_connection() checks the exclusive flag in the
      rxrpc_connection struct before it's initialised from the function
      parameters.  This means that the DONT_REUSE flag doesn't get set.
      
      Fix this by checking the function parameters for the exclusive flag.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8732db67
  2. 29 9月, 2016 6 次提交
    • D
      Merge branch 'qcom-emac-acpi' · 31fbe81f
      David S. Miller 提交于
      Timur Tabi says:
      
      ====================
      Add basic ACPI support to the Qualcomm Technologies EMAC driver
      
      This patch series adds support to the EMAC driver for extracting addresses,
      interrupts, and some _DSDs (properties) from ACPI.  The first two patches
      clean up the code, and the third patch adds ACPI-specific functionality.
      
      The first patch fixes a bug with handling the platform_device for the
      internal PHY.  This phy is treated as a separate device in both DT and
      ACPI, but since the platform is not released automatically when the
      driver unloads, managed functions like devm_ioremap_resource cannot be
      used.
      
      The second patch replaces of_get_mac_address with its platform-independent
      equivalent device_get_mac_address.
      
      The third patch parses the ACPI tables to obtain the platform_device for
      the primary EMAC node ("QCOM8070") and the internal phy node ("QCOM8071").
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31fbe81f
    • T
      net: qcom/emac: initial ACPI support · 5f3d3807
      Timur Tabi 提交于
      Add support for reading addresses, interrupts, and _DSD properties
      from ACPI tables, just like with device tree.  The HID for the
      EMAC device itself is QCOM8070.  The internal PHY is represented
      by a child node with a HID of QCOM8071.
      
      The EMAC also has some complex clock initialization requirements
      that are not represented by this patch.  This will be addressed
      in a future patch.
      Signed-off-by: NTimur Tabi <timur@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f3d3807
    • T
      net: qcom/emac: use device_get_mac_address · 0de709ac
      Timur Tabi 提交于
      Replace the DT-specific of_get_mac_address() function with
      device_get_mac_address, which works on both DT and ACPI platforms.  This
      change makes it easier to add ACPI support.
      Signed-off-by: NTimur Tabi <timur@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0de709ac
    • T
      net: qcom/emac: do not use devm on internal phy pdev · 54e19bc7
      Timur Tabi 提交于
      The platform_device returned by of_find_device_by_node() is not
      automatically released when the driver unprobes.  Therefore,
      managed calls like devm_ioremap_resource() should not be used.
      Instead, we manually allocate the resources and then free them
      on driver release.
      Signed-off-by: NTimur Tabi <timur@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54e19bc7
    • J
      bpf: allow access into map value arrays · 48461135
      Josef Bacik 提交于
      Suppose you have a map array value that is something like this
      
      struct foo {
      	unsigned iter;
      	int array[SOME_CONSTANT];
      };
      
      You can easily insert this into an array, but you cannot modify the contents of
      foo->array[] after the fact.  This is because we have no way to verify we won't
      go off the end of the array at verification time.  This patch provides a start
      for this work.  We accomplish this by keeping track of a minimum and maximum
      value a register could be while we're checking the code.  Then at the time we
      try to do an access into a MAP_VALUE we verify that the maximum offset into that
      region is a valid access into that memory region.  So in practice, code such as
      this
      
      unsigned index = 0;
      
      if (foo->iter >= SOME_CONSTANT)
      	foo->iter = index;
      else
      	index = foo->iter++;
      foo->array[index] = bar;
      
      would be allowed, as we can verify that index will always be between 0 and
      SOME_CONSTANT-1.  If you wish to use signed values you'll have to have an extra
      check to make sure the index isn't less than 0, or do something like index %=
      SOME_CONSTANT.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48461135
    • E
      net: do not export sk_stream_write_space · 7836667c
      Eric Dumazet 提交于
      Since commit 900f65d3 ("tcp: move duplicate code from
      tcp_v4_init_sock()/tcp_v6_init_sock()") we no longer need
      to export sk_stream_write_space()
      
      From: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7836667c
  3. 28 9月, 2016 7 次提交