1. 18 8月, 2016 6 次提交
    • D
      Merge branch 'strparser' · 48433419
      David S. Miller 提交于
      Tom Herbert says:
      
      ====================
      strp: Stream parser for messages
      
      This patch set introduces a utility for parsing application layer
      protocol messages in a TCP stream. This is a generalization of the
      mechanism implemented of Kernel Connection Multiplexor.
      
      This patch set adapts KCM to use the strparser. We expect that kTLS
      can use this mechanism also. RDS would probably be another candidate
      to use a common stream parsing mechanism.
      
      The API includes a context structure, a set of callbacks, utility
      functions, and a data ready function. The callbacks include
      a parse_msg function that is called to perform parsing (e.g.
      BPF parsing in case of KCM), and a rcv_msg function that is called
      when a full message has been completed.
      
      For strparser we specify the return codes from the parser to allow
      the backend to indicate that control of the socket should be
      transferred back to userspace to handle some exceptions in the
      stream: The return values are:
      
            >0 : indicates length of successfully parsed message
             0  : indicates more data must be received to parse the message
             -ESTRPIPE : current message should not be processed by the
                kernel, return control of the socket to userspace which
                can proceed to read the messages itself
             other < 0 : Error is parsing, give control back to userspace
                assuming that synchronization is lost and the stream
                is unrecoverable (application expected to close TCP socket)
      
      There is one issue I haven't been able to fully resolve. If parse_msg
      returns ESTRPIPE (wants control back to userspace) the parser may
      already have consumed some bytes of the message. There is no way to
      put bytes back into the TCP receive queue and tcp_read_sock does not
      allow an easy way to peek messages. In lieu of a better solution, we
      return ENODATA on the socket to indicate that the data stream is
      unrecoverable (application needs to close socket). This condition
      should only happen if an application layer message header is split
      across two skbuffs and parsing just the first skbuff wasn't sufficient
      to determine the that transfer to userspace is needed.
      
      This patch set contains:
      
        - strparser implementation
        - changes to kcm to use strparser
        - strparser.txt documentation
      
      v2:
        - Add copyright notice to C files
        - Remove GPL module license from strparser.c
        - Add report of rxpause
      
      v3:
        - Restore GPL module license
        - Use EXPORT_SYMBOL_GPL
      
      v4:
        - Removed unused function, changed another to be static as suggested
          by davem
        - Rewoked data_ready to be called from upper layer, no longer requires
          taking over socket data_ready callback as suggested by Lance Chao
      
      Tested:
        - Ran a KCM thrash test for 24 hours. No behavioral or performance
          differences observed.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48433419
    • T
      strparser: Documentation · adcce4d5
      Tom Herbert 提交于
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adcce4d5
    • T
      kcm: Use stream parser · 9b73896a
      Tom Herbert 提交于
      Adapt KCM to use the stream parser. This mostly involves removing
      the RX handling and setting up the strparser using the interface.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b73896a
    • T
      strparser: Stream parser for messages · 43a0c675
      Tom Herbert 提交于
      This patch introduces a utility for parsing application layer protocol
      messages in a TCP stream. This is a generalization of the mechanism
      implemented of Kernel Connection Multiplexor.
      
      The API includes a context structure, a set of callbacks, utility
      functions, and a data ready function.
      
      A stream parser instance is defined by a strparse structure that
      is bound to a TCP socket. The function to initialize the structure
      is:
      
      int strp_init(struct strparser *strp, struct sock *csk,
                    struct strp_callbacks *cb);
      
      csk is the TCP socket being bound to and cb are the parser callbacks.
      
      The upper layer calls strp_tcp_data_ready when data is ready on the lower
      socket for strparser to process. This should be called from a data_ready
      callback that is set on the socket:
      
      void strp_tcp_data_ready(struct strparser *strp);
      
      A parser is bound to a TCP socket by setting data_ready function to
      strp_tcp_data_ready so that all receive indications on the socket
      go through the parser. This is assumes that sk_user_data is set to
      the strparser structure.
      
      There are four callbacks.
       - parse_msg is called to parse the message (returns length or error).
       - rcv_msg is called when a complete message has been received
       - read_sock_done is called when data_ready function exits
       - abort_parser is called to abort the parser
      
      The input to parse_msg is an skbuff which contains next message under
      construction. The backend processing of parse_msg will parse the
      application layer protocol headers to determine the length of
      the message in the stream. The possible return values are:
      
         >0 : indicates length of successfully parsed message
         0  : indicates more data must be received to parse the message
         -ESTRPIPE : current message should not be processed by the
            kernel, return control of the socket to userspace which
            can proceed to read the messages itself
         other < 0 : Error is parsing, give control back to userspace
            assuming that synchronzation is lost and the stream
            is unrecoverable (application expected to close TCP socket)
      
      In the case of error return (< 0) strparse will stop the parser
      and report and error to userspace. The application must deal
      with the error. To handle the error the strparser is unbound
      from the TCP socket. If the error indicates that the stream
      TCP socket is at recoverable point (ESTRPIPE) then the application
      can read the TCP socket to process the stream. Once the application
      has dealt with the exceptions in the stream, it may again bind the
      socket to a strparser to continue data operations.
      
      Note that ENODATA may be returned to the application. In this case
      parse_msg returned -ESTRPIPE, however strparser was unable to maintain
      synchronization of the stream (i.e. some of the message in question
      was already read by the parser).
      
      strp_pause and strp_unpause are used to provide flow control. For
      instance, if rcv_msg is called but the upper layer can't immediately
      consume the message it can hold the message and pause strparser.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43a0c675
    • T
      net: ipconfig: Fix more use after free · d2d371ae
      Thierry Reding 提交于
      While commit 9c706a49 ("net: ipconfig: fix use after free") avoids
      the use after free, the resulting code still ends up calling both the
      ic_setup_if() and ic_setup_routes() after calling ic_close_devs(), and
      access to the device is still required.
      
      Move the call to ic_close_devs() to the very end of the function.
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2d371ae
    • D
      Merge tag 'batadv-next-for-davem-20160816' of git://git.open-mesh.org/linux-merge · 00062a93
      David S. Miller 提交于
      Simon Wunderlich says:
      
      ====================
      pull request for net-next: batman-adv 2016-08-16
      
      This feature patchset is all about adding netlink support, which should
      supersede our debugfs configuration interface in the long run. It is
      especially necessary when batman-adv should be used in different
      namespaces, since debugfs can not differentiate between those.
      
      More specifically, the following changes are included:
      
       - Two fixes for namespace handling by Andrew Lunn, checking also the
         namespaces for parent interfaces, and supress debugfs entries
         for non-default netns
      
       - Implement various netlink commands for the new interface, by
         Matthias Schiffer, Andrew Lunn, Sven Eckelmann and Simon Wunderlich
         (13 patches):
          * routing algorithm list
          * hardif list
          * translation tables (local and global)
          * TTVN for the translation tables
          * originator and neighbor tables for B.A.T.M.A.N. IV
            and B.A.T.M.A.N. V
          * gateway dump functionality for B.A.T.M.A.N. IV
            and B.A.T.M.A.N. V
          * Bridge Loop Avoidance claims, and corresponding BLA group
          * Bridge Loop Avoidance backbone tables
      
       - Finally, mark batman-adv as netns compatible, by Andrew Lunn (1 patch)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00062a93
  2. 16 8月, 2016 20 次提交
  3. 15 8月, 2016 10 次提交
    • W
      net: dsa: b53: remove .owner and .bus fields for driver · cfad65c7
      Wei Yongjun 提交于
      Remove .owner and .bus fields since module_spi_driver() is used
      which set them automatically.
      
      Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
      Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfad65c7
    • W
      net: macb: add missing free_netdev() on error in macb_probe() · b22ae0b4
      Wei Yongjun 提交于
      Add the missing free_netdev() before return from function macb_probe()
      in the platform_get_irq() error handling case.
      
      Fixes: c69618b3 ("net/macb: fix probe sequence to setup clocks earlier")
      Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b22ae0b4
    • W
      qed: Fix possible memory leak in qed_dcbnl_get_ieee_pfc() · 02ee9b18
      Wei Yongjun 提交于
      'dcbx_info is malloced in qed_dcbnl_get_ieee_pfc() and should be freed
      before leaving from the error handling cases, otherwise it will cause
      memory leak.
      
      Fixes: a1d8d8a5 ("qed: Add dcbnl support.")
      Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02ee9b18
    • D
      Merge branch 'cxgb4-IFLA_VF_MAC' · 2fb876b2
      David S. Miller 提交于
      Hariprasad Shenai says:
      
      ====================
      cxgb4: Add support for IFLA_VF_MAC
      
      We're struggling to implement the PCI SR-IOV management features for
      administering Virtual Functions which represent networking devices using
      the current Linux APIs. The problem is that these APIs incorporate all
      sorts of assumptions which don't match chelsio networking cards.
      
      For instance, the current APIs assume a 1-to-1 mapping of Network Ports,
      Physical Functions and the SR-IOV Virtual Functions of those Physical
      Functions. This is not the case with our cards where any Virtual Function
      can be hooked up to any Port -- or any number of Ports the current Linux
      APIs also assume only 1 Network Interface/Port can be accessed per Virtuali
      Function.
      
      Another issue is that these APIs assume that the Administrative Driver is
      attached to the Physical Function Associated with a Virtual Function. This
      is not the case with our card where all administration is performed by a
      Driver which is not attached to any of the Physical Functions which have
      SR-IOV PCI Capabilities.
      
      Another consequence of these assumptions is the inability to utilize all
      of the cards SR-IOV resources. For instance, our cards have SR-IOV
      Capabilities on Physical Functions 0..3 and the administrative Driver
      attaches to Physical Function 4. Each of the Physical Functions 0..3 can
      support up to 16 Virtual Functions. With the current Linux APIs, a 2-Port
      card would only be able to use the Virtual Functions on Physical
      Function 0..1 and not allow the Virtual Functions on Physical
      Functions 2..3 to be used since there are no Ports 2..3 on a 2-Port card.
      
      Patch 1/2 adds support to create management interface for each PF to control
      thier corresponding VF's. Patch 2/2 adds support for ndo_set_vf_mac.
      
      This patch series has been created against net-next tree.
      
      We have included all the maintainers of respective drivers. Kindly review
      the change and let us know in case of any review comments.
      
      V5: Fix warning reported by kbuild bot when CONFIG_PCI_IOV isn't defined.
      
      V4: Handle memory allocation failure for adapter->mbox_log in init_one().
          Based on review comment by Yuval Mintz <Yuval.Mintz@qlogic.com>
      
      V3: Based on review comment by Yuval Mintz, removed extra parameter pf
          added to IFLA_VF API's and created a net_device corresponding to
          each PF for controling their VF. Based on review comment by
          Yuval Mintz <Yuval.Mintz@qlogic.com>
      
      V2: Fixed check for MAC address in Patch 2/2, based on review comment by
          Yuval Mintz <Yuval.Mintz@qlogic.com>
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fb876b2
    • H
      cxgb4/cxgb4vf: Add set VF mac address support · 858aa65c
      Hariprasad Shenai 提交于
      Add ndo_set_vf_mac support which allows to set the MAC address
      for cxgb4vf interfaces from the host
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      858aa65c
    • H
      cxgb4: Add control net_device for configuring PCIe VF · 7829451c
      Hariprasad Shenai 提交于
      Issue:
      For instance, the current APIs assume a 1-to-1 mapping of Network Ports,
      Physical Functions and the SR-IOV Virtual Functions of those Physical
      Functions. This is not the case with our cards where any Virtual
      Function can be hooked up to any Port -- or any number of Ports the
      current Linux APIs also assume only 1 Network Interface/Port can be
      accessed per Virtual Function.
      
      Another issue is that these APIs assume that the Administrative Driver
      is attached to the Physical Function Associated with a Virtual Function.
      This is not the case with our card where all administration is performed
      by a Driver which is not attached to any of the Physical Functions which
      have SR-IOV PCI Capabilities.
      
      Another consequence of these assumptions is the inability to utilize all
      of the cards SR-IOV resources. For instance, our cards have SR-IOV
      Capabilities on Physical Functions 0..3 and the administrative Driver
      attaches to Physical Function 4. Each of the Physical Functions 0..3 can
      support up to 16 Virtual Functions. With the current Linux APIs, a
      2-Port card would only be able to use the Virtual Functions on Physical
      Function 0..1 and not allow the Virtual Functions on Physical Functions
      2..3 to be used since there are no Ports 2..3 on a 2-Port card.
      
      Fix:
      Since the control node is always the netdevice for all VF ACL commands.
      Created a dummy netdevice for each Physical Function from 0 to 3 through
      which one could control their VFs. The device won't be associated with
      any port, since it doesn't need to transmit/receive. Its purely used
      for VF management purpose only. The device will be registered only when
      VF for a particular PF is configured using PCI sysfs interface and
      unregistered while pci_disable_sriov() for the PF is called.
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7829451c
    • D
      Merge branch 'proc-per-ns' · a878c020
      David S. Miller 提交于
      Dmitry Torokhov says:
      
      ====================
      Make /proc per net namespace objects belong to container
      
      Currently [almost] all /proc objects belong to the global root, even if
      data belongs to a given namespace within a container and (at least for
      sysctls) we work around permssions checks to allow container's root to
      access the data.
      
      This series changes ownership of net namespace /proc objects
      (/proc/net/self/* and /proc/sys/net/*) to be container's root and not
      global root when there exists mapping for container's root in user
      namespace.
      
      This helps when running Android CTS in a container, but I think it makes
      sense regardless.
      
      Changes from V1:
      
      - added fix for crash when !CONFIG_NET_NS (new patch #1)
      - addressed Eric'c comments for error handling style in patch #3 and
        added his Ack
      - adjusted patch #2 to use the same style of erro handling
      - sent out as series instead of separate patches
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a878c020
    • D
      net: make net namespace sysctls belong to container's owner · e79c6a4f
      Dmitry Torokhov 提交于
      If net namespace is attached to a user namespace let's make container's
      root owner of sysctls affecting said network namespace instead of global
      root.
      
      This also allows us to clean up net_ctl_permissions() because we do not
      need to fudge permissions anymore for the container's owner since it now
      owns the objects in question.
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e79c6a4f
    • D
      proc: make proc entries inherit ownership from parent · c110486f
      Dmitry Torokhov 提交于
      There are certain parameters that belong to net namespace and that are
      exported in /proc. They should be controllable by the container's owner,
      but are currently owned by global root and thus not available.
      
      Let's change proc code to inherit ownership of parent entry, and when
      create per-ns "net" proc entry set it up as owned by container's owner.
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c110486f
    • D
      netns: do not call pernet ops for not yet set up init_net namespace · f8c46cb3
      Dmitry Torokhov 提交于
      When CONFIG_NET_NS is disabled, registering pernet operations causes
      init() to be called immediately with init_net as an argument. Unfortunately
      this leads to some pernet ops, such as proc_net_ns_init() to be called too
      early, when init_net namespace has not been fully initialized. This causes
      issues when we want to change pernet ops to use more data from the net
      namespace in question, for example reference user namespace that owns our
      network namespace.
      
      To fix this we could either play game of musical chairs and rearrange init
      order, or we could do the same as when CONFIG_NET_NS is enabled, and
      postpone calling pernet ops->init() until namespace is set up properly.
      
      Note that we can not simply undo commit ed160e83 ("[NET]: Cleanup
      pernet operation without CONFIG_NET_NS") and use the same implementations
      for __register_pernet_operations() and __unregister_pernet_operations(),
      because many pernet ops are marked as __net_initdata and will be discarded,
      which wreaks havoc on our ops lists. Here we rely on the fact that we only
      use lists until init_net is fully initialized, which happens much earlier
      than discarding __net_initdata sections.
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8c46cb3
  4. 14 8月, 2016 4 次提交